Biopython/C2/Writing-Sequence-Files/Khasi

From Script | Spoken-Tutorial
Revision as of 17:42, 2 February 2018 by Hezekiah2016 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Time Narration
00:01 Hello everyone.Welcome to this tutorial on Writing Sequence Files.

(Khublei ia phi baroh. Ngi pdiang sngewbha ia phi sha kane ka jingbatai shaphang ka Writing Sequence Files.

00:07 In this tutorial, we will learn: * How to create Sequence Record Objects

(Ha kane ka tutorial, ngin sa nang ban : * Kumno ban shna Sequence Record Objects


00:13 Write sequences files

(Ban write ia ki sequences files

00:15 Convert between file formats

(Pynkylla hapdeng file formats

00:19 And, sort records in a file by length.

(Bad, jied records ha ka file da ka jingjrong.

00:23 To follow this tutorial, you should be familiar with

(Ban kham sngewthuh ia kane ka tutorial, phi dei ban long kiba shemphang bad

00:27 undergraduate Biochemistry or Bioinformatics

(undergraduate Biochemistry lane Bioinformatics.

00:31 and basic Python programming.

(Bad basic Python programming.

00:34 Refer to the Python tutorials at the given link.

(Pyndonkam da ka Python tutorials na ka link harum .

00:38 To record this tutorial, I am using: * Ubuntu OS version 14.10

(Ban record ia kane ka tutorial, nga pyndonkam da ka: * Ubuntu OS version 14.10

00:45 Python version 2.7.8

(Python version 2.7.8

00:48 Ipython interpretor version 2.3.0 and * Biopython version 1.64.

(Ipython interpretor version 2.3.0 bad * Biopython version 1.64.

00:55 We have earlier learnt about parse and read functions to read contents of a file.

(Ha kaba nyngkong ngi lah dep pule kumno ban parse badread functions ban read ia ki contents jong ka file.

01:03 In this tutorial, we will learn how to use write function to write sequences to a file.

(Ha kane ka tutorial, ngin sa nang kumno ban pyndonkam ia ka write function sha ka write sequences jong ka file.

01:09 And, use Convert function for inter-conversion between various file formats.

(Bad ban pyndonkam Convert function na ka bynta ka inter-conversion hapdeng ba bun ki file formats.

01:16 Let me now demonstrate how to use write function.

(Mynta ngan pyni ia phi kumno ban pyndonkam ia write function.

01:20 Here is a text file with a protein sequence.

(Hangne don ka text file ba don u protein sequence.

01:24 The sequence shown here is insulin protein.

(Ka sequence ba pyni hangne dei u insulin protein.

01:28 The file also has information such as GI accession number and also description.

(Kane ka file ka don ruh ki jingpyntip kum GI accession number bad description.

01:36 We will now create a file for this sequence in FASTA format.

(Mynta ngin ia shna ka sequence jong kane ka file ha ka FASTA format.

01:41 The first step is to create sequence record object.

(Nyngkong eh ngi shna sequence record object.

01:45 More information about Sequence Record Objects:

(Ki jingtip ba kham bniah shaphang Sequence Record Objects:

01:49 It is the basic data type for the sequence input/output interface.

(Ka dei ka basic data type na ka bynta ka sequence input/output interface.

01:55 In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions.

(Ha ka sequence record object, ia ka sequence la pyniasoh bad ka features ba kham halor kum ki identifiers bad descriptions.

02:04 Open the terminal by pressing Ctrl, Alt and t keys simultaneously .

(Plie ia ka terminal da kaba nion sah ia ki Ctrl, Alt and t keys.

02:10 At the prompt, type: ipython, press Enter.

(Ha ka prompt, type: ipython, nion Enter.

02:15 At the prompt, type the following lines:

(Ha ka prompt, type kumne harum:

02:18 from Bio dot Seq module import Seq class.

(from Bio dot Seq module import Seq class.

02:24 from Bio dot SeqRecord module import Sequence Record class

(from Bio dot SeqRecord module import Sequence Record class

02:31 Next, from Bio dot Alphabet module import generic protein class.

(Hadien , from Bio dot Alphabet module import generic protein class.

02:38 Next, I will save the sequence record object in a variable record1.

(Nangta , ngan sa save ia ka sequence record object ha ka variable record1.

02:45 Copy the sequence, id and description from the text file and paste it in the respective lines on the terminal.

(Copy ia ka sequence, id bad description na ka text file bad paste ia ka ha ki lines ba lah buh ha ka terminal.

02:56 Press Enter.

(Nion Enter.

02:58 To view the output, type: record1.

(Ban peit ia ka output, type: record1.

03:02 Press Enter.

(Nion Enter.

03:04 The output shows the insulin protein sequence as sequence record object.

(Ka output ka pyni ia ka insulin protein sequence kum sequence record object.

03:10 It shows the sequence along with id and description.

(Ka pyni ia ka sequence ryngkat bad id bad description.

03:13 We will use write function to convert the above sequence record object to a FASTA file.

(Ngin pyndonkam write function ban pynkylla ia ka sequence record object haneng sha ka FASTA file.

03:21 Import SeqIO module from Bio package.

(Import SeqIO module na Bio package.

03:26 Next, type the command line with a write function to convert the sequence object to FASTA file.

(Hadien , type ia ka command line bad ka write function ban pynkylla ia ka sequence object sha ka FASTA file.

03:40 The write function takes 3 arguments.

(Ka write function ka shim 3 arguments .

03:44 The first one is the variable storing the sequence record object.

(Ka ba nyngkong ka dei ka variable ba store ia ka sequence record object.

03:49 The second is the file name to write the FASTA file.

(Ka ba ar ka dei ka file name ban write ia ka FASTA file.

03:54 The third is the file format to write. Press Enter.

(Ka ba lai ka dei ka file format ban write. Nion Enter.

03:58 The Output shows one, that is, we have converted one sequence record object to a FASTA file.

(Ka Output ka pyni kawei, kata ka dei, ngi pynkylla ia kawei ka sequence record object sha ka FASTA file.

04:07 The file in FASTA format is saved in the home folder as "example.fasta".

(Ka file ha ka FASTA format la save ha ka home folder kum "example.fasta".

04:13 Let me warn you,the output will over-write any pre-existing file of the same name.

(Ngan shu maham ia phi ba ka output kan over-write ia kano kano ka file ba lah don lypa kaba don kajuh ka kyrteng.

04:18 To view the file, navigate to the file in the home folder.

(Ban peit ia ka file, leit sha ka file ha ka home folder.

04:24 Open this file in a text editor.

(Plie ia kane ka file ha ka text editor.

04:27 The protein sequence is now in FASTA format.

(Ka protein sequence ka lah don mynta ha ka FASTA format.

04:31 Close the text editor.

(Khang ia u text editor.

04:33 Many bioinformatics tools take different input file formats.

(Bun ki bioinformatics tools ki shim bun jait ki input file formats.

04:38 So, sometimes there is a need to inter-convert between sequence file formats.

(Te, teng teng ngi hap ban inter-convert hapdeng ki sequence file formats.

04:44 We can do file conversions using convert function in SeqIO module.

(Ngi lah ban pynkylla ia ki files da kaba pyndonkam convert function ha SeqIO module.

04:50 For demonstration, I will convert a GenBank file to a FASTA file.

(Ban pyni nuksa, ngan pynkylla ia u GenBank file sha ka FASTA file.

04:55 Have a GenBank file in my home folder.

(Nga don u GenBank file ha ka home folder jong nga.

04:59 Let me open this in a text editor.

(To ngan plie ia une ha u text editor.

05:02 The file contains HIV genome in GenBank format.

(U file u don HIV genome ha GenBank format.

05:07 This GenBank file has descriptions of all the genes in the genome, in the first part of the file.

(Une u GenBank file u don ki jingbatai jong baroh ki genes ha ka genome, ha ka bynta ba nyngkong jong ka file.

05:14 It is followed by a complete genome sequence.

(Nangta sa bud da u genome sequence ba pura.

05:18 Close the text editor. Type the following lines on the terminal.

(Khang ia u text editor. Type ia ki line ha terminal kumne harum.

05:23 Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter.

(Hangne u convert function u pynkylla ia baroh ka genome sequence kiba don ha GenBank file sha FASTA file. Nion Enter.

05:33 The new file in the FASTA format is now saved as HIV.fasta in the home folder.

(Ka file ba thymmai ha ka FASTA format mynta la shah save kum HIV.fasta ha ka home folder.

05:39 Navigate to the file and open in the text editor.

(Leit sha ka file bad plie da u text editor.

05:46 Close the text editor.

(Khang ia u text editor.

05:49 Even though we can convert the file formats easily using convert function, it has limitations.

(Wat la ka long kaba suk ban pynkylla ia ka file formats da kaba pyndonkam convert function,hynrei ka don da u pud.

05:56 Writing some formats requires information which other file formats don’t contain.

(Ban write ia katto katne ki formats, donkam ia ki jingtip ba kiwei ki formats kim don.

06:02 For example: We can convert a GenBank file to a FASTA file, we can't do the reverse.

(Nuksa: Ngi lah ban pynkylla ia u GenBank file sha u FASTA file. Hynrei ngim lah ban leh da khongpong.

06:09 Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse.

(Kumjuh ruh, ngi lah ban pynkylla ia FASTQ file sha ka FASTA file hynrei ngim lah ban leh da khongpong pat.

06:15 For more information regarding convert function, type the help command.

(Ban tip khap bniah shaphang ka convert function, type ia u help command.

06:21 Press Enter.

(Nion Enter.)

06:24 Press 'q' on the key board to get back to the prompt.

(Nion 'q' ha ka keyboard ban ioh biang ia ka prompt.

06:28 We can also extract individual genes from the HIV genome in GenBank format.

(Ngi lah ruh ban sei ia ki genesba shimet na ka HIV genome ha GenBank format.

06:35 These individual genes can be saved in FASTA or any other formats.

(Kine ki genes ba shimet lah ban save ruh ha FASTA lane kino kino ki formats.

06:41 For this, type the following code at the prompt.

(Ban leh kumta , type ia u code kumne ha ka prompt.

06:47 This code will write all individual CDS gene sequences, their ids and name of the gene in a file.

(Une u code un write lut ia baroh ki CDS gene sequences, ki ids bad kyrteng jong ki gene ha ka file.

06:56 The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter.

(Ia kane ka file la save kum “HIV_geneseq.fasta” ha ka home folder jong phi. Nion Enter.

07:07 Using Biopython tools, we can sort the records in a file by length.

(Da kaba pyndonkam u Biopython tools, ngi lah ban jied ia ki records jong ka file da ka jingjrong jong ka.

07:12 Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.

(Hangne, nga lah plie ia u FASTA file “hemoglobin.fasta” uba don hynriew tylli ki records.

07:19 Each record is of a different length.

(Man ka record ki pher ha ka jingjrong.

07:23 Type the following lines to arrange the longest record first.

(Type kumne harum ban buh da ka record ba jrong na ba sdang .

07:27 The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder.

(Ka file ba thymmai kaba don u sequence ba la jied yn shah save kum "sorted_hemoglobin.fasta" ha ka home folder jong phi.

07:38 For short records first, reverse the arguments in the records.sort command line.

(Ban ioh ia ki records ba lyngkot nyngkong, pynkylla ia ki arguments ha ka records.sort command line.

07:45 Let's summarize.In this tutorial, we have learnt :* to create Sequence Record Objects

(To ngin ia lum kyllum . Ha kane ka tutorial, ngi lah nang ban:* to create Sequence Record Objects

07:51 Write sequence files using write function of Sequence Input/Output module.

(Write ia ki sequence files da kaba pyndonkam write function jongSequence Input/Output module.

07:58 Convert between sequence file formats using convert function.

(Pynkylla hapdeng sequence file formats da kaba pyndonkamconvert function.

08:03 And, sort records in a file by length.

(Bad, wad ia ki records ha ka file da ka jingjrong.

08:07 For the assignment:

(Na ka bynta ka assignment:

08:09 Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV.

(Extract ia u gene "HIV1gp3" ha positions 4587 sha 5165 na ka genomic sequence jong HIV.

08:21 The file “HIV.gb” is included in the code files of this tutorial.

(Ka file “HIV.gb” la kynthup lang ha ka code files jong kane ka tutorial.

08:28 Your completed assignment will have the following code.

(Ka assignment ba lah dep jong phi kan don ia u code ba harum.

08:43 The video at the following link summarizes the Spoken Tutorial project.

(Ka video ha ka link ba bud ka batai bniah shaphang ka Spoken Tutorial project.

08:48 Please download and watch it. The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.

(Sngewbha download bad peit ia ka. Ka Spoken Tutorial Project team ka ju pynlong workshops bad ai certificates ruh ia kito kiba pass ia ka online test.

08:57 For more details, please write to us.

((Na ka bynta ka jingtip ba kham bniah, sngewbha thoh sha ngi.

09:00 The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

(Ia ka Spoken Tutorial Project la bei tyngka da ka NMEICT, MHRD, Government of India.

09:06 More information on this mission is available at the link shown.

(Ki jingtip ba kham pura ia kane ka mission lah ban ioh na ka link harum .

09:10 This is Snehalatha from IIT Bombay, signing off. Thank you for joining.

(Nga dei ka Snehalatha na IIT Bombay, signing off. Khublei shibun )

Contributors and Content Editors

Hezekiah2016