Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with " {|Border=1 ! <center>Time</center> ! <center>Narration</center> |- | 00:01 | Hello everyone. |- | 00:02 | Welcome to this tutorial on '''Writing Sequence Files'''. |- | 00...")
 
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
 
{|Border=1
 
{|Border=1
! <center>Time</center>
+
|'''Time'''
! <center>Narration</center>
+
|'''Narration'''
  
 
|-
 
|-
 
| 00:01
 
| 00:01
| Hello everyone.
+
| Hello everyone.Welcome to this tutorial on '''Writing Sequence Files'''.
 
+
|-
+
| 00:02
+
| Welcome to this tutorial on '''Writing Sequence Files'''.
+
  
 
|-
 
|-
 
| 00:07
 
| 00:07
| In this tutorial, we will learn: How to create Sequence Record Objects.
+
| In this tutorial, we will learn: * How to create '''Sequence Record Objects'''
  
 
|-
 
|-
 
| 00:13
 
| 00:13
| Write sequences files.
+
| Write sequences files
  
 
|-
 
|-
 
| 00:15
 
| 00:15
| Convert between file formats.
+
| Convert between '''file format'''s
  
 
|-
 
|-
 
| 00:19
 
| 00:19
| And sort records in a file by length.
+
| And, sort '''record'''s in a file by length.
  
 
|-
 
|-
 
| 00:23
 
| 00:23
| To follow this tutorial you should be familiar with,
+
| To follow this tutorial, you should be familiar with
  
 
|-
 
|-
 
| 00:27
 
| 00:27
Undergraduate '''Biochemistry''' or '''Bioinformatics'''
+
undergraduate Biochemistry or Bioinformatics
  
 
|-
 
|-
 
| 00:31
 
| 00:31
| And basic '''Python''' programming  
+
| and basic '''Python''' programming.
  
 
|-
 
|-
Line 46: Line 41:
 
|-
 
|-
 
| 00:38
 
| 00:38
|  To record this tutorial I am using '''Ubuntu OS''' version. 14.10
+
|  To record this tutorial, I am using: * '''Ubuntu OS''' version 14.10
 
   
 
   
 
|-
 
|-
 
| 00:45
 
| 00:45
|'''Python''' version 2.7.8
+
| '''Python''' version 2.7.8
 +
 
 
|-
 
|-
 
| 00:48
 
| 00:48
| '''Ipython interpretor''' version 2.3.0 and '''Biopython''' version 1.64
+
| '''Ipython interpretor''' version 2.3.0 and * '''Biopython''' version 1.64.
  
 
|-
 
|-
 
| 00:55
 
| 00:55
|We have earlier learnt about '''parse''' and '''read''' functions to read contents of a file.
+
|We have earlier learnt about '''parse''' and '''read''' '''function'''s to read contents of a file.
  
 
|-
 
|-
 
|01:03
 
|01:03
|In this tutorial we will learn how to use '''write''' function to write sequences to a file.
+
|In this tutorial, we will learn how to use '''write''' function to write sequences to a file.
  
 
|-
 
|-
 
| 01:09
 
| 01:09
|And use '''Convert''' function for interconversion between various file formats.  
+
|And, use '''Convert''' function for inter-conversion between various '''file format'''s.  
  
 
|-
 
|-
Line 81: Line 77:
 
|-
 
|-
 
|01:28
 
|01:28
|The file also has information such as '''GI''' accession number and also description.
+
|The file also has information such as '''GI accession number''' and also '''description'''.
  
 
|-
 
|-
Line 89: Line 85:
 
|-
 
|-
 
|01:41
 
|01:41
|The first step is to create sequence record object.
+
|The first step is to create '''sequence record object'''.
  
 
|-
 
|-
Line 97: Line 93:
 
|-
 
|-
 
|01:49
 
|01:49
|It is the basic data type for the sequence input/output interface.
+
|It is the basic data type for the '''sequence input/output interface'''.
  
 
|-
 
|-
 
|01:55
 
|01:55
|In sequence record object, a sequence is associated with higher level features: such as identifiers and descriptions.
+
|In sequence record object, a sequence is associated with higher level features such as '''identifier'''s and descriptions.
  
 
|-
 
|-
 
|02:04
 
|02:04
|Open the terminal by pressing ctrl, alt and t keys simultaneously .
+
|Open the terminal by pressing '''Ctrl, Alt''' and '''t''' keys simultaneously .
  
 
|-
 
|-
 
| 02:10
 
| 02:10
|At the prompt type '''ipython''', Press Enter.
+
|At the prompt, type: '''ipython''', press '''Enter'''.
  
 
|-
 
|-
 
|02:15
 
|02:15
|At the prompt type the following lines:
+
|At the prompt, type the following lines:
  
 
|-
 
|-
Line 125: Line 121:
 
|-
 
|-
 
|02:31
 
|02:31
| Next '''from Bio dot Alphabet module import generic protein class'''
+
| Next, '''from Bio dot Alphabet module import generic protein class'''.
  
 
|-
 
|-
 
|02:38
 
|02:38
|Next I will save the sequence record object in a variable '''record1.'''
+
|Next, I will save the sequence record object in a variable '''record1.'''
  
 
|-
 
|-
 
|02:45
 
|02:45
|Copy the sequence, id and description from the text file and paste in the respective lines on the terminal.
+
|'''Copy''' the '''sequence, id''' and '''description''' from the text file and '''paste''' it in the respective lines on the terminal.
  
 
|-
 
|-
Line 141: Line 137:
 
|-
 
|-
 
| 02:58
 
| 02:58
|To view the output, type, '''record1'''.
+
|To view the output, type: '''record1'''.
  
 
|-
 
|-
Line 153: Line 149:
 
|-
 
|-
 
|03:10
 
|03:10
|It shows the sequence along with id and description.
+
|It shows the sequence along with '''id''' and '''description'''.
  
 
|-
 
|-
 
| 03:13
 
| 03:13
|We will use write function to convert the above sequence record object to a '''FASTA''' file.
+
|We will use '''write''' function to convert the above sequence record object to a '''FASTA''' file.
  
 
|-
 
|-
Line 165: Line 161:
 
|-
 
|-
 
|03:26
 
|03:26
|Next type the command line with a '''write''' function to convert the sequence object to '''FASTA''' file.
+
|Next, type the '''command line''' with a '''write''' function to convert the sequence object to '''FASTA''' file.
  
 
|-
 
|-
 
|03:40
 
|03:40
|The '''write '''function takes 3 arguments
+
|The '''write '''function takes 3 '''argument'''s.
  
 
|-
 
|-
 
|03:44
 
|03:44
|The first one is the variable storing the sequence record object.
+
|The first one is the variable storing the '''sequence record object'''.
  
 
|-
 
|-
 
|03:49
 
|03:49
|The Second is the file name to write the '''FASTA''' file.
+
|The second is the file name to write the '''FASTA''' file.
  
 
|-
 
|-
 
|03:54
 
|03:54
|The Third is the file format to write. Press '''enter.'''
+
|The third is the file format to write. Press '''Enter'''.
  
 
|-
 
|-
 
| 03:58
 
| 03:58
|The Output shows “one”, that is we have converted one sequence record object to a '''FASTA''' file.
+
|The Output shows one, that is, we have converted one '''sequence record object''' to a '''FASTA''' file.
  
 
|-
 
|-
 
|04:07
 
|04:07
|The file in '''FASTA''' format is saved in the home folder as '''"example.fasta".'''
+
|The file in '''FASTA''' format is saved in the '''home''' folder as "example.fasta".
  
 
|-
 
|-
 
|04:13
 
|04:13
|Let me warn you:
+
|Let me warn you,the output will over-write any pre-existing file of the same name.
 
+
|-
+
|04:14
+
|The output will over-write any pre-existing file of the same name.
+
 
+
 
|-
 
|-
 
| 04:18
 
| 04:18
Line 221: Line 213:
 
|-
 
|-
 
|04:38
 
|04:38
|So sometimes there is a need to interconvert between sequence file formats.
+
|So, sometimes there is a need to inter-convert between sequence file formats.
  
 
|-
 
|-
Line 229: Line 221:
 
|-
 
|-
 
| 04:50
 
| 04:50
|For demonstration I will convert a '''GenBank''' file to a '''FASTA''' file.
+
|For demonstration, I will convert a '''GenBank''' file to a '''FASTA''' file.
  
 
|-
 
|-
 
|04:55
 
|04:55
|Have a '''GenBank''' file in my home folder.
+
|Have a '''GenBank''' file in my '''home''' folder.
  
 
|-
 
|-
Line 245: Line 237:
 
|-
 
|-
 
|05:07
 
|05:07
|This '''GenBank''' file has descriptions of all the '''genes''' in the '''genome''' in the first part of the file.  
+
|This '''GenBank''' file has descriptions of all the '''genes''' in the '''genome''', in the first part of the file.  
  
 
|-
 
|-
 
|05:14
 
|05:14
 
|It is followed by a complete '''genome''' sequence.
 
|It is followed by a complete '''genome''' sequence.
 +
 
|-
 
|-
 
|05:18
 
|05:18
|Close the text editor.
+
|Close the text editor. Type the following lines on the terminal.
 
+
|-
+
| 05:19
+
|Type the following lines on the terminal.
+
  
 
|-
 
|-
 
|05:23
 
|05:23
|Here the '''convert''' function converts the complete '''genome''' sequence present in the '''GenBank''' file to '''FASTA''' file. Press '''enter'''
+
|Here the '''convert''' function converts the complete '''genome''' sequence present in the '''GenBank''' file to '''FASTA''' file. Press '''Enter'''.
  
 
|-
 
|-
 
|05:33
 
|05:33
|The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the home folder.
+
|The new file in the '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder.
  
 
|-
 
|-
Line 288: Line 277:
 
|-
 
|-
 
|06:09
 
|06:09
|Similarly we can turn a '''FASTQ''' file into a '''FASTA''' file, but can’t do the reverse.
+
|Similarly, we can turn a '''FASTQ''' file into a '''FASTA''' file but can’t do the reverse.
  
 
|-
 
|-
 
|06:15
 
|06:15
|For more information regarding convert function, type the '''help''' command.
+
|For more information regarding '''convert''' function, type the '''help''' command.
  
 
|-
 
|-
 
|06:21
 
|06:21
|Press '''enter'''.
+
|Press '''Enter'''.
  
 
|-
 
|-
 
|06:24
 
|06:24
|Press “'''q'''” on the key board to get back to the prompt.  
+
|Press 'q' on the key board to get back to the prompt.  
  
 
|-
 
|-
Line 312: Line 301:
 
|-
 
|-
 
|06:41
 
|06:41
| For this type the following code at the prompt.
+
| For this, type the following code at the '''prompt'''.
  
 
|-
 
|-
 
|06:47
 
|06:47
|This code will write all individual '''CDS''' gene sequences , their ids and name of the gene in a file.  
+
|This code will write all individual '''CDS''' gene sequences, their ids and name of the '''gene''' in a file.  
  
 
|-
 
|-
 
|06:56
 
|06:56
|The file is saved as “'''HIV_geneseq.fasta'''” in your '''home''' folder. Press '''enter'''
+
|The file is saved as “HIV_geneseq.fasta” in your '''home''' folder. Press '''Enter'''.
  
 
|-
 
|-
 
| 07:07
 
| 07:07
|Using '''Biopython''' tools we can sort the records in a file by length.
+
|Using '''Biopython''' tools, we can sort the records in a file by length.
  
 
|-
 
|-
 
|07:12
 
|07:12
|Here I have opened a FASTA file “'''hemoglobin.fasta'''” which has six records.
+
|Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.
  
 
|-
 
|-
 
|07:19
 
|07:19
|Each record is of a different length.
+
|Each '''record''' is of a different length.
  
 
|-
 
|-
Line 340: Line 329:
 
|-
 
|-
 
|07:27
 
|07:27
|The new file with the sorted sequences will be saved as''' "sorted_hemoglobin.fasta" '''in your '''home''' folder
+
|The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your '''home''' folder.
  
 
|-
 
|-
 
|07:38
 
|07:38
|For Short records first, reverse the arguments in the '''records.sort''' command line.
+
|For short records first, reverse the arguments in the '''records.sort''' command line.
  
 
|-
 
|-
 
| 07:45
 
| 07:45
| Lets Summarize,
+
| Let's summarize.In this tutorial, we have learnt :* to create Sequence Record Objects
|-
+
|07:46
+
|In this tutorial we have learnt : to create Sequence Record Objects.
+
  
 
|-
 
|-
 
|07:51
 
|07:51
|Write sequence files using''' write''' function of Sequence Input/Output module.
+
| Write sequence files using''' write''' function of '''Sequence Input/Output''' module.
  
 
|-
 
|-
 
|07:58
 
|07:58
|Convert between sequence file formats using '''convert''' function.
+
| Convert between '''sequence file format'''s using '''convert''' function.
  
 
|-
 
|-
 
|08:03
 
|08:03
|And sort records in a file by length.
+
| And, sort records in a file by length.
  
 
|-
 
|-
 
| 08:07
 
| 08:07
|For the Assignment:  
+
|For the assignment:  
  
 
|-
 
|-
 
|08:09
 
|08:09
|Extract the gene "'''HIV1gp3'''" at positions 4587 to 5165 from the '''genomic''' sequence of HIV.
+
|'''Extract''' the gene "HIV1gp3" at positions 4587 to 5165 from the '''genomic''' sequence of HIV.
  
 
|-
 
|-
 
|08:21
 
|08:21
|The file “'''HIV.gb'''” is included in code files of this tutorial.
+
|The file “HIV.gb” is included in the code files of this tutorial.
  
 
|-
 
|-
Line 387: Line 373:
 
|-
 
|-
 
|08:48
 
|08:48
|Please download and watch it.  
+
|Please download and watch it. The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.  
 
+
|-
+
|08:49
+
|The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.  
+
  
 
|-
 
|-
Line 403: Line 385:
 
|-
 
|-
 
|09:06
 
|09:06
|More information on this Mission is available at the link shown.  
+
|More information on this mission is available at the link shown.  
  
 
|-
 
|-
 
| 09:10
 
| 09:10
| This is Snehalatha from IIT Bombay signing off. Thank you for joining.  
+
| This is Snehalatha from '''IIT Bombay''', signing off. Thank you for joining.  
  
 
|}
 
|}

Latest revision as of 18:27, 23 March 2017

Time Narration
00:01 Hello everyone.Welcome to this tutorial on Writing Sequence Files.
00:07 In this tutorial, we will learn: * How to create Sequence Record Objects
00:13 Write sequences files
00:15 Convert between file formats
00:19 And, sort records in a file by length.
00:23 To follow this tutorial, you should be familiar with
00:27 undergraduate Biochemistry or Bioinformatics
00:31 and basic Python programming.
00:34 Refer to the Python tutorials at the given link.
00:38 To record this tutorial, I am using: * Ubuntu OS version 14.10
00:45 Python version 2.7.8
00:48 Ipython interpretor version 2.3.0 and * Biopython version 1.64.
00:55 We have earlier learnt about parse and read functions to read contents of a file.
01:03 In this tutorial, we will learn how to use write function to write sequences to a file.
01:09 And, use Convert function for inter-conversion between various file formats.
01:16 Let me now demonstrate how to use write function.
01:20 Here is a text file with a protein sequence.
01:24 The sequence shown here is insulin protein.
01:28 The file also has information such as GI accession number and also description.
01:36 We will now create a file for this sequence in FASTA format.
01:41 The first step is to create sequence record object.
01:45 More information about Sequence Record Objects:
01:49 It is the basic data type for the sequence input/output interface.
01:55 In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions.
02:04 Open the terminal by pressing Ctrl, Alt and t keys simultaneously .
02:10 At the prompt, type: ipython, press Enter.
02:15 At the prompt, type the following lines:
02:18 from Bio dot Seq module import Seq class.
02:24 from Bio dot SeqRecord module import Sequence Record class
02:31 Next, from Bio dot Alphabet module import generic protein class.
02:38 Next, I will save the sequence record object in a variable record1.
02:45 Copy the sequence, id and description from the text file and paste it in the respective lines on the terminal.
02:56 Press Enter.
02:58 To view the output, type: record1.
03:02 Press Enter.
03:04 The output shows the insulin protein sequence as sequence record object.
03:10 It shows the sequence along with id and description.
03:13 We will use write function to convert the above sequence record object to a FASTA file.
03:21 Import SeqIO module from Bio package.
03:26 Next, type the command line with a write function to convert the sequence object to FASTA file.
03:40 The write function takes 3 arguments.
03:44 The first one is the variable storing the sequence record object.
03:49 The second is the file name to write the FASTA file.
03:54 The third is the file format to write. Press Enter.
03:58 The Output shows one, that is, we have converted one sequence record object to a FASTA file.
04:07 The file in FASTA format is saved in the home folder as "example.fasta".
04:13 Let me warn you,the output will over-write any pre-existing file of the same name.
04:18 To view the file, navigate to the file in the home folder.
04:24 Open this file in a text editor.
04:27 The protein sequence is now in FASTA format.
04:31 Close the text editor.
04:33 Many bioinformatics tools take different input file formats.
04:38 So, sometimes there is a need to inter-convert between sequence file formats.
04:44 We can do file conversions using convert function in SeqIO module.
04:50 For demonstration, I will convert a GenBank file to a FASTA file.
04:55 Have a GenBank file in my home folder.
04:59 Let me open this in a text editor.
05:02 The file contains HIV genome in GenBank format.
05:07 This GenBank file has descriptions of all the genes in the genome, in the first part of the file.
05:14 It is followed by a complete genome sequence.
05:18 Close the text editor. Type the following lines on the terminal.
05:23 Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter.
05:33 The new file in the FASTA format is now saved as HIV.fasta in the home folder.
05:39 Navigate to the file and open in the text editor.
05:46 Close the text editor.
05:49 Even though we can convert the file formats easily using convert function, it has limitations.
05:56 Writing some formats requires information which other file formats don’t contain.
06:02 For example: We can convert a GenBank file to a FASTA file, we can't do the reverse.
06:09 Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse.
06:15 For more information regarding convert function, type the help command.
06:21 Press Enter.
06:24 Press 'q' on the key board to get back to the prompt.
06:28 We can also extract individual genes from the HIV genome in GenBank format.
06:35 These individual genes can be saved in FASTA or any other formats.
06:41 For this, type the following code at the prompt.
06:47 This code will write all individual CDS gene sequences, their ids and name of the gene in a file.
06:56 The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter.
07:07 Using Biopython tools, we can sort the records in a file by length.
07:12 Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.
07:19 Each record is of a different length.
07:23 Type the following lines to arrange the longest record first.
07:27 The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder.
07:38 For short records first, reverse the arguments in the records.sort command line.
07:45 Let's summarize.In this tutorial, we have learnt :* to create Sequence Record Objects
07:51 Write sequence files using write function of Sequence Input/Output module.
07:58 Convert between sequence file formats using convert function.
08:03 And, sort records in a file by length.
08:07 For the assignment:
08:09 Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV.
08:21 The file “HIV.gb” is included in the code files of this tutorial.
08:28 Your completed assignment will have the following code.
08:43 The video at the following link summarizes the Spoken Tutorial project.
08:48 Please download and watch it. The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
08:57 For more details, please write to us.
09:00 The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
09:06 More information on this mission is available at the link shown.
09:10 This is Snehalatha from IIT Bombay, signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Pratik kamble, Priyacst, Sandhya.np14