Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"

Revision as of 23:55, 2 August 2016

Time	Narration
00:01	Hello everyone.
00:02	Welcome to this tutorial on Writing Sequence Files.
00:07	In this tutorial, we will learn: * How to create Sequence Record Objects
00:13	* Write sequences files
00:15	* Convert between file formats
00:19	* And, sort records in a file by length.
00:23	To follow this tutorial, you should be familiar with
00:27	undergraduate Biochemistry or Bioinformatics
00:31	and basic Python programming.
00:34	Refer to the Python tutorials at the given link.
00:38	To record this tutorial, I am using: * Ubuntu OS version 14.10
00:45	* Python version 2.7.8
00:48	* Ipython interpretor version 2.3.0 and * Biopython version 1.64.
00:55	We have earlier learnt about parse and read functions to read contents of a file.
01:03	In this tutorial, we will learn how to use write function to write sequences to a file.
01:09	And, use Convert function for inter-conversion between various file formats.
01:16	Let me now demonstrate how to use write function.
01:20	Here is a text file with a protein sequence.
01:24	The sequence shown here is insulin protein.
01:28	The file also has information such as GI accession number and also description.
01:36	We will now create a file for this sequence in FASTA format.
01:41	The first step is to create sequence record object.
01:45	More information about Sequence Record Objects:
01:49	It is the basic data type for the sequence input/output interface.
01:55	In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions.
02:04	Open the terminal by pressing Ctrl, Alt and t keys simultaneously .
02:10	At the prompt, type: ipython, press Enter.
02:15	At the prompt, type the following lines:
02:18	from Bio dot Seq module import Seq class.
02:24	from Bio dot SeqRecord module import Sequence Record class
02:31	Next from Bio dot Alphabet module import generic protein class
02:38	Next, I will save the sequence record object in a variable record1.
02:45	Copy the sequence, id and description from the text file and paste in the respective lines on the terminal.
02:56	Press Enter.
02:58	To view the output, type: record1.
03:02	Press Enter.
03:04	The output shows the insulin protein sequence as sequence record object.
03:10	It shows the sequence along with id and description.
03:13	We will use write function to convert the above sequence record object to a FASTA file.
03:21	Import SeqIO module from Bio package.
03:26	Next, type the command line with a write function to convert the sequence object to FASTA file.
03:40	The write function takes 3 arguments.
03:44	The first one is the variable storing the sequence record object.
03:49	The second is the file name to write the FASTA file.
03:54	The third is the file format to write. Press Enter.
03:58	The Output shows “one”, that is, we have converted one sequence record object to a FASTA file.
04:07	The file in FASTA format is saved in the home folder as "example.fasta".
04:13	Let me warn you,
04:14	the output will over-write any pre existing file of the same name.
04:18	To view the file, navigate to the file in the home folder.
04:24	Open this file in a text editor.
04:27	The protein sequence is now in FASTA format.
04:31	Close the text editor.
04:33	Many bioinformatics tools take different input file formats.
04:38	So, sometimes there is a need to inter-convert between sequence file formats.
04:44	We can do file conversions using convert function in SeqIO module.
04:50	For demonstration, I will convert a GenBank file to a FASTA file.
04:55	Have a GenBank file in my home folder.
04:59	Let me open this in a text editor.
05:02	The file contains HIV genome in GenBank format.
05:07	This GenBank file has descriptions of all the genes in the genome, in the first part of the file.
05:14	It is followed by a complete genome sequence.
05:18	Close the text editor.
05:19	Type the following lines on the terminal.
05:23	Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter.
05:33	The new file in FASTA format is now saved as HIV.fasta in the home folder.
05:39	Navigate to the file and open in the text editor.
05:46	Close the text editor.
05:49	Even though we can convert the file formats easily using convert function, it has limitations.
05:56	Writing some formats requires information which other file formats don’t contain.
06:02	For example: We can convert a GenBank file to a FASTA file, we can't do the reverse.
06:09	Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse.
06:15	For more information regarding convert function, type the help command.
06:21	Press Enter.
06:24	Press 'q' on the key board to get back to the prompt.
06:28	We can also extract individual genes from the HIV genome in GenBank format.
06:35	These individual genes can be saved in FASTA or any other formats.
06:41	For this, type the following code at the prompt.
06:47	This code will write all individual CDS gene sequences , their ids and name of the gene in a file.
06:56	The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter.
07:07	Using Biopython tools, we can sort the records in a file by length.
07:12	Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.
07:19	Each record is of a different length.
07:23	Type the following lines to arrange the longest record first.
07:27	The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder.
07:38	For short records first, reverse the arguments in the records.sort command line.
07:45	Let's summarize.
07:46	In this tutorial, we have learnt :* to create Sequence Record Objects.
07:51	* Write sequence files using write function of Sequence Input/Output module.
07:58	* Convert between sequence file formats using convert function.
08:03	* And, sort records in a file by length.
08:07	For the assignment:
08:09	Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV.
08:21	The file “HIV.gb” is included in code files of this tutorial.
08:28	Your completed assignment will have the following code.
08:43	The video at the following link summarizes the Spoken Tutorial project.
08:48	Please download and watch it.
08:49	The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
08:57	For more details, please write to us.
09:00	The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
09:06	More information on this mission is available at the link shown.
09:10	This is Snehalatha from IIT Bombay, signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Pratik kamble, Priyacst, Sandhya.np14

Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"

Revision as of 23:55, 2 August 2016

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools

@@ Line 14: / Line 14: @@
 |-
 | 00:07
-| In this tutorial, we will learn:  How to create Sequence Record Objects.
+| In this tutorial, we will learn: * How to create Sequence Record Objects
 |-
 | 00:13
-| Write sequences files.
+|* Write sequences files
 |-
 | 00:15
-| Convert between file formats.
+|* Convert between file formats
 |-
 | 00:19
-|  And sort records in a file by length.
+|*  And, sort records in a file by length.
 |-
 | 00:23
-| To follow this tutorial you should be familiar with,
+| To follow this tutorial, you should be familiar with
 |-
 | 00:27
-|  Undergraduate '''Biochemistry''' or '''Bioinformatics'''
+|  undergraduate Biochemistry or Bioinformatics
 |-
 | 00:31
-| And basic '''Python''' programming
+| and basic '''Python''' programming.
 |-
@@ Line 46: / Line 46: @@
 |-
 | 00:38
-|  To record this tutorial I am using '''Ubuntu OS''' version. 14.10
+|  To record this tutorial, I am using: * '''Ubuntu OS''' version 14.10
 |-
 | 00:45
-|'''Python''' version 2.7.8
+|* '''Python''' version 2.7.8
 |-
 | 00:48
-| '''Ipython interpretor''' version 2.3.0 and '''Biopython''' version 1.64
+|* '''Ipython interpretor''' version 2.3.0 and * '''Biopython''' version 1.64.
 |-
 | 00:55
-|We have earlier learnt about '''parse''' and '''read''' functions to read contents of a file.
+|We have earlier learnt about '''parse''' and '''read''' '''function'''s to read contents of a file.
 |-
 |01:03
-|In this tutorial we will learn how to use '''write''' function to write sequences to a file.
+|In this tutorial, we will learn how to use '''write''' function to write sequences to a file.
 |-
 | 01:09
-|And use '''Convert''' function for interconversion between various file formats.
+|And, use '''Convert''' function for inter-conversion between various '''file format'''s.
 |-
@@ Line 81: / Line 81: @@
 |-
 |01:28
-|The file also has information such as '''GI''' accession number and also description.
+|The file also has information such as '''GI accession number''' and also description.
 |-
@@ Line 89: / Line 89: @@
 |-
 |01:41
-|The first step is to create sequence record object.
+|The first step is to create '''sequence record object'''.
 |-
@@ Line 97: / Line 97: @@
 |-
 |01:49
-|It is the basic data type for the sequence input/output interface.
+|It is the basic data type for the '''sequence input/output interface'''.
 |-
 |01:55
-|In sequence record object, a sequence is associated with higher level features: such as identifiers and descriptions.
+|In sequence record object, a sequence is associated with higher level features such as '''identifier'''s and descriptions.
 |-
 |02:04
-|Open the terminal by pressing ctrl, alt and t keys simultaneously .
+|Open the terminal by pressing '''Ctrl, Alt''' and '''t''' keys simultaneously .
 |-
 | 02:10
-|At the prompt type '''ipython''', Press Enter.
+|At the prompt, type: '''ipython''', press '''Enter'''.
 |-
 |02:15
-|At the prompt type the following lines:
+|At the prompt, type the following lines:
 |-
@@ Line 129: / Line 129: @@
 |-
 |02:38
-|Next I will save the sequence record object in a variable '''record1.'''
+|Next, I will save the sequence record object in a variable '''record1.'''
 |-
 |02:45
-|Copy the sequence, id and description from the text file and paste in the respective lines on the terminal.
+|'''Copy''' the sequence, id and description from the text file and '''paste''' in the respective lines on the terminal.
 |-
@@ Line 141: / Line 141: @@
 |-
 | 02:58
-|To view the output, type, '''record1'''.
+|To view the output, type: '''record1'''.
 |-
@@ Line 157: / Line 157: @@
 |-
 | 03:13
-|We will use write function to convert the above sequence record object to a '''FASTA''' file.
+|We will use '''write''' function to convert the above sequence record object to a '''FASTA''' file.
 |-
@@ Line 165: / Line 165: @@
 |-
 |03:26
-|Next type the command line with a '''write''' function to convert the sequence object to '''FASTA''' file.
+|Next, type the '''command line''' with a '''write''' function to convert the sequence object to '''FASTA''' file.
 |-
 |03:40
-|The '''write '''function takes 3 arguments
+|The '''write '''function takes 3 '''argument'''s.
 |-
@@ Line 177: / Line 177: @@
 |-
 |03:49
-|The Second is the file name to write the '''FASTA''' file.
+|The second is the file name to write the '''FASTA''' file.
 |-
 |03:54
-|The Third is the file format to write. Press '''enter.'''
+|The third is the file format to write. Press '''Enter'''.
 |-
 | 03:58
-|The Output shows “one”, that is we have converted one sequence record object to a '''FASTA''' file.
+|The Output shows “one”, that is, we have converted one '''sequence record object''' to a '''FASTA''' file.
 |-
 |04:07
-|The file in '''FASTA''' format is saved in the home folder as '''"example.fasta".'''
+|The file in '''FASTA''' format is saved in the '''home''' folder as "example.fasta".
 |-
 |04:13
-|Let me warn you:
+|Let me warn you,
 |-
 |04:14
-|The output will over-write any pre-existing file of the same name.
+|the output will over-write any pre existing file of the same name.
 |-
 | 04:18
@@ Line 221: / Line 221: @@
 |-
 |04:38
-|So sometimes there is a need to interconvert between sequence file formats.
+|So, sometimes there is a need to inter-convert between sequence file formats.
 |-
@@ Line 229: / Line 229: @@
 |-
 | 04:50
-|For demonstration I will convert a '''GenBank''' file to a '''FASTA''' file.
+|For demonstration, I will convert a '''GenBank''' file to a '''FASTA''' file.
 |-
 |04:55
-|Have a '''GenBank''' file in my home folder.
+|Have a '''GenBank''' file in my '''home''' folder.
 |-
@@ Line 245: / Line 245: @@
 |-
 |05:07
-|This '''GenBank''' file has descriptions of all the '''genes''' in the '''genome''' in the first part of the file.
+|This '''GenBank''' file has descriptions of all the '''genes''' in the '''genome''', in the first part of the file.
 |-
@@ Line 260: / Line 260: @@
 |-
 |05:23
-|Here the '''convert''' function converts the complete '''genome''' sequence present in the '''GenBank''' file to '''FASTA''' file. Press '''enter'''
+|Here the '''convert''' function converts the complete '''genome''' sequence present in the '''GenBank''' file to '''FASTA''' file. Press '''Enter'''.
 |-
 |05:33
-|The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the home folder.
+|The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder.
 |-
@@ Line 288: / Line 288: @@
 |-
 |06:09
-|Similarly we can turn a '''FASTQ''' file into a '''FASTA''' file, but can’t do the reverse.
+|Similarly, we can turn a '''FASTQ''' file into a '''FASTA''' file but can’t do the reverse.
 |-
 |06:15
-|For more information regarding convert function, type the '''help''' command.
+|For more information regarding '''convert''' function, type the '''help''' command.
 |-
 |06:21
-|Press '''enter'''.
+|Press '''Enter'''.
 |-
 |06:24
-|Press “'''q'''” on the key board to get back to the prompt.
+|Press 'q' on the key board to get back to the prompt.
 |-
@@ Line 312: / Line 312: @@
 |-
 |06:41
-| For this type the following code at the prompt.
+| For this, type the following code at the '''prompt'''.
 |-
 |06:47
-|This code will write all individual '''CDS''' gene sequences , their ids and name of the gene in a file.
+|This code will write all individual '''CDS''' gene sequences , their ids and name of the '''gene''' in a file.
 |-
 |06:56
-|The file is saved as “'''HIV_geneseq.fasta'''” in your '''home''' folder. Press '''enter'''
+|The file is saved as “'''HIV_geneseq.fasta'''” in your '''home''' folder. Press '''Enter'''.
 |-
 | 07:07
-|Using '''Biopython''' tools we can sort the records in a file by length.
+|Using '''Biopython''' tools, we can sort the records in a file by length.
 |-
 |07:12
-|Here I have opened a FASTA file “'''hemoglobin.fasta'''” which has six records.
+|Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.
 |-
 |07:19
-|Each record is of a different length.
+|Each '''record''' is of a different length.
 |-
@@ Line 340: / Line 340: @@
 |-
 |07:27
-|The new file with the sorted sequences will be saved as''' "sorted_hemoglobin.fasta" '''in your '''home''' folder
+|The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your '''home''' folder.
 |-
 |07:38
-|For Short records first, reverse the arguments in the '''records.sort''' command line.
+|For short records first, reverse the arguments in the '''records.sort''' command line.
 |-
 | 07:45
-| Lets Summarize,
+| Let's summarize.
 |-
 |07:46
-|In this tutorial we have learnt : to create Sequence Record Objects.
+|In this tutorial, we have learnt :* to create Sequence Record Objects.
 |-
 |07:51
-|Write sequence files using''' write''' function of Sequence Input/Output module.
+|* Write sequence files using''' write''' function of Sequence Input/Output module.
 |-
 |07:58
-|Convert between sequence file formats using '''convert''' function.
+|* Convert between '''sequence file format'''s using '''convert''' function.
 |-
 |08:03
-|And sort records in a file by length.
+|* And, sort records in a file by length.
 |-
 | 08:07
-|For the Assignment:
+|For the assignment:
 |-
 |08:09
-|Extract the gene "'''HIV1gp3'''" at positions 4587 to 5165 from the '''genomic''' sequence of HIV.
+|'''Extract''' the gene "HIV1gp3" at positions 4587 to 5165 from the '''genomic''' sequence of HIV.
 |-
 |08:21
-|The file “'''HIV.gb'''” is included in code files of this tutorial.
+|The file “HIV.gb” is included in code files of this tutorial.
 |-
@@ Line 391: / Line 391: @@
 |-
 |08:49
-|The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.
+|The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
 |-
@@ Line 403: / Line 403: @@
 |-
 |09:06
-|More information on this Mission is available at the link shown.
+|More information on this mission is available at the link shown.
 |-
 | 09:10
-| This is Snehalatha from IIT Bombay signing off. Thank you for joining.
+| This is Snehalatha from '''IIT Bombay''', signing off. Thank you for joining.
 |}