Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"

Revision as of 16:26, 4 August 2016

Time	Narration
00:01	Hello everyone.
00:02	Welcome to this tutorial on Writing Sequence Files.
00:07	In this tutorial, we will learn: * How to create Sequence Record Objects
00:13	* Write sequences files
00:15	* Convert between file formats
00:19	* And, sort records in a file by length.
00:23	To follow this tutorial, you should be familiar with
00:27	undergraduate Biochemistry or Bioinformatics
00:31	and basic Python programming.
00:34	Refer to the Python tutorials at the given link.
00:38	To record this tutorial, I am using: * Ubuntu OS version 14.10
00:45	* Python version 2.7.8
00:48	* Ipython interpretor version 2.3.0 and * Biopython version 1.64.
00:55	We have earlier learnt about parse and read functions to read contents of a file.
01:03	In this tutorial, we will learn how to use write function to write sequences to a file.
01:09	And, use Convert function for inter-conversion between various file formats.
01:16	Let me now demonstrate how to use write function.
01:20	Here is a text file with a protein sequence.
01:24	The sequence shown here is insulin protein.
01:28	The file also has information such as GI accession number and also description.
01:36	We will now create a file for this sequence in FASTA format.
01:41	The first step is to create sequence record object.
01:45	More information about Sequence Record Objects:
01:49	It is the basic data type for the sequence input/output interface.
01:55	In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions.
02:04	Open the terminal by pressing Ctrl, Alt and t keys simultaneously .
02:10	At the prompt, type: ipython, press Enter.
02:15	At the prompt, type the following lines:
02:18	from Bio dot Seq module import Seq class.
02:24	from Bio dot SeqRecord module import Sequence Record class
02:31	Next, from Bio dot Alphabet module import generic protein class.
02:38	Next, I will save the sequence record object in a variable record1.
02:45	Copy the sequence, id and description from the text file and paste it in the respective lines on the terminal.
02:56	Press Enter.
02:58	To view the output, type: record1.
03:02	Press Enter.
03:04	The output shows the insulin protein sequence as sequence record object.
03:10	It shows the sequence along with id and description.
03:13	We will use write function to convert the above sequence record object to a FASTA file.
03:21	Import SeqIO module from Bio package.
03:26	Next, type the command line with a write function to convert the sequence object to FASTA file.
03:40	The write function takes 3 arguments.
03:44	The first one is the variable storing the sequence record object.
03:49	The second is the file name to write the FASTA file.
03:54	The third is the file format to write. Press Enter.
03:58	The Output shows one, that is, we have converted one sequence record object to a FASTA file.
04:07	The file in FASTA format is saved in the home folder as "example.fasta".
04:13	Let me warn you,
04:14	the output will over-write any pre-existing file of the same name.
04:18	To view the file, navigate to the file in the home folder.
04:24	Open this file in a text editor.
04:27	The protein sequence is now in FASTA format.
04:31	Close the text editor.
04:33	Many bioinformatics tools take different input file formats.
04:38	So, sometimes there is a need to inter-convert between sequence file formats.
04:44	We can do file conversions using convert function in SeqIO module.
04:50	For demonstration, I will convert a GenBank file to a FASTA file.
04:55	Have a GenBank file in my home folder.
04:59	Let me open this in a text editor.
05:02	The file contains HIV genome in GenBank format.
05:07	This GenBank file has descriptions of all the genes in the genome, in the first part of the file.
05:14	It is followed by a complete genome sequence.
05:18	Close the text editor.
05:19	Type the following lines on the terminal.
05:23	Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter.
05:33	The new file in the FASTA format is now saved as HIV.fasta in the home folder.
05:39	Navigate to the file and open in the text editor.
05:46	Close the text editor.
05:49	Even though we can convert the file formats easily using convert function, it has limitations.
05:56	Writing some formats requires information which other file formats don’t contain.
06:02	For example: We can convert a GenBank file to a FASTA file, we can't do the reverse.
06:09	Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse.
06:15	For more information regarding convert function, type the help command.
06:21	Press Enter.
06:24	Press 'q' on the key board to get back to the prompt.
06:28	We can also extract individual genes from the HIV genome in GenBank format.
06:35	These individual genes can be saved in FASTA or any other formats.
06:41	For this, type the following code at the prompt.
06:47	This code will write all individual CDS gene sequences, their ids and name of the gene in a file.
06:56	The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter.
07:07	Using Biopython tools, we can sort the records in a file by length.
07:12	Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.
07:19	Each record is of a different length.
07:23	Type the following lines to arrange the longest record first.
07:27	The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder.
07:38	For short records first, reverse the arguments in the records.sort command line.
07:45	Let's summarize.
07:46	In this tutorial, we have learnt :* to create Sequence Record Objects
07:51	* Write sequence files using write function of Sequence Input/Output module.
07:58	* Convert between sequence file formats using convert function.
08:03	* And, sort records in a file by length.
08:07	For the assignment:
08:09	Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV.
08:21	The file “HIV.gb” is included in the code files of this tutorial.
08:28	Your completed assignment will have the following code.
08:43	The video at the following link summarizes the Spoken Tutorial project.
08:48	Please download and watch it.
08:49	The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
08:57	For more details, please write to us.
09:00	The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
09:06	More information on this mission is available at the link shown.
09:10	This is Snehalatha from IIT Bombay, signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Pratik kamble, Priyacst, Sandhya.np14

@@ Line 14: / Line 14: @@
 |-
 | 00:07
-| In this tutorial, we will learn: * How to create Sequence Record Objects
+| In this tutorial, we will learn: * How to create '''Sequence Record Objects'''
 |-
@@ Line 22: / Line 22: @@
 |-
 | 00:15
-|* Convert between file formats
+|* Convert between '''file format'''s
 |-
 | 00:19
-|*  And, sort records in a file by length.
+|*  And, sort '''record'''s in a file by length.
 |-
@@ Line 81: / Line 81: @@
 |-
 |01:28
-|The file also has information such as '''GI accession number''' and also description.
+|The file also has information such as '''GI accession number''' and also '''description'''.
 |-
@@ Line 125: / Line 125: @@
 |-
 |02:31
-| Next '''from Bio dot Alphabet module import generic protein class'''
+| Next, '''from Bio dot Alphabet module import generic protein class'''.
 |-
@@ Line 133: / Line 133: @@
 |-
 |02:45
-|'''Copy''' the sequence, id and description from the text file and '''paste''' in the respective lines on the terminal.
+|'''Copy''' the '''sequence, id''' and '''description''' from the text file and '''paste''' it in the respective lines on the terminal.
 |-
@@ Line 153: / Line 153: @@
 |-
 |03:10
-|It shows the sequence along with id and description.
+|It shows the sequence along with '''id''' and '''description'''.
 |-
@@ Line 173: / Line 173: @@
 |-
 |03:44
-|The first one is the variable storing the sequence record object.
+|The first one is the variable storing the '''sequence record object'''.
 |-
@@ Line 185: / Line 185: @@
 |-
 | 03:58
-|The Output shows “one”, that is, we have converted one '''sequence record object''' to a '''FASTA''' file.
+|The Output shows one, that is, we have converted one '''sequence record object''' to a '''FASTA''' file.
 |-
@@ Line 197: / Line 197: @@
 |-
 |04:14
-|the output will over-write any pre existing file of the same name.
+|the output will over-write any pre-existing file of the same name.
 |-
@@ Line 264: / Line 264: @@
 |-
 |05:33
-|The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder.
+|The new file in the '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder.
 |-
@@ Line 316: / Line 316: @@
 |-
 |06:47
-|This code will write all individual '''CDS''' gene sequences , their ids and name of the '''gene''' in a file.
+|This code will write all individual '''CDS''' gene sequences, their ids and name of the '''gene''' in a file.
 |-
 |06:56
-|The file is saved as “'''HIV_geneseq.fasta'''” in your '''home''' folder. Press '''Enter'''.
+|The file is saved as “HIV_geneseq.fasta” in your '''home''' folder. Press '''Enter'''.
 |-
@@ Line 351: / Line 351: @@
 |-
 |07:46
-|In this tutorial, we have learnt :* to create Sequence Record Objects.
+|In this tutorial, we have learnt :* to create Sequence Record Objects
 |-
 |07:51
-|* Write sequence files using''' write''' function of Sequence Input/Output module.
+|* Write sequence files using''' write''' function of '''Sequence Input/Output''' module.
 |-
@@ Line 375: / Line 375: @@
 |-
 |08:21
-|The file “HIV.gb” is included in code files of this tutorial.
+|The file “HIV.gb” is included in the code files of this tutorial.
 |-

Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"

Revision as of 16:26, 4 August 2016

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools