Biopython/C2/Writing-Sequence-Files/English-timed

Time	Narration
00:01	Hello everyone.Welcome to this tutorial on Writing Sequence Files.
00:07	In this tutorial, we will learn: * How to create Sequence Record Objects
00:13	Write sequences files
00:15	Convert between file formats
00:19	And, sort records in a file by length.
00:23	To follow this tutorial, you should be familiar with
00:27	undergraduate Biochemistry or Bioinformatics
00:31	and basic Python programming.
00:34	Refer to the Python tutorials at the given link.
00:38	To record this tutorial, I am using: * Ubuntu OS version 14.10
00:45	Python version 2.7.8
00:48	Ipython interpretor version 2.3.0 and * Biopython version 1.64.
00:55	We have earlier learnt about parse and read functions to read contents of a file.
01:03	In this tutorial, we will learn how to use write function to write sequences to a file.
01:09	And, use Convert function for inter-conversion between various file formats.
01:16	Let me now demonstrate how to use write function.
01:20	Here is a text file with a protein sequence.
01:24	The sequence shown here is insulin protein.
01:28	The file also has information such as GI accession number and also description.
01:36	We will now create a file for this sequence in FASTA format.
01:41	The first step is to create sequence record object.
01:45	More information about Sequence Record Objects:
01:49	It is the basic data type for the sequence input/output interface.
01:55	In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions.
02:04	Open the terminal by pressing Ctrl, Alt and t keys simultaneously .
02:10	At the prompt, type: ipython, press Enter.
02:15	At the prompt, type the following lines:
02:18	from Bio dot Seq module import Seq class.
02:24	from Bio dot SeqRecord module import Sequence Record class
02:31	Next, from Bio dot Alphabet module import generic protein class.
02:38	Next, I will save the sequence record object in a variable record1.
02:45	Copy the sequence, id and description from the text file and paste it in the respective lines on the terminal.
02:56	Press Enter.
02:58	To view the output, type: record1.
03:02	Press Enter.
03:04	The output shows the insulin protein sequence as sequence record object.
03:10	It shows the sequence along with id and description.
03:13	We will use write function to convert the above sequence record object to a FASTA file.
03:21	Import SeqIO module from Bio package.
03:26	Next, type the command line with a write function to convert the sequence object to FASTA file.
03:40	The write function takes 3 arguments.
03:44	The first one is the variable storing the sequence record object.
03:49	The second is the file name to write the FASTA file.
03:54	The third is the file format to write. Press Enter.
03:58	The Output shows one, that is, we have converted one sequence record object to a FASTA file.
04:07	The file in FASTA format is saved in the home folder as "example.fasta".
04:13	Let me warn you,the output will over-write any pre-existing file of the same name.
04:18	To view the file, navigate to the file in the home folder.
04:24	Open this file in a text editor.
04:27	The protein sequence is now in FASTA format.
04:31	Close the text editor.
04:33	Many bioinformatics tools take different input file formats.
04:38	So, sometimes there is a need to inter-convert between sequence file formats.
04:44	We can do file conversions using convert function in SeqIO module.
04:50	For demonstration, I will convert a GenBank file to a FASTA file.
04:55	Have a GenBank file in my home folder.
04:59	Let me open this in a text editor.
05:02	The file contains HIV genome in GenBank format.
05:07	This GenBank file has descriptions of all the genes in the genome, in the first part of the file.
05:14	It is followed by a complete genome sequence.
05:18	Close the text editor.
05:19	Type the following lines on the terminal.
05:23	Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter.
05:33	The new file in the FASTA format is now saved as HIV.fasta in the home folder.
05:39	Navigate to the file and open in the text editor.
05:46	Close the text editor.
05:49	Even though we can convert the file formats easily using convert function, it has limitations.
05:56	Writing some formats requires information which other file formats don’t contain.
06:02	For example: We can convert a GenBank file to a FASTA file, we can't do the reverse.
06:09	Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse.
06:15	For more information regarding convert function, type the help command.
06:21	Press Enter.
06:24	Press 'q' on the key board to get back to the prompt.
06:28	We can also extract individual genes from the HIV genome in GenBank format.
06:35	These individual genes can be saved in FASTA or any other formats.
06:41	For this, type the following code at the prompt.
06:47	This code will write all individual CDS gene sequences, their ids and name of the gene in a file.
06:56	The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter.
07:07	Using Biopython tools, we can sort the records in a file by length.
07:12	Here, I have opened a FASTA file “hemoglobin.fasta” which has six records.
07:19	Each record is of a different length.
07:23	Type the following lines to arrange the longest record first.
07:27	The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder.
07:38	For short records first, reverse the arguments in the records.sort command line.
07:45	Let's summarize.In this tutorial, we have learnt :* to create Sequence Record Objects
07:51	Write sequence files using write function of Sequence Input/Output module.
07:58	Convert between sequence file formats using convert function.
08:03	And, sort records in a file by length.
08:07	For the assignment:
08:09	Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV.
08:21	The file “HIV.gb” is included in the code files of this tutorial.
08:28	Your completed assignment will have the following code.
08:43	The video at the following link summarizes the Spoken Tutorial project.
08:48	Please download and watch it. The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
08:57	For more details, please write to us.
09:00	The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
09:06	More information on this mission is available at the link shown.
09:10	This is Snehalatha from IIT Bombay, signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Pratik kamble, Priyacst, Sandhya.np14

Biopython/C2/Writing-Sequence-Files/English-timed

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools