Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"
From Script | Spoken-Tutorial
Sandhya.np14 (Talk | contribs) |
Sandhya.np14 (Talk | contribs) |
||
Line 14: | Line 14: | ||
|- | |- | ||
| 00:07 | | 00:07 | ||
− | | In this tutorial, we will learn: * How to create Sequence Record Objects | + | | In this tutorial, we will learn: * How to create '''Sequence Record Objects''' |
|- | |- | ||
Line 22: | Line 22: | ||
|- | |- | ||
| 00:15 | | 00:15 | ||
− | |* Convert between file | + | |* Convert between '''file format'''s |
|- | |- | ||
| 00:19 | | 00:19 | ||
− | |* And, sort | + | |* And, sort '''record'''s in a file by length. |
|- | |- | ||
Line 81: | Line 81: | ||
|- | |- | ||
|01:28 | |01:28 | ||
− | |The file also has information such as '''GI accession number''' and also description. | + | |The file also has information such as '''GI accession number''' and also '''description'''. |
|- | |- | ||
Line 125: | Line 125: | ||
|- | |- | ||
|02:31 | |02:31 | ||
− | | Next '''from Bio dot Alphabet module import generic protein class''' | + | | Next, '''from Bio dot Alphabet module import generic protein class'''. |
|- | |- | ||
Line 133: | Line 133: | ||
|- | |- | ||
|02:45 | |02:45 | ||
− | |'''Copy''' the sequence, id and description from the text file and '''paste''' in the respective lines on the terminal. | + | |'''Copy''' the '''sequence, id''' and '''description''' from the text file and '''paste''' it in the respective lines on the terminal. |
|- | |- | ||
Line 153: | Line 153: | ||
|- | |- | ||
|03:10 | |03:10 | ||
− | |It shows the sequence along with id and description. | + | |It shows the sequence along with '''id''' and '''description'''. |
|- | |- | ||
Line 173: | Line 173: | ||
|- | |- | ||
|03:44 | |03:44 | ||
− | |The first one is the variable storing the sequence record object. | + | |The first one is the variable storing the '''sequence record object'''. |
|- | |- | ||
Line 185: | Line 185: | ||
|- | |- | ||
| 03:58 | | 03:58 | ||
− | |The Output shows | + | |The Output shows one, that is, we have converted one '''sequence record object''' to a '''FASTA''' file. |
|- | |- | ||
Line 197: | Line 197: | ||
|- | |- | ||
|04:14 | |04:14 | ||
− | |the output will over-write any pre existing file of the same name. | + | |the output will over-write any pre-existing file of the same name. |
|- | |- | ||
Line 264: | Line 264: | ||
|- | |- | ||
|05:33 | |05:33 | ||
− | |The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder. | + | |The new file in the '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder. |
|- | |- | ||
Line 316: | Line 316: | ||
|- | |- | ||
|06:47 | |06:47 | ||
− | |This code will write all individual '''CDS''' gene sequences , their ids and name of the '''gene''' in a file. | + | |This code will write all individual '''CDS''' gene sequences, their ids and name of the '''gene''' in a file. |
|- | |- | ||
|06:56 | |06:56 | ||
− | |The file is saved as | + | |The file is saved as “HIV_geneseq.fasta” in your '''home''' folder. Press '''Enter'''. |
|- | |- | ||
Line 351: | Line 351: | ||
|- | |- | ||
|07:46 | |07:46 | ||
− | |In this tutorial, we have learnt :* to create Sequence Record Objects | + | |In this tutorial, we have learnt :* to create Sequence Record Objects |
|- | |- | ||
|07:51 | |07:51 | ||
− | |* Write sequence files using''' write''' function of Sequence Input/Output module. | + | |* Write sequence files using''' write''' function of '''Sequence Input/Output''' module. |
|- | |- | ||
Line 375: | Line 375: | ||
|- | |- | ||
|08:21 | |08:21 | ||
− | |The file “HIV.gb” is included in code files of this tutorial. | + | |The file “HIV.gb” is included in the code files of this tutorial. |
|- | |- |
Revision as of 16:26, 4 August 2016
|
|
---|---|
00:01 | Hello everyone. |
00:02 | Welcome to this tutorial on Writing Sequence Files. |
00:07 | In this tutorial, we will learn: * How to create Sequence Record Objects |
00:13 | * Write sequences files |
00:15 | * Convert between file formats |
00:19 | * And, sort records in a file by length. |
00:23 | To follow this tutorial, you should be familiar with |
00:27 | undergraduate Biochemistry or Bioinformatics |
00:31 | and basic Python programming. |
00:34 | Refer to the Python tutorials at the given link. |
00:38 | To record this tutorial, I am using: * Ubuntu OS version 14.10 |
00:45 | * Python version 2.7.8 |
00:48 | * Ipython interpretor version 2.3.0 and * Biopython version 1.64. |
00:55 | We have earlier learnt about parse and read functions to read contents of a file. |
01:03 | In this tutorial, we will learn how to use write function to write sequences to a file. |
01:09 | And, use Convert function for inter-conversion between various file formats. |
01:16 | Let me now demonstrate how to use write function. |
01:20 | Here is a text file with a protein sequence. |
01:24 | The sequence shown here is insulin protein. |
01:28 | The file also has information such as GI accession number and also description. |
01:36 | We will now create a file for this sequence in FASTA format. |
01:41 | The first step is to create sequence record object. |
01:45 | More information about Sequence Record Objects: |
01:49 | It is the basic data type for the sequence input/output interface. |
01:55 | In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions. |
02:04 | Open the terminal by pressing Ctrl, Alt and t keys simultaneously . |
02:10 | At the prompt, type: ipython, press Enter. |
02:15 | At the prompt, type the following lines: |
02:18 | from Bio dot Seq module import Seq class. |
02:24 | from Bio dot SeqRecord module import Sequence Record class |
02:31 | Next, from Bio dot Alphabet module import generic protein class. |
02:38 | Next, I will save the sequence record object in a variable record1. |
02:45 | Copy the sequence, id and description from the text file and paste it in the respective lines on the terminal. |
02:56 | Press Enter. |
02:58 | To view the output, type: record1. |
03:02 | Press Enter. |
03:04 | The output shows the insulin protein sequence as sequence record object. |
03:10 | It shows the sequence along with id and description. |
03:13 | We will use write function to convert the above sequence record object to a FASTA file. |
03:21 | Import SeqIO module from Bio package. |
03:26 | Next, type the command line with a write function to convert the sequence object to FASTA file. |
03:40 | The write function takes 3 arguments. |
03:44 | The first one is the variable storing the sequence record object. |
03:49 | The second is the file name to write the FASTA file. |
03:54 | The third is the file format to write. Press Enter. |
03:58 | The Output shows one, that is, we have converted one sequence record object to a FASTA file. |
04:07 | The file in FASTA format is saved in the home folder as "example.fasta". |
04:13 | Let me warn you, |
04:14 | the output will over-write any pre-existing file of the same name. |
04:18 | To view the file, navigate to the file in the home folder. |
04:24 | Open this file in a text editor. |
04:27 | The protein sequence is now in FASTA format. |
04:31 | Close the text editor. |
04:33 | Many bioinformatics tools take different input file formats. |
04:38 | So, sometimes there is a need to inter-convert between sequence file formats. |
04:44 | We can do file conversions using convert function in SeqIO module. |
04:50 | For demonstration, I will convert a GenBank file to a FASTA file. |
04:55 | Have a GenBank file in my home folder. |
04:59 | Let me open this in a text editor. |
05:02 | The file contains HIV genome in GenBank format. |
05:07 | This GenBank file has descriptions of all the genes in the genome, in the first part of the file. |
05:14 | It is followed by a complete genome sequence. |
05:18 | Close the text editor. |
05:19 | Type the following lines on the terminal. |
05:23 | Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter. |
05:33 | The new file in the FASTA format is now saved as HIV.fasta in the home folder. |
05:39 | Navigate to the file and open in the text editor. |
05:46 | Close the text editor. |
05:49 | Even though we can convert the file formats easily using convert function, it has limitations. |
05:56 | Writing some formats requires information which other file formats don’t contain. |
06:02 | For example: We can convert a GenBank file to a FASTA file, we can't do the reverse. |
06:09 | Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse. |
06:15 | For more information regarding convert function, type the help command. |
06:21 | Press Enter. |
06:24 | Press 'q' on the key board to get back to the prompt. |
06:28 | We can also extract individual genes from the HIV genome in GenBank format. |
06:35 | These individual genes can be saved in FASTA or any other formats. |
06:41 | For this, type the following code at the prompt. |
06:47 | This code will write all individual CDS gene sequences, their ids and name of the gene in a file. |
06:56 | The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter. |
07:07 | Using Biopython tools, we can sort the records in a file by length. |
07:12 | Here, I have opened a FASTA file “hemoglobin.fasta” which has six records. |
07:19 | Each record is of a different length. |
07:23 | Type the following lines to arrange the longest record first. |
07:27 | The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder. |
07:38 | For short records first, reverse the arguments in the records.sort command line. |
07:45 | Let's summarize. |
07:46 | In this tutorial, we have learnt :* to create Sequence Record Objects |
07:51 | * Write sequence files using write function of Sequence Input/Output module. |
07:58 | * Convert between sequence file formats using convert function. |
08:03 | * And, sort records in a file by length. |
08:07 | For the assignment: |
08:09 | Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV. |
08:21 | The file “HIV.gb” is included in the code files of this tutorial. |
08:28 | Your completed assignment will have the following code. |
08:43 | The video at the following link summarizes the Spoken Tutorial project. |
08:48 | Please download and watch it. |
08:49 | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
08:57 | For more details, please write to us. |
09:00 | The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
09:06 | More information on this mission is available at the link shown. |
09:10 | This is Snehalatha from IIT Bombay, signing off. Thank you for joining. |