Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"
From Script | Spoken-Tutorial
Sandhya.np14 (Talk | contribs) |
|||
(3 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
{|Border=1 | {|Border=1 | ||
− | + | |'''Time''' | |
− | + | |'''Narration''' | |
|- | |- | ||
| 00:01 | | 00:01 | ||
− | | Hello everyone. | + | | Hello everyone.Welcome to this tutorial on '''Writing Sequence Files'''. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
| 00:07 | | 00:07 | ||
− | | In this tutorial, we will learn: * How to create Sequence Record Objects | + | | In this tutorial, we will learn: * How to create '''Sequence Record Objects''' |
|- | |- | ||
| 00:13 | | 00:13 | ||
− | | | + | | Write sequences files |
|- | |- | ||
| 00:15 | | 00:15 | ||
− | | | + | | Convert between '''file format'''s |
|- | |- | ||
| 00:19 | | 00:19 | ||
− | | | + | | And, sort '''record'''s in a file by length. |
|- | |- | ||
Line 50: | Line 45: | ||
|- | |- | ||
| 00:45 | | 00:45 | ||
− | | | + | | '''Python''' version 2.7.8 |
+ | |||
|- | |- | ||
| 00:48 | | 00:48 | ||
− | | | + | | '''Ipython interpretor''' version 2.3.0 and * '''Biopython''' version 1.64. |
|- | |- | ||
Line 81: | Line 77: | ||
|- | |- | ||
|01:28 | |01:28 | ||
− | |The file also has information such as '''GI accession number''' and also description. | + | |The file also has information such as '''GI accession number''' and also '''description'''. |
|- | |- | ||
Line 125: | Line 121: | ||
|- | |- | ||
|02:31 | |02:31 | ||
− | | Next '''from Bio dot Alphabet module import generic protein class''' | + | | Next, '''from Bio dot Alphabet module import generic protein class'''. |
|- | |- | ||
Line 133: | Line 129: | ||
|- | |- | ||
|02:45 | |02:45 | ||
− | |'''Copy''' the sequence, id and description from the text file and '''paste''' in the respective lines on the terminal. | + | |'''Copy''' the '''sequence, id''' and '''description''' from the text file and '''paste''' it in the respective lines on the terminal. |
|- | |- | ||
Line 153: | Line 149: | ||
|- | |- | ||
|03:10 | |03:10 | ||
− | |It shows the sequence along with id and description. | + | |It shows the sequence along with '''id''' and '''description'''. |
|- | |- | ||
Line 173: | Line 169: | ||
|- | |- | ||
|03:44 | |03:44 | ||
− | |The first one is the variable storing the sequence record object. | + | |The first one is the variable storing the '''sequence record object'''. |
|- | |- | ||
Line 185: | Line 181: | ||
|- | |- | ||
| 03:58 | | 03:58 | ||
− | |The Output shows | + | |The Output shows one, that is, we have converted one '''sequence record object''' to a '''FASTA''' file. |
|- | |- | ||
Line 193: | Line 189: | ||
|- | |- | ||
|04:13 | |04:13 | ||
− | |Let me warn you, | + | |Let me warn you,the output will over-write any pre-existing file of the same name. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
Line 250: | Line 242: | ||
|05:14 | |05:14 | ||
|It is followed by a complete '''genome''' sequence. | |It is followed by a complete '''genome''' sequence. | ||
+ | |||
|- | |- | ||
|05:18 | |05:18 | ||
− | |Close the text editor. | + | |Close the text editor. Type the following lines on the terminal. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
Line 264: | Line 253: | ||
|- | |- | ||
|05:33 | |05:33 | ||
− | |The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder. | + | |The new file in the '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder. |
|- | |- | ||
Line 316: | Line 305: | ||
|- | |- | ||
|06:47 | |06:47 | ||
− | |This code will write all individual '''CDS''' gene sequences , their ids and name of the '''gene''' in a file. | + | |This code will write all individual '''CDS''' gene sequences, their ids and name of the '''gene''' in a file. |
|- | |- | ||
|06:56 | |06:56 | ||
− | |The file is saved as | + | |The file is saved as “HIV_geneseq.fasta” in your '''home''' folder. Press '''Enter'''. |
|- | |- | ||
Line 348: | Line 337: | ||
|- | |- | ||
| 07:45 | | 07:45 | ||
− | | Let's summarize. | + | | Let's summarize.In this tutorial, we have learnt :* to create Sequence Record Objects |
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
|07:51 | |07:51 | ||
− | | | + | | Write sequence files using''' write''' function of '''Sequence Input/Output''' module. |
|- | |- | ||
|07:58 | |07:58 | ||
− | | | + | | Convert between '''sequence file format'''s using '''convert''' function. |
|- | |- | ||
|08:03 | |08:03 | ||
− | | | + | | And, sort records in a file by length. |
|- | |- | ||
Line 375: | Line 361: | ||
|- | |- | ||
|08:21 | |08:21 | ||
− | |The file “HIV.gb” is included in code files of this tutorial. | + | |The file “HIV.gb” is included in the code files of this tutorial. |
|- | |- | ||
Line 387: | Line 373: | ||
|- | |- | ||
|08:48 | |08:48 | ||
− | |Please download and watch it. | + | |Please download and watch it. The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- |
Latest revision as of 18:27, 23 March 2017
Time | Narration |
00:01 | Hello everyone.Welcome to this tutorial on Writing Sequence Files. |
00:07 | In this tutorial, we will learn: * How to create Sequence Record Objects |
00:13 | Write sequences files |
00:15 | Convert between file formats |
00:19 | And, sort records in a file by length. |
00:23 | To follow this tutorial, you should be familiar with |
00:27 | undergraduate Biochemistry or Bioinformatics |
00:31 | and basic Python programming. |
00:34 | Refer to the Python tutorials at the given link. |
00:38 | To record this tutorial, I am using: * Ubuntu OS version 14.10 |
00:45 | Python version 2.7.8 |
00:48 | Ipython interpretor version 2.3.0 and * Biopython version 1.64. |
00:55 | We have earlier learnt about parse and read functions to read contents of a file. |
01:03 | In this tutorial, we will learn how to use write function to write sequences to a file. |
01:09 | And, use Convert function for inter-conversion between various file formats. |
01:16 | Let me now demonstrate how to use write function. |
01:20 | Here is a text file with a protein sequence. |
01:24 | The sequence shown here is insulin protein. |
01:28 | The file also has information such as GI accession number and also description. |
01:36 | We will now create a file for this sequence in FASTA format. |
01:41 | The first step is to create sequence record object. |
01:45 | More information about Sequence Record Objects: |
01:49 | It is the basic data type for the sequence input/output interface. |
01:55 | In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions. |
02:04 | Open the terminal by pressing Ctrl, Alt and t keys simultaneously . |
02:10 | At the prompt, type: ipython, press Enter. |
02:15 | At the prompt, type the following lines: |
02:18 | from Bio dot Seq module import Seq class. |
02:24 | from Bio dot SeqRecord module import Sequence Record class |
02:31 | Next, from Bio dot Alphabet module import generic protein class. |
02:38 | Next, I will save the sequence record object in a variable record1. |
02:45 | Copy the sequence, id and description from the text file and paste it in the respective lines on the terminal. |
02:56 | Press Enter. |
02:58 | To view the output, type: record1. |
03:02 | Press Enter. |
03:04 | The output shows the insulin protein sequence as sequence record object. |
03:10 | It shows the sequence along with id and description. |
03:13 | We will use write function to convert the above sequence record object to a FASTA file. |
03:21 | Import SeqIO module from Bio package. |
03:26 | Next, type the command line with a write function to convert the sequence object to FASTA file. |
03:40 | The write function takes 3 arguments. |
03:44 | The first one is the variable storing the sequence record object. |
03:49 | The second is the file name to write the FASTA file. |
03:54 | The third is the file format to write. Press Enter. |
03:58 | The Output shows one, that is, we have converted one sequence record object to a FASTA file. |
04:07 | The file in FASTA format is saved in the home folder as "example.fasta". |
04:13 | Let me warn you,the output will over-write any pre-existing file of the same name. |
04:18 | To view the file, navigate to the file in the home folder. |
04:24 | Open this file in a text editor. |
04:27 | The protein sequence is now in FASTA format. |
04:31 | Close the text editor. |
04:33 | Many bioinformatics tools take different input file formats. |
04:38 | So, sometimes there is a need to inter-convert between sequence file formats. |
04:44 | We can do file conversions using convert function in SeqIO module. |
04:50 | For demonstration, I will convert a GenBank file to a FASTA file. |
04:55 | Have a GenBank file in my home folder. |
04:59 | Let me open this in a text editor. |
05:02 | The file contains HIV genome in GenBank format. |
05:07 | This GenBank file has descriptions of all the genes in the genome, in the first part of the file. |
05:14 | It is followed by a complete genome sequence. |
05:18 | Close the text editor. Type the following lines on the terminal. |
05:23 | Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter. |
05:33 | The new file in the FASTA format is now saved as HIV.fasta in the home folder. |
05:39 | Navigate to the file and open in the text editor. |
05:46 | Close the text editor. |
05:49 | Even though we can convert the file formats easily using convert function, it has limitations. |
05:56 | Writing some formats requires information which other file formats don’t contain. |
06:02 | For example: We can convert a GenBank file to a FASTA file, we can't do the reverse. |
06:09 | Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse. |
06:15 | For more information regarding convert function, type the help command. |
06:21 | Press Enter. |
06:24 | Press 'q' on the key board to get back to the prompt. |
06:28 | We can also extract individual genes from the HIV genome in GenBank format. |
06:35 | These individual genes can be saved in FASTA or any other formats. |
06:41 | For this, type the following code at the prompt. |
06:47 | This code will write all individual CDS gene sequences, their ids and name of the gene in a file. |
06:56 | The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter. |
07:07 | Using Biopython tools, we can sort the records in a file by length. |
07:12 | Here, I have opened a FASTA file “hemoglobin.fasta” which has six records. |
07:19 | Each record is of a different length. |
07:23 | Type the following lines to arrange the longest record first. |
07:27 | The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder. |
07:38 | For short records first, reverse the arguments in the records.sort command line. |
07:45 | Let's summarize.In this tutorial, we have learnt :* to create Sequence Record Objects |
07:51 | Write sequence files using write function of Sequence Input/Output module. |
07:58 | Convert between sequence file formats using convert function. |
08:03 | And, sort records in a file by length. |
08:07 | For the assignment: |
08:09 | Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV. |
08:21 | The file “HIV.gb” is included in the code files of this tutorial. |
08:28 | Your completed assignment will have the following code. |
08:43 | The video at the following link summarizes the Spoken Tutorial project. |
08:48 | Please download and watch it. The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
08:57 | For more details, please write to us. |
09:00 | The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
09:06 | More information on this mission is available at the link shown. |
09:10 | This is Snehalatha from IIT Bombay, signing off. Thank you for joining. |