Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English-timed"
From Script | Spoken-Tutorial
PoojaMoolya (Talk | contribs) (Created page with " {|Border=1 ! <center>Time</center> ! <center>Narration</center> |- | 00:01 | Hello everyone. |- | 00:02 | Welcome to this tutorial on '''Writing Sequence Files'''. |- | 00...") |
Sandhya.np14 (Talk | contribs) |
||
Line 14: | Line 14: | ||
|- | |- | ||
| 00:07 | | 00:07 | ||
− | | In this tutorial, we will learn: | + | | In this tutorial, we will learn: * How to create Sequence Record Objects |
|- | |- | ||
| 00:13 | | 00:13 | ||
− | | Write sequences files | + | |* Write sequences files |
|- | |- | ||
| 00:15 | | 00:15 | ||
− | | Convert between file formats | + | |* Convert between file formats |
|- | |- | ||
| 00:19 | | 00:19 | ||
− | | And sort records in a file by length. | + | |* And, sort records in a file by length. |
|- | |- | ||
| 00:23 | | 00:23 | ||
− | | To follow this tutorial you should be familiar with | + | | To follow this tutorial, you should be familiar with |
|- | |- | ||
| 00:27 | | 00:27 | ||
− | | | + | | undergraduate Biochemistry or Bioinformatics |
|- | |- | ||
| 00:31 | | 00:31 | ||
− | | | + | | and basic '''Python''' programming. |
|- | |- | ||
Line 46: | Line 46: | ||
|- | |- | ||
| 00:38 | | 00:38 | ||
− | | To record this tutorial I am using '''Ubuntu OS''' version | + | | To record this tutorial, I am using: * '''Ubuntu OS''' version 14.10 |
|- | |- | ||
| 00:45 | | 00:45 | ||
− | |'''Python''' version 2.7.8 | + | |* '''Python''' version 2.7.8 |
|- | |- | ||
| 00:48 | | 00:48 | ||
− | | '''Ipython interpretor''' version 2.3.0 and '''Biopython''' version 1.64 | + | |* '''Ipython interpretor''' version 2.3.0 and * '''Biopython''' version 1.64. |
|- | |- | ||
| 00:55 | | 00:55 | ||
− | |We have earlier learnt about '''parse''' and '''read''' | + | |We have earlier learnt about '''parse''' and '''read''' '''function'''s to read contents of a file. |
|- | |- | ||
|01:03 | |01:03 | ||
− | |In this tutorial we will learn how to use '''write''' function to write sequences to a file. | + | |In this tutorial, we will learn how to use '''write''' function to write sequences to a file. |
|- | |- | ||
| 01:09 | | 01:09 | ||
− | |And use '''Convert''' function for | + | |And, use '''Convert''' function for inter-conversion between various '''file format'''s. |
|- | |- | ||
Line 81: | Line 81: | ||
|- | |- | ||
|01:28 | |01:28 | ||
− | |The file also has information such as '''GI''' | + | |The file also has information such as '''GI accession number''' and also description. |
|- | |- | ||
Line 89: | Line 89: | ||
|- | |- | ||
|01:41 | |01:41 | ||
− | |The first step is to create sequence record object. | + | |The first step is to create '''sequence record object'''. |
|- | |- | ||
Line 97: | Line 97: | ||
|- | |- | ||
|01:49 | |01:49 | ||
− | |It is the basic data type for the sequence input/output interface. | + | |It is the basic data type for the '''sequence input/output interface'''. |
|- | |- | ||
|01:55 | |01:55 | ||
− | |In sequence record object, a sequence is associated with higher level features | + | |In sequence record object, a sequence is associated with higher level features such as '''identifier'''s and descriptions. |
|- | |- | ||
|02:04 | |02:04 | ||
− | |Open the terminal by pressing | + | |Open the terminal by pressing '''Ctrl, Alt''' and '''t''' keys simultaneously . |
|- | |- | ||
| 02:10 | | 02:10 | ||
− | |At the prompt type '''ipython''', | + | |At the prompt, type: '''ipython''', press '''Enter'''. |
|- | |- | ||
|02:15 | |02:15 | ||
− | |At the prompt type the following lines: | + | |At the prompt, type the following lines: |
|- | |- | ||
Line 129: | Line 129: | ||
|- | |- | ||
|02:38 | |02:38 | ||
− | |Next I will save the sequence record object in a variable '''record1.''' | + | |Next, I will save the sequence record object in a variable '''record1.''' |
|- | |- | ||
|02:45 | |02:45 | ||
− | |Copy the sequence, id and description from the text file and paste in the respective lines on the terminal. | + | |'''Copy''' the sequence, id and description from the text file and '''paste''' in the respective lines on the terminal. |
|- | |- | ||
Line 141: | Line 141: | ||
|- | |- | ||
| 02:58 | | 02:58 | ||
− | |To view the output, type | + | |To view the output, type: '''record1'''. |
|- | |- | ||
Line 157: | Line 157: | ||
|- | |- | ||
| 03:13 | | 03:13 | ||
− | |We will use write function to convert the above sequence record object to a '''FASTA''' file. | + | |We will use '''write''' function to convert the above sequence record object to a '''FASTA''' file. |
|- | |- | ||
Line 165: | Line 165: | ||
|- | |- | ||
|03:26 | |03:26 | ||
− | |Next type the command line with a '''write''' function to convert the sequence object to '''FASTA''' file. | + | |Next, type the '''command line''' with a '''write''' function to convert the sequence object to '''FASTA''' file. |
|- | |- | ||
|03:40 | |03:40 | ||
− | |The '''write '''function takes 3 | + | |The '''write '''function takes 3 '''argument'''s. |
|- | |- | ||
Line 177: | Line 177: | ||
|- | |- | ||
|03:49 | |03:49 | ||
− | |The | + | |The second is the file name to write the '''FASTA''' file. |
|- | |- | ||
|03:54 | |03:54 | ||
− | |The | + | |The third is the file format to write. Press '''Enter'''. |
|- | |- | ||
| 03:58 | | 03:58 | ||
− | |The Output shows “one”, that is we have converted one sequence record object to a '''FASTA''' file. | + | |The Output shows “one”, that is, we have converted one '''sequence record object''' to a '''FASTA''' file. |
|- | |- | ||
|04:07 | |04:07 | ||
− | |The file in '''FASTA''' format is saved in the home | + | |The file in '''FASTA''' format is saved in the '''home''' folder as "example.fasta". |
|- | |- | ||
|04:13 | |04:13 | ||
− | |Let me warn you | + | |Let me warn you, |
|- | |- | ||
|04:14 | |04:14 | ||
− | | | + | |the output will over-write any pre existing file of the same name. |
− | + | ||
|- | |- | ||
| 04:18 | | 04:18 | ||
Line 221: | Line 221: | ||
|- | |- | ||
|04:38 | |04:38 | ||
− | |So sometimes there is a need to | + | |So, sometimes there is a need to inter-convert between sequence file formats. |
|- | |- | ||
Line 229: | Line 229: | ||
|- | |- | ||
| 04:50 | | 04:50 | ||
− | |For demonstration I will convert a '''GenBank''' file to a '''FASTA''' file. | + | |For demonstration, I will convert a '''GenBank''' file to a '''FASTA''' file. |
|- | |- | ||
|04:55 | |04:55 | ||
− | |Have a '''GenBank''' file in my home folder. | + | |Have a '''GenBank''' file in my '''home''' folder. |
|- | |- | ||
Line 245: | Line 245: | ||
|- | |- | ||
|05:07 | |05:07 | ||
− | |This '''GenBank''' file has descriptions of all the '''genes''' in the '''genome''' in the first part of the file. | + | |This '''GenBank''' file has descriptions of all the '''genes''' in the '''genome''', in the first part of the file. |
|- | |- | ||
Line 260: | Line 260: | ||
|- | |- | ||
|05:23 | |05:23 | ||
− | |Here the '''convert''' function converts the complete '''genome''' sequence present in the '''GenBank''' file to '''FASTA''' file. Press ''' | + | |Here the '''convert''' function converts the complete '''genome''' sequence present in the '''GenBank''' file to '''FASTA''' file. Press '''Enter'''. |
|- | |- | ||
|05:33 | |05:33 | ||
− | |The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the home folder. | + | |The new file in '''FASTA''' format is now saved as '''HIV.fasta''' in the '''home''' folder. |
|- | |- | ||
Line 288: | Line 288: | ||
|- | |- | ||
|06:09 | |06:09 | ||
− | |Similarly we can turn a '''FASTQ''' file into a '''FASTA''' file | + | |Similarly, we can turn a '''FASTQ''' file into a '''FASTA''' file but can’t do the reverse. |
|- | |- | ||
|06:15 | |06:15 | ||
− | |For more information regarding convert function, type the '''help''' command. | + | |For more information regarding '''convert''' function, type the '''help''' command. |
|- | |- | ||
|06:21 | |06:21 | ||
− | |Press ''' | + | |Press '''Enter'''. |
|- | |- | ||
|06:24 | |06:24 | ||
− | |Press | + | |Press 'q' on the key board to get back to the prompt. |
|- | |- | ||
Line 312: | Line 312: | ||
|- | |- | ||
|06:41 | |06:41 | ||
− | | For this type the following code at the prompt. | + | | For this, type the following code at the '''prompt'''. |
|- | |- | ||
|06:47 | |06:47 | ||
− | |This code will write all individual '''CDS''' gene sequences , their ids and name of the gene in a file. | + | |This code will write all individual '''CDS''' gene sequences , their ids and name of the '''gene''' in a file. |
|- | |- | ||
|06:56 | |06:56 | ||
− | |The file is saved as “'''HIV_geneseq.fasta'''” in your '''home''' folder. Press ''' | + | |The file is saved as “'''HIV_geneseq.fasta'''” in your '''home''' folder. Press '''Enter'''. |
|- | |- | ||
| 07:07 | | 07:07 | ||
− | |Using '''Biopython''' tools we can sort the records in a file by length. | + | |Using '''Biopython''' tools, we can sort the records in a file by length. |
|- | |- | ||
|07:12 | |07:12 | ||
− | |Here I have opened a FASTA file | + | |Here, I have opened a FASTA file “hemoglobin.fasta” which has six records. |
|- | |- | ||
|07:19 | |07:19 | ||
− | |Each record is of a different length. | + | |Each '''record''' is of a different length. |
|- | |- | ||
Line 340: | Line 340: | ||
|- | |- | ||
|07:27 | |07:27 | ||
− | |The new file with the sorted sequences will be saved as | + | |The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your '''home''' folder. |
|- | |- | ||
|07:38 | |07:38 | ||
− | |For | + | |For short records first, reverse the arguments in the '''records.sort''' command line. |
|- | |- | ||
| 07:45 | | 07:45 | ||
− | | | + | | Let's summarize. |
|- | |- | ||
|07:46 | |07:46 | ||
− | |In this tutorial we have learnt : to create Sequence Record Objects. | + | |In this tutorial, we have learnt :* to create Sequence Record Objects. |
|- | |- | ||
|07:51 | |07:51 | ||
− | |Write sequence files using''' write''' function of Sequence Input/Output module. | + | |* Write sequence files using''' write''' function of Sequence Input/Output module. |
|- | |- | ||
|07:58 | |07:58 | ||
− | |Convert between sequence file | + | |* Convert between '''sequence file format'''s using '''convert''' function. |
|- | |- | ||
|08:03 | |08:03 | ||
− | |And sort records in a file by length. | + | |* And, sort records in a file by length. |
|- | |- | ||
| 08:07 | | 08:07 | ||
− | |For the | + | |For the assignment: |
|- | |- | ||
|08:09 | |08:09 | ||
− | | | + | |'''Extract''' the gene "HIV1gp3" at positions 4587 to 5165 from the '''genomic''' sequence of HIV. |
|- | |- | ||
|08:21 | |08:21 | ||
− | |The file | + | |The file “HIV.gb” is included in code files of this tutorial. |
|- | |- | ||
Line 391: | Line 391: | ||
|- | |- | ||
|08:49 | |08:49 | ||
− | |The Spoken Tutorial Project | + | |The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
|- | |- | ||
Line 403: | Line 403: | ||
|- | |- | ||
|09:06 | |09:06 | ||
− | |More information on this | + | |More information on this mission is available at the link shown. |
|- | |- | ||
| 09:10 | | 09:10 | ||
− | | This is Snehalatha from IIT Bombay signing off. Thank you for joining. | + | | This is Snehalatha from '''IIT Bombay''', signing off. Thank you for joining. |
|} | |} |
Revision as of 23:55, 2 August 2016
|
|
---|---|
00:01 | Hello everyone. |
00:02 | Welcome to this tutorial on Writing Sequence Files. |
00:07 | In this tutorial, we will learn: * How to create Sequence Record Objects |
00:13 | * Write sequences files |
00:15 | * Convert between file formats |
00:19 | * And, sort records in a file by length. |
00:23 | To follow this tutorial, you should be familiar with |
00:27 | undergraduate Biochemistry or Bioinformatics |
00:31 | and basic Python programming. |
00:34 | Refer to the Python tutorials at the given link. |
00:38 | To record this tutorial, I am using: * Ubuntu OS version 14.10 |
00:45 | * Python version 2.7.8 |
00:48 | * Ipython interpretor version 2.3.0 and * Biopython version 1.64. |
00:55 | We have earlier learnt about parse and read functions to read contents of a file. |
01:03 | In this tutorial, we will learn how to use write function to write sequences to a file. |
01:09 | And, use Convert function for inter-conversion between various file formats. |
01:16 | Let me now demonstrate how to use write function. |
01:20 | Here is a text file with a protein sequence. |
01:24 | The sequence shown here is insulin protein. |
01:28 | The file also has information such as GI accession number and also description. |
01:36 | We will now create a file for this sequence in FASTA format. |
01:41 | The first step is to create sequence record object. |
01:45 | More information about Sequence Record Objects: |
01:49 | It is the basic data type for the sequence input/output interface. |
01:55 | In sequence record object, a sequence is associated with higher level features such as identifiers and descriptions. |
02:04 | Open the terminal by pressing Ctrl, Alt and t keys simultaneously . |
02:10 | At the prompt, type: ipython, press Enter. |
02:15 | At the prompt, type the following lines: |
02:18 | from Bio dot Seq module import Seq class. |
02:24 | from Bio dot SeqRecord module import Sequence Record class |
02:31 | Next from Bio dot Alphabet module import generic protein class |
02:38 | Next, I will save the sequence record object in a variable record1. |
02:45 | Copy the sequence, id and description from the text file and paste in the respective lines on the terminal. |
02:56 | Press Enter. |
02:58 | To view the output, type: record1. |
03:02 | Press Enter. |
03:04 | The output shows the insulin protein sequence as sequence record object. |
03:10 | It shows the sequence along with id and description. |
03:13 | We will use write function to convert the above sequence record object to a FASTA file. |
03:21 | Import SeqIO module from Bio package. |
03:26 | Next, type the command line with a write function to convert the sequence object to FASTA file. |
03:40 | The write function takes 3 arguments. |
03:44 | The first one is the variable storing the sequence record object. |
03:49 | The second is the file name to write the FASTA file. |
03:54 | The third is the file format to write. Press Enter. |
03:58 | The Output shows “one”, that is, we have converted one sequence record object to a FASTA file. |
04:07 | The file in FASTA format is saved in the home folder as "example.fasta". |
04:13 | Let me warn you, |
04:14 | the output will over-write any pre existing file of the same name. |
04:18 | To view the file, navigate to the file in the home folder. |
04:24 | Open this file in a text editor. |
04:27 | The protein sequence is now in FASTA format. |
04:31 | Close the text editor. |
04:33 | Many bioinformatics tools take different input file formats. |
04:38 | So, sometimes there is a need to inter-convert between sequence file formats. |
04:44 | We can do file conversions using convert function in SeqIO module. |
04:50 | For demonstration, I will convert a GenBank file to a FASTA file. |
04:55 | Have a GenBank file in my home folder. |
04:59 | Let me open this in a text editor. |
05:02 | The file contains HIV genome in GenBank format. |
05:07 | This GenBank file has descriptions of all the genes in the genome, in the first part of the file. |
05:14 | It is followed by a complete genome sequence. |
05:18 | Close the text editor. |
05:19 | Type the following lines on the terminal. |
05:23 | Here the convert function converts the complete genome sequence present in the GenBank file to FASTA file. Press Enter. |
05:33 | The new file in FASTA format is now saved as HIV.fasta in the home folder. |
05:39 | Navigate to the file and open in the text editor. |
05:46 | Close the text editor. |
05:49 | Even though we can convert the file formats easily using convert function, it has limitations. |
05:56 | Writing some formats requires information which other file formats don’t contain. |
06:02 | For example: We can convert a GenBank file to a FASTA file, we can't do the reverse. |
06:09 | Similarly, we can turn a FASTQ file into a FASTA file but can’t do the reverse. |
06:15 | For more information regarding convert function, type the help command. |
06:21 | Press Enter. |
06:24 | Press 'q' on the key board to get back to the prompt. |
06:28 | We can also extract individual genes from the HIV genome in GenBank format. |
06:35 | These individual genes can be saved in FASTA or any other formats. |
06:41 | For this, type the following code at the prompt. |
06:47 | This code will write all individual CDS gene sequences , their ids and name of the gene in a file. |
06:56 | The file is saved as “HIV_geneseq.fasta” in your home folder. Press Enter. |
07:07 | Using Biopython tools, we can sort the records in a file by length. |
07:12 | Here, I have opened a FASTA file “hemoglobin.fasta” which has six records. |
07:19 | Each record is of a different length. |
07:23 | Type the following lines to arrange the longest record first. |
07:27 | The new file with the sorted sequences will be saved as "sorted_hemoglobin.fasta" in your home folder. |
07:38 | For short records first, reverse the arguments in the records.sort command line. |
07:45 | Let's summarize. |
07:46 | In this tutorial, we have learnt :* to create Sequence Record Objects. |
07:51 | * Write sequence files using write function of Sequence Input/Output module. |
07:58 | * Convert between sequence file formats using convert function. |
08:03 | * And, sort records in a file by length. |
08:07 | For the assignment: |
08:09 | Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV. |
08:21 | The file “HIV.gb” is included in code files of this tutorial. |
08:28 | Your completed assignment will have the following code. |
08:43 | The video at the following link summarizes the Spoken Tutorial project. |
08:48 | Please download and watch it. |
08:49 | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
08:57 | For more details, please write to us. |
09:00 | The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
09:06 | More information on this mission is available at the link shown. |
09:10 | This is Snehalatha from IIT Bombay, signing off. Thank you for joining. |