Difference between revisions of "Biopython/C2/Writing-Sequence-Files/English"
PoojaMoolya (Talk | contribs) |
PoojaMoolya (Talk | contribs) |
||
Line 34: | Line 34: | ||
− | * Undergraduate Biochemistry or Bioinformatics | + | * Undergraduate '''Biochemistry''' or '''Bioinformatics''' |
− | * And basic Python programming | + | * And basic '''Python''' programming |
− | Refer to the Python tutorials at the given link. | + | Refer to the '''Python''' tutorials at the given link. |
|- | |- | ||
Line 45: | Line 45: | ||
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000001;border-left:1pt solid #000001;border-right:1pt solid #000001;padding-top:0.097cm;padding-bottom:0.097cm;padding-left:0.062cm;padding-right:0.097cm;"| To record this tutorial I am using | | style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000001;border-left:1pt solid #000001;border-right:1pt solid #000001;padding-top:0.097cm;padding-bottom:0.097cm;padding-left:0.062cm;padding-right:0.097cm;"| To record this tutorial I am using | ||
− | * Ubuntu OS version. 14.10 | + | * '''Ubuntu OS''' version. 14.10 |
− | * Python version 2.7.8 | + | * '''Python''' version 2.7.8 |
− | * Ipython interpretor version 2.3.0 | + | * '''Ipython interpretor''' version 2.3.0 |
− | * And Biopython 1.64 | + | * And '''Biopython''' 1.64 |
Latest revision as of 15:34, 28 September 2015
|
|
---|---|
Slide Number 1
Title Slide |
Hello everyone.
|
Slide Number 2
Learning Objectives |
In this tutorial, we will learn how to,
|
Slide Number 3
Pre-requisites |
To follow this tutorial you should be familiar with,
Refer to the Python tutorials at the given link. |
Slide Number 4
System Requirement |
To record this tutorial I am using
|
Slide Number 5
SeqIO functions |
We have earlier learnt about
parse and read functions to read contents of a file.
|
Navigate to the file, “example-insulin”.
|
Here is a text file with a protein sequence.
The sequence shown here is insulin protein.
|
Slide Number 6
Sequence Record Objects |
More information about Sequence Record Objects:
such as identifiers and descriptions. |
Press ctrl, alt ant t keys simultaneously on the keyboard. | Open the terminal by pressing ctrl, alt and t keys simultaneously . |
Type,
from Bio.SeqRecord import SeqRecord from Bio.Alphabet import generic_protein |
At the prompt type ipython, press enter.
from Bio dot SeqRecord module import Sequence Record class from Bio dot Alphabet module import generic protein class
|
Type,
+ “GPGAGSLQPLALEG
description= “insulin [Homo sapiens]”)
|
Next I will save the sequence record object in a variable record1.
|
Type, record1.
Press enter.
|
To view the output, type, record1.
Press enter.
It shows the sequence along with id and description. |
Type,
|
We will use write function to convert the above sequence object to a FASTA file.
|
Highlight the output.
|
Output shows “one”, that is we have converted one sequence record object to a FASTA file.
The output will over-write any pre-existing file of the same name. |
Navigate to the home folder and click on the file, “my_example.fasta”.
|
To view the file
|
Cursor on the terminal. | Many bioinformatics tools take different input file formats.
|
Navigate to home folder and click on “HIV.gb” .
|
For demonstration I will convert a GenBank file to a FASTA file.
|
Type the following lines on the terminal.
SeqIO.convert("HIV.gb", "genbank", "HIV.fasta", "fasta")
|
Type the following lines on the terminal.
Press enter
|
Navigate to the file and open my_example-2.fasta.
Close the text editor. |
Navigate to the file and open in the text editor.
|
Slide Number 7
Limitations of convert function. |
Even though we can convert the file formats easily using convert function, it has limitations.
|
Cursor on the terminal.
Type >>> from Bio import SeqIO >>> help(SeqIO.convert) |
For more information regarding convert function, type the help command.
|
Press “q” on the key board. | Press “q” on the key board to get back to the prompt. |
Cursor on the terminal.
|
We can also extract individual genes from the HIV genome in GenBank format.
|
Type the following at the prompt:
f = open('HIV_gene.fasta', 'w') for genome in SeqIO.parse('HIV.gb','genbank'): for gene in genome.features: if gene.type == "CDS": gene_seq = gene.extract(genome.seq) gi = str(gene.qualifiers['db_xref']).split(":")[1].split("'")[0] f.write(">GeneId %s %s\n%s\n" % (gi, gene.qualifiers['product'], gene_seq)) f.close()
|
For this type the following code at the prompt.
|
Navigate to home folder and open “hemoglobin.fasta”. | Using Biopython tools we can sort the records in a file by length.
|
At the prompt type,
records = list(SeqIO.parse("hemoglobin.fasta","fasta")) records.sort(cmp=lambda x,y: cmp(len(y),len(x))) SeqIO.write(records, "sorted_hemoglobin.fasta", "fasta")
|
Type the following lines to arrange the longest record first.
|
Cursor on the terminal. | For Short records first, reverse the arguments in the records.sort command line. |
Slide Number 8
Summary |
Lets Summarize,
In this tutorial we have learnt to,
|
Slide Number 9
Assignment
record = SeqIO.read(“HIV.gb”, “genbank”) record sub_record = record [4587:5165] # GI = 19172951, ID 155459, “HIV1gp3” SeqIO.write (sub_record, “sub_record-2.fasta”, “fasta”) |
For Assignment:
Extract the gene "HIV1gp3" at positions 4587 to 5165 from the genomic sequence of HIV. The file “HIV.gb” is included in code files of this tutorial.
|
Slide Number 10
Acknowledgement |
The video at the following link summarizes the Spoken Tutorial project.
Please download and watch it. |
Slide Number 11 | The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.
For more details, please write to us. |
Slide number 12 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
More information on this Mission is available at the link shown. |
Slide number 12 | This is Snehalatha from IIT Bombay signing off. Thank you for joining. |