Biopython/C2/Blast/English
|
|
---|---|
Slide Number 1
Title Slide |
Welcome to this tutorial on BLAST using Biopython tools. |
Slide Number 2
Learning Objectives |
In this tutorial, we will learn to
|
Slide Number 3
Pre-requisites |
To follow this tutorial you should be familiar with,
Refer to the Python tutorials at the given link. |
Slide Number 4
System Requirement |
To record this tutorial, I am using,
|
Slide Number 5
About BLAST
|
BLAST is the acronym for Basic Local Alignment Search Tool.
|
Slide Number 6
About BLAST |
The program compares nucleotide or protein sequences to sequences in databases
|
Slide Number 7
About BLAST |
There are two different ways to run BLAST:
Local BLAST on your machine.
|
Slide Number 8
About BLAST |
Running BLAST in Biopython has two steps.
|
Cursor on Slide Number 8 | We will open the terminal and run BLAST for a nucleotide sequence.
|
Cursor on the terminal.
Type, ipython and press enter. |
At the prompt type ipython and press enter. |
At the prompt type:
>>> from Bio.Blast import NCBIWWW
|
In this tutorial I will demonstrate how to run BLAST over internet using NCBI BLAST service.
|
Type,
|
Next to run the BLAST over internet:
|
Cursor on the terminal.
|
The qblast function takes three arguments:
|
Cursor on the terminal.
|
The input for the query sequence can be in the form of GI number or a FASTA file.
Or it can also be a sequence record object. |
Highlight the query. | For this demonstration I am using the GI number for a nucleotide sequence.
|
Slide Number 9
qblast function. |
The qblast function also takes a number of other option arguments.
|
Cursor on the terminal. | Back to the terminal.
|
Slide Number 10
|
Details about this are available at the NCBI BLAST webpage.
|
Cursor on the terminal. | The blast output is stored in the variable “result” in the form of an xml file.
Press enter. |
Cursor on the terminal. | Depending upon the speed of your internet, it may take a few minutes to complete the BLAST search. |
Type,
>>> save_file.write(result.read()) >>> save_file.close() >>> result.close()
|
It is important to save the xml file on the disk before processing further.
|
Navigate to your home folder and locate the file.
| |
Open the text file.
fasta_string = open("insulin.fasta").read() result = NCBIWWW.qblast("blastn", "nt", fasta_string) save_file = open("blast.xml", "w") save_file.write(result.read()) save_file.close() result.close() result= open("blast.xml") |
Use the code in this text file if you want to use a FASTA file as a query.
|
Open the text file.
from Bio import SeqIO record = SeqIO.read("insulin.fasta", format="fasta") result = NCBIWWW.qblast("blastn", "nt", record.seq) save_file = open("blast.xml", "w") save_file.write(result.read()) save_file.close() result.close() result= open("blast.xml")
|
Here is the code if you want to use sequence record object from a FASTA file as a query.
|
Type,
>>>result = open("my_blast.xml")
|
Back to the terminal.
|
>>> from Bio.Blast import NCBIXML
|
Next import the module NCBIXML from Bio.Blast package.
|
Type,
>>>blast_record = records.next() press enter |
Type the following lines to parse the Blast output.
|
Cursor on the terminal. | Let us print out some information about all hits in our blast report greater than a particular threshold.
|
Type the following code.
for hsp in alignment.hsps: if hsp.expect <0.01: print('****Alignment****') print('sequence:', alignment.title) print('length:', alignment.length) print('score:', hsp.score) print('gaps:', hsp.gaps) print('e value:', hsp.expect) print(hsp.query[0:90] + '...') print(hsp.match[0:90] + '...') print(hsp.sbjct[0:90] + '…')
|
For each hsp that is high scoring pair, we get the title, length, gaps and expect value.
|
Highlight the output. | Observe the output:
For each alignment we have length, score, gaps, evalue and strings. |
Cursor on the terminal. | You can extract the required information using other functions available in Bio.Blast package.
|
Slide Number 11
Summary
|
Let's summarize,
In this tutorial we have learnt to,
|
Slide Number 12
Assignment
result = NCBIWWW.qblast("blastp", "nr", 386828) non-redundant protein database (nr). |
For the assignment,
As shown in this text file.
|
Slide Number 13
Acknowledgement |
The video at the following link summarizes the Spoken Tutorial project.
Please download and watch it. |
Slide Number 14 | The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.
For more details, please write to us. |
Slide number 15 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
More information on this Mission is available at the link shown. |
Slide number 15 | This is Snehalatha from IIT Bombay signing off. Thank you for joining. |