Biopython/C2/Blast/English

Visual Cue	Narration
Slide Number 1 Title Slide	Welcome to this tutorial on BLAST using Biopython tools.
Slide Number 2 Learning Objectives	In this tutorial, we will learn to Run BLAST for the query sequence using Biopython tools And parse the BLAST output for further analysis.
Slide Number 3 Pre-requisites	To follow this tutorial you should be familiar with, Undergraduate Biochemistry or Bioinformatics And basic Python programming Refer to the Python tutorials at the given link.
Slide Number 4 System Requirement	To record this tutorial, I am using, Ubuntu OS version. 14.10 Python version 2.7.8 Ipython interpretor version 2.3.0 Biopython version 1.64 And a working Internet connection
Slide Number 5 About BLAST	BLAST is the acronym for Basic Local Alignment Search Tool. It is an algorithm for comparing sequence information.
Slide Number 6 About BLAST	The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches.
Slide Number 7 About BLAST	There are two different ways to run BLAST': Local BLAST on your machine. Run BLAST over Internet through NCBI servers.
Slide Number 8 About BLAST	Running BLAST in Biopython has two steps. First, run BLAST for your query sequence and get some output. Second, parse the BLAST output for further analysis.
Cursor on Slide Number 8	We will open the terminal and run BLAST for a nucleotide sequence. Open the terminal by pressing Ctrl, Alt and T keys simultaneously.
Cursor on the terminal. Type, ipython and press enter.	At the prompt type ipython and press Enter.
At the prompt type: >>> from Bio.Blast import NCBIWWW Press enter.	In this tutorial I will demonstrate how to run BLAST over internet using NCBI BLAST service. Type the following at the prompt: Import NCBIWWW module from Bio.Blast package. Press Enter.
Type, >>> result = NCBIWWW.qblast("blastn", "nt", "186429")	Next to run the BLAST over internet: Type the following at the prompt, We will use qblast function in the NCBIWWW module.
Cursor on the terminal. Highlight the arguments. >>> result = NCBIWWW.qblast("blastn", "nt", "186429")	The qblast function takes three arguments: The first argument is the blast program to use for the search. Second specifies the databases to search against. The third argument is a your query sequence.
Cursor on the terminal.	The input for the query sequence can be in the form of GI number or a FASTA file. Or it can also be a sequence record object.
Highlight the query.	For this demonstration, I am using the GI number for a nucleotide sequence. The GI number is for a nucleotide sequence of insulin.
Slide Number 9 qblast function.	The qblast function also takes a number of other option arguments. These arguments are analogous to the different parameters you can set on the BLAST web page. The qblast function will return the BLAST results in xml format.
Cursor on the terminal.	Back to the terminal. We have to use the appropriate Blast program, depending on whether our query is a nucleotide or protein sequence. Since our query is a nucleotide, we will use blastn program. And nt refers to the nucleotide database.
Slide Number 10 NCBI BLAST website	Details about this are available at the NCBI BLAST webpage. http://blast.ncbi.nlm.nih.gov/Blast.cgi
Cursor on the terminal.	The blast output is stored in the variable result in the form of an xml file. Press Enter.
Cursor on the terminal.	Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search.
Type, >>> save_file = open("blast.xml", "w") >>> save_file.write(result.read()) >>> save_file.close() >>> result.close() Press enter	It is important to save the xml file on the disk before processing further. Type the following lines to save the xml file. These lines of code will save the search result as blast.xml in the home folder.
	Navigate to your home folder and locate the file. Click on the file and check the contents of the file.
Open the text file. from Bio.Blast import NCBIWWW fasta_string = open("insulin.fasta").read() result = NCBIWWW.qblast("blastn", "nt", fasta_string) save_file = open("blast.xml", "w") save_file.write(result.read()) save_file.close() result.close() result= open("blast.xml")	Use the code in this text file if you want to use a FASTA file as a query.
Open the text file. from Bio.Blast import NCBIWWW from Bio import SeqIO record = SeqIO.read("insulin.fasta", format="fasta") result = NCBIWWW.qblast("blastn", "nt", record.seq) save_file = open("blast.xml", "w") save_file.write(result.read()) save_file.close() result.close() result= open("blast.xml")	Here is the code if you want to use sequence record object from a FASTA file as a query.
Type, >>>result = open("my_blast.xml") press Enter.	Back to the terminal. The next step is to parse the file to extract data. First step in parsing is to open the xml file for input. Type the following at the prompt: Press Enter.
>>> from Bio.Blast import NCBIXML press enter.	Next import the module NCBIXML from Bio.Blast package. Press Enter.
Type, >>>records = NCBIXM.parse(result) >>>blast_record = records.next() press enter	Type the following lines to parse the Blast output. A BLAST record contains all the information you want to extract from the BLAST output.
Cursor on the terminal.	Let us print out some information about all hits in our blast report greater than a particular threshold. Type the following For a match to be significant, expect score should be less than 0.01.
Type the following code. for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.expect <0.01: print('Alignment') print('sequence:', alignment.title) print('length:', alignment.length) print('score:', hsp.score) print('gaps:', hsp.gaps) print('e value:', hsp.expect) print(hsp.query[0:90] + '...') print(hsp.match[0:90] + '...') print(hsp.sbjct[0:90] + '…') Press Enter key twice.	For each hsp that is high scoring pair, we get the title, length, gaps and expect value. We also print strings containing the query (query), the aligned database sequence (sbjct) and string specifying the match and mismatch positions (match). Press Enter key twice to get output.
Highlight the output.	Observe the output: For each alignment we have length, score, gaps, evalue and strings.
Cursor on the terminal.	You can extract the required information using other functions available in Bio.Blast package. We have come to the end of this tutorial.
Slide Number 11 Summary	Let's summarize, In this tutorial we have learnt to, Run BLAST for the query nucleotide sequence using GI number. And parse the BLAST output file using Bio.Blast.Record module.
Slide Number 12 Assignment (use blastp) result = NCBIWWW.qblast("blastp", "nr", 386828) non-redundant protein database (nr).	For the assignment, Run BLAST Search for a protein sequence of your choice. Save the output file and parse the data contained in the file. Your completed assignment should have the following lines of code : As shown in this text file. Since our query is protein sequence, here we have used blastp program. And nr non-redundant protein database for the BLAST search.
Slide Number 13 Acknowledgement	The video at the following link summarizes the Spoken Tutorial project. Please download and watch it.
Slide Number 14	The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test. For more details, please write to us.
Slide number 15	Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. More information on this Mission is available at the link shown.
Slide number 15	This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

Nancyvarkey, Snehalathak

Biopython/C2/Blast/English

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools