Biopython/C2/Blast/English-timed

Time	Narration
00:01	Welcome to this tutorial on BLAST using Biopython tools.
00:06	In this tutorial, we will learn: To run "BLAST" for the query sequence using Biopython tools
00:13	And, parse the BLAST output for further analysis.
00:17	To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics
00:24	and basic Python programming.
00:27	Refer to the Python tutorials at the given link.
00:31	To record this tutorial, I am using: * Ubuntu Operating System version 14.10
00:37	Python version 2.7.8
00:41	Ipython interpretor version 2.3.0
00:46	Biopython version 1.64 and * a working Internet connection.
00:52	BLAST is the acronym for Basic Local Alignment Search Tool.
00:57	It is an algorithm for comparing sequence information.
01:02	The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches.
01:14	There are two different ways to run BLAST:
01:17	Local BLAST on your machine or run BLAST over Internet through NCBI servers.
01:24	Running BLAST in Biopython has two steps.
01:28	First, run BLAST for your query sequence and get some output.
01:33	Second, parse the BLAST output for further analysis.
01:38	We will open the terminal and run BLAST for a nucleotide sequence.
01:43	Open the terminal by pressing Ctrl, Alt and T keys simultaneously.
01:48	At the prompt, type: ipython and press Enter.
01:52	In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service.
02:01	Type the following at the prompt: from Bio.Blast Import NCBIWWW Press Enter.
02:14	Next, to run the BLAST over internet, type the following at the prompt.result= NCBIWWW.qblast("blastn","nt","186429").
02:20	We will use qblast function in the NCBIWWW module.
02:25	qblast function takes three arguments:
02:29	The first argument is the blast program to use for the search.
02:33	Second, specifies the databases to search against.
02:38	The third argument is your query sequence.
02:43	The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object.
02:53	For this demonstration, I am using the GI number for a nucleotide sequence.
02:58	The GI number is for a nucleotide sequence of insulin.
03:03	The qblast function also takes a number of other option arguments.
03:09	These arguments are analogous to the different parameters you can set on the BLAST web page.
03:15	The qblast function will return the BLAST results in xml format.
03:20	Back to the terminal.
03:22	We have to use the appropriate Blast program,
03:25	depending on whether our query sequence is a nucleotide or protein sequence.
03:30	Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database.
03:39	Details about this are available at the NCBI BLAST webpage.
03:45	The blast output is stored in the variable result, in the form of an xml file.
03:51	Press Enter.
03:53	Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search.
03:59	It is important to save the xml file on the disk before processing further.
04:05	Type the following lines to save the xml file.
04:09	These lines of code will save the search result as blast.xml in the home folder.
04:18	Navigate to your home folder and locate the file.
04:21	Click on the file and check the contents of the file.
04:30	Use the code shown in this text file, if you want to use a FASTA file as a query.
04:36	Here is the code, if you want to use sequence record object from a FASTA file as a query.
04:42	Back to the terminal.
04:44	The next step is to parse the file to extract data.
04:48	The first step in parsing is to open the xml file for input.
04:53	Type the following at the prompt. Press Enter.
04:57	Next, import the module NCBIXML from "Bio.Blast" package.
05:05	Press Enter.
05:07	Type the following lines to parse the Blast output.
05:11	A BLAST record contains all the information you want to extract from the BLAST output.
05:18	Let us print out some information about all the hits in our blast report greater than a particular threshold.
05:27	Type the following code.
05:30	For a match to be significant, expect score should be less than 0.01.
05:37	For each hsp, that is, high scoring pair, we get the title, length, hsp score, gaps and expect value.
05:49	We will also print strings containing the query, the aligned database sequence and string specifying the match and mismatch positions.
06:02	Press Enter key twice to get the output.
06:05	Observe the output.
06:09	For each alignment, we have length, score, gaps, evalue and strings.
06:16	You can extract the required information using other functions, available in Bio.Blast package.
06:24	We have come to the end of this tutorial.
06:26	Let's summarize.In this tutorial, we have learnt to run BLAST for the query nucleotide sequence using GI number.
06:36	And, parse the BLAST output using Bio.Blast.Record module.
06:43	For the assignment, run BLAST Search for a protein sequence of your choice.
06:50	Save the output file and parse the data contained in the file.
06:55	Your completed assignment should have the following lines of code, as shown in this file.
07:01	Observe the code. Since our query is protein sequence, we have used blastp program and "nr", that is, non-redundant protein database for the BLAST search.
07:16	The video at the following link summarizes the Spoken Tutorial project.
07:20	Please download and watch it.
07:22	The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
07:30	For more details, please write to us.
07:33	Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
07:40	More information on this mission is available at the link shown.
07:45	This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Sandhya.np14, Snehalathak

Biopython/C2/Blast/English-timed

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools