Biopython/C2/Blast/English-timed

From Script | Spoken-Tutorial
Jump to: navigation, search
Time
Narration
00:01 Welcome to this tutorial on BLAST using Biopython tools.
00:06 In this tutorial, we will learn: To run "BLAST" for the query sequence using Biopython tools
00:13 And, parse the BLAST output for further analysis.
00:17 To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics
00:24 and basic Python programming.
00:27 Refer to the Python tutorials at the given link.
00:31 To record this tutorial, I am using: * Ubuntu Operating System version 14.10
00:37 Python version 2.7.8
00:41 Ipython interpretor version 2.3.0
00:46 Biopython version 1.64 and * a working Internet connection.
00:52 BLAST is the acronym for Basic Local Alignment Search Tool.
00:57 It is an algorithm for comparing sequence information.
01:02 The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches.
01:14 There are two different ways to run BLAST:
01:17 Local BLAST on your machine or run BLAST over Internet through NCBI servers.
01:24 Running BLAST in Biopython has two steps.
01:28 First, run BLAST for your query sequence and get some output.
01:33 Second, parse the BLAST output for further analysis.
01:38 We will open the terminal and run BLAST for a nucleotide sequence.
01:43 Open the terminal by pressing Ctrl, Alt and T keys simultaneously.
01:48 At the prompt, type: ipython and press Enter.
01:52 In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service.
02:01 Type the following at the prompt: from Bio.Blast Import NCBIWWW Press Enter.
02:14 Next, to run the BLAST over internet, type the following at the prompt.result= NCBIWWW.qblast("blastn","nt","186429").
02:20 We will use qblast function in the NCBIWWW module.
02:25 qblast function takes three arguments:
02:29 The first argument is the blast program to use for the search.
02:33 Second, specifies the databases to search against.
02:38 The third argument is your query sequence.
02:43 The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object.
02:53 For this demonstration, I am using the GI number for a nucleotide sequence.
02:58 The GI number is for a nucleotide sequence of insulin.
03:03 The qblast function also takes a number of other option arguments.
03:09 These arguments are analogous to the different parameters you can set on the BLAST web page.
03:15 The qblast function will return the BLAST results in xml format.
03:20 Back to the terminal.
03:22 We have to use the appropriate Blast program,
03:25 depending on whether our query sequence is a nucleotide or protein sequence.
03:30 Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database.
03:39 Details about this are available at the NCBI BLAST webpage.
03:45 The blast output is stored in the variable result, in the form of an xml file.
03:51 Press Enter.
03:53 Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search.
03:59 It is important to save the xml file on the disk before processing further.
04:05 Type the following lines to save the xml file.
04:09 These lines of code will save the search result as blast.xml in the home folder.
04:18 Navigate to your home folder and locate the file.
04:21 Click on the file and check the contents of the file.
04:30 Use the code shown in this text file, if you want to use a FASTA file as a query.
04:36 Here is the code, if you want to use sequence record object from a FASTA file as a query.
04:42 Back to the terminal.
04:44 The next step is to parse the file to extract data.
04:48 The first step in parsing is to open the xml file for input.
04:53 Type the following at the prompt. Press Enter.
04:57 Next, import the module NCBIXML from "Bio.Blast" package.
05:05 Press Enter.
05:07 Type the following lines to parse the Blast output.
05:11 A BLAST record contains all the information you want to extract from the BLAST output.
05:18 Let us print out some information about all the hits in our blast report greater than a particular threshold.
05:27 Type the following code.
05:30 For a match to be significant, expect score should be less than 0.01.
05:37 For each hsp, that is, high scoring pair, we get the title, length, hsp score, gaps and expect value.
05:49 We will also print strings containing the query, the aligned database sequence and string specifying the match and mismatch positions.
06:02 Press Enter key twice to get the output.
06:05 Observe the output.
06:09 For each alignment, we have length, score, gaps, evalue and strings.
06:16 You can extract the required information using other functions, available in Bio.Blast package.
06:24 We have come to the end of this tutorial.
06:26 Let's summarize.In this tutorial, we have learnt to run BLAST for the query nucleotide sequence using GI number.
06:36 And, parse the BLAST output using Bio.Blast.Record module.
06:43 For the assignment, run BLAST Search for a protein sequence of your choice.
06:50 Save the output file and parse the data contained in the file.
06:55 Your completed assignment should have the following lines of code, as shown in this file.
07:01 Observe the code. Since our query is protein sequence, we have used blastp program and "nr", that is, non-redundant protein database for the BLAST search.
07:16 The video at the following link summarizes the Spoken Tutorial project.
07:20 Please download and watch it.
07:22 The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
07:30 For more details, please write to us.
07:33 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
07:40 More information on this mission is available at the link shown.
07:45 This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Sandhya.np14, Snehalathak