Biopython/C2/Blast/English-timed
From Script | Spoken-Tutorial
Revision as of 12:20, 5 January 2017 by Snehalathak (Talk | contribs)
|
|
---|---|
00:01 | Welcome to this tutorial on BLAST using Biopython tools. |
00:06 | In this tutorial, we will learn: * To run "BLAST" for the query sequence using Biopython tools |
00:13 | * And, parse the BLAST output for further analysis. |
00:17 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
00:24 | and basic Python programming. |
00:27 | Refer to the Python tutorials at the given link. |
00:31 | To record this tutorial, I am using: * Ubuntu Operating System version 14.10 |
00:37 | * Python version 2.7.8 |
00:41 | * Ipython interpretor version 2.3.0 |
00:46 | * Biopython version 1.64 and * a working Internet connection. |
00:52 | BLAST is the acronym for Basic Local Alignment Search Tool. |
00:57 | It is an algorithm for comparing sequence information. |
01:02 | The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches. |
01:14 | There are two different ways to run BLAST: |
01:17 | Local BLAST on your machine or run BLAST over Internet through NCBI servers. |
01:24 | Running BLAST in Biopython has two steps. |
01:28 | First, run BLAST for your query sequence and get some output. |
01:33 | Second, parse the BLAST output for further analysis. |
01:38 | We will open the terminal and run BLAST for a nucleotide sequence. |
01:43 | Open the terminal by pressing Ctrl, Alt and T keys simultaneously. |
01:48 | At the prompt, type: ipython and press Enter. |
01:52 | In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service. |
02:01 | Type the following at the prompt: from Bio.Blast Import NCBIWWW Press Enter. |
02:14 | Next, to run the BLAST over internet, type the following at the prompt.
result= NCBIWWW.qblast("blastn","nt","186429"). |
02:20 | We will use qblast function in the NCBIWWW module. |
02:25 | qblast function takes three arguments: |
02:29 | The first argument is the blast program to use for the search. |
02:33 | Second, specifies the databases to search against. |
02:38 | The third argument is your query sequence. |
02:43 | The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object. |
02:53 | For this demonstration, I am using the GI number for a nucleotide sequence. |
02:58 | The GI number is for a nucleotide sequence of insulin. |
03:03 | The qblast function also takes a number of other option arguments. |
03:09 | These arguments are analogous to the different parameters you can set on the BLAST web page. |
03:15 | The qblast function will return the BLAST results in xml format. |
03:20 | Back to the terminal. |
03:22 | We have to use the appropriate Blast program, |
03:25 | depending on whether our query sequence is a nucleotide or protein sequence. |
03:30 | Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database. |
03:39 | Details about this are available at the NCBI BLAST webpage. |
03:45 | The blast output is stored in the variable result, in the form of an xml file. |
03:51 | Press Enter. |
03:53 | Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search. |
03:59 | It is important to save the xml file on the disk before processing further. |
04:05 | Type the following lines to save the xml file. |
04:09 | These lines of code will save the search result as blast.xml in the home folder. |
04:18 | Navigate to your home folder and locate the file. |
04:21 | Click on the file and check the contents of the file. |
04:30 | Use the code shown in this text file, if you want to use a FASTA file as a query. |
04:36 | Here is the code, if you want to use sequence record object from a FASTA file as a query. |
04:42 | Back to the terminal. |
04:44 | The next step is to parse the file to extract data. |
04:48 | The first step in parsing is to open the xml file for input. |
04:53 | Type the following at the prompt. Press Enter. |
04:57 | Next, import the module NCBIXML from "Bio.Blast" package. |
05:05 | Press Enter. |
05:07 | Type the following lines to parse the Blast output. |
05:11 | A BLAST record contains all the information you want to extract from the BLAST output. |
05:18 | Let us print out some information about all the hits in our blast report greater than a particular threshold. |
05:27 | Type the following code. |
05:30 | For a match to be significant, expect score should be less than 0.01. |
05:37 | For each hsp, that is, high scoring pair, we get the title, length, hsp score, gaps and expect value. |
05:49 | We will also print strings containing the query, the aligned database sequence and string specifying the match and mismatch positions. |
06:02 | Press Enter key twice to get the output. |
06:05 | Observe the output. |
06:09 | For each alignment, we have length, score, gaps, evalue and strings. |
06:16 | You can extract the required information using other functions, available in Bio.Blast package. |
06:24 | We have come to the end of this tutorial. |
06:26 | Let's summarize. |
06:27 | In this tutorial, we have learnt to run BLAST for the query nucleotide sequence using GI number. |
06:36 | And, parse the BLAST output using Bio.Blast.Record module. |
06:43 | For the assignment, run BLAST Search for a protein sequence of your choice. |
06:50 | Save the output file and parse the data contained in the file. |
06:55 | Your completed assignment should have the following lines of code, as shown in this file. |
07:01 | Observe the code. Since our query is protein sequence, we have used blastp program and "nr", that is, non-redundant protein database for the BLAST search. |
07:16 | The video at the following link summarizes the Spoken Tutorial project. |
07:20 | Please download and watch it. |
07:22 | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
07:30 | For more details, please write to us. |
07:33 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
07:40 | More information on this mission is available at the link shown. |
07:45 | This is Snehalatha from IIT Bombay signing off. Thank you for joining. |