Biopython/C2/Blast/English-timed
From Script | Spoken-Tutorial
| |
|
|---|---|
| 00:01 | Welcome to this tutorial on BLAST using Biopython tools. |
| 00:06 | In this tutorial, we will learn: To run "BLAST" for the query sequence using Biopython tools |
| 00:13 | And, parse the BLAST output for further analysis. |
| 00:17 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
| 00:24 | and basic Python programming. |
| 00:27 | Refer to the Python tutorials at the given link. |
| 00:31 | To record this tutorial, I am using: * Ubuntu Operating System version 14.10 |
| 00:37 | Python version 2.7.8 |
| 00:41 | Ipython interpretor version 2.3.0 |
| 00:46 | Biopython version 1.64 and * a working Internet connection. |
| 00:52 | BLAST is the acronym for Basic Local Alignment Search Tool. |
| 00:57 | It is an algorithm for comparing sequence information. |
| 01:02 | The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches. |
| 01:14 | There are two different ways to run BLAST: |
| 01:17 | Local BLAST on your machine or run BLAST over Internet through NCBI servers. |
| 01:24 | Running BLAST in Biopython has two steps. |
| 01:28 | First, run BLAST for your query sequence and get some output. |
| 01:33 | Second, parse the BLAST output for further analysis. |
| 01:38 | We will open the terminal and run BLAST for a nucleotide sequence. |
| 01:43 | Open the terminal by pressing Ctrl, Alt and T keys simultaneously. |
| 01:48 | At the prompt, type: ipython and press Enter. |
| 01:52 | In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service. |
| 02:01 | Type the following at the prompt: from Bio.Blast Import NCBIWWW Press Enter. |
| 02:14 | Next, to run the BLAST over internet, type the following at the prompt.result= NCBIWWW.qblast("blastn","nt","186429"). |
| 02:20 | We will use qblast function in the NCBIWWW module. |
| 02:25 | qblast function takes three arguments: |
| 02:29 | The first argument is the blast program to use for the search. |
| 02:33 | Second, specifies the databases to search against. |
| 02:38 | The third argument is your query sequence. |
| 02:43 | The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object. |
| 02:53 | For this demonstration, I am using the GI number for a nucleotide sequence. |
| 02:58 | The GI number is for a nucleotide sequence of insulin. |
| 03:03 | The qblast function also takes a number of other option arguments. |
| 03:09 | These arguments are analogous to the different parameters you can set on the BLAST web page. |
| 03:15 | The qblast function will return the BLAST results in xml format. |
| 03:20 | Back to the terminal. |
| 03:22 | We have to use the appropriate Blast program, |
| 03:25 | depending on whether our query sequence is a nucleotide or protein sequence. |
| 03:30 | Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database. |
| 03:39 | Details about this are available at the NCBI BLAST webpage. |
| 03:45 | The blast output is stored in the variable result, in the form of an xml file. |
| 03:51 | Press Enter. |
| 03:53 | Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search. |
| 03:59 | It is important to save the xml file on the disk before processing further. |
| 04:05 | Type the following lines to save the xml file. |
| 04:09 | These lines of code will save the search result as blast.xml in the home folder. |
| 04:18 | Navigate to your home folder and locate the file. |
| 04:21 | Click on the file and check the contents of the file. |
| 04:30 | Use the code shown in this text file, if you want to use a FASTA file as a query. |
| 04:36 | Here is the code, if you want to use sequence record object from a FASTA file as a query. |
| 04:42 | Back to the terminal. |
| 04:44 | The next step is to parse the file to extract data. |
| 04:48 | The first step in parsing is to open the xml file for input. |
| 04:53 | Type the following at the prompt. Press Enter. |
| 04:57 | Next, import the module NCBIXML from "Bio.Blast" package. |
| 05:05 | Press Enter. |
| 05:07 | Type the following lines to parse the Blast output. |
| 05:11 | A BLAST record contains all the information you want to extract from the BLAST output. |
| 05:18 | Let us print out some information about all the hits in our blast report greater than a particular threshold. |
| 05:27 | Type the following code. |
| 05:30 | For a match to be significant, expect score should be less than 0.01. |
| 05:37 | For each hsp, that is, high scoring pair, we get the title, length, hsp score, gaps and expect value. |
| 05:49 | We will also print strings containing the query, the aligned database sequence and string specifying the match and mismatch positions. |
| 06:02 | Press Enter key twice to get the output. |
| 06:05 | Observe the output. |
| 06:09 | For each alignment, we have length, score, gaps, evalue and strings. |
| 06:16 | You can extract the required information using other functions, available in Bio.Blast package. |
| 06:24 | We have come to the end of this tutorial. |
| 06:26 | Let's summarize.In this tutorial, we have learnt to run BLAST for the query nucleotide sequence using GI number. |
| 06:36 | And, parse the BLAST output using Bio.Blast.Record module. |
| 06:43 | For the assignment, run BLAST Search for a protein sequence of your choice. |
| 06:50 | Save the output file and parse the data contained in the file. |
| 06:55 | Your completed assignment should have the following lines of code, as shown in this file. |
| 07:01 | Observe the code. Since our query is protein sequence, we have used blastp program and "nr", that is, non-redundant protein database for the BLAST search. |
| 07:16 | The video at the following link summarizes the Spoken Tutorial project. |
| 07:20 | Please download and watch it. |
| 07:22 | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
| 07:30 | For more details, please write to us. |
| 07:33 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
| 07:40 | More information on this mission is available at the link shown. |
| 07:45 | This is Snehalatha from IIT Bombay signing off. Thank you for joining. |