Difference between revisions of "Biopython/C2/Blast/English-timed"
From Script | Spoken-Tutorial
Sandhya.np14 (Talk | contribs) |
Sandhya.np14 (Talk | contribs) |
||
(5 intermediate revisions by 3 users not shown) | |||
Line 9: | Line 9: | ||
|- | |- | ||
| 00:06 | | 00:06 | ||
− | | In this tutorial, we will learn: | + | | In this tutorial, we will learn: To '''run''' "BLAST" for the '''query sequence''' using Biopython tools |
|- | |- | ||
|00:13 | |00:13 | ||
− | | | + | | And, '''parse''' the BLAST output for further analysis. |
|- | |- | ||
Line 33: | Line 33: | ||
|- | |- | ||
|00:37 | |00:37 | ||
− | | | + | | '''Python''' version 2.7.8 |
|- | |- | ||
|00:41 | |00:41 | ||
− | | | + | | '''Ipython interpretor''' version 2.3.0 |
|- | |- | ||
|00:46 | |00:46 | ||
− | | | + | | '''Biopython''' version 1.64 and * a working Internet connection. |
|- | |- | ||
Line 93: | Line 93: | ||
|- | |- | ||
|02:01 | |02:01 | ||
− | |Type the following at the prompt: ''' | + | |Type the following at the prompt: '''from Bio.Blast Import NCBIWWW''' Press '''Enter'''. |
|- | |- | ||
| 02:14 | | 02:14 | ||
− | |Next, to '''run''' the BLAST over internet, type the following at the prompt. | + | |Next, to '''run''' the BLAST over internet, type the following at the prompt.result= NCBIWWW.qblast("blastn","nt","186429"). |
|- | |- | ||
Line 157: | Line 157: | ||
|- | |- | ||
|03:30 | |03:30 | ||
− | |Since our query is a '''nucleotide''', we will use '''blastn '''program | + | |Since our query is a '''nucleotide''', we will use '''blastn '''program and "nt" refers to the '''nucleotide''' database. |
|- | |- | ||
Line 201: | Line 201: | ||
|- | |- | ||
| 04:36 | | 04:36 | ||
− | | Here is the code if you want to use '''sequence record object''' from a '''FASTA''' file as a query. | + | | Here is the code, if you want to use '''sequence record object''' from a '''FASTA''' file as a query. |
|- | |- | ||
Line 229: | Line 229: | ||
|- | |- | ||
| 05:07 | | 05:07 | ||
− | |Type the following lines to parse the Blast output. | + | |Type the following lines to '''parse''' the '''Blast''' output. |
|- | |- | ||
|05:11 | |05:11 | ||
− | |A BLAST '''record''' contains all the information you want to extract from the '''BLAST''' output. | + | |A BLAST '''record''' contains all the information you want to '''extract''' from the '''BLAST''' output. |
|- | |- | ||
| 05:18 | | 05:18 | ||
− | |Let us print out some information about all '''hit'''s in our '''blast report''' greater than a particular threshold. | + | |Let us print out some information about all the '''hit'''s in our '''blast report''' greater than a particular threshold. |
|- | |- | ||
Line 245: | Line 245: | ||
|- | |- | ||
|05:30 | |05:30 | ||
− | |For a match to be significant, expect score should be less than 0.01. | + | |For a '''match''' to be significant, expect '''score''' should be less than 0.01. |
|- | |- | ||
| 05:37 | | 05:37 | ||
− | | For each '''hsp''', that is, high scoring pair, we get the '''title | + | | For each '''hsp''', that is, high scoring pair, we get the '''title, length, hsp score, gaps''' and '''expect value'''. |
|- | |- | ||
|05:49 | |05:49 | ||
− | |We also print '''string'''s containing the '''query''' , the aligned database sequence and string specifying the match and mismatch positions. | + | |We will also print '''string'''s containing the '''query''', the aligned database sequence and string specifying the match and mismatch positions. |
|- | |- | ||
|06:02 | |06:02 | ||
− | |Press '''Enter''' key twice to get output. | + | |Press '''Enter''' key twice to get the output. |
|- | |- | ||
| 06:05 | | 06:05 | ||
− | | Observe the output | + | | Observe the output. |
|- | |- | ||
Line 269: | Line 269: | ||
|- | |- | ||
| 06:16 | | 06:16 | ||
− | |You can extract the required information using other '''functions''' available in '''Bio.Blast''' package. | + | |You can extract the required information using other '''functions''', available in '''Bio.Blast''' package. |
|- | |- | ||
Line 277: | Line 277: | ||
|- | |- | ||
| 06:26 | | 06:26 | ||
− | | Let's summarize. | + | | Let's summarize.In this tutorial, we have learnt to run '''BLAST''' for the query nucleotide sequence using '''GI''' number. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
Line 301: | Line 297: | ||
|- | |- | ||
|07:01 | |07:01 | ||
− | |Observe the code. Since our query is '''protein''' sequence, we have used '''blastp '''program and | + | |Observe the code. Since our '''query''' is '''protein''' sequence, we have used '''blastp '''program and "nr", that is, non-redundant protein database for the '''BLAST''' search. |
|- | |- |
Latest revision as of 21:18, 20 July 2018
|
|
---|---|
00:01 | Welcome to this tutorial on BLAST using Biopython tools. |
00:06 | In this tutorial, we will learn: To run "BLAST" for the query sequence using Biopython tools |
00:13 | And, parse the BLAST output for further analysis. |
00:17 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
00:24 | and basic Python programming. |
00:27 | Refer to the Python tutorials at the given link. |
00:31 | To record this tutorial, I am using: * Ubuntu Operating System version 14.10 |
00:37 | Python version 2.7.8 |
00:41 | Ipython interpretor version 2.3.0 |
00:46 | Biopython version 1.64 and * a working Internet connection. |
00:52 | BLAST is the acronym for Basic Local Alignment Search Tool. |
00:57 | It is an algorithm for comparing sequence information. |
01:02 | The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches. |
01:14 | There are two different ways to run BLAST: |
01:17 | Local BLAST on your machine or run BLAST over Internet through NCBI servers. |
01:24 | Running BLAST in Biopython has two steps. |
01:28 | First, run BLAST for your query sequence and get some output. |
01:33 | Second, parse the BLAST output for further analysis. |
01:38 | We will open the terminal and run BLAST for a nucleotide sequence. |
01:43 | Open the terminal by pressing Ctrl, Alt and T keys simultaneously. |
01:48 | At the prompt, type: ipython and press Enter. |
01:52 | In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service. |
02:01 | Type the following at the prompt: from Bio.Blast Import NCBIWWW Press Enter. |
02:14 | Next, to run the BLAST over internet, type the following at the prompt.result= NCBIWWW.qblast("blastn","nt","186429"). |
02:20 | We will use qblast function in the NCBIWWW module. |
02:25 | qblast function takes three arguments: |
02:29 | The first argument is the blast program to use for the search. |
02:33 | Second, specifies the databases to search against. |
02:38 | The third argument is your query sequence. |
02:43 | The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object. |
02:53 | For this demonstration, I am using the GI number for a nucleotide sequence. |
02:58 | The GI number is for a nucleotide sequence of insulin. |
03:03 | The qblast function also takes a number of other option arguments. |
03:09 | These arguments are analogous to the different parameters you can set on the BLAST web page. |
03:15 | The qblast function will return the BLAST results in xml format. |
03:20 | Back to the terminal. |
03:22 | We have to use the appropriate Blast program, |
03:25 | depending on whether our query sequence is a nucleotide or protein sequence. |
03:30 | Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database. |
03:39 | Details about this are available at the NCBI BLAST webpage. |
03:45 | The blast output is stored in the variable result, in the form of an xml file. |
03:51 | Press Enter. |
03:53 | Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search. |
03:59 | It is important to save the xml file on the disk before processing further. |
04:05 | Type the following lines to save the xml file. |
04:09 | These lines of code will save the search result as blast.xml in the home folder. |
04:18 | Navigate to your home folder and locate the file. |
04:21 | Click on the file and check the contents of the file. |
04:30 | Use the code shown in this text file, if you want to use a FASTA file as a query. |
04:36 | Here is the code, if you want to use sequence record object from a FASTA file as a query. |
04:42 | Back to the terminal. |
04:44 | The next step is to parse the file to extract data. |
04:48 | The first step in parsing is to open the xml file for input. |
04:53 | Type the following at the prompt. Press Enter. |
04:57 | Next, import the module NCBIXML from "Bio.Blast" package. |
05:05 | Press Enter. |
05:07 | Type the following lines to parse the Blast output. |
05:11 | A BLAST record contains all the information you want to extract from the BLAST output. |
05:18 | Let us print out some information about all the hits in our blast report greater than a particular threshold. |
05:27 | Type the following code. |
05:30 | For a match to be significant, expect score should be less than 0.01. |
05:37 | For each hsp, that is, high scoring pair, we get the title, length, hsp score, gaps and expect value. |
05:49 | We will also print strings containing the query, the aligned database sequence and string specifying the match and mismatch positions. |
06:02 | Press Enter key twice to get the output. |
06:05 | Observe the output. |
06:09 | For each alignment, we have length, score, gaps, evalue and strings. |
06:16 | You can extract the required information using other functions, available in Bio.Blast package. |
06:24 | We have come to the end of this tutorial. |
06:26 | Let's summarize.In this tutorial, we have learnt to run BLAST for the query nucleotide sequence using GI number. |
06:36 | And, parse the BLAST output using Bio.Blast.Record module. |
06:43 | For the assignment, run BLAST Search for a protein sequence of your choice. |
06:50 | Save the output file and parse the data contained in the file. |
06:55 | Your completed assignment should have the following lines of code, as shown in this file. |
07:01 | Observe the code. Since our query is protein sequence, we have used blastp program and "nr", that is, non-redundant protein database for the BLAST search. |
07:16 | The video at the following link summarizes the Spoken Tutorial project. |
07:20 | Please download and watch it. |
07:22 | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
07:30 | For more details, please write to us. |
07:33 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
07:40 | More information on this mission is available at the link shown. |
07:45 | This is Snehalatha from IIT Bombay signing off. Thank you for joining. |