Difference between revisions of "Biopython/C2/Blast/English-timed"
From Script | Spoken-Tutorial
PoojaMoolya (Talk | contribs) (Created page with "{| Border=1 ! <center>Time</center> ! <center>Narration</center> |- | 00:01 | Welcome to this tutorial on''' BLAST''' using Biopython tools. |- | 00:06 | In this tutorial, w...") |
Sandhya.np14 (Talk | contribs) |
||
Line 5: | Line 5: | ||
|- | |- | ||
| 00:01 | | 00:01 | ||
− | | Welcome to this tutorial on''' BLAST''' using Biopython tools. | + | | Welcome to this tutorial on''' BLAST''' using '''Biopython''' tools. |
|- | |- | ||
| 00:06 | | 00:06 | ||
− | | In this tutorial, we will learn: To run '''BLAST''' for the query sequence using Biopython tools | + | | In this tutorial, we will learn: * To '''run''' '''BLAST''' for the '''query sequence''' using Biopython tools |
|- | |- | ||
|00:13 | |00:13 | ||
− | |And parse the '''BLAST''' output for further analysis. | + | |* And, '''parse''' the '''BLAST''' output for further analysis. |
|- | |- | ||
| 00:17 | | 00:17 | ||
− | |To follow this tutorial you should be familiar with | + | |To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
|- | |- | ||
|00:24 | |00:24 | ||
− | | | + | |and basic '''Python''' programming. |
|- | |- | ||
Line 29: | Line 29: | ||
|- | |- | ||
| 00:31 | | 00:31 | ||
− | |To record this tutorial, I am using | + | |To record this tutorial, I am using: * '''Ubuntu''' Operating System version 14.10 |
|- | |- | ||
|00:37 | |00:37 | ||
− | |'''Python''' version 2.7.8 | + | |* '''Python''' version 2.7.8 |
|- | |- | ||
|00:41 | |00:41 | ||
− | | '''Ipython interpretor''' version 2.3.0 | + | |* '''Ipython interpretor''' version 2.3.0 |
|- | |- | ||
|00:46 | |00:46 | ||
− | |'''Biopython''' version 1.64 and a working Internet connection | + | |* '''Biopython''' version 1.64 and * a working Internet connection. |
|- | |- | ||
Line 49: | Line 49: | ||
|- | |- | ||
|00:57 | |00:57 | ||
− | |It is an algorithm for comparing sequence information. | + | |It is an '''algorithm''' for comparing '''sequence''' information. |
|- | |- | ||
Line 57: | Line 57: | ||
|- | |- | ||
|01:14 | |01:14 | ||
− | | There are two different ways to | + | | There are two different ways to '''run''' BLAST: |
|- | |- | ||
|01:17 | |01:17 | ||
− | |Local '''BLAST''' on your machine or | + | |Local '''BLAST''' on your machine or run '''BLAST''' over Internet through NCBI servers. |
|- | |- | ||
Line 69: | Line 69: | ||
|- | |- | ||
|01:28 | |01:28 | ||
− | |First, run '''BLAST''' for your query sequence and get some output. | + | |First, run '''BLAST''' for your '''query sequence''' and get some output. |
|- | |- | ||
|01:33 | |01:33 | ||
− | |Second, parse the '''BLAST''' output for further analysis. | + | |Second, '''parse''' the '''BLAST''' output for further analysis. |
|- | |- | ||
Line 81: | Line 81: | ||
|- | |- | ||
|01:43 | |01:43 | ||
− | |Open the terminal by pressing Ctrl, Alt and T keys simultaneously. | + | |Open the terminal by pressing '''Ctrl, Alt''' and '''T''' keys simultaneously. |
|- | |- | ||
| 01:48 | | 01:48 | ||
− | |At the '''prompt''' type '''ipython''' and press '''Enter'''. | + | |At the '''prompt''', type: '''ipython''' and press '''Enter'''. |
|- | |- | ||
| 01:52 | | 01:52 | ||
− | |In this tutorial I will demonstrate how to run '''BLAST''' over internet using '''NCBI BLAST''' service. | + | |In this tutorial, I will demonstrate how to run '''BLAST''' over internet using '''NCBI BLAST''' service. |
|- | |- | ||
|02:01 | |02:01 | ||
− | |Type the following at the | + | |Type the following at the prompt: '''Import NCBIWWW''' module from '''Bio.Blast''' package. Press '''Enter'''. |
|- | |- | ||
| 02:14 | | 02:14 | ||
− | |Next to | + | |Next, to '''run''' the BLAST over internet, type the following at the prompt. |
|- | |- | ||
Line 113: | Line 113: | ||
|- | |- | ||
|02:33 | |02:33 | ||
− | |Second specifies the databases to search against. | + | |Second, specifies the databases to search against. |
|- | |- | ||
|02:38 | |02:38 | ||
− | |The third '''argument''' is | + | |The third '''argument''' is your '''query sequence.''' |
|- | |- | ||
| 02:43 | | 02:43 | ||
− | | The input for the '''query sequence''' can be in the form of '''GI''' number or a '''FASTA''' file Or it can also be a '''sequence record object'''. | + | | The input for the '''query sequence''' can be in the form of '''GI''' number or a '''FASTA''' file. Or, it can also be a '''sequence record object'''. |
|- | |- | ||
Line 133: | Line 133: | ||
|- | |- | ||
| 03:03 | | 03:03 | ||
− | |The '''qblast function''' also takes a number of other option | + | |The '''qblast function''' also takes a number of other option arguments. |
|- | |- | ||
|03:09 | |03:09 | ||
− | |These | + | |These arguments are analogous to the different parameters you can set on the '''BLAST''' web page. |
|- | |- | ||
Line 157: | Line 157: | ||
|- | |- | ||
|03:30 | |03:30 | ||
− | |Since our query is a '''nucleotide''', we will use '''blastn '''program''' and | + | |Since our query is a '''nucleotide''', we will use '''blastn '''program''' and "nt" refers to the '''nucleotide''' database. |
|- | |- | ||
Line 165: | Line 165: | ||
|- | |- | ||
| 03:45 | | 03:45 | ||
− | |The blast output is stored in the variable '''result''' in the form of an '''xml''' file. | + | |The '''blast output''' is stored in the variable '''result''', in the form of an '''xml''' file. |
|- | |- | ||
Line 181: | Line 181: | ||
|- | |- | ||
|04:05 | |04:05 | ||
− | |Type the following lines to save the '''xml file. ''' | + | |Type the following lines to '''save''' the '''xml file. ''' |
|- | |- | ||
|04:09 | |04:09 | ||
− | |These lines of code will save the search result as '''blast.xml''' in the home folder. | + | |These lines of code will save the search result as '''blast.xml''' in the '''home''' folder. |
|- | |- | ||
| 04:18 | | 04:18 | ||
− | | Navigate to your home folder and locate the file. | + | | Navigate to your '''home''' folder and locate the file. |
|- | |- | ||
Line 197: | Line 197: | ||
|- | |- | ||
| 04:30 | | 04:30 | ||
− | | Use the code shown in this text file if you want to use a '''FASTA''' file as a query. | + | | Use the code shown in this text file, if you want to use a '''FASTA''' file as a '''query'''. |
|- | |- | ||
Line 209: | Line 209: | ||
|- | |- | ||
|04:44 | |04:44 | ||
− | |The next step is to parse the file to extract data. | + | |The next step is to '''parse''' the file to '''extract''' data. |
|- | |- | ||
|04:48 | |04:48 | ||
− | |The | + | |The first step in parsing is to open the '''xml''' file for input. |
|- | |- | ||
|04:53 | |04:53 | ||
− | |Type the following at the | + | |Type the following at the prompt. Press '''Enter'''. |
|- | |- | ||
| 04:57 | | 04:57 | ||
− | |Next import the module '''NCBIXML''' from | + | |Next, '''import''' the module '''NCBIXML''' from "Bio.Blast" '''package'''. |
|- | |- | ||
Line 233: | Line 233: | ||
|- | |- | ||
|05:11 | |05:11 | ||
− | |A ''' | + | |A BLAST '''record''' contains all the information you want to extract from the '''BLAST''' output. |
|- | |- | ||
| 05:18 | | 05:18 | ||
− | |Let us print out some information about all | + | |Let us print out some information about all '''hit'''s in our '''blast report''' greater than a particular threshold. |
|- | |- | ||
|05:27 | |05:27 | ||
− | |Type the following code | + | |Type the following code. |
|- | |- | ||
Line 249: | Line 249: | ||
|- | |- | ||
| 05:37 | | 05:37 | ||
− | | For each '''hsp''' that is high scoring pair, we get the '''title''', '''length''','''hsp | + | | For each '''hsp''', that is, high scoring pair, we get the '''title''', '''length''','''hsp score, gaps''' and '''expect value'''. |
|- | |- | ||
|05:49 | |05:49 | ||
− | |We also print | + | |We also print '''string'''s containing the '''query''' , the aligned database sequence and string specifying the match and mismatch positions. |
|- | |- | ||
Line 265: | Line 265: | ||
|- | |- | ||
|06:09 | |06:09 | ||
− | |For each alignment we have '''length | + | |For each alignment, we have '''length, score, gaps, evalue''' and '''strings'''. |
|- | |- | ||
Line 277: | Line 277: | ||
|- | |- | ||
| 06:26 | | 06:26 | ||
− | | Let's summarize | + | | Let's summarize. |
|- | |- | ||
|06:27 | |06:27 | ||
− | |In this tutorial we have learnt to | + | |In this tutorial, we have learnt to run '''BLAST''' for the query nucleotide sequence, using '''GI''' number. |
|- | |- | ||
|06:36 | |06:36 | ||
− | |And parse the '''BLAST''' output using '''Bio.Blast.Record''' module. | + | |And, parse the '''BLAST''' output using '''Bio.Blast.Record''' module. |
|- | |- | ||
| 06:43 | | 06:43 | ||
− | | For the assignment, | + | | For the assignment, run''' BLAST Search''' for a '''protein''' sequence of your choice. |
|- | |- | ||
|06:50 | |06:50 | ||
− | |Save the output file and parse the data contained in the file. | + | |'''Save''' the output file and parse the data contained in the file. |
|- | |- | ||
Line 301: | Line 301: | ||
|- | |- | ||
|07:01 | |07:01 | ||
− | |Observe the code | + | |Observe the code. Since our query is '''protein''' sequence, we have used '''blastp '''program and '''nr''', that is, non-redundant '''protein''' database for the '''BLAST''' search. |
|- | |- | ||
Line 313: | Line 313: | ||
|- | |- | ||
| 07:22 | | 07:22 | ||
− | | The Spoken Tutorial Project | + | | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
|- | |- | ||
Line 325: | Line 325: | ||
|- | |- | ||
|07:40 | |07:40 | ||
− | |More information on this | + | |More information on this mission is available at the link shown. |
|- | |- | ||
| 07:45 | | 07:45 | ||
− | | This is Snehalatha from IIT Bombay signing off. Thank you for joining. | + | | This is Snehalatha from '''IIT Bombay''' signing off. Thank you for joining. |
|} | |} |
Revision as of 13:53, 3 August 2016
|
|
---|---|
00:01 | Welcome to this tutorial on BLAST using Biopython tools. |
00:06 | In this tutorial, we will learn: * To run BLAST for the query sequence using Biopython tools |
00:13 | * And, parse the BLAST output for further analysis. |
00:17 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
00:24 | and basic Python programming. |
00:27 | Refer to the Python tutorials at the given link. |
00:31 | To record this tutorial, I am using: * Ubuntu Operating System version 14.10 |
00:37 | * Python version 2.7.8 |
00:41 | * Ipython interpretor version 2.3.0 |
00:46 | * Biopython version 1.64 and * a working Internet connection. |
00:52 | BLAST is the acronym for Basic Local Alignment Search Tool. |
00:57 | It is an algorithm for comparing sequence information. |
01:02 | The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches. |
01:14 | There are two different ways to run BLAST: |
01:17 | Local BLAST on your machine or run BLAST over Internet through NCBI servers. |
01:24 | Running BLAST in Biopython has two steps. |
01:28 | First, run BLAST for your query sequence and get some output. |
01:33 | Second, parse the BLAST output for further analysis. |
01:38 | We will open the terminal and run BLAST for a nucleotide sequence. |
01:43 | Open the terminal by pressing Ctrl, Alt and T keys simultaneously. |
01:48 | At the prompt, type: ipython and press Enter. |
01:52 | In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service. |
02:01 | Type the following at the prompt: Import NCBIWWW module from Bio.Blast package. Press Enter. |
02:14 | Next, to run the BLAST over internet, type the following at the prompt. |
02:20 | We will use qblast function in the NCBIWWW module. |
02:25 | qblast function takes three arguments: |
02:29 | The first argument is the blast program to use for the search. |
02:33 | Second, specifies the databases to search against. |
02:38 | The third argument is your query sequence. |
02:43 | The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object. |
02:53 | For this demonstration, I am using the GI number for a nucleotide sequence. |
02:58 | The GI number is for a nucleotide sequence of insulin. |
03:03 | The qblast function also takes a number of other option arguments. |
03:09 | These arguments are analogous to the different parameters you can set on the BLAST web page. |
03:15 | The qblast function will return the BLAST results in xml format. |
03:20 | Back to the terminal. |
03:22 | We have to use the appropriate Blast program, |
03:25 | depending on whether our query sequence is a nucleotide or protein sequence. |
03:30 | Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database. |
03:39 | Details about this are available at the NCBI BLAST webpage. |
03:45 | The blast output is stored in the variable result, in the form of an xml file. |
03:51 | Press Enter. |
03:53 | Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search. |
03:59 | It is important to save the xml file on the disk before processing further. |
04:05 | Type the following lines to save the xml file. |
04:09 | These lines of code will save the search result as blast.xml in the home folder. |
04:18 | Navigate to your home folder and locate the file. |
04:21 | Click on the file and check the contents of the file. |
04:30 | Use the code shown in this text file, if you want to use a FASTA file as a query. |
04:36 | Here is the code if you want to use sequence record object from a FASTA file as a query. |
04:42 | Back to the terminal. |
04:44 | The next step is to parse the file to extract data. |
04:48 | The first step in parsing is to open the xml file for input. |
04:53 | Type the following at the prompt. Press Enter. |
04:57 | Next, import the module NCBIXML from "Bio.Blast" package. |
05:05 | Press Enter. |
05:07 | Type the following lines to parse the Blast output. |
05:11 | A BLAST record contains all the information you want to extract from the BLAST output. |
05:18 | Let us print out some information about all hits in our blast report greater than a particular threshold. |
05:27 | Type the following code. |
05:30 | For a match to be significant, expect score should be less than 0.01. |
05:37 | For each hsp, that is, high scoring pair, we get the title, length,hsp score, gaps and expect value. |
05:49 | We also print strings containing the query , the aligned database sequence and string specifying the match and mismatch positions. |
06:02 | Press Enter key twice to get output. |
06:05 | Observe the output: |
06:09 | For each alignment, we have length, score, gaps, evalue and strings. |
06:16 | You can extract the required information using other functions available in Bio.Blast package. |
06:24 | We have come to the end of this tutorial. |
06:26 | Let's summarize. |
06:27 | In this tutorial, we have learnt to run BLAST for the query nucleotide sequence, using GI number. |
06:36 | And, parse the BLAST output using Bio.Blast.Record module. |
06:43 | For the assignment, run BLAST Search for a protein sequence of your choice. |
06:50 | Save the output file and parse the data contained in the file. |
06:55 | Your completed assignment should have the following lines of code, as shown in this file. |
07:01 | Observe the code. Since our query is protein sequence, we have used blastp program and nr, that is, non-redundant protein database for the BLAST search. |
07:16 | The video at the following link summarizes the Spoken Tutorial project. |
07:20 | Please download and watch it. |
07:22 | The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test. |
07:30 | For more details, please write to us. |
07:33 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India. |
07:40 | More information on this mission is available at the link shown. |
07:45 | This is Snehalatha from IIT Bombay signing off. Thank you for joining. |