Difference between revisions of "Biopython/C2/Blast/English-timed"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with "{| Border=1 ! <center>Time</center> ! <center>Narration</center> |- | 00:01 | Welcome to this tutorial on''' BLAST''' using Biopython tools. |- | 00:06 | In this tutorial, w...")
 
Line 5: Line 5:
 
|-
 
|-
 
| 00:01
 
| 00:01
| Welcome to this tutorial on''' BLAST''' using Biopython tools.
+
| Welcome to this tutorial on''' BLAST''' using '''Biopython''' tools.
  
 
|-
 
|-
 
| 00:06
 
| 00:06
| In this tutorial, we will learn: To run '''BLAST''' for the query sequence using Biopython tools  
+
| In this tutorial, we will learn: * To '''run''' '''BLAST''' for the '''query sequence''' using Biopython tools  
  
 
|-
 
|-
 
|00:13
 
|00:13
|And parse the '''BLAST''' output for further analysis.
+
|* And, '''parse''' the '''BLAST''' output for further analysis.
  
 
|-
 
|-
 
| 00:17
 
| 00:17
|To follow this tutorial you should be familiar with, Undergraduate '''Biochemistry''' or '''Bioinformatics'''
+
|To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics
  
 
|-
 
|-
 
|00:24
 
|00:24
|And basic '''Python''' programming  
+
|and basic '''Python''' programming.
  
 
|-
 
|-
Line 29: Line 29:
 
|-
 
|-
 
| 00:31
 
| 00:31
|To record this tutorial, I am using,'''Ubuntu''' Operating System version. 14.10
+
|To record this tutorial, I am using: * '''Ubuntu''' Operating System version 14.10
 
   
 
   
 
|-
 
|-
 
|00:37
 
|00:37
|'''Python''' version 2.7.8
+
|* '''Python''' version 2.7.8
  
 
|-
 
|-
 
|00:41
 
|00:41
| '''Ipython interpretor''' version 2.3.0
+
|* '''Ipython interpretor''' version 2.3.0
  
 
|-
 
|-
 
|00:46
 
|00:46
|'''Biopython''' version 1.64 and a working Internet connection
+
|* '''Biopython''' version 1.64 and * a working Internet connection.
  
 
|-
 
|-
Line 49: Line 49:
 
|-
 
|-
 
|00:57
 
|00:57
|It is an algorithm for comparing sequence information.
+
|It is an '''algorithm''' for comparing '''sequence''' information.
  
 
|-
 
|-
Line 57: Line 57:
 
|-
 
|-
 
|01:14
 
|01:14
| There are two different ways to run ''BLAST''':
+
| There are two different ways to '''run''' BLAST:
  
 
|-
 
|-
 
|01:17
 
|01:17
|Local '''BLAST''' on your machine or Run '''BLAST''' over Internet through NCBI servers.
+
|Local '''BLAST''' on your machine or run '''BLAST''' over Internet through NCBI servers.
  
 
|-
 
|-
Line 69: Line 69:
 
|-
 
|-
 
|01:28
 
|01:28
|First, run '''BLAST''' for your query sequence and get some output.  
+
|First, run '''BLAST''' for your '''query sequence''' and get some output.  
  
 
|-
 
|-
 
|01:33
 
|01:33
|Second, parse the '''BLAST''' output for further analysis.
+
|Second, '''parse''' the '''BLAST''' output for further analysis.
  
 
|-
 
|-
Line 81: Line 81:
 
|-
 
|-
 
|01:43
 
|01:43
|Open the terminal by pressing Ctrl, Alt and T keys simultaneously.  
+
|Open the terminal by pressing '''Ctrl, Alt''' and '''T''' keys simultaneously.  
  
 
|-
 
|-
 
| 01:48
 
| 01:48
|At the '''prompt''' type '''ipython''' and press '''Enter'''.
+
|At the '''prompt''', type: '''ipython''' and press '''Enter'''.
  
 
|-
 
|-
 
| 01:52
 
| 01:52
|In this tutorial I will demonstrate how to run '''BLAST''' over internet using '''NCBI BLAST''' service.
+
|In this tutorial, I will demonstrate how to run '''BLAST''' over internet using '''NCBI BLAST''' service.
  
 
|-
 
|-
 
|02:01
 
|02:01
|Type the following at the '''prompt''': Import '''NCBIWWW''' module from '''Bio.Blast''' package. Press '''Enter'''.
+
|Type the following at the prompt: '''Import NCBIWWW''' module from '''Bio.Blast''' package. Press '''Enter'''.
  
 
|-
 
|-
 
| 02:14
 
| 02:14
|Next to run the '''BLAST''' over internet: type the following at the '''prompt'''
+
|Next, to '''run''' the BLAST over internet, type the following at the prompt.
  
 
|-
 
|-
Line 113: Line 113:
 
|-
 
|-
 
|02:33
 
|02:33
|Second specifies the databases to search against.
+
|Second, specifies the databases to search against.
  
 
|-
 
|-
 
|02:38
 
|02:38
|The third '''argument''' is a your '''query sequence.'''
+
|The third '''argument''' is your '''query sequence.'''
  
 
|-
 
|-
 
| 02:43
 
| 02:43
| The input for the '''query sequence''' can be in the form of '''GI''' number or a '''FASTA''' file Or it can also be a '''sequence record object'''.
+
| The input for the '''query sequence''' can be in the form of '''GI''' number or a '''FASTA''' file. Or, it can also be a '''sequence record object'''.
  
 
|-
 
|-
Line 133: Line 133:
 
|-
 
|-
 
| 03:03
 
| 03:03
|The '''qblast function''' also takes a number of other option '''arguments'''.
+
|The '''qblast function''' also takes a number of other option arguments.
  
 
|-
 
|-
 
|03:09
 
|03:09
|These '''arguments''' are analogous to the different parameters you can set on the '''BLAST''' web page.
+
|These arguments are analogous to the different parameters you can set on the '''BLAST''' web page.
  
 
|-
 
|-
Line 157: Line 157:
 
|-
 
|-
 
|03:30
 
|03:30
|Since our query is a '''nucleotide''', we will use '''blastn '''program'''  and '''nt''' refers to the '''nucleotide''' database.
+
|Since our query is a '''nucleotide''', we will use '''blastn '''program'''  and "nt" refers to the '''nucleotide''' database.
  
 
|-
 
|-
Line 165: Line 165:
 
|-
 
|-
 
| 03:45
 
| 03:45
|The blast output is stored in the variable '''result''' in the form of an '''xml''' file.
+
|The '''blast output''' is stored in the variable '''result''', in the form of an '''xml''' file.
  
 
|-
 
|-
Line 181: Line 181:
 
|-
 
|-
 
|04:05
 
|04:05
|Type the following lines to save the '''xml file. '''
+
|Type the following lines to '''save''' the '''xml file. '''
  
 
|-
 
|-
 
|04:09
 
|04:09
|These lines of code will save the search result as '''blast.xml''' in the home folder.
+
|These lines of code will save the search result as '''blast.xml''' in the '''home''' folder.
  
 
|-
 
|-
 
| 04:18
 
| 04:18
| Navigate to your home folder and locate the file.
+
| Navigate to your '''home''' folder and locate the file.
  
 
|-
 
|-
Line 197: Line 197:
 
|-
 
|-
 
| 04:30
 
| 04:30
| Use the code shown in this text file if you want to use a '''FASTA''' file as a query.  
+
| Use the code shown in this text file, if you want to use a '''FASTA''' file as a '''query'''.  
  
 
|-
 
|-
Line 209: Line 209:
 
|-
 
|-
 
|04:44
 
|04:44
|The next step is to parse the file to extract data.
+
|The next step is to '''parse''' the file to '''extract''' data.
  
 
|-
 
|-
 
|04:48
 
|04:48
|The First step in parsing is to open the '''xml''' file for input.
+
|The first step in parsing is to open the '''xml''' file for input.
  
 
|-
 
|-
 
|04:53
 
|04:53
|Type the following at the '''prompt'''. Press '''Enter'''.
+
|Type the following at the prompt. Press '''Enter'''.
  
 
|-
 
|-
 
| 04:57
 
| 04:57
|Next import the module '''NCBIXML''' from '''Bio.Blast''' package.
+
|Next, '''import''' the module '''NCBIXML''' from "Bio.Blast" '''package'''.
  
 
|-
 
|-
Line 233: Line 233:
 
|-
 
|-
 
|05:11
 
|05:11
|A '''BLAST''' record contains all the information you want to extract from the '''BLAST''' output.
+
|A BLAST '''record''' contains all the information you want to extract from the '''BLAST''' output.
  
 
|-
 
|-
 
| 05:18
 
| 05:18
|Let us print out some information about all hits in our blast report greater than a particular threshold.  
+
|Let us print out some information about all '''hit'''s in our '''blast report''' greater than a particular threshold.  
  
 
|-
 
|-
 
|05:27
 
|05:27
|Type the following code  
+
|Type the following code.
  
 
|-
 
|-
Line 249: Line 249:
 
|-
 
|-
 
| 05:37
 
| 05:37
| For each '''hsp''' that is high scoring pair, we get the '''title''', '''length''','''hsp''' score '''gaps and '''expect value'''.
+
| For each '''hsp''', that is, high scoring pair, we get the '''title''', '''length''','''hsp score, gaps''' and '''expect value'''.
  
 
|-
 
|-
 
|05:49
 
|05:49
|We also print strings containing the '''query''' , the aligned database sequence and string specifying the match and mismatch positions  
+
|We also print '''string'''s containing the '''query''' , the aligned database sequence and string specifying the match and mismatch positions.
  
 
|-
 
|-
Line 265: Line 265:
 
|-
 
|-
 
|06:09
 
|06:09
|For each alignment we have '''length''', '''score''', '''gaps''', '''evalue''' and '''strings'''.
+
|For each alignment, we have '''length, score, gaps, evalue''' and '''strings'''.
  
 
|-
 
|-
Line 277: Line 277:
 
|-
 
|-
 
| 06:26
 
| 06:26
| Let's summarize,
+
| Let's summarize.
  
 
|-
 
|-
 
|06:27
 
|06:27
|In this tutorial we have learnt to Run '''BLAST''' for the query nucleotide sequence using '''GI''' number.
+
|In this tutorial, we have learnt to run '''BLAST''' for the query nucleotide sequence, using '''GI''' number.
  
 
|-
 
|-
 
|06:36
 
|06:36
|And parse the '''BLAST''' output using '''Bio.Blast.Record''' module.
+
|And, parse the '''BLAST''' output using '''Bio.Blast.Record''' module.
  
 
|-
 
|-
 
| 06:43
 
| 06:43
| For the assignment, Run''' BLAST''' Search for a '''protein''' sequence of your choice.  
+
| For the assignment, run''' BLAST Search''' for a '''protein''' sequence of your choice.  
  
 
|-
 
|-
 
|06:50
 
|06:50
|Save the output file and parse the data contained in the file.
+
|'''Save''' the output file and parse the data contained in the file.
  
 
|-
 
|-
Line 301: Line 301:
 
|-
 
|-
 
|07:01
 
|07:01
|Observe the code, Since our query is '''protein''' sequence, we have used '''blastp '''program and '''nr''' that is non-redundant '''protein''' database for the '''BLAST''' search.
+
|Observe the code. Since our query is '''protein''' sequence, we have used '''blastp '''program and '''nr''', that is, non-redundant '''protein''' database for the '''BLAST''' search.
  
 
|-
 
|-
Line 313: Line 313:
 
|-
 
|-
 
| 07:22
 
| 07:22
| The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.  
+
| The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.  
  
 
|-
 
|-
Line 325: Line 325:
 
|-
 
|-
 
|07:40
 
|07:40
|More information on this Mission is available at the link shown.  
+
|More information on this mission is available at the link shown.  
  
 
|-
 
|-
 
| 07:45
 
| 07:45
| This is Snehalatha from IIT Bombay signing off. Thank you for joining.  
+
| This is Snehalatha from '''IIT Bombay''' signing off. Thank you for joining.  
  
 
|}
 
|}

Revision as of 13:53, 3 August 2016

Time
Narration
00:01 Welcome to this tutorial on BLAST using Biopython tools.
00:06 In this tutorial, we will learn: * To run BLAST for the query sequence using Biopython tools
00:13 * And, parse the BLAST output for further analysis.
00:17 To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics
00:24 and basic Python programming.
00:27 Refer to the Python tutorials at the given link.
00:31 To record this tutorial, I am using: * Ubuntu Operating System version 14.10
00:37 * Python version 2.7.8
00:41 * Ipython interpretor version 2.3.0
00:46 * Biopython version 1.64 and * a working Internet connection.
00:52 BLAST is the acronym for Basic Local Alignment Search Tool.
00:57 It is an algorithm for comparing sequence information.
01:02 The program compares nucleotide or protein sequences to sequences in databases and calculates the statistical significance of matches.
01:14 There are two different ways to run BLAST:
01:17 Local BLAST on your machine or run BLAST over Internet through NCBI servers.
01:24 Running BLAST in Biopython has two steps.
01:28 First, run BLAST for your query sequence and get some output.
01:33 Second, parse the BLAST output for further analysis.
01:38 We will open the terminal and run BLAST for a nucleotide sequence.
01:43 Open the terminal by pressing Ctrl, Alt and T keys simultaneously.
01:48 At the prompt, type: ipython and press Enter.
01:52 In this tutorial, I will demonstrate how to run BLAST over internet using NCBI BLAST service.
02:01 Type the following at the prompt: Import NCBIWWW module from Bio.Blast package. Press Enter.
02:14 Next, to run the BLAST over internet, type the following at the prompt.
02:20 We will use qblast function in the NCBIWWW module.
02:25 qblast function takes three arguments:
02:29 The first argument is the blast program to use for the search.
02:33 Second, specifies the databases to search against.
02:38 The third argument is your query sequence.
02:43 The input for the query sequence can be in the form of GI number or a FASTA file. Or, it can also be a sequence record object.
02:53 For this demonstration, I am using the GI number for a nucleotide sequence.
02:58 The GI number is for a nucleotide sequence of insulin.
03:03 The qblast function also takes a number of other option arguments.
03:09 These arguments are analogous to the different parameters you can set on the BLAST web page.
03:15 The qblast function will return the BLAST results in xml format.
03:20 Back to the terminal.
03:22 We have to use the appropriate Blast program,
03:25 depending on whether our query sequence is a nucleotide or protein sequence.
03:30 Since our query is a nucleotide, we will use blastn program and "nt" refers to the nucleotide database.
03:39 Details about this are available at the NCBI BLAST webpage.
03:45 The blast output is stored in the variable result, in the form of an xml file.
03:51 Press Enter.
03:53 Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search.
03:59 It is important to save the xml file on the disk before processing further.
04:05 Type the following lines to save the xml file.
04:09 These lines of code will save the search result as blast.xml in the home folder.
04:18 Navigate to your home folder and locate the file.
04:21 Click on the file and check the contents of the file.
04:30 Use the code shown in this text file, if you want to use a FASTA file as a query.
04:36 Here is the code if you want to use sequence record object from a FASTA file as a query.
04:42 Back to the terminal.
04:44 The next step is to parse the file to extract data.
04:48 The first step in parsing is to open the xml file for input.
04:53 Type the following at the prompt. Press Enter.
04:57 Next, import the module NCBIXML from "Bio.Blast" package.
05:05 Press Enter.
05:07 Type the following lines to parse the Blast output.
05:11 A BLAST record contains all the information you want to extract from the BLAST output.
05:18 Let us print out some information about all hits in our blast report greater than a particular threshold.
05:27 Type the following code.
05:30 For a match to be significant, expect score should be less than 0.01.
05:37 For each hsp, that is, high scoring pair, we get the title, length,hsp score, gaps and expect value.
05:49 We also print strings containing the query , the aligned database sequence and string specifying the match and mismatch positions.
06:02 Press Enter key twice to get output.
06:05 Observe the output:
06:09 For each alignment, we have length, score, gaps, evalue and strings.
06:16 You can extract the required information using other functions available in Bio.Blast package.
06:24 We have come to the end of this tutorial.
06:26 Let's summarize.
06:27 In this tutorial, we have learnt to run BLAST for the query nucleotide sequence, using GI number.
06:36 And, parse the BLAST output using Bio.Blast.Record module.
06:43 For the assignment, run BLAST Search for a protein sequence of your choice.
06:50 Save the output file and parse the data contained in the file.
06:55 Your completed assignment should have the following lines of code, as shown in this file.
07:01 Observe the code. Since our query is protein sequence, we have used blastp program and nr, that is, non-redundant protein database for the BLAST search.
07:16 The video at the following link summarizes the Spoken Tutorial project.
07:20 Please download and watch it.
07:22 The Spoken Tutorial Project team conducts workshops and gives certificates for those who pass an online test.
07:30 For more details, please write to us.
07:33 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
07:40 More information on this mission is available at the link shown.
07:45 This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Sandhya.np14, Snehalathak