Difference between revisions of "Biopython/C2/Blast/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with " {| style="border-spacing:0;" ! <center>Visual Cue</center> ! <center>Narration</center> |- | style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;...")
 
 
Line 16: Line 16:
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| In this tutorial, we will learn to
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| In this tutorial, we will learn to
  
* Run '''BLAST''' for the query sequence using Biopython tools.<br/>
+
* Run '''BLAST''' for the query sequence using Biopython tools  
  
 
* And parse the '''BLAST''' output for further analysis.
 
* And parse the '''BLAST''' output for further analysis.
 
 
  
 
|-
 
|-
Line 28: Line 26:
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| To follow this tutorial you should be familiar with,
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| To follow this tutorial you should be familiar with,
  
 +
* Undergraduate '''Biochemistry''' or '''Bioinformatics'''
 +
* And basic '''Python''' programming
  
* Undergraduate Biochemistry or Bioinformatics<br/>
+
Refer to the '''Python''' tutorials at the given link.
 
+
* And Basic Python programming
+
 
+
Refer to the Python tutorials at the given link.
+
  
 
|-
 
|-
Line 41: Line 37:
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| To record this tutorial, I am using,
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| To record this tutorial, I am using,
  
* Ubuntu OS version. 14.10
+
* '''Ubuntu''' OS version. 14.10
* Python version 2.7.8
+
* '''Python''' version 2.7.8
* Ipython interpretor version 2.3.0
+
* '''Ipython interpretor''' version 2.3.0
* Biopython version 1.64
+
* '''Biopython''' version 1.64
* And a Working internet connection
+
* And a working Internet connection
 
+
 
+
  
 
|-
 
|-
Line 65: Line 59:
  
 
'''About BLAST'''
 
'''About BLAST'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The program compares nucleotide or protein sequences to sequences in databases  
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The program  
 
+
*compares '''nucleotide''' or '''protein''' sequences to sequences in databases  
 
+
*and calculates the statistical significance of matches.
and calculates the statistical significance of matches.
+
  
 
|-
 
|-
Line 74: Line 67:
  
 
'''About BLAST'''
 
'''About BLAST'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| There are two different ways to run '''BLAST''':
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| There are two different ways to run ''BLAST''':
 
+
Local '''BLAST''' on your machine.
+
  
 +
*Local '''BLAST''' on your machine.
  
Run '''BLAST''' over Internet through NCBI servers.
+
*Run '''BLAST''' over Internet through NCBI servers.
  
 
|-
 
|-
Line 85: Line 77:
  
 
'''About BLAST'''
 
'''About BLAST'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Running '''BLAST''' in Biopython has two steps.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Running '''BLAST''' in '''Biopython''' has two steps.
  
  
Line 95: Line 87:
 
|-
 
|-
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| '''Cursor on Slide Number 8'''
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| '''Cursor on Slide Number 8'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| We will open the terminal and run '''BLAST''' for a nucleotide sequence.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| We will open the '''terminal''' and run '''BLAST''' for a nucleotide sequence.
  
  
Open the terminal by pressing ctrl, alt and t keys simultaneously.  
+
Open the terminal by pressing Ctrl, Alt and T keys simultaneously.  
  
 
|-
 
|-
Line 104: Line 96:
  
 
Type, ipython and press enter.
 
Type, ipython and press enter.
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| At the prompt type ipython and press enter.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| At the '''prompt''' type '''ipython''' and press '''Enter'''.
  
 
|-
 
|-
Line 113: Line 105:
  
 
'''Press enter.'''
 
'''Press enter.'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| In this tutorial I will demonstrate how to run BLAST over internet using NCBI BLAST service.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| In this tutorial I will demonstrate how to run '''BLAST''' over internet using '''NCBI BLAST''' service.
  
  
Type the following at the prompt:
+
Type the following at the '''prompt''':
  
  
Import '''NCBIWWW''' module from Bio.Blast package.
+
Import '''NCBIWWW''' module from '''Bio.Blast''' package.
 
+
 
+
Press enter.
+
 
+
  
  
 +
Press '''Enter'''.
  
 
|-
 
|-
Line 132: Line 121:
  
 
'''>>> result = NCBIWWW.qblast("blastn", "nt", "186429")'''
 
'''>>> result = NCBIWWW.qblast("blastn", "nt", "186429")'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Next to run the BLAST over internet:
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Next to run the '''BLAST''' over internet:
 
+
 
+
Type the following at the prompt,
+
 
+
  
We will use function '''qblast function '''in the '''NCBIWWW '''module.
 
  
 +
Type the following at the '''prompt''',
  
  
 +
We will use '''qblast function '''in the '''NCBIWWW '''module.
  
 
|-
 
|-
Line 151: Line 137:
  
 
'''>>> result = NCBIWWW.qblast("blastn", "nt", "186429")'''
 
'''>>> result = NCBIWWW.qblast("blastn", "nt", "186429")'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The qblast function takes three arguments:
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The '''qblast function''' takes three '''arguments''':
  
  
The first argument is the blast program to use for the search.
+
The first '''argument''' is the '''blast program''' to use for the search.
  
  
Second specifies the databases to search against
+
Second specifies the databases to search against.
 
+
 
+
The third argument is a your query sequence.
+
 
+
  
  
 +
The third '''argument''' is a your '''query sequence.'''
  
 
|-
 
|-
Line 170: Line 153:
  
  
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The input for the query sequence can be in the form of '''GI''' number or a '''FASTA''' file.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The input for the '''query sequence''' can be in the form of '''GI''' number or a '''FASTA''' file.
  
Or it can also be a sequence record object.
+
Or it can also be a '''sequence record object'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Highlight the query.
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Highlight the query.
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| For this demonstration I am using the GI number for a nucleotide sequence.  
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| For this demonstration, I am using the '''GI number''' for a '''nucleotide sequence'''.  
  
  
The '''GI''' number is for a nucleotide sequence of insulin.  
+
The '''GI number''' is for a '''nucleotide sequence''' of '''insulin'''.  
  
 
|-
 
|-
Line 185: Line 168:
  
 
'''qblast function.'''
 
'''qblast function.'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The '''qblast''' function also takes a number of other option arguments.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The '''qblast function''' also takes a number of other option '''arguments'''.
  
  
These arguments are analogous to the different parameters you can set on the '''BLAST''' web page.
+
These '''arguments''' are analogous to the different parameters you can set on the '''BLAST''' web page.
  
  
The '''qblast''' function will return the '''BLAST''' results in '''xml''' format.
+
The '''qblast function''' will return the '''BLAST''' results in '''xml''' format.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Back to the terminal.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Back to the '''terminal'''.
  
  
We have to use the appropriate Blast program,
+
We have to use the appropriate '''Blast''' program,
  
  
depending on whether your query is a nucleotide or protein sequence.  
+
depending on whether our query is a '''nucleotide''' or '''protein sequence'''.  
  
  
Since our query is a nucleotide, we will use '''blastn '''program.''' '''
+
Since our query is a '''nucleotide''', we will use '''blastn '''program.''' '''
  
  
And '''nt''' refers to the nucleotide database.
+
And '''nt''' refers to the '''nucleotide''' database.
  
 
|-
 
|-
Line 214: Line 197:
  
 
'''NCBI BLAST website'''
 
'''NCBI BLAST website'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Details about this are available at the NCBI BLAST webpage.  
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Details about this are available at the '''NCBI BLAST''' webpage.  
  
  
 
http://blast.ncbi.nlm.nih.gov/Blast.cgi
 
http://blast.ncbi.nlm.nih.gov/Blast.cgi
 
 
 
  
 
|-
 
|-
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The blast output is stored in the variable '''result'''in the form of an '''xml''' file.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| The blast output is stored in the variable '''result''' in the form of an '''xml''' file.
  
Press enter.
+
Press '''Enter'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Depending upon the speed of your internet, it may take a few minutes to complete the BLAST search.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Depending upon the speed of your Internet, it may take a few minutes to complete the '''BLAST''' search.
  
 
|-
 
|-
Line 309: Line 289:
  
  
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Here is the code if you want to use sequence record object from a '''FASTA''' file as a query.  
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Here is the code if you want to use '''sequence record object''' from a '''FASTA''' file as a query.  
 
+
 
+
 
+
  
 
|-
 
|-
Line 320: Line 297:
  
  
'''press enter.'''
+
'''press Enter.'''
  
  
  
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Back to the terminal.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Back to the '''terminal'''.
  
  
Line 333: Line 310:
  
  
Type the following at the prompt:
+
Type the following at the '''prompt''':
  
  
Press enter.
+
Press '''Enter'''.
  
 
|-
 
|-
Line 343: Line 320:
  
 
'''press enter.'''
 
'''press enter.'''
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Next import the module '''NCBIXML''' from Bio.Blast package.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| Next import the module '''NCBIXML''' from '''Bio.Blast''' package.
  
  
Press enter.
+
Press '''Enter'''.
  
 
|-
 
|-
Line 401: Line 378:
  
  
'''Press enter key twice.'''
+
'''Press Enter key twice.'''
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| For each '''hsp''' that is high scoring pair, we get the '''title''', '''length''', '''gaps''' and '''expect value'''.
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| For each '''hsp''' that is high scoring pair, we get the '''title''', '''length''', '''gaps''' and '''expect value'''.
  
Line 408: Line 385:
  
  
Press enter key twice to get output.
+
Press '''Enter''' key twice to get output.
 
+
 
+
 
+
  
 
|-
 
|-
Line 421: Line 395:
 
|-
 
|-
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
 
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;"| Cursor on the terminal.
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| You can extract the required information using other functions available in '''Bio.Blast''' package.
+
| style="background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;"| You can extract the required information using other '''functions''' available in '''Bio.Blast''' package.
  
  
Line 437: Line 411:
 
In this tutorial we have learnt to,
 
In this tutorial we have learnt to,
  
 +
*Run '''BLAST''' for the query nucleotide sequence using '''GI''' number.
  
Run '''BLAST''' for the query nucleotide sequence using '''GI''' number.
+
*And parse the '''BLAST''' output file using '''Bio.Blast.Record''' module.
 
+
 
+
And parse the '''BLAST''' output file using '''Bio.Blast.Record''' module.
+
  
 
|-
 
|-
Line 457: Line 429:
  
  
Run''' BLAST''' search for a protein sequence of your choice.  
+
Run''' BLAST'''
 +
 
 +
Search for a '''protein''' sequence of your choice.  
  
  
Line 468: Line 442:
  
  
Since our query is protein sequence, here we have used '''blastp '''program.
+
Since our query is '''protein''' sequence, here we have used '''blastp '''program.
  
  
And '''nr'''non-redundant protein database for the BLAST search.
+
And '''nr''' non-redundant '''protein''' database for the '''BLAST''' search.
  
 
|-
 
|-

Latest revision as of 22:25, 15 October 2015

Visual Cue
Narration
Slide Number 1

Title Slide

Welcome to this tutorial on BLAST using Biopython tools.
Slide Number 2

Learning Objectives

In this tutorial, we will learn to
  • Run BLAST for the query sequence using Biopython tools
  • And parse the BLAST output for further analysis.
Slide Number 3

Pre-requisites

To follow this tutorial you should be familiar with,
  • Undergraduate Biochemistry or Bioinformatics
  • And basic Python programming

Refer to the Python tutorials at the given link.

Slide Number 4

System Requirement

To record this tutorial, I am using,
  • Ubuntu OS version. 14.10
  • Python version 2.7.8
  • Ipython interpretor version 2.3.0
  • Biopython version 1.64
  • And a working Internet connection
Slide Number 5

About BLAST


BLAST is the acronym for Basic Local Alignment Search Tool.


It is an algorithm for comparing sequence information.

Slide Number 6

About BLAST

The program
  • compares nucleotide or protein sequences to sequences in databases
  • and calculates the statistical significance of matches.
Slide Number 7

About BLAST

There are two different ways to run BLAST':
  • Local BLAST on your machine.
  • Run BLAST over Internet through NCBI servers.
Slide Number 8

About BLAST

Running BLAST in Biopython has two steps.


First, run BLAST for your query sequence and get some output.


Second, parse the BLAST output for further analysis.

Cursor on Slide Number 8 We will open the terminal and run BLAST for a nucleotide sequence.


Open the terminal by pressing Ctrl, Alt and T keys simultaneously.

Cursor on the terminal.

Type, ipython and press enter.

At the prompt type ipython and press Enter.
At the prompt type:

>>> from Bio.Blast import NCBIWWW


Press enter.

In this tutorial I will demonstrate how to run BLAST over internet using NCBI BLAST service.


Type the following at the prompt:


Import NCBIWWW module from Bio.Blast package.


Press Enter.

Type,


>>> result = NCBIWWW.qblast("blastn", "nt", "186429")

Next to run the BLAST over internet:


Type the following at the prompt,


We will use qblast function in the NCBIWWW module.

Cursor on the terminal.


Highlight the arguments.


>>> result = NCBIWWW.qblast("blastn", "nt", "186429")

The qblast function takes three arguments:


The first argument is the blast program to use for the search.


Second specifies the databases to search against.


The third argument is a your query sequence.

Cursor on the terminal.


The input for the query sequence can be in the form of GI number or a FASTA file.

Or it can also be a sequence record object.

Highlight the query. For this demonstration, I am using the GI number for a nucleotide sequence.


The GI number is for a nucleotide sequence of insulin.

Slide Number 9

qblast function.

The qblast function also takes a number of other option arguments.


These arguments are analogous to the different parameters you can set on the BLAST web page.


The qblast function will return the BLAST results in xml format.

Cursor on the terminal. Back to the terminal.


We have to use the appropriate Blast program,


depending on whether our query is a nucleotide or protein sequence.


Since our query is a nucleotide, we will use blastn program.


And nt refers to the nucleotide database.

Slide Number 10


NCBI BLAST website

Details about this are available at the NCBI BLAST webpage.


http://blast.ncbi.nlm.nih.gov/Blast.cgi

Cursor on the terminal. The blast output is stored in the variable result in the form of an xml file.

Press Enter.

Cursor on the terminal. Depending upon the speed of your Internet, it may take a few minutes to complete the BLAST search.
Type,


>>> save_file = open("blast.xml", "w")

>>> save_file.write(result.read())

>>> save_file.close()

>>> result.close()


Press enter

It is important to save the xml file on the disk before processing further.


Type the following lines to save the xml file.


These lines of code will save the search result as blast.xml in the home folder.

Navigate to your home folder and locate the file.


Click on the file and check the contents of the file.

Open the text file.


from Bio.Blast import NCBIWWW

fasta_string = open("insulin.fasta").read()

result = NCBIWWW.qblast("blastn", "nt", fasta_string)

save_file = open("blast.xml", "w")

save_file.write(result.read())

save_file.close()

result.close()

result= open("blast.xml")

Use the code in this text file if you want to use a FASTA file as a query.



Open the text file.


from Bio.Blast import NCBIWWW

from Bio import SeqIO

record = SeqIO.read("insulin.fasta", format="fasta")

result = NCBIWWW.qblast("blastn", "nt", record.seq)

save_file = open("blast.xml", "w")

save_file.write(result.read())

save_file.close()

result.close()

result= open("blast.xml")


Here is the code if you want to use sequence record object from a FASTA file as a query.
Type,

>>>result = open("my_blast.xml")


press Enter.


Back to the terminal.


The next step is to parse the file to extract data.


First step in parsing is to open the xml file for input.


Type the following at the prompt:


Press Enter.

>>> from Bio.Blast import NCBIXML


press enter.

Next import the module NCBIXML from Bio.Blast package.


Press Enter.

Type,


>>>records = NCBIXM.parse(result)

>>>blast_record = records.next()

press enter

Type the following lines to parse the Blast output.


A BLAST record contains all the information you want to extract from the BLAST output.

Cursor on the terminal. Let us print out some information about all hits in our blast report greater than a particular threshold.


Type the following


For a match to be significant, expect score should be less than 0.01.

Type the following code.


for alignment in blast_record.alignments:

for hsp in alignment.hsps:

if hsp.expect <0.01:

print('****Alignment****')

print('sequence:', alignment.title)

print('length:', alignment.length)

print('score:', hsp.score)

print('gaps:', hsp.gaps)

print('e value:', hsp.expect)

print(hsp.query[0:90] + '...')

print(hsp.match[0:90] + '...')

print(hsp.sbjct[0:90] + '…')


Press Enter key twice.

For each hsp that is high scoring pair, we get the title, length, gaps and expect value.


We also print strings containing the query (query), the aligned database sequence (sbjct) and string specifying the match and mismatch positions (match).


Press Enter key twice to get output.

Highlight the output. Observe the output:

For each alignment we have length, score, gaps, evalue and strings.

Cursor on the terminal. You can extract the required information using other functions available in Bio.Blast package.


We have come to the end of this tutorial.

Slide Number 11

Summary


Let's summarize,

In this tutorial we have learnt to,

  • Run BLAST for the query nucleotide sequence using GI number.
  • And parse the BLAST output file using Bio.Blast.Record module.
Slide Number 12

Assignment


(use blastp)

result = NCBIWWW.qblast("blastp", "nr", 386828)

non-redundant protein database (nr).

For the assignment,


Run BLAST

Search for a protein sequence of your choice.


Save the output file and parse the data contained in the file.


Your completed assignment should have the following lines of code :

As shown in this text file.


Since our query is protein sequence, here we have used blastp program.


And nr non-redundant protein database for the BLAST search.

Slide Number 13

Acknowledgement

The video at the following link summarizes the Spoken Tutorial project.

Please download and watch it.

Slide Number 14 The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.

For more details, please write to us.

Slide number 15 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

More information on this Mission is available at the link shown.

Slide number 15 This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

Nancyvarkey, Snehalathak