Difference between revisions of "Biopython/C2/Introduction-to-Biopython/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
Line 39: Line 39:
 
*'''Python''' version 2.7.3
 
*'''Python''' version 2.7.3
 
*'''Ipython''' version 0.12.1
 
*'''Ipython''' version 0.12.1
*'''Biopython''' 1.58
+
*'''Biopython''' version 1.58
  
 
|-
 
|-

Latest revision as of 10:46, 26 June 2015

Visual Cue
Narration
Slide Number 1

Title Slide

Welcome to this tutorial on Introduction to Biopython
Slide Number 2

Learning Objectives

In this tutorial, we will learn about
  • Important features of Biopython.
  • Information regarding download and installation on Linux Operating System.
  • And translation of a DNA sequence to a protein sequence using Biopython tools.
Slide Number 3

Pre-requisites

To follow this tutorial you should be familiar with,
  • Undergraduate Biochemistry or Bioinformatics
  • And basic Python programming

Refer to the Python tutorials at the given link.

Slide Number 4

System Requirement

To record this tutorial I am using
  • Ubuntu OS version 12.04
  • Python version 2.7.3
  • Ipython version 0.12.1
  • Biopython version 1.58
Slide Number 5.

About Biopython

Biopython is a collection of modules for computational biology.

It can perform most basic to advanced tasks required for bioinformatics.

Slide number 6

Biopython functionality

Biopython tools are used for:

1. Parsing that is extracting information from various file formats such as FASTA, Genbank etc.

2. Download data from database websites such as NCBI, ExPASY etc

3. Run Bioinformatic algorithms such as BLAST

Slide Number 7

Biopython functionality

4. It has tools for performing common operations on sequences.

For example to obtain complements, transcription, translation etc.

5. Code for dealing with alignments.

6. And code to split up tasks into separate processes.

Slide Number 8

Download

Information regarding download.

Biopython package is not part of the Python distribution.

It needs to be downloaded independently.

For details refer the following link

http://biopython.org/wiki/Download

Slide Number 9

Installation for Ubuntu/Linux systems

Installation on Linux system.
  • Install Python, Ipython and Biopython packages using Synaptic Package Manager.
  • Prerequisite software will be installed automatically.
  • Additional packages must be installed for graphic outputs and plots.
  • Open the terminal by pressing Ctrl, Alt and T keys simultaneously.
Cursor on the terminal I have already installed Python, Ipython and Biopython on my system.

Start Ipython interpreter by typing ipython and press Enter.

IPython prompt appears on screen.

Open the terminal and check installation of biopython To check the installation of Biopython, at the prompt type: import Bio

Press Enter.

If you don't get any error message, it means Biopython is installed.


Here let me remind you, Python language is case sensitive.

Take precaution while typing keywords, variables or functions.

For instance, in the above line “i” in import is lowercase.

And “B” is uppercase in Bio.

Cursor on the terminal. In this tutorial, we will make use of Biopython modules to translate a DNA sequence.
Slide Number 10

DNA Translation

It involves the following steps.
  1. First create a sequence object for coding DNA strand.
  2. Next transcription of coding DNA strand to mRNA.
  3. Finally translation of mRNA to a protein sequence.
Slide Number 11

Sequence Object

We will be using the coding DNA strand shown on this slide, as an example.

It codes for a small protein sequence.

The first step is to create a sequence object for the above coding DNA strand.

Let us go back to the terminal.

Open the terminal

Type:

>>> from Bio.Seq import Seq

For creating a sequence object, import the Seq module from Bio package.

The Seq module provides methods to store and process sequence objects.

At the prompt, type from Bio dot Seq import Seq

press Enter.

Cursor on the terminal. Next, specify the alphabets in the strand explicitly, when creating your sequence object.

That is to specify whether the sequence of alphabets code for nucleotides or amino acids.

>>> from Bio.Alphabet import IUPAC To do so we will use IUPAC module from Alphabet package.

At the prompt, type:

from Bio dot Alphabet import IUPAC

Press Enter.

Note that, we have used import and from statements to load Seq and IUPAC modules.

Type >>> cdna = Seq("ATGTTACACTCCCGATGA", IUPAC.unambiguous_dna)

Press enter

cdna

press enter

Out put

Seq(ATGTTACACTCCCGATGA”, IUPAC unambiguousDNA())

Store the sequence object in a variable called cdna.

At the prompt, type: cdna equal to Seq as in normal strings.

Enclose the sequence within double quotes and parentheses.


We know our sequence is a DNA fragment.

So, type: unambiguous DNA alphabet object as an argument.


For the output type: cdna; press Enter

The output shows the DNA sequence as a sequence object.

Cursor on the terminal Let’s transcribe the coding DNA strand into the corresponding mRNA.

We will use the Seq module's built-in “transcribe” method.

Type

>>> mrna = coding_dna.transcribe()

press enter

Type

mrna press enter

>>> mrna

Seq('AUGUUACACUCCCGAUGA', IUPACUnambiguousRNA())

Type the following code:

Store the output in a variable mrna.


At the prompt type,

mrna equal to cdna dot transcribe open and close parentheses

press Enter.


For the output, type mrna

press Enter.

Highlight the output Observe the output.


The transcribe method replaces the Thiamin in the DNA sequence by Uracil.

Cursor on the terminal Next, to translate this mRNA to corresponding protein sequence, use the translate method.
Type

>>> mrna.translate()

press enter

Cursor on the terminal.

Output:

protein

Seq('MLHSR*', HasStopCodon(IUPACProtein(), '*'))

Type the following code:

protein equal to mrna dot translate open and close parentheses

press Enter.


The translate method translates RNA or DNA sequence using the standard genetic code, if unspecified.

Cursor on the terminal.

Output:

protein

Seq('MLHSR*', HasStopCodon(IUPACProtein(), '*'))

The output shows an amino acid sequence.

The output also shows information regarding the presence of stop codons in the

translated sequence.


Observe the asterix at the end of the protein sequence.

It indicates the stop codon.

Cursor on the terminal. In the above code, we have used a coding DNA strand for transcription.

In Biopython, transcribe method works only on coding DNA strand.


However in real biological systems the process of transcription starts with a template strand.

Type ,

coding_dna = template_dna.reverse_complement()

If you are starting with a template strand,
  • convert it to coding strand
  • by using reverse complement method, as shown on the terminal.
Cursor on the terminal Follow the rest of the code as shown above, for the coding strand.
Cursor on the terminal Using methods in Biopython we have translated a DNA sequence to a protein sequence.
Cursor on the terminal DNA sequence of any size can be translated to a protein using this code.
Slide Number 12

Summary

Let's summarize.

In this tutorial we have learnt

  • Important features of Biopython.
  • Information regarding download and installation on Linux OS.
  • Create a sequence object for the given DNA strand.
Slide Number 13

Summary

  • Transcription of the DNA sequence to mRNA.
  • Translation of mRNA to protein sequence.
Slide Number 14

Assignment

Now for the assignment.
  • Translate the given DNA sequence into protein sequence.
  • 'ATGGCCCTATAGTGTCTAAGCTAG'
  • Observe the output.
  • The protein sequence has an internal stop codon.
  • As it happens in nature, translate the DNA till first in frame stop codon.
Cursor on the terminal. Your completed assignment should have the following code.

Notice that, we have used 'to underscore stop' argument in the translate method.


Notice the output.

The stop codon itself is not translated.

The stop symbol is not included at the end of your protein sequence.

Slide Number 15

Acknowledgement

This video summarizes the Spoken Tutorial project.

If you do not have good bandwidth, you can download and watch it.

Slide Number 16 The Spoken Tutorial Project Team conducts workshops and gives certificates for those who pass an online test.

For more details, please write to us.

Slide number 17 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

More information on this Mission is available at this link.

This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

Nancyvarkey, Snehalathak