Python for Biologists/C2/Introduction-to-Python-for-Biologists/English

From Script | Spoken-Tutorial
Revision as of 10:58, 6 August 2014 by Trupti (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of script: Introduction to Python for Biologists

Author: Trupti Rajesh Kini & Snehalatha

Keywords: video tutorial, Python, DNA seqences, Protein sequences, Biologists


Visual Cue Narration
Slide 1 Welcome to the spoken-tutorial on Introduction to Python for Biologists.
Slide 2

Learning Objectives

In this tutorial we will learn,
  • Installation of Python/IPython interpreter.
  • Simple Python programs using examples of DNA and Protein sequences.



Slide 3

System Requirements


To record this tutorial, I am using
  • Ubuntu OS version 12.04
  • Python 3.2.3
  • IPython 0.12.1


Slide 4

Prerequisites


To practice this tutorial you should be familiar with,
  • Basic biochemistry

You can also refer to Spoken Tutorials on Python for better understanding of this tutorial.


These are available at www.spoken-tutorial.org

Slide 5

Why Python for biologists?

Some of the features of Python useful for biologists are as follows:


  • Python has many tools to write small
    programs that are useful in biology.
  • It has a consistent syntax.
  • It has built-in-libraries for common tasks.
  • We can manipulate DNA and protein sequences easily .


Slide 6

Why Pyhton for biologists?

* It has a large user base as it is commonly used in bioinformatics.
  • Listed here are examples of few bioinformatic tools in Python:

Biopython, Modeller, chemopy, BLASTorage, Pymol


For more information, refer the following website :

http://pythonforbiologists.com

Slide 7

Installation

* Python comes installed, by default on Ubuntu.
  • IPython is an interactive terminal for Python
  • To install Python on Windows, Mac OS and Android devices, visit www.python.org


Open terminal by pressing Ctrl+Alt+T at the same time. First of all, let's check if IPython3 is installed on our system.


To do so, open the termiI have changed the text here.nal by pressing Ctrl+Alt+T simultaneously.


Now, at the prompt, type ipython3 and press Enter.

Cursor on the terminal


Point to “>>>” symbol.

You will see few lines of information on Python like, the version number etc.


You will also see the IPython promptI added this. Pls verify In[1] on the terminal.

Cursor on the terminal. In case you don't, then manually install the latest version of IPython, by typing


sudo apt-get install ipython3

and press Enter.

Cursor on the terminal. Wait for a few minutes for the installation to complete.

Python3 does not overwrite the default Python.on the system.Here we are installing manually. But what happens to the already installed Python? Does it get overwritten?

Open the terminal

Type ipython3 and press Enter.

To check whether ipython3 is installed successfully on your system,


Type ipython3 at the prompt and press Enter.

Point to IPython prompt

point to In[1]

In[1] indicates that IPython3 is installed successfully.
Cursor on terminal Let's type a few simple Python commands with an example of a DNA sequence.
Cursor on terminal To begin with, we will store data, i.e DNA sequence in a variable called my_DNA.
Slide 8


What is a string?

In Python language, data such as protein and DNA sequences are called as strings.


A string is a data in the form of a text.

Type in the terminal,

my_DNA = "ATGCGCAT"

Highlight my_DNA

Press Enter

Type,

my_DNA equal to ATGCGCAT within double quotes


Press Enter.

Cursor on the panel. We call this as assigning a variable.
Cursor on the panel For writing a code, we can use the variable name instead of the string itself.
Type,

my_DNA and press Enter

To print this sequence, type

my_DNA and press Enter.

Highlight the output,

ATGCGCAT

This will print the DNA sequence, ATGCGCAT within single quotes as it is a string.
Cursor on the terminal. Now let us print the sequence on two separate lines.
Press up arrow

Add \n and DNA after ATGCGCAT .

my_DNA = "ATGCGCAT\nDNA"

Press Enter

Press up arrow on the key board till we get this command on the terminal.

my_DNA = "ATGCGCAT”

Lets edit this line.

Type \n and DNA after ATGCGCAT within double quotes.

Press Enter.

Type,

print(my_DNA) and press Enter

Type,

print(my_DNA) and press Enter.



Highlight the output

ATGCGCAT

DNA

The output prints the sequence on two separate lines as,

ATGCGCAT

DNA


<<PAUSE>>

Slide 9

Assignment

As an assignment,
  • Using example of a short protein
    sequence: CNLTFTWPEADFYPI Protein
  • Print the above sequence on a single line, and also on two separate lines.

<<PAUSE>>

Cursor on the terminal Let's go back to the terminal and learn a few more functions and methods.
Another useful built-in tool in Python is the len function.


It is used to calculate the length of a string.

Press up arrow key

my_DNA = "ATGCGCAT"

Press up arrow on the key board till we get this command on the terminal.

my_DNA = "ATGCGCAT”

Press Enter

Type:

len(my_DNA)

To find the length of the DNA sequence in a variable, type,

len within brackets my_DNA


Press Enter.

Cursor on the terminal The output on the screen shows the number 8.


This is the length of the DNA sequence stored in the variable my_DNA.

Slide 10


Assignment

Another assignment for you
  • Calculate the length of the given DNA sequence `ATGGCATGCGC'
  • Store the output in a variable.


Lets go back to the terminal
Many times in biochemistry, sequences are represented either in  lowercase or uppercase  alphabets.



Type protein =” ”

Press Enter

Type my_DNA.lower()

Press Enter.

To convert the uppercase alphabets in a string to  lowercase:

We make use of lower() method.

Type,

my_DNA=”ATGCGCAT”.  Press Enter

Then type, my_DNA.lower().

In a method, we write,

  • The name of the variable first,
  • followed by a period(.),
  • then the name of the method
  • then we open and close parentheses.

Press Enter.

Highlight  'atgcgcat' The output shows the string in lowercase.
Slide 11

Assignment

As an assignment,

Using example of a short protein

sequence:  cnltftwpeadfypi

Convert the sequence to uppercase.

Hint: Use upper() method.

Back to terminal again
Type

my_protein = "alspadkanl"

Lets take an example of an amino acid sequence.

Store it in a variable called protein

my_protein = "alspadkanl"



Cursor on the terminal To find out the number of times an amino acid or a sequence of amino acids occurs in a string.

We make use of count function



Type

my_protein.count ('a')


Press enter

For example to know the number of times amino acid Alanine occurs in the string


Type

my_protein.count ('a')


Press enter

Cursor on the terminal Output shows number 3.

There are 3 Alanines in the string.



Type

my_protein.count('l')

Press enter

Similarly to find number of Leucines in the string

Type

my_protein.count('l')

Press enter



Cursor on the terminal We get an output as 2, there are 2 Leucines in the string.
Cursor on the terminal Similarly we can use DNA or an RNA sequence as string to count the ocurrences of basepairs .
Slide 12

Summary


Let us summarize,

In this tutorial we have learnt:


  • Installation of IPython Interpreter
  • Storing data in variables using

examples of DNA and Protein

sequences.


  • Printing a sequence in single and on two separate lines


Slide 13

Summary

* Find the length of the string
  • Change case of the string
  • Count the number of times a character appears in a string


Slide 13

Assignment

Here is an assignment,

Calculate GC content in the given DNA sequence.

'ATGGCATGCGC'



Slide 14

About Spoken Tutorial Project


The video available at the following link summarizes the Spoken Tutorial project. Pls watch it.
Slide 15

About Spoken Tutorial workshops


The Spoken Tutorial Project Team conducts workshops and gives certificates to those who pass an online test.


For more details, please write to us.

Slide 16

Acknowledgement


Spoken Tutorial ProjecThese 3 slides remain as before, on LaTeX. Only the narration will now be as given here.t is supported by the NMEICT, MHRD, Government of India.

More information on this Mission is available at this link.

This script is contributed by Snehalatha and Trupti Kini.

And this is Trupti Kini from IIT Bombay signing off.

Thanks for joining.

Contributors and Content Editors

Nancyvarkey, Snehalathak, Trupti