Python for Biologists/C2/Introduction-to-Python-for-Biologists/English

From Script | Spoken-Tutorial
Revision as of 22:10, 12 August 2014 by Trupti (Talk | contribs)

Jump to: navigation, search

Title of script: Introduction to Python for Biologists

Author: Trupti Rajesh Kini & Snehalatha

Keywords: video tutorial, Python, DNA seqences, Protein sequences, Biologists


Visual Cue Narration
Slide 1 Welcome to the spoken-tutorial on Introduction to Python for Biologists.
Slide 2

Learning Objectives

In this tutorial we will learn,
  • Installation of Python/IPython interpreter.
  • Simple Python programs using examples of DNA and Protein sequences.


Slide 3

System Requirements


To record this tutorial, I am using
  • Ubuntu OS version 12.04
  • Python 3.2.3
  • IPython 0.12.1


Slide 4

Prerequisites


To practice this tutorial you should be familiar with,
  • Basic biochemistry

You can also refer to Spoken Tutorials on Python for better understanding of this tutorial.

These are available at www.spoken-tutorial.org

Slide 5

Why Python for biologists?

Some of the features of Python useful for biologists are as follows:
  • Python has many tools to write small programs that are useful in biology.
  • It has a consistent syntax.
  • It has built-in-libraries for common tasks.
  • We can manipulate DNA and protein sequences easily .


Slide 6

Why Pyhton for biologists?

  • It has a large user base as it is commonly used in bioinformatics.
  • Listed here are examples of few bioinformatic tools in Python:

Biopython, Modeller, chemopy, BLASTorage, Pymol

For more information, refer the given website :

http://pythonforbiologists.com

Slide 7

Installation

  • Python comes installed, by default on Ubuntu.
  • IPython is an interactive terminal for Python
  • To install Python on Windows, Mac OS and Android devices, visit the given link
    www.python.org


Open terminal by pressing Ctrl+Alt+T at the same time. Open the terminal by pressing Ctrl+Alt+T simultaneously.

Python comes installed, by default on Ubuntu.



Type sudo apt-get install ipython3

and press Enter.

In case you don't, then manually install the latest version of IPython, by typing

sudo apt-get install ipython3

and press Enter.

Give root password if asked.

Cursor on the terminal. Wait for a few minutes for the installation to complete.

Note : Python3 does not overwrite the default Python on the system

Open the terminal

Type ipython3 and press Enter.

To check whether ipython3 is installed successfully on your system,

Type ipython3 and press Enter.

Cursor on the terminal

Highlight the prompt

You will see few lines of information on Python like, the version number etc.

You will also see the Ipython prompt on the terminal.

Prompt indicates that Ipython is installed successfully.

Cursor on terminal Let's type a few simple Python commands with an example of a DNA sequence.
Cursor on terminal To begin with, we will store data, i.e DNA sequence in a variable called my_DNA.
Slide 8

What is a string?

In Python language, data such as protein and DNA sequences are called as strings.

A string is a data in the form of a text.

Type in the terminal,

my_DNA = "ATGCGCAT"

Highlight my_DNA

Press Enter

Let us go back to the terminal.

Type,

my_DNA is equal to within double quotes ATGCGCAT.Press Enter.

Highlight my_DNA We call this as assigning a variable.
Highlight my_DNA For writing a code, we can use the variable name instead of the string itself.
Type,

print(my_DNA) and press Enter

To print the DNA sequence,

we will use print function.

For that type,

print(my_DNA) and press Enter.

Highlight the output,

ATGCGCAT

We get the sequence as output.
Cursor on the terminal. Now let us print the sequence on two separate lines.
Press up arrow

Add \n and DNA after ATGCGCAT .

my_DNA = "ATGCGCAT\nDNA"

Press Enter

Press up arrow on the key board till we get this command on the terminal.

my_DNA = "ATGCGCAT”

Lets edit this line.

Type \n and DNA after the sequence within double quotes.

Press Enter.

Type,

print(my_DNA) and press Enter

Type,

print(my_DNA) and press Enter.



Highlight the output

ATGCGCAT

DNA

The output prints the sequence on two separate lines ,



Slide 9

Assignment

As an assignment,
  • Using example of a short proteinsequence given
  • Print the sequence on a single line, and print the sequence on two separate lines.


Cursor on the terminal Let us now learn a few more functions and methods.
Slide 10 Another useful built-in tool in Python is the len function.

It is used to calculate the length of a string.

Let us go back to the terminal.

Press up arrow key

my_DNA = "ATGCGCAT"

Press Enter

Let us go back to the terminal.

Press up arrow on the key board till we get this command on the terminal.

my_DNA = "ATGCGCAT”

Press Enter

Type:

len(my_DNA)

To find the length of the DNA sequence in a variable, type,

len within brackets my_DNA

Press Enter.

Cursor on the terminal The output on the screen shows the number 8.

This is the length of the DNA sequence stored in the variable my_DNA.

Slide 11

Assignment

Another assignment for you
  • Calculate the length of the given DNA sequence `ATGGCATGCGC'
  • and Store the output in a variable.


Slide 12 Many times in biochemistry, sequences are represented either in  lowercase or uppercase  alphabets.



Type,

my_DNA=”ATGCGCAT”

Press Enter

Type my_DNA.lower()

Press Enter.

To convert the uppercase alphabets in a string to  lowercase:

We make use of lower() method.

Let us go back to the terminal.

Type,

my_DNA=”ATGCGCAT”.  Press Enter

Then type, my_DNA.lower().

In a method, we write,

  • The name of the variable first,
  • followed by a period(.),
  • then the name of the method
  • then we open and close parentheses.

Press Enter.

Highlight  'atgcgcat' The output shows the string in lowercase.
Slide 13

Assignment

As an assignment,

Using example of a short protein

sequence given

Convert the sequence to uppercase.

Hint: Use upper() method.

Let us go back to terminal again
Type

my_protein = "alspadkanl"

Lets take an example of an amino acid sequence.

Store it in a variable called my_protein

my_protein = "alspadkanl"



Slide 14 To find out the number of times an amino acid or a sequence of amino acids occurs in a string.

We make use of count function



Type

my_protein.count ('a')

Press Enter

Let us go back to the terminal.

For example to know the number of times amino acid Alanine occurs in the string

Type

my_protein.count ('a')

Press Enter

Highlight 3 Output shows number 3.

There are 3 Alanines in the string.



Type

my_protein.count('l')

Press Enter

Similarly to find number of Leucines in the string

Type

my_protein.count('l')

Press Enter



Highlight 2 We get an output as 2, there are 2 Leucines in the string.
Cursor on the terminal Similarly we can use DNA or an RNA sequence as string to count the ocurrences of basepairs .
Slide 15

Summary


Let us summarize,

In this tutorial we learnt:

  • Installation of IPython Interpreter
  • Storing data in variables using

examples of DNA and Protein

sequences.


  • Printing a sequence in single and on two separate lines


Slide 16

Summary

  • Find the length of the string
  • Change case of the string
  • Count the number of times a character appears in a string


Slide 17

Assignment

Here is an assignment,

Calculate GC content in the given DNA sequence.

'ATGGCATGCGC'



Slide 18

About Spoken Tutorial Project


The video available at the following link summarizes the Spoken Tutorial project. Pls watch it.
Slide 19

About Spoken Tutorial workshops


The Spoken Tutorial Project Team conducts workshops and gives certificates to those who pass an online test.

For more details, please write to us.

Slide 20

Acknowledgement


Spoken Tutorial Project is supported by the NMEICT, MHRD, Government of India.

More information on this Mission is available at this link.

This script is contributed by Snehalatha and Trupti Kini.

And this is Trupti Kini from IIT Bombay signing off.

Thanks for joining.

Contributors and Content Editors

Nancyvarkey, Snehalathak, Trupti