Biopython/C2/Manipulating-Sequences/English

From Script | Spoken-Tutorial
Revision as of 17:25, 23 July 2015 by Snehalathak (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Visual Cue
Narration
Slide Number 1

Title Slide

Welcome to this tutorial on Manipulating Sequences.
Slide Number 2

Learning Objectives

In this tutorial, we will use Biopython tools to;


1. Generate a random DNA sequence .


2. Slice a DNA sequence at specified locations.


3. Join two sequences together to form a new

sequence that is to Concatenate.

Slide Number 3

Learning Objectives (Biopython Functions)

4. Find the length of the sequence.


5. Count the number of individual bases or part of the string.


6. Find a particular base or part of the string.


7. Convert a sequence object to a mutable sequence object.

Slide Number 4

Pre-requisites

To follow this tutorial you should be familiar with,


Undergraduate Biochemistry or Bioinformatics


And Basic Python programming

If not:

Refer to the Python tutorials at the given link.

Slide Number 5

System Requirement

To record this tutorial I am using,

Ubuntu OS version. 14.10

Python version 2.7.8

Ipython interpretor version 2.3.0

Biopython version 1.64

Press ctrl, alt and t simultaneously.

At the prompt, type ipython.

Let me open the terminal and start ipython interpretor.


Press ctrl, alt and t keys simultaneously.


At the prompt, type ipython and press enter.


Ipython prompt appears on the screen.

Generating random DNA sequence in python


Using Biopython we can generate a sequence object for a random DNA sequence of any specified length.


Let us now generate a sequence object for a DNA sequence of 20 bases.


At the prompt. Type


import random


Press enter

Fr At the prompt type


from Bio.Seq import Seq

Next import Seq module from Bio package.


Often Seq is pronounced as seek.

Type,

from Bio.Seq import Seq


Press Enter

At the prompt type,


From Bio dot Seq import Seq

(from Bio.Seq import Seq )


Press Enter

Cursor on the terminal. We will use Bio.Alphabet module to specify the alphabets in the DNA sequence.
Type,

>>>from Bio.Alphabet import generic_dna

Type,


from Bio dot alphabet import generic underscore dna.


(from Bio.Alphabet import generic_dna)


Press enter

Type,

dna1 = Seq( .join(random.choice('AGTC') for _ in range(30)),generic_dna)


Press enter

Type the following command to create a sequence object for the random DNA sequence;


Store the sequence in a variable dna1


Please note in this command use two single quotes instead of a double quote.


Press enter.

type

>>> dna1

Press enter

For the output, type dna1.

Press enter

Highlight output. The output shows the sequence object for the random DNA sequence.
Cursor on the terminal.

Press up arrow key

Press enter.

type

>>> dna1

Press enter

If you want a new sequence, press up arrow key to get the same command as above.


Press enter.

For the output, type the variable name, dna1.

Press enter

Cursor on the terminal. The output shows a new DNA sequence, which is different from the first one.
Slide number 6and 7

Sequence Objects

About Sequence Objects


The sequence objects usually act like normal Python strings.


So follow the normal conventions as you do for Python strings


In Python,We count the characters in the string starting from 0 instead of 1.


The first character in the sequence is position zero.

Cursor on the terminal.


At the prompt type,


string1 = my_seq[0:6]


Highlight the first string from 1-5th character.


Back to the terminal.


Often you many need to work with only a part of the sequence.


Now lets see how to extract parts of the string and store them as sequence objects.


For example we will slice the DNA sequence at two positions.


First between bases 6 and 7.


This will extract a fragment from the beginning of the sequence to the 6th base in the sequence.

Highlight the second string from 18th -30th character. The second slice will be between bases 11 and 12.


The second fragment will be from the 12th base to the end of the sequence.

At the prompt type,


Type the following command at the prompt to extract the first fragment.


String1 equal to dna1 within brackets 0 semicolon 6.


(string1 = dna1[0:6] )


string1 is the variable to store the first fragment.


The rest of the command follows as in normal Python.


Enclosed in these brackets are the start and stop positions separated by a colon.


The positions are inclusive of the start, but exclusive of the stop position.


Press Enter

Type,

string1

press enter

To view the output

Type, string1,press enter.


The output shows the first fragment as the sequence object.

Type


string2 = my_seq[11:20]


press enter

To extract the second string from the sequence,


Press up arrow key and edit the command as follows:


Change the name of the variable to string2, and positions to 11 and 20.



Type string2

press enter.

For the output

Type

string2

press enter.


Now we have the 2nd fragment also as a sequence object.

Type

dna2 = string1 + string2


press enter


Let us concatenate, that is, add the two strings together to form a new fragment:


Store the new sequence in a variable dna2.


Type,


dna2 equal to string1 plus string2

(dna2 = string1 + string2)


press enter


Please note;

we can not add sequences with incompatible alphabets.


That is we can not concatenate a DNA sequence and a protein sequence, to form a new sequence.


The two sequences must have the same alphabet attribute.

Type

dna2

Press enter

To view the output, type dna2.

Press enter


The output shows a new sequence which is a combination of string1 and string2.

Cursor on the terminal To find the length of the new sequence, we will use len function.
Type

len(dna2) press Enter


Type

len within parenthesis dna2.


(len(dna2))


press Enter

Output shows the sequence as 15 bases long.

Type,

my_seq.count("A")

Press enter


We can also count the number of individual bases present in the sequence.


To do so we will use count function.

For example to count the number of alanines present in the sequence .


Type the following command.


dna2 dot count within parenthesis within doublequotes alphabet A.


(dna2.count("A"))


press enter


The output shows the number of alanines present in the sequence dna2.

Type

dna1.find(“AT”)

Press enter

To find a particular base or part of the string we will use find function.


Type


dna2 dot find within parenthesis within doublequotes GC.


dna2.find("GC")


press enter


The output indicates the position of the first instance of the appearance of GC in the string.

Cursor on the terminal. Normally a sequence object cannot be edited.


To edit a sequence we have to convert it to the mutable sequence object.

Type,

dna3=dna2.tomutable()

press enter

To do so, type,

dna3 equal to dna2 dot tomutable open and close parenthesis.


(dna3=dna2.tomutable())


press enter

Type

dna3

press enter

For the output, type

dna3

press enter

Type

>>>mutable_seq[5]A

Press enter

Now the sequence object can be edited.


Let us replace a base from the sequence.


For example to replace a base present at 5th position to alanine, type


dna3 within brackets 5 equal to within double quotes alphabet A.


(dna3[5]=A)


Press enter


For the output type dna3 press enter.


Observe the output, the cytosine at position 5 is replaced with alanine.

Type,

dna3[6:10]=ATGC

press enter


Highlight the output.

To replace a part of the string,

type the following command.


Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC


(dna3[6:10]=ATGC)


press enter


For the output type dna3 press enter.


The output shows the 4 bases from position the 6 to 9 are replaced with new bases ATGC.

Type,

dna4=mutable_seq.toseq()


For the output type,

dna4

Press enter

Once you have edited your sequence object, convert it back to the “read only” form.


Type the following


dna4 equal to dna3 dot toseq open and close parenthesis.


(dna4=dna3.toseq())


Press enter.


For the output type dna4.

Press enter



Slide Number 8

Summary

Let's summarize,

In this tutorial, we have learnt to,

1. Generate a random DNA sequence.

2. Slice a DNA sequence at specified locations

3. Join two sequences together to form a new

sequence that is to Concatenate.



Slide Number 9

Summary

4. We have also learnt how to use

len, count and find functions.

5. Convert a sequence object to a mutable sequence

object.

6. And replace a base or part of the string.

Slide Number 10

Assignment


For the assignment


  • Generate a random DNA sequence of 30 bases.
  • Using Biopython tools calculate the GC percentage and Molecular Weight of the sequence.
  • Your completed assignment will be as follows.


GC content of the DNA sequence.

Type,


from Bio.SeqUtils import GC

Press enter

Type

GC(dna)

Press enter


The out put shows the GC content as percentage.
Type,

from Bio.SeqUtils import molecular_weight

press enter


Then type,

molecular_weight(dna)


press enter

The output shows the molecular weight of the DNA sequence.
Slide Number 11

Acknowledgement

This video summarizes the Spoken Tutorial project

If you do not have good bandwidth, you can download and watch it.

Slide Number 12 We Conduct workshops and give certificates.

Please contact us.

Slide number 13 Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India
Slide number 13 This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

Snehalathak