Biopython/C2/Manipulating-Sequences/English-timed

From Script | Spoken-Tutorial
Revision as of 12:13, 6 June 2016 by PoojaMoolya (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Time
Narration
00:01 Welcome to this tutorial on Manipulating Sequences.
00:06 In this tutorial, we will use Biopython tools To Generate a random DNA sequence .
00:13 Slice a DNA sequence at specified locations.
00:17 Join two sequences together to form a new sequence that is to Concatenate.
00:22 Find the length of the sequence.
00:26 Count the number of individual bases or part of the string.
00:31 Find a particular base or part of the string.
00:35 Convert a sequence object to a mutable sequence object.
00:40 To follow this tutorial you should be familiar with, Undergraduate Biochemistry or Bioinformatics
00:47 And Basic Python programming
00:51 If not: Refer to the Python tutorials at the given link.
00:56 To record this tutorial I am using, Ubuntu OS version. 14.10
01:03 Python version 2.7.8
01:07 Ipython interpretor version 2.3.0
01:12 Biopython version 1.64
01:16 Let me open the terminal and start ipython interpretor.
01:21 Press ctrl, alt and t keys simultaneously.
01:26 At the prompt, type ipython and press enter.
01:31 Ipython prompt appears on the screen.
01:35 Using Biopython we can generate a sequence object for a random DNA sequence of any specified length.
01:44 Let us now generate a sequence object for a DNA sequence of 20 bases.
01:50 At the prompt. Typeimport random press enter.
01:56 Next import Seq module from Bio package.
02:01 Often Seq is pronounced as seek.
02:06 At the prompt type,From Bio dot Seq import Seq . Press enter.
02:15 We will use Bio.Alphabet module to specify the alphabets in the DNA sequence.
02:22 Type,from Bio dot alphabet import generic underscore dna. press enter.
02:32 Type the following command to create a sequence object for the random DNA sequence;
02:38 Store the sequence in a variable dna1
02:42 Please note in this command use two single quotes instead of a double quote. Press enter.
02:50 For the output, type dna1. Press enter
02:55 The output shows the sequence object for the random DNA sequence.
03:00 If you want a new sequence, press up arrow key to get the same command as above. Press enter.
03:11 For the output, type the variable name, dna1. Press enter
03:17 The output shows a new DNA sequence, which is different from the first one.
03:23 About Sequence Objects
03:25 The sequence objects usually act like normal Python strings.
03:30 So follow the normal conventions as you do for Python strings
03:35 In Python, we count the characters in the string starting from 0 instead of 1.
03:41 The first character in the sequence is position zero.
03:45 Back to the terminal.
03:47 Often you many need to work with only a part of the sequence.
03:52 Now lets see how to extract parts of the string and store them as sequence objects.
03:58 For example we will slice the DNA sequence at two positions.
04:04 First between bases 6 and 7.
04:08 This will extract a fragment from the beginning of the sequence to the 6th base in the sequence.
04:15 The second slice will be between bases 11 and 12.
04:20 The second fragment will be from the 12th base to the end of the sequence.
04:26 Type the following command at the prompt to extract the first fragment.
04:31 String1 equal to dna1 within brackets 0 semicolon 6.
04:39 string1 is the variable to store the first fragment.
04:43 The rest of the command follows as in normal Python.
04:47 Enclosed in these brackets are the start and stop positions separated by a colon.
04:53 The positions are inclusive of the start, but exclusive of the stop position. Press Enter
05:01 To view the output type, string1, Press enter.
05:04 The output shows the first fragment as the sequence object.
05:10 To extract the second string from the sequence, Press up arrow key and edit the command as follows:
05:17 Change the name of the variable to string2, and positions to 11 and 20.
05:24 For the output type string2. Press enter.
05:30 Now we have the 2nd fragment also as a sequence object.
05:34 Let us concatenate, that is, add the two strings together to form a new fragment:
05:42 Store the new sequence in a variable dna2.
05:46 Type,dna2 equal to string1 plus string2. Press enter
05:53 Please note; we cannot add sequences with incompatible alphabets.
05:59 That is we cannot concatenate a DNA sequence and a protein sequence, to form a new sequence.
06:07 The two sequences must have the same alphabet attribute.
06:12 To view the output, type dna2. Press enter
06:17 The output shows a new sequence which is a combination of string1 and string2.
06:23 To find the length of the new sequence, we will use len function.
06:29 Type len within parenthesis dna2. Press enter
06:34 Output shows the sequence as 15 bases long.
06:39 We can also count the number of individual bases present in the sequence.
06:44 To do so we will use count function.
06:47 For example to count the number of alanines present in the sequence, type the following command dna2 dot count within parenthesis within doublequotes alphabet A.
07:02 Press enter
07:04 The output shows the number of alanines present in the sequence dna2.
07:10 To find a particular base or part of the string we will use find function.
07:16 Type dna2 dot find within parenthesis within doublequotes GC. Press enter
07:26 The output indicates the position of the first instance of the appearance of GC in the string.
07:32 Normally a sequence object cannot be edited.
07:35 To edit a sequence we have to convert it to the mutable sequence object.
07:41 To do so, type,dna3 equal to dna2 dot to mutable open and close parenthesis. Press enter
07:52 For the output, type dna3. Press enter
07:55 Now the sequence object can be edited.
07:59 Let us replace a base from the sequence.
08:01 For example to replace a base present at 5th position to alanine type dna3 within brackets 5 equal to within double quotes alphabet A. Press enter
08:19 For the output type dna3. Press enter.
08:24 Observe the output, the cytosine at position 5 is replaced with alanine.
08:31 To replace a part of the string, type the following command.
08:35 Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC. Press enter
08:45 For the output type dna3. Press enter.
08:52 The output shows the 4 bases from position the 6 to 9 are replaced with new bases ATGC.
09:01 Once you have edited your sequence object, convert it back to the “read only” form.
09:07 Type the following dna4 equal to dna3 dot to seq open and close parenthesis. Press enter.
09:19 For the output type dna4. Press enter.
09:25 Let's summarize,
09:27 In this tutorial, we have learnt to, Generate a random DNA sequence.
09:32 Slice a DNA sequence at specified locations
09:36 Join two sequences together to form a new sequence that is to Concatenate.
09:43 We have also learnt how to use len, count and find functions.
09:49 Convert a sequence object to a mutable sequence object and replace a base or part of the string.
09:57 For the assignment Generate a random DNA sequence of 30 bases.
10:02 Using Biopython tools calculate the GC percentage and Molecular Weight of the sequence.
10:09 Your completed assignment will be as follows.
10:13 The out put shows the GC content as percentage.
10:18 The output shows the molecular weight of the DNA sequence.
10:23 This video summarizes the Spoken Tutorial project
10:26 If you do not have good bandwidth, you can download and watch it.
10:30 We Conduct workshops and give certificates.
10:32 Please contact us.
10:35 Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India
10:43 This is Snehalatha from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Pratik kamble, Priyacst, Sandhya.np14