Difference between revisions of "Biopython/C2/Manipulating-Sequences/English-timed"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
 
{| Border=1
 
{| Border=1
! <center>Time</center>
+
|'''Time'''
! <center>Narration</center>
+
|'''Narration'''
  
 
|-
 
|-
Line 10: Line 9:
 
|-
 
|-
 
| 00:06
 
| 00:06
| In this tutorial, we will use Biopython tools:* To Generate a random DNA sequence
+
| In this tutorial, we will use '''Biopython''' tools: To generate a random DNA sequence
  
 
|-
 
|-
 
| 00:13
 
| 00:13
|* Slice a DNA sequence at specified locations
+
| Slice a DNA sequence at specified locations
  
 
|-
 
|-
 
| 00:17
 
| 00:17
|* Join two sequences together to form a new sequence that is to concatenate  
+
| Join two sequences together to form a new sequence that is to concatenate  
  
 
|-
 
|-
 
| 00:22
 
| 00:22
|* Find the length of the sequence
+
| Find the length of the sequence
  
 
|-
 
|-
 
| 00:26
 
| 00:26
|* Count the number of individual bases or part of the string
+
| Count the number of individual '''base'''s or part of the '''string'''
  
 
|-
 
|-
 
| 00:31
 
| 00:31
|* Find a particular base or part of the string.
+
| Find a particular base or part of the string.
  
 
|-
 
|-
 
| 00:35
 
| 00:35
|* Convert a sequence object to a mutable sequence object.
+
| Convert a '''sequence object''' to a mutable sequence object.
  
 
|-
 
|-
Line 54: Line 53:
 
|-
 
|-
 
| 01:03
 
| 01:03
|* '''Python''' version 2.7.8
+
| '''Python''' version 2.7.8
  
 
|-
 
|-
 
| 01:07
 
| 01:07
|* '''Ipython interpreter''' version 2.3.0
+
| '''Ipython interpreter''' version 2.3.0
  
 
|-
 
|-
 
| 01:12
 
| 01:12
|* '''Biopython''' version 1.64.
+
|'''Biopython''' version 1.64.
  
 
|-
 
|-
Line 74: Line 73:
 
|-
 
|-
 
|01:26
 
|01:26
|At the prompt, type: '''ipython''' and press '''Enter'''.
+
|At the prompt, type: "ipython" and press '''Enter'''.
  
 
|-
 
|-
Line 82: Line 81:
 
|-
 
|-
 
| 01:35
 
| 01:35
|Using Biopython we can generate a sequence object for a random DNA sequence of any specified length.
+
|Using Biopython, we can generate a '''sequence object''' for a random DNA sequence of any specified length.
  
 
|-
 
|-
Line 90: Line 89:
 
|-
 
|-
 
|01:50
 
|01:50
|At the prompt. Type'''import random''' press enter.
+
|At the prompt, type: "import random", press '''Enter'''.
  
 
|-
 
|-
 
| 01:56
 
| 01:56
| Next import '''Seq '''module from '''Bio '''package.
+
| Next, import '''Seq '''module from '''Bio '''package.
  
 
|-
 
|-
Line 102: Line 101:
 
|-
 
|-
 
| 02:06
 
| 02:06
| At the prompt type,'''From Bio dot Seq import Seq''' . Press enter.
+
| At the prompt, type: '''From Bio dot Seq import Seq'''. Press '''Enter'''.
  
 
|-
 
|-
Line 110: Line 109:
 
|-
 
|-
 
| 02:22
 
| 02:22
| Type,'''from Bio dot alphabet import generic underscore dna.''' press enter.
+
| Type: '''from Bio dot Alphabet import generic underscore dna.''' Press '''Enter'''.
  
 
|-
 
|-
 
| 02:32
 
| 02:32
| Type the following command to create a sequence object for the random DNA sequence;
+
| Type the following '''command''' to create a sequence object for the random DNA sequence.
  
 
|-
 
|-
 
| 02:38
 
| 02:38
|Store the sequence in a variable '''dna1'''
+
|Store the sequence in a variable '''dna1'''.
  
 
|-
 
|-
 
| 02:42
 
| 02:42
|Please note in this command use two single quotes instead of a double quote. Press enter.
+
|Please note: in this command, use two single quotes instead of a double quote. Press '''Enter'''.
  
 
|-
 
|-
 
| 02:50
 
| 02:50
| For the output, type '''dna1'''. Press enter
+
| For the output, type: '''dna1'''. Press '''Enter'''.
  
 
|-
 
|-
 
| 02:55
 
| 02:55
|  The output shows the sequence object for the random DNA sequence.
+
|  The output shows the '''sequence object''' for the random DNA sequence.
  
 
|-
 
|-
 
| 03:00
 
| 03:00
| If you want a new sequence, press up arrow key to get the same command as above. Press enter.
+
| If you want a new sequence, press up-arrow key to get the same command as above. Press '''Enter'''.
  
 
|-
 
|-
 
| 03:11
 
| 03:11
|For the output, type the variable name, '''dna1.''' Press enter
+
|For the output, type the variable name '''dna1.''' Press '''Enter'''.
  
 
|-
 
|-
 
| 03:17
 
| 03:17
| The output shows a new DNA sequence, which is different from the first one.  
+
| The output shows a new DNA sequence which is different from the first one.  
  
 
|-
 
|-
 
| 03:23
 
| 03:23
|  About '''Sequence Objects'''
+
|  About '''Sequence Objects''':
  
 
|-
 
|-
 
| 03:25
 
| 03:25
|The '''sequence objects''' usually act like normal Python strings.
+
|The '''sequence objects''' usually act like normal '''Python string'''s.
  
 
|-
 
|-
 
| 03:30
 
| 03:30
|So follow the normal conventions as you do for Python strings
+
|So, follow the normal conventions as you do for Python strings.
  
 
|-
 
|-
 
| 03:35
 
| 03:35
|In Python, we count the characters in the string starting from 0 instead of 1.
+
|In Python, we count the characters in the string starting from 0, instead of 1.
  
 
|-
 
|-
Line 170: Line 169:
 
|-
 
|-
 
| 03:47
 
| 03:47
|Often you many need to work with only a part of the sequence.
+
|Often you may need to work with only a part of the sequence.
  
 
|-
 
|-
 
| 03:52
 
| 03:52
|Now lets see how to extract parts of the string and store them as sequence objects.  
+
|Now, let's see how to extract parts of the string and store them as sequence objects.  
  
 
|-
 
|-
 
| 03:58
 
| 03:58
|For example we will slice the DNA sequence at two positions.
+
|For example, we will '''slice''' the DNA sequence at two positions.
  
 
|-
 
|-
 
| 04:04
 
| 04:04
|First between bases 6 and 7.
+
|First, between bases 6 and 7.
  
 
|-
 
|-
Line 198: Line 197:
 
|-
 
|-
 
| 04:26
 
| 04:26
| Type the following command at the prompt to extract the first fragment.
+
| Type the following command, at the prompt, to extract the first fragment.
  
 
|-
 
|-
 
| 04:31
 
| 04:31
|'''String1 equal to dna1 within brackets 0 semicolon 6.'''
+
|'''String1 equal to dna1 within brackets 0 colon 6.'''
  
 
|-
 
|-
Line 214: Line 213:
 
|-
 
|-
 
| 04:47
 
| 04:47
|Enclosed in these brackets are the start and stop positions separated by a colon.  
+
|Enclosed in these brackets are the start and the stop positions separated by a colon.  
  
 
|-
 
|-
 
| 04:53
 
| 04:53
|The positions are '''inclusive''' of the start, but '''exclusive''' of the stop position. Press '''Enter'''
+
|The positions are inclusive of the start but exclusive of the stop position. Press '''Enter'''.
  
 
|-
 
|-
 
| 05:01
 
| 05:01
| To view the output type, '''string1,''' Press enter.
+
| To view the output, type: "string1", press '''Enter'''.
  
 
|-
 
|-
Line 230: Line 229:
 
|-
 
|-
 
| 05:10
 
| 05:10
| To extract the second string from the sequence, Press up arrow key and edit the command as follows:
+
| To extract the second string from the sequence, press up-arrow key and '''edit''' the command as follows:
  
 
|-
 
|-
 
| 05:17
 
| 05:17
|Change the name of the variable to '''string2''', and positions to '''11 '''and '''20'''.  
+
|Change the name of the variable to '''string2''' and positions to '''11 '''and '''20'''.  
  
 
|-
 
|-
 
| 05:24
 
| 05:24
| For the output type '''string2'''. Press enter.
+
| For the output, type: "string2". Press '''Enter'''.
  
 
|-
 
|-
Line 246: Line 245:
 
|-
 
|-
 
| 05:34
 
| 05:34
| Let us concatenate, that is, add the two '''strings '''together to form a new fragment:
+
| Let us concatenate, that is, add the two '''strings '''together to form a new fragment.
  
 
|-
 
|-
Line 254: Line 253:
 
|-
 
|-
 
| 05:46
 
| 05:46
|Type,'''dna2 equal to string1 plus string2'''. Press enter
+
|Type: '''dna2 equal to string1 plus string2'''. Press '''Enter'''.
  
 
|-
 
|-
 
| 05:53
 
| 05:53
|Please note; we cannot add sequences with incompatible alphabets.
+
|Please note: we cannot add sequences with incompatible alphabets.
  
 
|-
 
|-
 
| 05:59
 
| 05:59
|That is we cannot concatenate a DNA sequence and a protein sequence, to form a new sequence.
+
|That is, we cannot concatenate a DNA sequence and a protein sequence to form a new sequence.
  
 
|-
 
|-
Line 270: Line 269:
 
|-
 
|-
 
| 06:12
 
| 06:12
| To view the output, type '''dna2.''' Press enter
+
| To view the output, type: "dna2". Press '''Enter'''.
  
 
|-
 
|-
Line 282: Line 281:
 
|-
 
|-
 
| 06:29
 
| 06:29
| Type '''len within parenthesis dna2.''' Press enter
+
| Type: "len" within parenthesis "dna2". Press '''Enter'''.
  
 
|-
 
|-
Line 294: Line 293:
 
|-
 
|-
 
| 06:44
 
| 06:44
|To do so we will use '''count''' function.
+
|To do so, we will use '''count()''' function.
 +
 
 
|-
 
|-
 
| 06:47
 
| 06:47
|For example to count the number of alanines present in the sequence, type the following command '''dna2 dot count within parenthesis within doublequotes alphabet A.'''
+
|For example- to count the number of alanines present in the sequence, type the following command: '''dna2 dot count''' within parenthesis within double quotes alphabet A.
  
 
|-
 
|-
 
| 07:02
 
| 07:02
|Press enter
+
|Press '''Enter'''.
  
 
|-
 
|-
Line 309: Line 309:
 
|-
 
|-
 
| 07:10
 
| 07:10
| To find a particular base or part of the string we will use '''find''' function.
+
| To find a particular base or part of the string, we will use '''find()''' function.
  
 
|-
 
|-
 
| 07:16
 
| 07:16
|Type '''dna2 dot find within parenthesis within doublequotes GC.''' Press enter
+
|Type: '''dna2 dot find''' within parenthesis within double quotes "GC". Press '''Enter'''.
  
 
|-
 
|-
Line 325: Line 325:
 
|-
 
|-
 
| 07:35
 
| 07:35
|To edit a sequence we have to convert it to the mutable sequence object.  
+
|To edit a sequence, we have to convert it to the mutable sequence object.  
  
 
|-
 
|-
 
| 07:41
 
| 07:41
|To do so, type,'''dna3 equal to dna2 dot to mutable open and close parenthesis.''' Press enter
+
|To do so, type: '''dna3 equal to dna2 dot to mutable''' open and close parenthesis. Press '''Enter'''.
  
 
|-
 
|-
 
| 07:52
 
| 07:52
| For the output, type '''dna3'''. Press enter
+
| For the output, type: '''dna3'''. Press '''Enter'''.
  
 
|-
 
|-
Line 341: Line 341:
 
|-
 
|-
 
| 07:59
 
| 07:59
|Let us replace a base from the sequence.  
+
|Let us replace a '''base''' from the sequence.  
  
 
|-
 
|-
 
| 08:01
 
| 08:01
|For example to replace a base present at 5th position to alanine type '''dna3 within brackets 5 equal to within double quotes alphabet A.'''  Press enter
+
|For example- to replace a base present at 5th position to alanine, type: '''dna3 within brackets 5 equal to within double quotes alphabet A.'''  Press '''Enter'''.
  
 
|-
 
|-
 
| 08:19
 
| 08:19
|For the output type '''dna3'''. Press enter.
+
|For the output, type: '''dna3'''. Press '''Enter'''.
  
 
|-
 
|-
 
| 08:24
 
| 08:24
|Observe the output, the '''cytosine''' at position 5 is replaced with '''alanine'''.  
+
|Observe the output. The '''cytosine''' at position 5 is replaced with '''alanine'''.  
  
 
|-
 
|-
 
| 08:31
 
| 08:31
| To replace a part of the string, type the following command.
+
| To replace a part of the '''string''', type the following command.
  
 
|-
 
|-
 
| 08:35
 
| 08:35
|'''Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC.''' Press enter
+
|'''Dna3 within brackets 6 colon 10 equal to within double quotes ATGC.''' Press '''Enter'''.
 
+
 
|-
 
|-
 
| 08:45
 
| 08:45
|For the output type '''dna3'''. Press enter.
+
|For the output, type: '''dna3'''. Press '''Enter'''.
  
 
|-
 
|-
 
| 08:52
 
| 08:52
|The output shows the 4 bases from position the 6 to 9 are replaced with new bases '''''ATGC'''''.
+
|The output shows the 4 bases from the position 6 to 9 are replaced with new bases '''ATGC'''.
  
 
|-
 
|-
 
| 09:01
 
| 09:01
| Once you have edited your sequence object, convert it back to the '''“read only'''” form.
+
| Once you have edited your sequence object, convert it back to the “read only” form.
  
 
|-
 
|-
 
| 09:07
 
| 09:07
|Type the following '''dna4 equal to dna3 dot to seq open and close parenthesis'''. Press enter.
+
|Type the following '''dna4 equal to dna3 dot to seq open and close parenthesis'''. Press '''Enter'''.
  
 
|-
 
|-
 
| 09:19
 
| 09:19
|For the output type '''dna4.''' Press enter.
+
|For the output, type: '''dna4.''' Press '''Enter'''.
  
 
|-
 
|-
 
| 09:25
 
| 09:25
| Let's summarize,
+
| Let's summarize.
  
 
|-
 
|-
 
| 09:27
 
| 09:27
|In this tutorial, we have learnt to, Generate a random DNA sequence.
+
|In this tutorial, we have learnt to: *  Generate a random DNA sequence
  
 
|-
 
|-
 
| 09:32
 
| 09:32
|Slice a DNA sequence at specified locations  
+
| Slice a DNA sequence at specified locations  
  
 
|-
 
|-
 
| 09:36
 
| 09:36
|Join two sequences together to form a new sequence that is to Concatenate.  
+
| Join two sequences together to form a new sequence, that is, to concatenate.  
  
 
|-
 
|-
 
| 09:43
 
| 09:43
| We have also learnt how to use len, count and find functions.
+
| We have also learnt how to: * use '''len, count''' and '''find''' functions
  
 
|-
 
|-
 
| 09:49
 
| 09:49
|Convert a sequence object to a mutable sequence object and replace a base or part of the string.
+
| convert a sequence object to a mutable sequence object and replace a base or part of the string.
  
 
|-
 
|-
 
| 09:57
 
| 09:57
|For the assignment Generate a random DNA sequence of 30 bases.  
+
|For the assignment, generate a random '''DNA sequence''' of 30 bases.  
  
 
|-
 
|-
 
| 10:02
 
| 10:02
|Using Biopython tools calculate the GC percentage and Molecular Weight of the sequence.  
+
|Using Biopython tools, calculate the GC percentage and molecular weight of the sequence.  
  
 
|-
 
|-
Line 421: Line 420:
 
|-
 
|-
 
| 10:13
 
| 10:13
|The out put shows the GC content as percentage.
+
|The output shows the '''GC''' content as percentage.
  
 
|-
 
|-
Line 429: Line 428:
 
|-
 
|-
 
| 10:23
 
| 10:23
| This video summarizes the Spoken Tutorial project
+
| This video summarizes the Spoken Tutorial project.
  
 
|-
 
|-
Line 437: Line 436:
 
|-
 
|-
 
| 10:30
 
| 10:30
| We Conduct workshops and give certificates.
+
| We conduct workshops and give certificates.
  
 
|-
 
|-
Line 445: Line 444:
 
|-
 
|-
 
| 10:35
 
| 10:35
| Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India
+
| Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India.
  
 
|-
 
|-
 
| 10:43
 
| 10:43
| This is Snehalatha from IIT Bombay signing off. Thank you for joining.  
+
| This is Snehalatha from '''IIT Bombay''', signing off. Thank you for joining.  
  
 
|}
 
|}

Latest revision as of 18:25, 23 March 2017

Time Narration
00:01 Welcome to this tutorial on Manipulating Sequences.
00:06 In this tutorial, we will use Biopython tools: To generate a random DNA sequence
00:13 Slice a DNA sequence at specified locations
00:17 Join two sequences together to form a new sequence that is to concatenate
00:22 Find the length of the sequence
00:26 Count the number of individual bases or part of the string
00:31 Find a particular base or part of the string.
00:35 Convert a sequence object to a mutable sequence object.
00:40 To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics
00:47 and basic Python programming.
00:51 If not, refer to the Python tutorials at the given link.
00:56 To record this tutorial, I am using: * Ubuntu OS version 14.10
01:03 Python version 2.7.8
01:07 Ipython interpreter version 2.3.0
01:12 Biopython version 1.64.
01:16 Let me open the terminal and start ipython interpreter.
01:21 Press Ctrl, Alt and t keys simultaneously.
01:26 At the prompt, type: "ipython" and press Enter.
01:31 Ipython prompt appears on the screen.
01:35 Using Biopython, we can generate a sequence object for a random DNA sequence of any specified length.
01:44 Let us now generate a sequence object for a DNA sequence of 20 bases.
01:50 At the prompt, type: "import random", press Enter.
01:56 Next, import Seq module from Bio package.
02:01 Often Seq is pronounced as seek.
02:06 At the prompt, type: From Bio dot Seq import Seq. Press Enter.
02:15 We will use Bio.Alphabet module to specify the alphabets in the DNA sequence.
02:22 Type: from Bio dot Alphabet import generic underscore dna. Press Enter.
02:32 Type the following command to create a sequence object for the random DNA sequence.
02:38 Store the sequence in a variable dna1.
02:42 Please note: in this command, use two single quotes instead of a double quote. Press Enter.
02:50 For the output, type: dna1. Press Enter.
02:55 The output shows the sequence object for the random DNA sequence.
03:00 If you want a new sequence, press up-arrow key to get the same command as above. Press Enter.
03:11 For the output, type the variable name dna1. Press Enter.
03:17 The output shows a new DNA sequence which is different from the first one.
03:23 About Sequence Objects:
03:25 The sequence objects usually act like normal Python strings.
03:30 So, follow the normal conventions as you do for Python strings.
03:35 In Python, we count the characters in the string starting from 0, instead of 1.
03:41 The first character in the sequence is position zero.
03:45 Back to the terminal.
03:47 Often you may need to work with only a part of the sequence.
03:52 Now, let's see how to extract parts of the string and store them as sequence objects.
03:58 For example, we will slice the DNA sequence at two positions.
04:04 First, between bases 6 and 7.
04:08 This will extract a fragment from the beginning of the sequence to the 6th base in the sequence.
04:15 The second slice will be between bases 11 and 12.
04:20 The second fragment will be from the 12th base to the end of the sequence.
04:26 Type the following command, at the prompt, to extract the first fragment.
04:31 String1 equal to dna1 within brackets 0 colon 6.
04:39 string1 is the variable to store the first fragment.
04:43 The rest of the command follows as in normal Python.
04:47 Enclosed in these brackets are the start and the stop positions separated by a colon.
04:53 The positions are inclusive of the start but exclusive of the stop position. Press Enter.
05:01 To view the output, type: "string1", press Enter.
05:04 The output shows the first fragment as the sequence object.
05:10 To extract the second string from the sequence, press up-arrow key and edit the command as follows:
05:17 Change the name of the variable to string2 and positions to 11 and 20.
05:24 For the output, type: "string2". Press Enter.
05:30 Now we have the 2nd fragment also as a sequence object.
05:34 Let us concatenate, that is, add the two strings together to form a new fragment.
05:42 Store the new sequence in a variable dna2.
05:46 Type: dna2 equal to string1 plus string2. Press Enter.
05:53 Please note: we cannot add sequences with incompatible alphabets.
05:59 That is, we cannot concatenate a DNA sequence and a protein sequence to form a new sequence.
06:07 The two sequences must have the same alphabet attribute.
06:12 To view the output, type: "dna2". Press Enter.
06:17 The output shows a new sequence which is a combination of string1 and string2.
06:23 To find the length of the new sequence, we will use len function.
06:29 Type: "len" within parenthesis "dna2". Press Enter.
06:34 Output shows the sequence as 15 bases long.
06:39 We can also count the number of individual bases present in the sequence.
06:44 To do so, we will use count() function.
06:47 For example- to count the number of alanines present in the sequence, type the following command: dna2 dot count within parenthesis within double quotes alphabet A.
07:02 Press Enter.
07:04 The output shows the number of alanines present in the sequence dna2.
07:10 To find a particular base or part of the string, we will use find() function.
07:16 Type: dna2 dot find within parenthesis within double quotes "GC". Press Enter.
07:26 The output indicates the position of the first instance of the appearance of GC in the string.
07:32 Normally a sequence object cannot be edited.
07:35 To edit a sequence, we have to convert it to the mutable sequence object.
07:41 To do so, type: dna3 equal to dna2 dot to mutable open and close parenthesis. Press Enter.
07:52 For the output, type: dna3. Press Enter.
07:55 Now the sequence object can be edited.
07:59 Let us replace a base from the sequence.
08:01 For example- to replace a base present at 5th position to alanine, type: dna3 within brackets 5 equal to within double quotes alphabet A. Press Enter.
08:19 For the output, type: dna3. Press Enter.
08:24 Observe the output. The cytosine at position 5 is replaced with alanine.
08:31 To replace a part of the string, type the following command.
08:35 Dna3 within brackets 6 colon 10 equal to within double quotes ATGC. Press Enter.
08:45 For the output, type: dna3. Press Enter.
08:52 The output shows the 4 bases from the position 6 to 9 are replaced with new bases ATGC.
09:01 Once you have edited your sequence object, convert it back to the “read only” form.
09:07 Type the following dna4 equal to dna3 dot to seq open and close parenthesis. Press Enter.
09:19 For the output, type: dna4. Press Enter.
09:25 Let's summarize.
09:27 In this tutorial, we have learnt to: * Generate a random DNA sequence
09:32 Slice a DNA sequence at specified locations
09:36 Join two sequences together to form a new sequence, that is, to concatenate.
09:43 We have also learnt how to: * use len, count and find functions
09:49 convert a sequence object to a mutable sequence object and replace a base or part of the string.
09:57 For the assignment, generate a random DNA sequence of 30 bases.
10:02 Using Biopython tools, calculate the GC percentage and molecular weight of the sequence.
10:09 Your completed assignment will be as follows.
10:13 The output shows the GC content as percentage.
10:18 The output shows the molecular weight of the DNA sequence.
10:23 This video summarizes the Spoken Tutorial project.
10:26 If you do not have good bandwidth, you can download and watch it.
10:30 We conduct workshops and give certificates.
10:32 Please contact us.
10:35 Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India.
10:43 This is Snehalatha from IIT Bombay, signing off. Thank you for joining.

Contributors and Content Editors

PoojaMoolya, Pratik kamble, Priyacst, Sandhya.np14