Difference between revisions of "Biopython/C2/Manipulating-Sequences/English-timed"
From Script | Spoken-Tutorial
Sandhya.np14 (Talk | contribs) |
Sandhya.np14 (Talk | contribs) |
||
Line 10: | Line 10: | ||
|- | |- | ||
| 00:06 | | 00:06 | ||
− | | In this tutorial, we will use Biopython tools:* To | + | | In this tutorial, we will use '''Biopython''' tools:* To generate a random DNA sequence |
|- | |- | ||
Line 26: | Line 26: | ||
|- | |- | ||
| 00:26 | | 00:26 | ||
− | |* Count the number of individual | + | |* Count the number of individual '''base'''s or part of the '''string''' |
|- | |- | ||
Line 34: | Line 34: | ||
|- | |- | ||
| 00:35 | | 00:35 | ||
− | |* Convert a sequence object to a mutable sequence object. | + | |* Convert a '''sequence object''' to a mutable sequence object. |
|- | |- | ||
Line 82: | Line 82: | ||
|- | |- | ||
| 01:35 | | 01:35 | ||
− | |Using Biopython we can generate a sequence object for a random DNA sequence of any specified length. | + | |Using Biopython, we can generate a sequence object for a random DNA sequence of any specified length. |
|- | |- | ||
Line 90: | Line 90: | ||
|- | |- | ||
|01:50 | |01:50 | ||
− | |At the prompt | + | |At the prompt, type: '''import random''', press '''Enter'''. |
|- | |- | ||
| 01:56 | | 01:56 | ||
− | | Next import '''Seq '''module from '''Bio '''package. | + | | Next, import '''Seq '''module from '''Bio '''package. |
|- | |- | ||
Line 102: | Line 102: | ||
|- | |- | ||
| 02:06 | | 02:06 | ||
− | | At the prompt | + | | At the prompt, type: '''From Bio dot Seq import Seq'''. Press '''Enter'''. |
|- | |- | ||
Line 110: | Line 110: | ||
|- | |- | ||
| 02:22 | | 02:22 | ||
− | | Type | + | | Type: '''from Bio dot alphabet import generic underscore dna.''' Press '''Enter'''. |
|- | |- | ||
| 02:32 | | 02:32 | ||
− | | Type the following command to create a sequence object for the random DNA sequence | + | | Type the following '''command''' to create a sequence object for the random DNA sequence. |
|- | |- | ||
| 02:38 | | 02:38 | ||
− | |Store the sequence in a variable '''dna1''' | + | |Store the sequence in a variable '''dna1'''. |
|- | |- | ||
| 02:42 | | 02:42 | ||
− | |Please note in this command use two single quotes instead of a double quote. Press | + | |Please note: in this command, use two single quotes instead of a double quote. Press '''Enter'''. |
|- | |- | ||
| 02:50 | | 02:50 | ||
− | | For the output, type '''dna1'''. Press | + | | For the output, type: '''dna1'''. Press '''Enter'''. |
|- | |- | ||
Line 134: | Line 134: | ||
|- | |- | ||
| 03:00 | | 03:00 | ||
− | | If you want a new sequence, press up arrow key to get the same command as above. Press | + | | If you want a new sequence, press up-arrow key to get the same command as above. Press '''Enter'''. |
|- | |- | ||
| 03:11 | | 03:11 | ||
− | |For the output, type the variable name | + | |For the output, type the variable name '''dna1.''' Press '''Enter'''. |
|- | |- | ||
| 03:17 | | 03:17 | ||
− | | The output shows a new DNA sequence | + | | The output shows a new DNA sequence which is different from the first one. |
|- | |- | ||
| 03:23 | | 03:23 | ||
− | | About '''Sequence Objects''' | + | | About '''Sequence Objects''': |
|- | |- | ||
| 03:25 | | 03:25 | ||
− | |The '''sequence objects''' usually act like normal Python | + | |The '''sequence objects''' usually act like normal '''Python string'''s. |
|- | |- | ||
| 03:30 | | 03:30 | ||
− | |So follow the normal conventions as you do for Python strings | + | |So, follow the normal conventions as you do for Python strings. |
|- | |- | ||
Line 170: | Line 170: | ||
|- | |- | ||
| 03:47 | | 03:47 | ||
− | |Often you | + | |Often you may need to work with only a part of the sequence. |
|- | |- | ||
| 03:52 | | 03:52 | ||
− | |Now | + | |Now let's see how to extract parts of the string and store them as sequence objects. |
|- | |- | ||
| 03:58 | | 03:58 | ||
− | |For example we will slice the DNA sequence at two positions. | + | |For example, we will slice the DNA sequence at two positions. |
|- | |- | ||
| 04:04 | | 04:04 | ||
− | |First between bases 6 and 7. | + | |First, between bases 6 and 7. |
|- | |- | ||
Line 198: | Line 198: | ||
|- | |- | ||
| 04:26 | | 04:26 | ||
− | | Type the following command at the prompt to extract the first fragment. | + | | Type the following command, at the prompt, to extract the first fragment. |
|- | |- | ||
Line 218: | Line 218: | ||
|- | |- | ||
| 04:53 | | 04:53 | ||
− | |The positions are | + | |The positions are inclusive of the start but exclusive of the stop position. Press '''Enter'''. |
|- | |- | ||
| 05:01 | | 05:01 | ||
− | | To view the output | + | | To view the output, type: '''string1''', press '''Enter'''. |
|- | |- | ||
Line 230: | Line 230: | ||
|- | |- | ||
| 05:10 | | 05:10 | ||
− | | To extract the second string from the sequence, | + | | To extract the second string from the sequence, press up-arrow key and '''edit''' the command as follows: |
|- | |- | ||
| 05:17 | | 05:17 | ||
− | |Change the name of the variable to '''string2''' | + | |Change the name of the variable to '''string2''' and positions to '''11 '''and '''20'''. |
|- | |- | ||
| 05:24 | | 05:24 | ||
− | | For the output type '''string2'''. Press | + | | For the output, type: '''string2'''. Press '''Enter'''. |
|- | |- | ||
Line 246: | Line 246: | ||
|- | |- | ||
| 05:34 | | 05:34 | ||
− | | Let us concatenate, that is, add the two '''strings '''together to form a new fragment | + | | Let us concatenate, that is, add the two '''strings '''together to form a new fragment. |
|- | |- | ||
Line 254: | Line 254: | ||
|- | |- | ||
| 05:46 | | 05:46 | ||
− | |Type | + | |Type: '''dna2 equal to string1 plus string2'''. Press '''Enter'''. |
|- | |- | ||
| 05:53 | | 05:53 | ||
− | |Please note | + | |Please note: we cannot add sequences with incompatible alphabets. |
|- | |- | ||
| 05:59 | | 05:59 | ||
− | |That is we cannot concatenate a DNA sequence and a protein sequence | + | |That is we cannot concatenate a DNA sequence and a protein sequence to form a new sequence. |
|- | |- | ||
Line 270: | Line 270: | ||
|- | |- | ||
| 06:12 | | 06:12 | ||
− | | To view the output, type '''dna2.''' Press | + | | To view the output, type: '''dna2.''' Press '''Enter'''. |
|- | |- | ||
Line 282: | Line 282: | ||
|- | |- | ||
| 06:29 | | 06:29 | ||
− | | Type '''len within parenthesis dna2.''' Press | + | | Type: '''len within parenthesis dna2.''' Press '''Enter'''. |
|- | |- | ||
Line 294: | Line 294: | ||
|- | |- | ||
| 06:44 | | 06:44 | ||
− | |To do so we will use '''count''' function. | + | |To do so, we will use '''count''' function. |
|- | |- | ||
| 06:47 | | 06:47 | ||
− | |For example to count the number of alanines present in the sequence, type the following command '''dna2 dot count within parenthesis within doublequotes alphabet A.''' | + | |For example- to count the number of alanines present in the sequence, type the following command: '''dna2 dot count within parenthesis within doublequotes alphabet A.''' |
|- | |- | ||
| 07:02 | | 07:02 | ||
− | |Press | + | |Press '''Enter'''. |
|- | |- | ||
Line 309: | Line 309: | ||
|- | |- | ||
| 07:10 | | 07:10 | ||
− | | To find a particular base or part of the string we will use '''find''' function. | + | | To find a particular base or part of the string, we will use '''find''' function. |
|- | |- | ||
| 07:16 | | 07:16 | ||
− | |Type '''dna2 dot find within parenthesis within | + | |Type: '''dna2 dot find within parenthesis within double quotes GC.''' Press '''Enter'''. |
|- | |- | ||
Line 325: | Line 325: | ||
|- | |- | ||
| 07:35 | | 07:35 | ||
− | |To edit a sequence we have to convert it to the mutable sequence object. | + | |To edit a sequence, we have to convert it to the mutable sequence object. |
|- | |- | ||
| 07:41 | | 07:41 | ||
− | |To do so, type | + | |To do so, type: '''dna3 equal to dna2 dot to mutable open and close parenthesis.''' Press '''Enter'''. |
|- | |- | ||
| 07:52 | | 07:52 | ||
− | | For the output, type '''dna3'''. Press | + | | For the output, type: '''dna3'''. Press '''Enter'''. |
|- | |- | ||
Line 341: | Line 341: | ||
|- | |- | ||
| 07:59 | | 07:59 | ||
− | |Let us replace a base from the sequence. | + | |Let us replace a '''base''' from the sequence. |
|- | |- | ||
| 08:01 | | 08:01 | ||
− | |For example to replace a base present at 5th position to alanine type '''dna3 within brackets 5 equal to within double quotes alphabet A.''' Press | + | |For example- to replace a base present at 5th position to alanine, type: '''dna3 within brackets 5 equal to within double quotes alphabet A.''' Press '''Enter'''. |
|- | |- | ||
| 08:19 | | 08:19 | ||
− | |For the output type '''dna3'''. Press | + | |For the output, type: '''dna3'''. Press '''Enter'''. |
|- | |- | ||
| 08:24 | | 08:24 | ||
− | |Observe the output | + | |Observe the output. The '''cytosine''' at position 5 is replaced with '''alanine'''. |
|- | |- | ||
Line 361: | Line 361: | ||
|- | |- | ||
| 08:35 | | 08:35 | ||
− | |'''Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC.''' Press | + | |'''Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC.''' Press '''Enter'''. |
− | + | ||
|- | |- | ||
| 08:45 | | 08:45 | ||
− | |For the output type '''dna3'''. Press | + | |For the output, type: '''dna3'''. Press '''Enter'''. |
|- | |- | ||
| 08:52 | | 08:52 | ||
− | |The output shows the 4 bases from position | + | |The output shows the 4 bases from the position 6 to 9 are replaced with new bases '''ATGC'''. |
|- | |- | ||
| 09:01 | | 09:01 | ||
− | | Once you have edited your sequence object, convert it back to the | + | | Once you have edited your sequence object, convert it back to the “read only” form. |
|- | |- | ||
| 09:07 | | 09:07 | ||
− | |Type the following '''dna4 equal to dna3 dot to seq open and close parenthesis'''. Press | + | |Type the following '''dna4 equal to dna3 dot to seq open and close parenthesis'''. Press '''Enter'''. |
|- | |- | ||
| 09:19 | | 09:19 | ||
− | |For the output type '''dna4.''' Press | + | |For the output, type: '''dna4.''' Press '''Enter'''. |
|- | |- | ||
Line 389: | Line 388: | ||
|- | |- | ||
| 09:27 | | 09:27 | ||
− | |In this tutorial, we have learnt to | + | |In this tutorial, we have learnt to: * Generate a random DNA sequence |
|- | |- | ||
| 09:32 | | 09:32 | ||
− | |Slice a DNA sequence at specified locations | + | |* Slice a DNA sequence at specified locations |
|- | |- | ||
| 09:36 | | 09:36 | ||
− | |Join two sequences together to form a new sequence that is to | + | |* Join two sequences together to form a new sequence, that is, to concatenate. |
|- | |- | ||
| 09:43 | | 09:43 | ||
− | | We have also learnt how to use len, count and find functions | + | | We have also learnt how to: * use len, count and '''find''' functions |
|- | |- | ||
| 09:49 | | 09:49 | ||
− | | | + | |* convert a sequence object to a mutable sequence object and replace a base or part of the string. |
|- | |- | ||
| 09:57 | | 09:57 | ||
− | |For the assignment | + | |For the assignment, generate a random DNA sequence of 30 bases. |
|- | |- | ||
| 10:02 | | 10:02 | ||
− | |Using Biopython tools calculate the GC percentage and | + | |Using Biopython tools, calculate the GC percentage and molecular weight of the sequence. |
|- | |- | ||
Line 437: | Line 436: | ||
|- | |- | ||
| 10:30 | | 10:30 | ||
− | | We | + | | We conduct workshops and give certificates. |
|- | |- | ||
Line 445: | Line 444: | ||
|- | |- | ||
| 10:35 | | 10:35 | ||
− | | Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India | + | | Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India. |
|- | |- | ||
| 10:43 | | 10:43 | ||
− | | This is Snehalatha from IIT Bombay signing off. Thank you for joining. | + | | This is Snehalatha from '''IIT Bombay''', signing off. Thank you for joining. |
|} | |} |
Revision as of 17:19, 2 August 2016
|
|
---|---|
00:01 | Welcome to this tutorial on Manipulating Sequences. |
00:06 | In this tutorial, we will use Biopython tools:* To generate a random DNA sequence |
00:13 | * Slice a DNA sequence at specified locations |
00:17 | * Join two sequences together to form a new sequence that is to concatenate |
00:22 | * Find the length of the sequence |
00:26 | * Count the number of individual bases or part of the string |
00:31 | * Find a particular base or part of the string. |
00:35 | * Convert a sequence object to a mutable sequence object. |
00:40 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
00:47 | and basic Python programming. |
00:51 | If not, refer to the Python tutorials at the given link. |
00:56 | To record this tutorial, I am using: * Ubuntu OS version 14.10 |
01:03 | * Python version 2.7.8 |
01:07 | * Ipython interpreter version 2.3.0 |
01:12 | * Biopython version 1.64. |
01:16 | Let me open the terminal and start ipython interpreter. |
01:21 | Press Ctrl, Alt and t keys simultaneously. |
01:26 | At the prompt, type: ipython and press Enter. |
01:31 | Ipython prompt appears on the screen. |
01:35 | Using Biopython, we can generate a sequence object for a random DNA sequence of any specified length. |
01:44 | Let us now generate a sequence object for a DNA sequence of 20 bases. |
01:50 | At the prompt, type: import random, press Enter. |
01:56 | Next, import Seq module from Bio package. |
02:01 | Often Seq is pronounced as seek. |
02:06 | At the prompt, type: From Bio dot Seq import Seq. Press Enter. |
02:15 | We will use Bio.Alphabet module to specify the alphabets in the DNA sequence. |
02:22 | Type: from Bio dot alphabet import generic underscore dna. Press Enter. |
02:32 | Type the following command to create a sequence object for the random DNA sequence. |
02:38 | Store the sequence in a variable dna1. |
02:42 | Please note: in this command, use two single quotes instead of a double quote. Press Enter. |
02:50 | For the output, type: dna1. Press Enter. |
02:55 | The output shows the sequence object for the random DNA sequence. |
03:00 | If you want a new sequence, press up-arrow key to get the same command as above. Press Enter. |
03:11 | For the output, type the variable name dna1. Press Enter. |
03:17 | The output shows a new DNA sequence which is different from the first one. |
03:23 | About Sequence Objects: |
03:25 | The sequence objects usually act like normal Python strings. |
03:30 | So, follow the normal conventions as you do for Python strings. |
03:35 | In Python, we count the characters in the string starting from 0 instead of 1. |
03:41 | The first character in the sequence is position zero. |
03:45 | Back to the terminal. |
03:47 | Often you may need to work with only a part of the sequence. |
03:52 | Now let's see how to extract parts of the string and store them as sequence objects. |
03:58 | For example, we will slice the DNA sequence at two positions. |
04:04 | First, between bases 6 and 7. |
04:08 | This will extract a fragment from the beginning of the sequence to the 6th base in the sequence. |
04:15 | The second slice will be between bases 11 and 12. |
04:20 | The second fragment will be from the 12th base to the end of the sequence. |
04:26 | Type the following command, at the prompt, to extract the first fragment. |
04:31 | String1 equal to dna1 within brackets 0 semicolon 6. |
04:39 | string1 is the variable to store the first fragment. |
04:43 | The rest of the command follows as in normal Python. |
04:47 | Enclosed in these brackets are the start and stop positions separated by a colon. |
04:53 | The positions are inclusive of the start but exclusive of the stop position. Press Enter. |
05:01 | To view the output, type: string1, press Enter. |
05:04 | The output shows the first fragment as the sequence object. |
05:10 | To extract the second string from the sequence, press up-arrow key and edit the command as follows: |
05:17 | Change the name of the variable to string2 and positions to 11 and 20. |
05:24 | For the output, type: string2. Press Enter. |
05:30 | Now we have the 2nd fragment also as a sequence object. |
05:34 | Let us concatenate, that is, add the two strings together to form a new fragment. |
05:42 | Store the new sequence in a variable dna2. |
05:46 | Type: dna2 equal to string1 plus string2. Press Enter. |
05:53 | Please note: we cannot add sequences with incompatible alphabets. |
05:59 | That is we cannot concatenate a DNA sequence and a protein sequence to form a new sequence. |
06:07 | The two sequences must have the same alphabet attribute. |
06:12 | To view the output, type: dna2. Press Enter. |
06:17 | The output shows a new sequence which is a combination of string1 and string2. |
06:23 | To find the length of the new sequence, we will use len function. |
06:29 | Type: len within parenthesis dna2. Press Enter. |
06:34 | Output shows the sequence as 15 bases long. |
06:39 | We can also count the number of individual bases present in the sequence. |
06:44 | To do so, we will use count function. |
06:47 | For example- to count the number of alanines present in the sequence, type the following command: dna2 dot count within parenthesis within doublequotes alphabet A. |
07:02 | Press Enter. |
07:04 | The output shows the number of alanines present in the sequence dna2. |
07:10 | To find a particular base or part of the string, we will use find function. |
07:16 | Type: dna2 dot find within parenthesis within double quotes GC. Press Enter. |
07:26 | The output indicates the position of the first instance of the appearance of GC in the string. |
07:32 | Normally a sequence object cannot be edited. |
07:35 | To edit a sequence, we have to convert it to the mutable sequence object. |
07:41 | To do so, type: dna3 equal to dna2 dot to mutable open and close parenthesis. Press Enter. |
07:52 | For the output, type: dna3. Press Enter. |
07:55 | Now the sequence object can be edited. |
07:59 | Let us replace a base from the sequence. |
08:01 | For example- to replace a base present at 5th position to alanine, type: dna3 within brackets 5 equal to within double quotes alphabet A. Press Enter. |
08:19 | For the output, type: dna3. Press Enter. |
08:24 | Observe the output. The cytosine at position 5 is replaced with alanine. |
08:31 | To replace a part of the string, type the following command. |
08:35 | Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC. Press Enter. |
08:45 | For the output, type: dna3. Press Enter. |
08:52 | The output shows the 4 bases from the position 6 to 9 are replaced with new bases ATGC. |
09:01 | Once you have edited your sequence object, convert it back to the “read only” form. |
09:07 | Type the following dna4 equal to dna3 dot to seq open and close parenthesis. Press Enter. |
09:19 | For the output, type: dna4. Press Enter. |
09:25 | Let's summarize, |
09:27 | In this tutorial, we have learnt to: * Generate a random DNA sequence |
09:32 | * Slice a DNA sequence at specified locations |
09:36 | * Join two sequences together to form a new sequence, that is, to concatenate. |
09:43 | We have also learnt how to: * use len, count and find functions |
09:49 | * convert a sequence object to a mutable sequence object and replace a base or part of the string. |
09:57 | For the assignment, generate a random DNA sequence of 30 bases. |
10:02 | Using Biopython tools, calculate the GC percentage and molecular weight of the sequence. |
10:09 | Your completed assignment will be as follows. |
10:13 | The out put shows the GC content as percentage. |
10:18 | The output shows the molecular weight of the DNA sequence. |
10:23 | This video summarizes the Spoken Tutorial project |
10:26 | If you do not have good bandwidth, you can download and watch it. |
10:30 | We conduct workshops and give certificates. |
10:32 | Please contact us. |
10:35 | Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India. |
10:43 | This is Snehalatha from IIT Bombay, signing off. Thank you for joining. |