Difference between revisions of "Biopython/C2/Manipulating-Sequences/English-timed"
From Script | Spoken-Tutorial
Sandhya.np14 (Talk | contribs) |
Sandhya.np14 (Talk | contribs) |
||
Line 10: | Line 10: | ||
|- | |- | ||
| 00:06 | | 00:06 | ||
− | | In this tutorial, we will use '''Biopython''' tools:* To generate a random DNA sequence | + | | In this tutorial, we will use '''Biopython''' tools: * To generate a random DNA sequence |
|- | |- | ||
Line 74: | Line 74: | ||
|- | |- | ||
|01:26 | |01:26 | ||
− | |At the prompt, type: | + | |At the prompt, type: "ipython" and press '''Enter'''. |
|- | |- | ||
Line 82: | Line 82: | ||
|- | |- | ||
| 01:35 | | 01:35 | ||
− | |Using Biopython, we can generate a sequence object for a random DNA sequence of any specified length. | + | |Using Biopython, we can generate a '''sequence object''' for a random DNA sequence of any specified length. |
|- | |- | ||
Line 90: | Line 90: | ||
|- | |- | ||
|01:50 | |01:50 | ||
− | |At the prompt, type: | + | |At the prompt, type: "import random", press '''Enter'''. |
|- | |- | ||
Line 110: | Line 110: | ||
|- | |- | ||
| 02:22 | | 02:22 | ||
− | | Type: '''from Bio dot | + | | Type: '''from Bio dot Alphabet import generic underscore dna.''' Press '''Enter'''. |
|- | |- | ||
Line 130: | Line 130: | ||
|- | |- | ||
| 02:55 | | 02:55 | ||
− | | The output shows the sequence object for the random DNA sequence. | + | | The output shows the '''sequence object''' for the random DNA sequence. |
|- | |- | ||
Line 158: | Line 158: | ||
|- | |- | ||
| 03:35 | | 03:35 | ||
− | |In Python, we count the characters in the string starting from 0 instead of 1. | + | |In Python, we count the characters in the string starting from 0, instead of 1. |
|- | |- | ||
Line 174: | Line 174: | ||
|- | |- | ||
| 03:52 | | 03:52 | ||
− | |Now let's see how to extract parts of the string and store them as sequence objects. | + | |Now, let's see how to extract parts of the string and store them as sequence objects. |
|- | |- | ||
| 03:58 | | 03:58 | ||
− | |For example, we will slice the DNA sequence at two positions. | + | |For example, we will '''slice''' the DNA sequence at two positions. |
|- | |- | ||
Line 214: | Line 214: | ||
|- | |- | ||
| 04:47 | | 04:47 | ||
− | |Enclosed in these brackets are the start and stop positions separated by a colon. | + | |Enclosed in these brackets are the start and the stop positions separated by a colon. |
|- | |- | ||
Line 222: | Line 222: | ||
|- | |- | ||
| 05:01 | | 05:01 | ||
− | | To view the output, type: | + | | To view the output, type: "string1", press '''Enter'''. |
|- | |- | ||
Line 238: | Line 238: | ||
|- | |- | ||
| 05:24 | | 05:24 | ||
− | | For the output, type: | + | | For the output, type: "string2". Press '''Enter'''. |
|- | |- | ||
Line 262: | Line 262: | ||
|- | |- | ||
| 05:59 | | 05:59 | ||
− | |That is we cannot concatenate a DNA sequence and a protein sequence to form a new sequence. | + | |That is, we cannot concatenate a DNA sequence and a protein sequence to form a new sequence. |
|- | |- | ||
Line 270: | Line 270: | ||
|- | |- | ||
| 06:12 | | 06:12 | ||
− | | To view the output, type: | + | | To view the output, type: "dna2". Press '''Enter'''. |
|- | |- | ||
Line 282: | Line 282: | ||
|- | |- | ||
| 06:29 | | 06:29 | ||
− | | Type: | + | | Type: "len" within parenthesis "dna2". Press '''Enter'''. |
|- | |- | ||
Line 294: | Line 294: | ||
|- | |- | ||
| 06:44 | | 06:44 | ||
− | |To do so, we will use '''count''' function. | + | |To do so, we will use '''count()''' function. |
|- | |- | ||
| 06:47 | | 06:47 | ||
− | |For example- to count the number of alanines present in the sequence, type the following command: '''dna2 dot count within parenthesis within | + | |For example- to count the number of alanines present in the sequence, type the following command: '''dna2 dot count''' within parenthesis within double quotes alphabet A. |
|- | |- | ||
Line 309: | Line 309: | ||
|- | |- | ||
| 07:10 | | 07:10 | ||
− | | To find a particular base or part of the string, we will use '''find''' function. | + | | To find a particular base or part of the string, we will use '''find()''' function. |
|- | |- | ||
| 07:16 | | 07:16 | ||
− | |Type: '''dna2 dot find within parenthesis within double quotes GC. | + | |Type: '''dna2 dot find''' within parenthesis within double quotes "GC". Press '''Enter'''. |
|- | |- | ||
Line 329: | Line 329: | ||
|- | |- | ||
| 07:41 | | 07:41 | ||
− | |To do so, type: '''dna3 equal to dna2 dot to mutable open and close parenthesis. | + | |To do so, type: '''dna3 equal to dna2 dot to mutable''' open and close parenthesis. Press '''Enter'''. |
|- | |- | ||
Line 357: | Line 357: | ||
|- | |- | ||
| 08:31 | | 08:31 | ||
− | | To replace a part of the string, type the following command. | + | | To replace a part of the '''string''', type the following command. |
|- | |- | ||
Line 384: | Line 384: | ||
|- | |- | ||
| 09:25 | | 09:25 | ||
− | | Let's summarize | + | | Let's summarize. |
|- | |- | ||
| 09:27 | | 09:27 | ||
− | |In this tutorial, we have learnt to: * Generate a random DNA sequence | + | |In this tutorial, we have learnt to: * Generate a random DNA sequence |
|- | |- | ||
Line 400: | Line 400: | ||
|- | |- | ||
| 09:43 | | 09:43 | ||
− | | We have also learnt how to: * use len, count and '''find''' functions | + | | We have also learnt how to: * use '''len, count''' and '''find''' functions |
|- | |- | ||
Line 408: | Line 408: | ||
|- | |- | ||
| 09:57 | | 09:57 | ||
− | |For the assignment, generate a random DNA sequence of 30 bases. | + | |For the assignment, generate a random '''DNA sequence''' of 30 bases. |
|- | |- | ||
Line 420: | Line 420: | ||
|- | |- | ||
| 10:13 | | 10:13 | ||
− | |The | + | |The output shows the '''GC''' content as percentage. |
|- | |- | ||
Line 428: | Line 428: | ||
|- | |- | ||
| 10:23 | | 10:23 | ||
− | | This video summarizes the Spoken Tutorial project | + | | This video summarizes the Spoken Tutorial project. |
|- | |- |
Revision as of 17:25, 3 August 2016
|
|
---|---|
00:01 | Welcome to this tutorial on Manipulating Sequences. |
00:06 | In this tutorial, we will use Biopython tools: * To generate a random DNA sequence |
00:13 | * Slice a DNA sequence at specified locations |
00:17 | * Join two sequences together to form a new sequence that is to concatenate |
00:22 | * Find the length of the sequence |
00:26 | * Count the number of individual bases or part of the string |
00:31 | * Find a particular base or part of the string. |
00:35 | * Convert a sequence object to a mutable sequence object. |
00:40 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
00:47 | and basic Python programming. |
00:51 | If not, refer to the Python tutorials at the given link. |
00:56 | To record this tutorial, I am using: * Ubuntu OS version 14.10 |
01:03 | * Python version 2.7.8 |
01:07 | * Ipython interpreter version 2.3.0 |
01:12 | * Biopython version 1.64. |
01:16 | Let me open the terminal and start ipython interpreter. |
01:21 | Press Ctrl, Alt and t keys simultaneously. |
01:26 | At the prompt, type: "ipython" and press Enter. |
01:31 | Ipython prompt appears on the screen. |
01:35 | Using Biopython, we can generate a sequence object for a random DNA sequence of any specified length. |
01:44 | Let us now generate a sequence object for a DNA sequence of 20 bases. |
01:50 | At the prompt, type: "import random", press Enter. |
01:56 | Next, import Seq module from Bio package. |
02:01 | Often Seq is pronounced as seek. |
02:06 | At the prompt, type: From Bio dot Seq import Seq. Press Enter. |
02:15 | We will use Bio.Alphabet module to specify the alphabets in the DNA sequence. |
02:22 | Type: from Bio dot Alphabet import generic underscore dna. Press Enter. |
02:32 | Type the following command to create a sequence object for the random DNA sequence. |
02:38 | Store the sequence in a variable dna1. |
02:42 | Please note: in this command, use two single quotes instead of a double quote. Press Enter. |
02:50 | For the output, type: dna1. Press Enter. |
02:55 | The output shows the sequence object for the random DNA sequence. |
03:00 | If you want a new sequence, press up-arrow key to get the same command as above. Press Enter. |
03:11 | For the output, type the variable name dna1. Press Enter. |
03:17 | The output shows a new DNA sequence which is different from the first one. |
03:23 | About Sequence Objects: |
03:25 | The sequence objects usually act like normal Python strings. |
03:30 | So, follow the normal conventions as you do for Python strings. |
03:35 | In Python, we count the characters in the string starting from 0, instead of 1. |
03:41 | The first character in the sequence is position zero. |
03:45 | Back to the terminal. |
03:47 | Often you may need to work with only a part of the sequence. |
03:52 | Now, let's see how to extract parts of the string and store them as sequence objects. |
03:58 | For example, we will slice the DNA sequence at two positions. |
04:04 | First, between bases 6 and 7. |
04:08 | This will extract a fragment from the beginning of the sequence to the 6th base in the sequence. |
04:15 | The second slice will be between bases 11 and 12. |
04:20 | The second fragment will be from the 12th base to the end of the sequence. |
04:26 | Type the following command, at the prompt, to extract the first fragment. |
04:31 | String1 equal to dna1 within brackets 0 semicolon 6. |
04:39 | string1 is the variable to store the first fragment. |
04:43 | The rest of the command follows as in normal Python. |
04:47 | Enclosed in these brackets are the start and the stop positions separated by a colon. |
04:53 | The positions are inclusive of the start but exclusive of the stop position. Press Enter. |
05:01 | To view the output, type: "string1", press Enter. |
05:04 | The output shows the first fragment as the sequence object. |
05:10 | To extract the second string from the sequence, press up-arrow key and edit the command as follows: |
05:17 | Change the name of the variable to string2 and positions to 11 and 20. |
05:24 | For the output, type: "string2". Press Enter. |
05:30 | Now we have the 2nd fragment also as a sequence object. |
05:34 | Let us concatenate, that is, add the two strings together to form a new fragment. |
05:42 | Store the new sequence in a variable dna2. |
05:46 | Type: dna2 equal to string1 plus string2. Press Enter. |
05:53 | Please note: we cannot add sequences with incompatible alphabets. |
05:59 | That is, we cannot concatenate a DNA sequence and a protein sequence to form a new sequence. |
06:07 | The two sequences must have the same alphabet attribute. |
06:12 | To view the output, type: "dna2". Press Enter. |
06:17 | The output shows a new sequence which is a combination of string1 and string2. |
06:23 | To find the length of the new sequence, we will use len function. |
06:29 | Type: "len" within parenthesis "dna2". Press Enter. |
06:34 | Output shows the sequence as 15 bases long. |
06:39 | We can also count the number of individual bases present in the sequence. |
06:44 | To do so, we will use count() function. |
06:47 | For example- to count the number of alanines present in the sequence, type the following command: dna2 dot count within parenthesis within double quotes alphabet A. |
07:02 | Press Enter. |
07:04 | The output shows the number of alanines present in the sequence dna2. |
07:10 | To find a particular base or part of the string, we will use find() function. |
07:16 | Type: dna2 dot find within parenthesis within double quotes "GC". Press Enter. |
07:26 | The output indicates the position of the first instance of the appearance of GC in the string. |
07:32 | Normally a sequence object cannot be edited. |
07:35 | To edit a sequence, we have to convert it to the mutable sequence object. |
07:41 | To do so, type: dna3 equal to dna2 dot to mutable open and close parenthesis. Press Enter. |
07:52 | For the output, type: dna3. Press Enter. |
07:55 | Now the sequence object can be edited. |
07:59 | Let us replace a base from the sequence. |
08:01 | For example- to replace a base present at 5th position to alanine, type: dna3 within brackets 5 equal to within double quotes alphabet A. Press Enter. |
08:19 | For the output, type: dna3. Press Enter. |
08:24 | Observe the output. The cytosine at position 5 is replaced with alanine. |
08:31 | To replace a part of the string, type the following command. |
08:35 | Dna3 within brackets 6 semicolon 10 equal to within double quotes ATGC. Press Enter. |
08:45 | For the output, type: dna3. Press Enter. |
08:52 | The output shows the 4 bases from the position 6 to 9 are replaced with new bases ATGC. |
09:01 | Once you have edited your sequence object, convert it back to the “read only” form. |
09:07 | Type the following dna4 equal to dna3 dot to seq open and close parenthesis. Press Enter. |
09:19 | For the output, type: dna4. Press Enter. |
09:25 | Let's summarize. |
09:27 | In this tutorial, we have learnt to: * Generate a random DNA sequence |
09:32 | * Slice a DNA sequence at specified locations |
09:36 | * Join two sequences together to form a new sequence, that is, to concatenate. |
09:43 | We have also learnt how to: * use len, count and find functions |
09:49 | * convert a sequence object to a mutable sequence object and replace a base or part of the string. |
09:57 | For the assignment, generate a random DNA sequence of 30 bases. |
10:02 | Using Biopython tools, calculate the GC percentage and molecular weight of the sequence. |
10:09 | Your completed assignment will be as follows. |
10:13 | The output shows the GC content as percentage. |
10:18 | The output shows the molecular weight of the DNA sequence. |
10:23 | This video summarizes the Spoken Tutorial project. |
10:26 | If you do not have good bandwidth, you can download and watch it. |
10:30 | We conduct workshops and give certificates. |
10:32 | Please contact us. |
10:35 | Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India. |
10:43 | This is Snehalatha from IIT Bombay, signing off. Thank you for joining. |