Difference between revisions of "Biopython/C2/Manipulating-Sequences/English-timed"
From Script | Spoken-Tutorial
PoojaMoolya (Talk | contribs) |
|||
Line 202: | Line 202: | ||
|- | |- | ||
| 04:31 | | 04:31 | ||
− | |'''String1 equal to dna1 within brackets 0 | + | |'''String1 equal to dna1 within brackets 0 colon 6.''' |
|- | |- | ||
Line 362: | Line 362: | ||
|- | |- | ||
| 08:35 | | 08:35 | ||
− | |'''Dna3 within brackets 6 | + | |'''Dna3 within brackets 6 colon 10 equal to within double quotes ATGC.''' Press '''Enter'''. |
|- | |- | ||
| 08:45 | | 08:45 |
Revision as of 16:00, 10 March 2017
|
|
---|---|
00:01 | Welcome to this tutorial on Manipulating Sequences. |
00:06 | In this tutorial, we will use Biopython tools: * To generate a random DNA sequence |
00:13 | Slice a DNA sequence at specified locations |
00:17 | Join two sequences together to form a new sequence that is to concatenate |
00:22 | Find the length of the sequence |
00:26 | Count the number of individual bases or part of the string |
00:31 | Find a particular base or part of the string. |
00:35 | Convert a sequence object to a mutable sequence object. |
00:40 | To follow this tutorial, you should be familiar with undergraduate Biochemistry or Bioinformatics |
00:47 | and basic Python programming. |
00:51 | If not, refer to the Python tutorials at the given link. |
00:56 | To record this tutorial, I am using: * Ubuntu OS version 14.10 |
01:03 | Python version 2.7.8 |
01:07 | Ipython interpreter version 2.3.0 |
01:12 | Biopython version 1.64. |
01:16 | Let me open the terminal and start ipython interpreter. |
01:21 | Press Ctrl, Alt and t keys simultaneously. |
01:26 | At the prompt, type: "ipython" and press Enter. |
01:31 | Ipython prompt appears on the screen. |
01:35 | Using Biopython, we can generate a sequence object for a random DNA sequence of any specified length. |
01:44 | Let us now generate a sequence object for a DNA sequence of 20 bases. |
01:50 | At the prompt, type: "import random", press Enter. |
01:56 | Next, import Seq module from Bio package. |
02:01 | Often Seq is pronounced as seek. |
02:06 | At the prompt, type: From Bio dot Seq import Seq. Press Enter. |
02:15 | We will use Bio.Alphabet module to specify the alphabets in the DNA sequence. |
02:22 | Type: from Bio dot Alphabet import generic underscore dna. Press Enter. |
02:32 | Type the following command to create a sequence object for the random DNA sequence. |
02:38 | Store the sequence in a variable dna1. |
02:42 | Please note: in this command, use two single quotes instead of a double quote. Press Enter. |
02:50 | For the output, type: dna1. Press Enter. |
02:55 | The output shows the sequence object for the random DNA sequence. |
03:00 | If you want a new sequence, press up-arrow key to get the same command as above. Press Enter. |
03:11 | For the output, type the variable name dna1. Press Enter. |
03:17 | The output shows a new DNA sequence which is different from the first one. |
03:23 | About Sequence Objects: |
03:25 | The sequence objects usually act like normal Python strings. |
03:30 | So, follow the normal conventions as you do for Python strings. |
03:35 | In Python, we count the characters in the string starting from 0, instead of 1. |
03:41 | The first character in the sequence is position zero. |
03:45 | Back to the terminal. |
03:47 | Often you may need to work with only a part of the sequence. |
03:52 | Now, let's see how to extract parts of the string and store them as sequence objects. |
03:58 | For example, we will slice the DNA sequence at two positions. |
04:04 | First, between bases 6 and 7. |
04:08 | This will extract a fragment from the beginning of the sequence to the 6th base in the sequence. |
04:15 | The second slice will be between bases 11 and 12. |
04:20 | The second fragment will be from the 12th base to the end of the sequence. |
04:26 | Type the following command, at the prompt, to extract the first fragment. |
04:31 | String1 equal to dna1 within brackets 0 colon 6. |
04:39 | string1 is the variable to store the first fragment. |
04:43 | The rest of the command follows as in normal Python. |
04:47 | Enclosed in these brackets are the start and the stop positions separated by a colon. |
04:53 | The positions are inclusive of the start but exclusive of the stop position. Press Enter. |
05:01 | To view the output, type: "string1", press Enter. |
05:04 | The output shows the first fragment as the sequence object. |
05:10 | To extract the second string from the sequence, press up-arrow key and edit the command as follows: |
05:17 | Change the name of the variable to string2 and positions to 11 and 20. |
05:24 | For the output, type: "string2". Press Enter. |
05:30 | Now we have the 2nd fragment also as a sequence object. |
05:34 | Let us concatenate, that is, add the two strings together to form a new fragment. |
05:42 | Store the new sequence in a variable dna2. |
05:46 | Type: dna2 equal to string1 plus string2. Press Enter. |
05:53 | Please note: we cannot add sequences with incompatible alphabets. |
05:59 | That is, we cannot concatenate a DNA sequence and a protein sequence to form a new sequence. |
06:07 | The two sequences must have the same alphabet attribute. |
06:12 | To view the output, type: "dna2". Press Enter. |
06:17 | The output shows a new sequence which is a combination of string1 and string2. |
06:23 | To find the length of the new sequence, we will use len function. |
06:29 | Type: "len" within parenthesis "dna2". Press Enter. |
06:34 | Output shows the sequence as 15 bases long. |
06:39 | We can also count the number of individual bases present in the sequence. |
06:44 | To do so, we will use count() function. |
06:47 | For example- to count the number of alanines present in the sequence, type the following command: dna2 dot count within parenthesis within double quotes alphabet A. |
07:02 | Press Enter. |
07:04 | The output shows the number of alanines present in the sequence dna2. |
07:10 | To find a particular base or part of the string, we will use find() function. |
07:16 | Type: dna2 dot find within parenthesis within double quotes "GC". Press Enter. |
07:26 | The output indicates the position of the first instance of the appearance of GC in the string. |
07:32 | Normally a sequence object cannot be edited. |
07:35 | To edit a sequence, we have to convert it to the mutable sequence object. |
07:41 | To do so, type: dna3 equal to dna2 dot to mutable open and close parenthesis. Press Enter. |
07:52 | For the output, type: dna3. Press Enter. |
07:55 | Now the sequence object can be edited. |
07:59 | Let us replace a base from the sequence. |
08:01 | For example- to replace a base present at 5th position to alanine, type: dna3 within brackets 5 equal to within double quotes alphabet A. Press Enter. |
08:19 | For the output, type: dna3. Press Enter. |
08:24 | Observe the output. The cytosine at position 5 is replaced with alanine. |
08:31 | To replace a part of the string, type the following command. |
08:35 | Dna3 within brackets 6 colon 10 equal to within double quotes ATGC. Press Enter. |
08:45 | For the output, type: dna3. Press Enter. |
08:52 | The output shows the 4 bases from the position 6 to 9 are replaced with new bases ATGC. |
09:01 | Once you have edited your sequence object, convert it back to the “read only” form. |
09:07 | Type the following dna4 equal to dna3 dot to seq open and close parenthesis. Press Enter. |
09:19 | For the output, type: dna4. Press Enter. |
09:25 | Let's summarize. |
09:27 | In this tutorial, we have learnt to: * Generate a random DNA sequence |
09:32 | Slice a DNA sequence at specified locations |
09:36 | Join two sequences together to form a new sequence, that is, to concatenate. |
09:43 | We have also learnt how to: * use len, count and find functions |
09:49 | convert a sequence object to a mutable sequence object and replace a base or part of the string. |
09:57 | For the assignment, generate a random DNA sequence of 30 bases. |
10:02 | Using Biopython tools, calculate the GC percentage and molecular weight of the sequence. |
10:09 | Your completed assignment will be as follows. |
10:13 | The output shows the GC content as percentage. |
10:18 | The output shows the molecular weight of the DNA sequence. |
10:23 | This video summarizes the Spoken Tutorial project. |
10:26 | If you do not have good bandwidth, you can download and watch it. |
10:30 | We conduct workshops and give certificates. |
10:32 | Please contact us. |
10:35 | Spoken-Tutorial project is supported by the National Mission on Education through ICT, MHRD, Government of India. |
10:43 | This is Snehalatha from IIT Bombay, signing off. Thank you for joining. |