Python for Biologists/C2/Manipulating-Strings/English
Title of script: Manipulating strings
Author: Snehalatha and Trupti
Keywords: Python, DNA sequences as strings, Protein sequences as strings,
Visual Cue | Narration |
Slide 1
Title Slide |
Hello everyone.
Welcome to this tutorial on Manipulating Strings in Python for biologists series. |
Slide 2
Learning Objectives |
In this tutorial, we will learn more Python functions to manipulate strings.
|
Slide 3
System Requirements
|
To record this tutorial, I am using
|
Slide 4
Prerequisite |
To practice this tutorial you should be familiar with,
You can also refer to Spoken Tutorials on Python for better understanding of this tutorial. These are available at www.spoken-tutorial.org |
Slide 5
|
We have already learnt a few functions and methods like:
|
Open the terminal by pressing Ctrl, Alt and T | Lets switch to the terminal and learn a few more functions and methods.
|
Cursor on the terminal.
|
At the prompt type ipython 3 and press enter.
|
Cursor on the terminal. | Lets begin with concatenation:
It means joining two different strings using plus (+)symbol. |
Type
Press Enter
|
At the prompt:
Type the variable name as, my_DNA equal to , in double quotes ATGC plus again in double quotes GCAT press Enter |
Type print(my_DNA) | Now let us print the output.
Type print(my_DNA) |
Highlight ATGCGCAT | Note that the output shows a combination of two strings. |
Type,
dna2 = “GCAT”
|
We can also concatenate two different strings stored in different variables.
dna2 = “GCAT” press Enter Concatenate the two strings using plus symbol. |
Type,
|
Lets store the new string in variable called dna3.
dna3 = “dna1” + “dna2” press Enter |
Highlight
"ATGCGCAT" |
We get an output "ATGCGCAT"
|
Another very useful method often used is,
Replace a character in a string with another. | |
Type my_DNA = "ATGCGCAT" | For example lets store a DNA sequence in a variable
|
To replace all the Thiamine in the string with Uracil i.e, to convert the DNA to RNA sequence. | |
Type
my_DNA.replace(“T” , “U”) Press Enter |
Type
The variable name , my_DNA Dot replace With in brackets in quotes T space comma space within quotes U Press Enter |
Highlight AUGCGCAU | The output shows the string with all the Thiamine replaced with Uracil. |
We can also replace more than one character in a string. | |
Press up arrow to get variable my_DNA = “ATGCGCAT” | Lets demonstrate this:
|
We will replace a part of the string with a new set of characters. | |
Type
Press Enter |
Type
dot replace within brackets in quotes GCGC space comma space within quotes CATG Press Enter |
Highlight output
ATCATGAT |
The output shows the part of the string is replaced by a new one. |
We will now learn how to find the position of a character in a string. | |
Lets make use of a protein sequence as an example | |
Type,
my_protein = "alspadkanl” |
At the prompt type
|
To find the position of Proline, which is represented as alphabet p:
We will make use of method find. | |
Type
my_protein.find(“p”) Press Enter |
Type the name of the variable i.e,
|
Highlight 3 | Output shows number 3. |
In Python,
We count the characters in the string starting from 0 instead of 1. | |
Highlight a | Position zero is first character in the string I.e, Alanine. |
Highlight p | Position three is therefore Proline. |
Type
my_protein.find(“d”) Press Enter |
Lets practice with a couple of more examples.
Type my_protein.find(“d”) for Aspartic acid. Press Enter |
Highlight 5 | Output shows number 5, which is the 6th position in the string. |
Type
Press Enter |
To find alanine , type
Press Enter |
Highlight 0 | Output shows 0, which is the first position on the string,
This is also the position which it occurs first on the string. |
Assignment
slide 6 |
What will be the output if we try to find an Amino acid that is not found on the string.
|
Press up arrow key | Back to the terminal again.
Press up arrow key to get the command my_protein = "alspadkanl” |
Often we may need only a part of the string to work with.
This process is called extracting a substring. | |
Highlight spad | As an example we will try to extract spad from the above sequence. |
Type
Press Enter
|
To extract a part of the string stored in a variable:
Type my_protein[2:6] Enclosed in these brackets are start and stop positions separated by a colon. The positions are inclusive at the start, but exclusive at the stop .
Press Enter |
Highlight s and d | Output shows spad, which means:
The expression my_protein[ 2:6] extracts the portion of the string: starting at third character I.e, 's' and ends before 7th I.e, 'd' which is 6th character. |
Type
my_protein[0:3] Press Enter my_protein[5:7] Press Enter |
We will demonstrate with a few more examples.
Type my_protein[0:3] Press Enter my_protein[5:7] Press Enter |
Observe the output. | |
Slide 7
Summary |
Let's summarize what we have learnt in this tutorial
|
Assignment
slide 8 |
As an assignment
|
Slide 9
About Spoken Tutorial Project
|
The video available at the following link summarizes the Spoken Tutorial project. Pls watch it. |
Slide 10
About Spoken Tutorial workshops |
The Spoken Tutorial Project Team conducts workshops and gives certificates to those who pass an online test.
For more details, please write to us. |
Slide 11
Acknowledgement |
Spoken Tutorial ProjecThese 3 slides remain as before, on LaTeX. Only the narration will now be as given here.t is supported by the NMEICT, MHRD, Government of India.
More information on this Mission is available at this link. |
The script is contributed by Snehlatha and Trupti Kini.
And this is Trupti Kini from IIT Bombay signing off. Thank you for joining. |