Python for Biologists/C2/Manipulating-Strings/English

From Script | Spoken-Tutorial
Jump to: navigation, search

Title of script: Manipulating strings

Author: Snehalatha and Trupti

Keywords: Python, DNA sequences as strings, Protein sequences as strings,

Visual Cue Narration
Slide 1

Title Slide

Hello everyone.

Welcome to this tutorial on Manipulating Strings in Python for biologists series.

Slide 2

Learning Objectives

In this tutorial, we will learn more Python functions to manipulate strings.

  • Concatenation. (Adding two strings )
  • Replacing a character in a string.
  • Find the position of character in a string.
  • Extracting part of a string.

Slide 3

System Requirements

To record this tutorial, I am using
  • Ubuntu OS version 12.04
  • Python 3.2.3
  • IPython 0.12.1

Slide 4


To practice this tutorial you should be familiar with,
  • Basic biochemistry

You can also refer to Spoken Tutorials on Python for better understanding of this tutorial.

These are available at

Slide 5

Python Functions and Methods

We have already learnt a few functions and methods like:
  • len() : to find the length of the string
  • count(): to count occurrences in a string
  • lower() and upper() to change the case etc.

Open the terminal by pressing Ctrl, Alt and T Lets switch to the terminal and learn a few more functions and methods.

Press Ctrl, Alt and T keys simultaneously

Cursor on the terminal.

At the prompt type ipython 3 and press enter.

IPython opens with a prompt.

Cursor on the terminal. Lets begin with concatenation:

It means joining two different strings using plus (+)symbol.


my_DNA = “ATGC” + “GCAT”

Press Enter

At the prompt:

Type the variable name as,

my_DNA equal to , in double quotes ATGC plus again in double quotes GCAT

press Enter

Type print(my_DNA) Now let us print the output.

Type print(my_DNA)

Highlight ATGCGCAT Note that the output shows a combination of two strings.

dna1 = “ATGC”

dna2 = “GCAT”

We can also concatenate two different strings stored in different variables.

dna1 = “ATGC” press Enter

dna2 = “GCAT” press Enter

Concatenate the two strings using plus symbol.


dna3 = “dna1” + “dna2”

Lets store the new string in variable called dna3.


dna3 = “dna1” + “dna2” press Enter



We get an output "ATGCGCAT"

Note that the two strings are combined to form a new string.

Another very useful method often used is,

Replace a character in a string with another.

Type my_DNA = "ATGCGCAT" For example lets store a DNA sequence in a variable



To replace all the Thiamine in the string with Uracil i.e, to convert the DNA to RNA sequence.

my_DNA.replace(“T” , “U”)

Press Enter


The variable name ,




With in brackets in quotes T space comma space within quotes U

Press Enter

Highlight AUGCGCAU The output shows the string with all the Thiamine replaced with Uracil.
We can also replace more than one character in a string.
Press up arrow to get variable my_DNA = “ATGCGCAT” Lets demonstrate this:

Press up arrow to get variable my_DNA = “ATGCGCAT”

We will replace a part of the string with a new set of characters.

my_DNA.replace('GCGC' , 'CATG')

Press Enter





within brackets in quotes GCGC space comma space within quotes CATG

Press Enter

Highlight output


The output shows the part of the string is replaced by a new one.
We will now learn how to find the position of a character in a string.
Lets make use of a protein sequence as an example

my_protein = "alspadkanl”

At the prompt type

my_protein = "alspadkanl”

To find the position of Proline, which is represented as alphabet p:

We will make use of method find.



Press Enter

Type the name of the variable i.e,

my_protein followed by dot find in brackets in single quotes p.

Press Enter

Highlight 3 Output shows number 3.
In Python,

We count the characters in the string starting from 0 instead of 1.

Highlight a Position zero is first character in the string I.e, Alanine.
Highlight p Position three is therefore Proline.


Press Enter

Lets practice with a couple of more examples.


my_protein.find(“d”) for Aspartic acid.

Press Enter

Highlight 5 Output shows number 5, which is the 6th position in the string.

my_protein.find(“a”) for Alanine.

Press Enter

To find alanine , type

my_protein.find(“a”) for Alanine.

Press Enter

Highlight 0 Output shows 0, which is the first position on the string,

This is also the position which it occurs first on the string.


slide 6

What will be the output if we try to find an Amino acid that is not found on the string.

Make note of your observations.

Press up arrow key Back to the terminal again.

Press up arrow key to get the command

my_protein = "alspadkanl”

Often we may need only a part of the string to work with.

This process is called extracting a substring.

Highlight spad As an example we will try to extract spad from the above sequence.

my_protein[ 2:6]

Press Enter

To extract a part of the string stored in a variable:

Type my_protein[2:6]

Enclosed in these brackets are start and stop positions separated by a colon.

The positions are inclusive at the start,

but exclusive at the stop .



Press Enter

Highlight s and d Output shows spad, which means:

The expression my_protein[ 2:6] extracts the portion of the string:

starting at third character I.e, 's' and ends before 7th I.e, 'd' which is 6th character.



Press Enter


Press Enter

We will demonstrate with a few more examples.



Press Enter


Press Enter

Observe the output.
Slide 7


Let's summarize what we have learnt in this tutorial
  • Concatenation (Adding two strings )
  • Replacing a character in a string.
  • Find the position of character in a string.
  • Extracting part of a string.


slide 8

As an assignment

Find the position of ash in the given protein sequence.


Extract a fragment hashd from the sequence.

Slide 9

About Spoken Tutorial Project

The video available at the following link summarizes the Spoken Tutorial project. Pls watch it.
Slide 10

About Spoken Tutorial workshops

The Spoken Tutorial Project Team conducts workshops and gives certificates to those who pass an online test.

For more details, please write to us.

Slide 11


Spoken Tutorial ProjecThese 3 slides remain as before, on LaTeX. Only the narration will now be as given here.t is supported by the NMEICT, MHRD, Government of India.

More information on this Mission is available at this link.

The script is contributed by Snehlatha and Trupti Kini.

And this is Trupti Kini from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors
