Python for Biologists/C2/Manipulating-Strings/English

From Script | Spoken-Tutorial
Revision as of 10:51, 7 August 2014 by Trupti (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of script: Manipulating strings

Author: Snehalatha and Trupti

Keywords: Python, DNA sequences as strings, Protein sequences as strings,


Visual Cue Narration
Slide 1

Title Slide

Hello everyone.

Welcome to this tutorial on Manipulating Strings in Python for biologists series.

Slide 2

Learning Objectives

In this tutorial, we will learn more Python functions to manipulate strings.


  • Concatenation. (Adding two strings )
  • Replacing a character in a string.
  • Find the position of character in a string.
  • Extracting part of a string.


Slide 3

System Requirements


To record this tutorial, I am using
  • Ubuntu OS version 12.04
  • Python 3.2.3
  • IPython 0.12.1


Slide 4

Prerequisite

To practice this tutorial you should be familiar with,
  • Basic biochemistry

You can also refer to Spoken Tutorials on Python for better understanding of this tutorial.

These are available at www.spoken-tutorial.org

Slide 5


Python Functions and Methods

We have already learnt a few functions and methods like:
  • len() : to find the length of the string
  • count(): to count occurrences in a string
  • lower() and upper() to change the case etc.


Open the terminal by pressing Ctrl, Alt and T Lets switch to the terminal and learn a few more functions and methods.


Press Ctrl, Alt and T keys simultaneously

Cursor on the terminal.


At the prompt type ipython 3 and press enter.


IPython opens with a prompt.

Cursor on the terminal. Lets begin with concatenation:

It means joining two different strings using plus (+)symbol.

Type


my_DNA = “ATGC” + “GCAT”

Press Enter


At the prompt:

Type the variable name as,

my_DNA equal to , in double quotes ATGC plus again in double quotes GCAT

press Enter

Type print(my_DNA) Now let us print the output.

Type print(my_DNA)

Highlight ATGCGCAT Note that the output shows a combination of two strings.
Type,


dna1 = “ATGC”

dna2 = “GCAT”


We can also concatenate two different strings stored in different variables.


dna1 = “ATGC” press Enter

dna2 = “GCAT” press Enter

Concatenate the two strings using plus symbol.

Type,


dna3 = “dna1” + “dna2”


Lets store the new string in variable called dna3.


Type

dna3 = “dna1” + “dna2” press Enter

Highlight

"ATGCGCAT"

We get an output "ATGCGCAT"


Note that the two strings are combined to form a new string.

Another very useful method often used is,

Replace a character in a string with another.

Type my_DNA = "ATGCGCAT" For example lets store a DNA sequence in a variable


Type,


my_DNA = "ATGCGCAT"

To replace all the Thiamine in the string with Uracil i.e, to convert the DNA to RNA sequence.
Type

my_DNA.replace(“T” , “U”)

Press Enter

Type

The variable name ,

my_DNA

Dot

replace

With in brackets in quotes T space comma space within quotes U

Press Enter

Highlight AUGCGCAU The output shows the string with all the Thiamine replaced with Uracil.
We can also replace more than one character in a string.
Press up arrow to get variable my_DNA = “ATGCGCAT” Lets demonstrate this:


Press up arrow to get variable my_DNA = “ATGCGCAT”

We will replace a part of the string with a new set of characters.
Type


my_DNA.replace('GCGC' , 'CATG')

Press Enter

Type


my_DNA

dot

replace

within brackets in quotes GCGC space comma space within quotes CATG

Press Enter

Highlight output

ATCATGAT

The output shows the part of the string is replaced by a new one.
We will now learn how to find the position of a character in a string.
Lets make use of a protein sequence as an example
Type,

my_protein = "alspadkanl”

At the prompt type


my_protein = "alspadkanl”

To find the position of Proline, which is represented as alphabet p:

We will make use of method find.

Type

my_protein.find(“p”)

Press Enter

Type the name of the variable i.e,


my_protein followed by dot find in brackets in single quotes p.


Press Enter

Highlight 3 Output shows number 3.
In Python,

We count the characters in the string starting from 0 instead of 1.

Highlight a Position zero is first character in the string I.e, Alanine.
Highlight p Position three is therefore Proline.
Type

my_protein.find(“d”)

Press Enter

Lets practice with a couple of more examples.

Type

my_protein.find(“d”) for Aspartic acid.

Press Enter

Highlight 5 Output shows number 5, which is the 6th position in the string.
Type


my_protein.find(“a”) for Alanine.

Press Enter

To find alanine , type


my_protein.find(“a”) for Alanine.

Press Enter

Highlight 0 Output shows 0, which is the first position on the string,

This is also the position which it occurs first on the string.

Assignment

slide 6

What will be the output if we try to find an Amino acid that is not found on the string.


Make note of your observations.

Press up arrow key Back to the terminal again.

Press up arrow key to get the command

my_protein = "alspadkanl”

Often we may need only a part of the string to work with.

This process is called extracting a substring.

Highlight spad As an example we will try to extract spad from the above sequence.
Type


my_protein[ 2:6]

Press Enter


To extract a part of the string stored in a variable:

Type my_protein[2:6]

Enclosed in these brackets are start and stop positions separated by a colon.

The positions are inclusive at the start,

but exclusive at the stop .


Type


my_protein[2:6]

Press Enter

Highlight s and d Output shows spad, which means:

The expression my_protein[ 2:6] extracts the portion of the string:

starting at third character I.e, 's' and ends before 7th I.e, 'd' which is 6th character.

Type

my_protein[0:3]

Press Enter

my_protein[5:7]

Press Enter

We will demonstrate with a few more examples.

Type

my_protein[0:3]

Press Enter

my_protein[5:7]

Press Enter

Observe the output.
Slide 7

Summary

Let's summarize what we have learnt in this tutorial
  • Concatenation (Adding two strings )
  • Replacing a character in a string.
  • Find the position of character in a string.
  • Extracting part of a string.


Assignment

slide 8

As an assignment


Find the position of ash in the given protein sequence.


aksdhashdaks


Extract a fragment hashd from the sequence.

Slide 9

About Spoken Tutorial Project


The video available at the following link summarizes the Spoken Tutorial project. Pls watch it.
Slide 10

About Spoken Tutorial workshops

The Spoken Tutorial Project Team conducts workshops and gives certificates to those who pass an online test.

For more details, please write to us.

Slide 11

Acknowledgement

Spoken Tutorial ProjecThese 3 slides remain as before, on LaTeX. Only the narration will now be as given here.t is supported by the NMEICT, MHRD, Government of India.

More information on this Mission is available at this link.

The script is contributed by Snehlatha and Trupti Kini.

And this is Trupti Kini from IIT Bombay signing off. Thank you for joining.

Contributors and Content Editors

Trupti