<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="https://script.spoken-tutorial.org/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>https://script.spoken-tutorial.org/index.php?action=history&amp;feed=atom&amp;title=Biopython%2FC2%2FParsing-Data%2FEnglish</id>
		<title>Biopython/C2/Parsing-Data/English - Revision history</title>
		<link rel="self" type="application/atom+xml" href="https://script.spoken-tutorial.org/index.php?action=history&amp;feed=atom&amp;title=Biopython%2FC2%2FParsing-Data%2FEnglish"/>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php?title=Biopython/C2/Parsing-Data/English&amp;action=history"/>
		<updated>2026-05-06T06:28:05Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.23.17</generator>

	<entry>
		<id>https://script.spoken-tutorial.org/index.php?title=Biopython/C2/Parsing-Data/English&amp;diff=23443&amp;oldid=prev</id>
		<title>Snehalathak: Created page with &quot; {| style=&quot;border-spacing:0;&quot; ! &lt;center&gt;Visual Cue&lt;/center&gt; ! &lt;center&gt;Narration&lt;/center&gt;  |- | style=&quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;...&quot;</title>
		<link rel="alternate" type="text/html" href="https://script.spoken-tutorial.org/index.php?title=Biopython/C2/Parsing-Data/English&amp;diff=23443&amp;oldid=prev"/>
				<updated>2015-09-02T05:19:40Z</updated>
		
		<summary type="html">&lt;p&gt;Created page with &amp;quot; {| style=&amp;quot;border-spacing:0;&amp;quot; ! &amp;lt;center&amp;gt;Visual Cue&amp;lt;/center&amp;gt; ! &amp;lt;center&amp;gt;Narration&amp;lt;/center&amp;gt;  |- | style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
{| style=&amp;quot;border-spacing:0;&amp;quot;&lt;br /&gt;
! &amp;lt;center&amp;gt;Visual Cue&amp;lt;/center&amp;gt;&lt;br /&gt;
! &amp;lt;center&amp;gt;Narration&amp;lt;/center&amp;gt;&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 1'''&lt;br /&gt;
&lt;br /&gt;
'''Title Slide'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Hello everyone.&lt;br /&gt;
&lt;br /&gt;
Welcome to this tutorial on '''Parsing Data.'''&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 2'''&lt;br /&gt;
&lt;br /&gt;
'''Learning Objectives'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| In this tutorial, we will learn to,&lt;br /&gt;
&lt;br /&gt;
* Download '''FASTA''' and '''GenBank''' files from '''NCBI''' database website.&amp;lt;br/&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* And '''Parse''' data files using functions in '''Sequence Input/Output''' module.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 3'''&lt;br /&gt;
&lt;br /&gt;
'''Pre-requisites'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| To follow this tutorial you should be familiar with,&lt;br /&gt;
&lt;br /&gt;
* Undergraduate Biochemistry or Bioinformatics&amp;lt;br/&amp;gt; &lt;br /&gt;
&lt;br /&gt;
* And basic '''Python''' programming &lt;br /&gt;
&lt;br /&gt;
Refer to the '''Python''' tutorials at the given link.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 4'''&lt;br /&gt;
&lt;br /&gt;
'''System Requirement'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| To record this tutorial I am using&lt;br /&gt;
&lt;br /&gt;
* '''Ubuntu''' OS version. 14.10&lt;br /&gt;
* '''Python''' version 2.7.8&lt;br /&gt;
* '''Ipython interpretor''' version 2.3.0&lt;br /&gt;
* '''Biopython''' 1.64&lt;br /&gt;
* And '''Mozilla Firefox '''browser 35.0&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 5'''&lt;br /&gt;
&lt;br /&gt;
'''Data files'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Scientific data in biology is generally stored in text files such as '''FASTA''', '''GenBank''', '''EMBL''', '''Swiss-Prot''' etc&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Data files can be download from the database websites.&lt;br /&gt;
&lt;br /&gt;
Open the website link given below in any web browser.&lt;br /&gt;
&lt;br /&gt;
'''http://www.ncbi.nlm.nih.gov/nucleotide'''&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the web-page.&lt;br /&gt;
&lt;br /&gt;
Download FASTA files&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| A web-page opens.&lt;br /&gt;
&lt;br /&gt;
Let us download '''FASTA''' and '''GenBank''' files for human '''insulin''' gene.&lt;br /&gt;
&lt;br /&gt;
In the search box type, '''human insulin'''.&lt;br /&gt;
&lt;br /&gt;
Click on search button.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Scroll down the page.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on check box.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The web-page shows many files for human insulin gene.&lt;br /&gt;
&lt;br /&gt;
For demonstration, I will select 4 files with the name “'''Homo sapiens Insulin mRNA”. '''&lt;br /&gt;
&lt;br /&gt;
I will choose files that have less than 500 base pairs.&lt;br /&gt;
&lt;br /&gt;
Click on the check box to select the file to download.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Bring the cursor to the “Send to” option. (Located on the top right hand corner.)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on the selection button. &lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Bring the cursor to the “'''Send to'''” option, located at the top right corner of the page.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Click on the small selection button with a down arrow present next to the “'''Send to'''” button.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Under the heading “Choose file destination” Click on “File” option.&lt;br /&gt;
&lt;br /&gt;
Click on “format” drop down list box.&lt;br /&gt;
&lt;br /&gt;
Choose “fasta” and click on “Create file” option. &lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Under the heading “'''Choose destination'''” Click on “'''File'''” option.&lt;br /&gt;
&lt;br /&gt;
You can save this file in any file format listed under “'''format'''” drop down list box.&lt;br /&gt;
&lt;br /&gt;
Choose “'''FASTA'''” from the given options.&lt;br /&gt;
&lt;br /&gt;
Then click on “'''Create file'''” option. &lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| A dialog box appears on screen. Click on “Save file” option.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| A dialog box appears on the screen. &lt;br /&gt;
&lt;br /&gt;
Select “'''Open with'''” click on '''OK .'''&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the text editor.&lt;br /&gt;
&lt;br /&gt;
Cursor on the first line.&lt;br /&gt;
&lt;br /&gt;
Scroll down.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The file opens in a text editor.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The file shows 4 records, since we had selected four files to download.&lt;br /&gt;
&lt;br /&gt;
The first line in each record is an '''identifier''' line,&lt;br /&gt;
&lt;br /&gt;
It starts with a “greater than (&amp;gt;) symbol”.&lt;br /&gt;
&lt;br /&gt;
This is followed by a '''sequence'''.&lt;br /&gt;
&lt;br /&gt;
Save the file in your home folder as “'''sequence.fasta'''”.&lt;br /&gt;
&lt;br /&gt;
Close the text editor.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the web-page.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Follow the same steps as above to download the files in '''GenBank''' format:&lt;br /&gt;
&lt;br /&gt;
for the same files selected earlier.&lt;br /&gt;
&lt;br /&gt;
Select the file format as G'''enBank.'''&lt;br /&gt;
&lt;br /&gt;
'''Create file''', open with a text editor. &lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Scroll down.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Notice that the sequence file in '''GenBank''' format has more features than a '''FASTA''' file.&lt;br /&gt;
&lt;br /&gt;
Save the file as '''sequence.gb '''in your home folder'''.'''&lt;br /&gt;
&lt;br /&gt;
Close the text editor'''.'''&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Click on the check boxes.&lt;br /&gt;
&lt;br /&gt;
Select '''Human insulin gene complete cds,''' &lt;br /&gt;
&lt;br /&gt;
click on the check box. &lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| For demonstration purpose we need a FASTA file with a single record.&lt;br /&gt;
&lt;br /&gt;
For this, clear the earlier selection by again clicking on the check boxes.&lt;br /&gt;
&lt;br /&gt;
Now select the file “'''Human insulin gene complete cds'''”.&lt;br /&gt;
&lt;br /&gt;
Click on the check box.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Save the file as '''insulin.fasta.'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| And Follow the same steps shown earlier to save the file in the home folder.&lt;br /&gt;
&lt;br /&gt;
Save the file as '''insulin.fasta.'''&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the text editor.&lt;br /&gt;
&lt;br /&gt;
Close the text editor.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Biological data stored in these files can be extracted and modified using '''Biopython''' libraries.&lt;br /&gt;
&lt;br /&gt;
Close the text editor.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 6'''&lt;br /&gt;
&lt;br /&gt;
'''Parsing'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Extracting data from data files is called as '''Parsing'''.&lt;br /&gt;
&lt;br /&gt;
Most file formats can be parsed using functions available in '''SeqIO''' module.&lt;br /&gt;
&lt;br /&gt;
Most commonly used functions of '''SeqIO''' module are, '''parse''', '''read''', '''write''', and '''convert'''.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide number 6'''&lt;br /&gt;
&lt;br /&gt;
Open the terminal using ctrl, alt and t keys.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Open the terminal by pressing '''ctrl''', '''alt''' and '''t''' keys simultaneously.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Type '''ipython''' at the prompt.&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt;ipython&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Start '''Ipython''' by typing '''ipython''' at the prompt.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Press enter.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the terminal.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Next import '''SeqIO''' module from '''Bio''' package.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Type,&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; '''from Bio import SeqIO'''&lt;br /&gt;
&lt;br /&gt;
Press enter&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| At the prompt type,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''from Bio import SeqIO'''&lt;br /&gt;
&lt;br /&gt;
Press enter&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| (Open the file in text editor and scroll down) &lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| We will start with the most important function “'''parse'''”.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For demonstration, I will use a '''FASTA''' file that has many records.&lt;br /&gt;
&lt;br /&gt;
Which we had downloaded earlier from the database.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Type,&lt;br /&gt;
&lt;br /&gt;
'''or seq_record in SeqIO.parse(&amp;quot;sequence.fasta&amp;quot;, &amp;quot;fasta&amp;quot;):'''&lt;br /&gt;
&lt;br /&gt;
'''print(seq_record.id)'''&lt;br /&gt;
&lt;br /&gt;
'''print(repr(seq_record.seq))'''&lt;br /&gt;
&lt;br /&gt;
'''print(len(seq_record))'''&lt;br /&gt;
&lt;br /&gt;
Highlight all the lines.&lt;br /&gt;
&lt;br /&gt;
Press enter key twice to get the output.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| For simple '''FASTA''' parsing, type the following at the prompt.&lt;br /&gt;
&lt;br /&gt;
Here we are using the '''parse''' function to read contents of '''sequence.fasta''' file.&lt;br /&gt;
&lt;br /&gt;
For the output print, '''record id''', sequence present in the record and also the length of the sequence.&lt;br /&gt;
&lt;br /&gt;
Also notice that the '''parse''' function is used to read sequence data as '''Sequence record objects'''. &lt;br /&gt;
&lt;br /&gt;
It is generally used with a '''for''' loop.&lt;br /&gt;
&lt;br /&gt;
It can accept two '''arguments''', the first one is the file name to read the data.&lt;br /&gt;
&lt;br /&gt;
The second specifies the file format.&lt;br /&gt;
&lt;br /&gt;
Press enter key twice to get the output.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Highlight the first line.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The output shows the''' identifier line, '''followed by the sequence contained in the file.&lt;br /&gt;
&lt;br /&gt;
Also the length of the sequence for all the records in the file.&lt;br /&gt;
&lt;br /&gt;
Notice that the '''FASTA''' format does not specify the alphabet. &lt;br /&gt;
&lt;br /&gt;
So, the output does not specifies it as as a '''DNA sequence'''.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Type,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
for seq_record in SeqIO.parse(&amp;quot;sequence.gb&amp;quot;, &amp;quot;genbank&amp;quot;):&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The same steps can be repeated for parsing '''GenBank''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For Demonstration we will use the '''GenBank''' file which we have download earlier from the database.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Press up arrow key to get the lines of code which we had used earlier.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Change the file name to '''sequence.gb '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Change the file format to '''genbank.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The rest of the code remains same.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| from Bio import SeqIO &lt;br /&gt;
&lt;br /&gt;
for seq_record in SeqIO.parse(&amp;quot;insulin.gb&amp;quot;, &amp;quot;genbank&amp;quot;): &lt;br /&gt;
&lt;br /&gt;
print(seq_record.id) &lt;br /&gt;
&lt;br /&gt;
print(seq_record.seq) &lt;br /&gt;
&lt;br /&gt;
'''print(len(seq_record))'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Press enter key twice to get the output.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here too the output shows the '''record id''', sequence and the length of the sequence for all the records in the file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Notice that the '''GenBank''' format specifies the sequence as DNA sequence. &lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the terminal.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Similarly '''Swiss-prot''' and '''EMBL''' files can be parsed using same code as above.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the terminal.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| If your file contains a single record then type the following lines for '''parsing'''.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Type,'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''&amp;gt;&amp;gt;&amp;gt; from Bio import SeqIO '''&lt;br /&gt;
&lt;br /&gt;
'''&amp;gt;&amp;gt;&amp;gt; record = SeqIO.read(&amp;quot;insulin.fasta&amp;quot;, &amp;quot;fasta&amp;quot;) '''&lt;br /&gt;
&lt;br /&gt;
'''&amp;gt;&amp;gt;&amp;gt; record'''&lt;br /&gt;
&lt;br /&gt;
'''Press enter '''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Here we will use the previously saved FASTA file with a single record, that is '''insulin.fasta '''as an example'''.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Notice that we have used '''read''' function instead of parse function.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Press enter. &lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the terminal.&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The output shows the contents for the file '''insulin.fasta'''.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
It shows the sequence as sequence record object.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
And other attributes such as '''GI, accession number '''and '''description'''.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| At the prompt type,&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; record.seq &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
press enter&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| We can also view the individual attributes of this record as follows.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
At the prompt type, '''record dot seq '''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
press enter&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| Cursor on the terminal'''.'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The output shows the sequence present in the file.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| type,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; record.id&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
press enter&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| To view the identifiers for this record, type, '''record dot id.'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
press enter&lt;br /&gt;
&lt;br /&gt;
The output shows the GI number and accession number etc.&lt;br /&gt;
&lt;br /&gt;
You can use the function described above to parse the data files of your choice.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 9'''&lt;br /&gt;
&lt;br /&gt;
Summary&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Now Let's summarize,&lt;br /&gt;
&lt;br /&gt;
In this tutorial we have learnt to,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Download '''FASTA''' and '''GenBank''' files from '''NCBI''' database website. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
And use '''parse''' and '''read''' functions from the '''SeqIO''' module:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To extract data such as record ids, description and sequences, from '''FASTA''' and '''GenBank''' files.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 10'''&lt;br /&gt;
&lt;br /&gt;
'''Assignment'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Now for the assignment,&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Download '''FASTA''' files for nucleotide sequence of your choice from '''NCBI''' database.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Convert the file of sequences to their reverse complements.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Type at the prompt,'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''&amp;gt;&amp;gt;&amp;gt; from Bio import SeqIO '''&lt;br /&gt;
&lt;br /&gt;
'''&amp;gt;&amp;gt;&amp;gt; for record in SeqIO.parse(&amp;quot;sequence.fasta&amp;quot;, &amp;quot;fasta&amp;quot;): '''&lt;br /&gt;
&lt;br /&gt;
'''... '''&lt;br /&gt;
&lt;br /&gt;
'''print(record.id) '''&lt;br /&gt;
&lt;br /&gt;
'''... '''&lt;br /&gt;
&lt;br /&gt;
'''print(record.seq.reverse_complement())'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Your completed assignment should have the following lines of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Use '''parse''' function to load nucleotide sequences from the '''FASTA''' file.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Next print reverse complements using the Seq object’s built in reverse complement method.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 11'''&lt;br /&gt;
&lt;br /&gt;
'''Acknowledgement '''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| Video at the following link, summarizes the spoken-tutorial project.&lt;br /&gt;
&lt;br /&gt;
Please download and watch it.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide Number 12'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The Spoken Tutorial Project Team: &lt;br /&gt;
&lt;br /&gt;
Conducts workshops and gives certificates to those who pass an on-line test. &lt;br /&gt;
&lt;br /&gt;
For more details, please write to us.&lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| '''Slide number 13'''&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.&lt;br /&gt;
&lt;br /&gt;
More information on this Mission is available at the link shown. &lt;br /&gt;
&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:none;padding:0.097cm;&amp;quot;| &lt;br /&gt;
| style=&amp;quot;background-color:#ffffff;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.097cm;&amp;quot;| This is Snehalatha from IIT Bombay signing off. Thank you for joining. &lt;br /&gt;
&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Snehalathak</name></author>	</entry>

	</feed>