Difference between revisions of "Biopython/C2/Parsing-Data/Khasi"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with "{| Border=1 ! <center>Time</center> ! <center>Narration</center> |- | 00:01 | Hello everyone.Welcome to this tutorial on '''Parsing Data.''' (Khublei baroh. Ngi pdiang sngewb...")
 
 
Line 5: Line 5:
 
|-
 
|-
 
| 00:01
 
| 00:01
| Hello everyone.Welcome to this tutorial on '''Parsing Data.'''
+
| Khublei baroh. Ngi pdiang sngewbha ia phi sha kane ka tutorial halor ka '''Parsing Data.'''
(Khublei baroh. Ngi pdiang sngewbha ia phi sha kane ka tutorial shaphang ka '''Parsing Data.'''
+
  
 
|-
 
|-
 
| 00:06
 
| 00:06
| In this tutorial, we will learn to download '''FASTA''' and '''GenBank''' files from '''NCBI''' database website.
+
| Ha kane ka tutorial, ngin pule kumno ban download ia ki '''FASTA''' bad '''GenBank''' files na ka '''NCBI''' database website.  
(Ha kane ka tutorial, ngin sa nang  kumno ban download ia ki '''FASTA''' bad '''GenBank''' files na ka database website.  
+
  
 
|-
 
|-
 
| 00:14
 
| 00:14
| And, '''Parse''' data files using '''function'''s in '''Sequence Input/Output''' module.
+
| Bad ban '''Parse'''ia ki data files da kaba pyndonkam ia ki '''function'''s  ha ka  '''Sequence Input/Output''' module.
(Bad ban '''Parse'''ia ki data files da kaba pyndonkam ia ki '''function'''s  ha ka  '''Sequence Input/Output''' module.
+
  
 
|-
 
|-
 
| 00:19
 
| 00:19
| To follow this tutorial, you should be familiar with undergraduate biochemistry or bioinformatics
+
| Ban bud ia kane ka tutorial, phi dei ban tip bha ia ka undergraduate biochemistry lane bioinformatics.  
(Ban sngewthuh bha  ia kane ka tutorial, phi dei ban long kiba shemphang ha  ka undergraduate biochemistry lane bioinformatics.  
+
  
 
|-
 
|-
 
| 00:26
 
| 00:26
| and basic '''Python''' programming.
+
| bad ka basic '''Python''' programming.
(bad ka basic '''Python''' programming.
+
  
 
|-
 
|-
 
| 00:30
 
| 00:30
| Refer to the '''Python''' tutorials at the given link.
+
| Peit ia ka '''Python''' tutorials ha ka link ba la ai.
(Peit ia ka '''Python''' tutorials na  ka link ba ai hapoh.  
+
  
 
|-
 
|-
 
| 00:34
 
| 00:34
| To record this tutorial, I am using: * '''Ubuntu OS''' version 14.10
+
| Ban record ia kane ka tutorial, nga pyndonkam da ka : * '''Ubuntu OS''' version 14.10
(Ban record ia kane ka tutorial, nga pyndonkam da ka : * '''Ubuntu OS''' version 14.10
+
  
 
|-
 
|-
 
| 00:40
 
| 00:40
 
| '''Python''' version 2.7.8
 
| '''Python''' version 2.7.8
('''Python''' version 2.7.8
 
  
 
|-
 
|-
 
| 00:44
 
| 00:44
| '''Ipython interpretor''' version 2.3.0
+
| '''Ipython interpretor''' version 2.3.0
('''Ipython interpretor''' version 2.3.0
+
  
 
|-
 
|-
 
| 00:48
 
| 00:48
|'''Biopython''' version 1.64 and * '''Mozilla Firefox '''browser 35.0.
+
| '''Biopython''' version 1.64 bad * '''Mozilla Firefox '''browser 35.0.
('''Biopython''' version 1.64 bad * '''Mozilla Firefox '''browser 35.0.
+
  
 
|-
 
|-
 
| 00:56
 
| 00:56
| Scientific data in biology is generally stored in text files such as '''FASTA''', '''GenBank''', '''EMBL''', '''Swiss-Prot''' etc.
+
| Ki scientific data jong ka biology ju store barabor ia ki ha ka text file kum '''FASTA''', '''GenBank''', '''EMBL''', '''Swiss-Prot''' kumta ter ter.
(Ki scientific data jong ka biology ju store barabor ia ki ha ka text file kum '''FASTA''', '''GenBank''', '''EMBL''', '''Swiss-Prot''' kumta ter ter.
+
  
 
|-
 
|-
 
| 01:07
 
| 01:07
|Data files can be downloaded from the database websites.
+
| Ia ki data files lah ban download na ka database websites.
(Ia ki data files lah ban download na ka database websites.
+
  
 
|-
 
|-
 
| 01:12
 
| 01:12
|Open the website link given below, in any web browser.
+
| Plie ia ka website link ba lah ai harum, da uno uno u web browser.
(Plie ia ka website link ba lah ai harum, da uno uno u web browser.
+
  
 
|-
 
|-
 
| 01:17
 
| 01:17
| A web-page opens.
+
| Ka web-page ka plie.
(Ka web-page kan sa  plie.
+
  
 
|-
 
|-
 
| 01:19
 
| 01:19
|Let us download '''FASTA''' and '''GenBank''' files for human '''insulin gene'''.
+
| To ngin download iaka '''FASTA''' bad '''GenBank''' files na ka bynta ka human '''insulin gene'''.
(To ngin ia download '''FASTA''' bad '''GenBank''' files na ka bynta ka human '''insulin gene'''.
+
  
 
|-
 
|-
 
| 01:25
 
| 01:25
|In the search box, type: "human insulin", click on '''Search''' button.
+
| Ha ka search box, type: "human insulin", click ha '''Search''' button.
(Ha ka search box, type: "human insulin", click ha '''Search''' button.
+
  
 
|-
 
|-
 
| 01:31
 
| 01:31
| The web-page shows many files for human '''insulin gene'''.
+
| Ka web-page ka pyni shibun tylli ki flies na ka bynta ka human '''insulin gene'''.
(Ka web-page kan pyni bun  ki flies na ka bynta ka human '''insulin gene'''.
+
  
 
|-
 
|-
 
| 01:35
 
| 01:35
|For demonstration, I will select 4 files with the name “Homo sapiens Insulin mRNA”.
+
| Ban pyni nuksa, ngan jied 4 tylli ki files kiba kyrteng “Homo sapiens Insulin mRNA”.
(Ban pyni nuksa, ngan jied 4 tylli ki files kiba kyrteng “Homo sapiens Insulin mRNA”.
+
  
 
|-
 
|-
 
| 01:43
 
| 01:43
|I will choose files that have less than 500 '''base''' pairs.
+
| Ngan jied ia ki files ba duna ia ka 500 '''base''' pairs.
(Ngan jied ia ki files ba duna ia ka 500 '''base''' pairs.
+
  
 
|-
 
|-
 
| 01:48
 
| 01:48
|Click on the check-box to select the file, to download.
+
| Click ha ka check-box ban jied ia ka file ban download.  
(Click ha ka check-box ban select  ia ka file ban download.  
+
  
 
|-
 
|-
 
| 01:56
 
| 01:56
| Bring the cursor to the “'''Send to'''” option, located at the top right corner of the page.
+
| Wanrah ia u cursor sha “'''Send to'''” option, kaba don ha kyndong khlieh ka mon jong ka page.
(Wanrah ia u cursor sha “'''Send to'''” option, kaba don ha jrong duh sha ka liang  ka mon jong ka page.
+
  
 
|-
 
|-
 
| 02:02
 
| 02:02
|Click on the small selection button with a down arrow, present next to the “'''Send to'''” button.
+
| Click ha i selection button barit ba don u khnam ba kdew shapoh, ba don hajan ka “'''Send to'''” button.
(Click ha i selection button barit ba don u khnam ba kdew shapoh, ba don hajan ka “'''Send to'''” button.
+
  
 
|-
 
|-
 
| 02:09
 
| 02:09
| Under the heading “'''Choose destination'''”, click on '''File''' option.
+
| Hapoh ka heading “'''Choose destination'''”, click ha '''File''' option.
(Hapoh ka heading “'''Choose destination'''”, click ha '''File''' option.
+
  
 
|-
 
|-
 
| 02:13
 
| 02:13
|You can '''save''' this file in any file format, listed under '''format''' drop-down list box.
+
| Phi lah ban '''save''' ia kane ka file ha kano kano ka format, kiba don hapoh '''format''' drop-down list box.
(Phi lah ban '''save''' ia kane ka file ha kano kano ka format, kiba don hapoh '''format''' drop-down list box.
+
  
 
|-
 
|-
 
| 02:21
 
| 02:21
|Choose '''FASTA''' from the given options.
+
| Jied '''FASTA''' na ki options ba ai hapoh.
(Jied '''FASTA''' na ki options ba ai hapoh.
+
 
 
|-
 
|-
 
| 02:25
 
| 02:25
|Then click on '''Create file''' option.
+
| Nangta sa click ha '''Create file''' option.
(Nangta sa click ha '''Create file''' option.
+
  
 
|-
 
|-
 
| 02:29
 
| 02:29
| A dialog-box appears on the screen.
+
| Ka dialog-box kan sa mih ha ka screen.  
(Ka dialog-box kan sa mih ha ka screen.  
+
  
 
|-
 
|-
 
|02:32  
 
|02:32  
|Select '''Open with''', click on '''OK.'''
+
| Jied '''Open with''', click ha '''OK.'''
(Jied '''Open with''', click ha '''OK.'''
+
  
 
|-
 
|-
 
| 02:36
 
| 02:36
| A file opens in a '''text editor'''.
+
| Ka file ka plie ha ka '''text editor'''.
(Ka file kan sa mih  ha ka '''text editor'''.
+
  
 
|-
 
|-
 
| 02:39
 
| 02:39
|The file shows 4 records, since we had selected four files to download.
+
| Kane ka file ka pyni 4 tylli ki records, namar ngi la jied 4 tylli ki files ban download.  
(Kane ka file ka pyni 4 tylli ki records, namar ngi la jied ban plie  4 tylli ki files ban download.  
+
  
 
|-
 
|-
 
| 02:46
 
| 02:46
|The first line in each record is an '''identifier''' line.
+
| U line banyngkong ha kawei pa kawei ka record u dei u '''identifier''' line.
(U line banyngkong ha kawei pa kawei ka record u dei u '''identifier''' line.
+
  
 
|-
 
|-
 
| 02:50
 
| 02:50
|It starts with a “greater than (>)” symbol.
+
| U sdang da u “greater than >)” symbol.
(Un  sdang da u “greater than (>)” symbol.
+
  
 
|-
 
|-
 
| 02:53
 
| 02:53
|This is followed by a '''sequence'''.
+
| Nangta la pynbud da u'''sequence'''.
(Nangta sa bud sa  da u'''sequence'''.
+
  
 
|-
 
|-
 
| 02:56
 
| 02:56
|'''Save''' the file in your '''home''' folder as “sequence.fasta'”.
+
| '''Save''' ia ka file ha '''home'''folder jong phi kum ka“sequence.fasta'”.
('''Save''' ia ka file ha '''home'''folder kum “sequence.fasta'”.
+
  
 
|-
 
|-
 
| 03:01
 
| 03:01
|Close the text editor.
+
| Khang ia ka text editor.  
(Khang ia ka text editor.  
+
  
 
|-
 
|-
 
| 03:03
 
| 03:03
| Follow the same steps as above, to download the files in '''GenBank''' format
+
| Pynbud ki juh ki syn jam kum haneng ban download ia ki files ha '''GenBank''' format
(Leh kumjuh ka rukom  kum haneng ban download ia ki files ha '''GenBank''' format
+
 
+
  
 
|-
 
|-
 
| 03:08
 
| 03:08
|for the same files selected earlier.
+
| na ka bynta ki files ba la jied hashwa.  
(na ka bynta ki files ba jied nyngkong .  
+
  
 
|-
 
|-
 
| 03:12
 
| 03:12
|Select the '''file format''' as '''GenBank.'''
+
| Jied ia ka '''file format''' kum '''GenBank.'''
(Select  ia ka '''file format''' kum '''GenBank.'''
+
  
 
|-
 
|-
 
| 03:16
 
| 03:16
|Create a file. Open with a text editor.
+
| Shna ia ka file. Plie da u text editor.  
(Shna ia ka file. Plie da u text editor.  
+
  
 
|-
 
|-
 
| 03:21
 
| 03:21
| Notice that the sequence file in '''GenBank''' format has more features than a '''FASTA''' file.
+
| Peit thuh ba ka sequence file ha '''GenBank''' format ka kham bun features ban ia ka '''FASTA''' file.
(Ha khmih  ba ka sequence file ha '''GenBank''' format ka kham bun features ban ia ka '''FASTA''' file.
+
  
 
|-
 
|-
 
| 03:27
 
| 03:27
|'''Save''' the file as "sequence.gb" in your '''home''' folder. Close the text editor.
+
| '''Save''' ia ka file kum "sequence.gb" ha ka '''home''' folder. Khang noh u text editor.
('''Save''' ia ka file kum "sequence.gb" ha ka '''home''' folder. Khang noh u text editor.
+
  
 
|-
 
|-
 
| 03:34
 
| 03:34
| For demonstration purpose, we need a FASTA file with a single '''record'''.
+
| Ban peit nuksa, ngi donkam ia ka FASTA file ba don '''record''' tang iwei.  
(Ban peit nuksa, ngi donkam ia ka FASTA file ba don '''record''' tang iwei.  
+
  
 
|-
 
|-
 
| 03:39
 
| 03:39
|For this, clear the earlier selection by again clicking on the check boxes.
+
| Na ka bynta kane, pynkhuid ia ki jingjied ba hashwa da kaba click biang ha ki check box.
(Na ka bynta kane, pynkhuid ia ki selection ba nyngkong  da kaba click biang ha ki check boxes.
+
  
 
|-
 
|-
 
| 03:48
 
| 03:48
|Now, select the file “'''Human insulin gene complete cds'''”.
+
| Mynta, jied ia ka file “'''Human insulin gene complete cds'''”.
(Mynta, select  ia ka file “'''Human insulin gene complete cds'''”.
+
 
+
  
 
|-
 
|-
 
| 03:54
 
| 03:54
|Click on the check-box.
+
| Click ha ka check-box.
(Click ha ka check-box.
+
  
 
|-
 
|-
 
| 03:57
 
| 03:57
| And follow the same steps shown earlier to '''save''' the file in the '''home''' folder.
+
| Bad sa bud ia ki rukom kumba la pyni mynne ban '''save''' ia ka file ha ka '''home''' folder.
(Bad sa bud ia ki rukom kumba la pyni nyngkong  ban '''save''' ia ka file ha ka '''home''' folder.
+
  
 
|-
 
|-
 
| 04:01
 
| 04:01
|'''Save''' the file as "insulin.fasta".
+
| '''Save''' ia ka file kum "insulin.fasta".
('''Save''' ia ka file kum "insulin.fasta".
+
  
 
|-
 
|-
 
| 04:08
 
| 04:08
| Biological data stored in these files can be extracted and modified using '''Biopython''' libraries.
+
| Ia ki Biological data ba lah store ha kine ki files lah ban sei bad pynkylla da kaba pyndonkam da ka '''Biopython''' libraries.  
(Ki  Biological data ba lah store ha kine ki files lah ban sei bad pynkylla da kaba pyndonkam da ka '''Biopython''' libraries.  
+
  
 
|-
 
|-
 
| 04:16
 
| 04:16
|Close the text-editor.
+
| Khang ia u text-editor.  
(Khang ia u text-editor.  
+
  
 
|-
 
|-
 
| 04:19
 
| 04:19
| Extracting data from data files is called as '''Parsing'''.
+
| Ban sei ia ki data na data files la khot '''Parsing'''.
(Ban sei ia ki data na data files ki ju  khot '''Parsing'''.
+
  
 
|-
 
|-
 
| 04:23
 
| 04:23
|Most file formats can be parsed using '''function'''s available in '''SeqIO''' module.
+
| Jan ia baroh ki file formats lah ban parsed da kaba pyndonkam '''function'''s kiba don ha '''SeqIO''' module.
(Bun ia kum kine jait  file formats lah ban parsed da kaba pyndonkam '''function'''s kiba don ha '''SeqIO''' module.
+
  
 
|-
 
|-
 
| 04:30
 
| 04:30
|Most commonly used functions of '''SeqIO''' module are: '''parse, read, write''' and '''convert'''.
+
| Ki function jong ka '''SeqIO''' module kiba ju kham pyndonkam bha baroh ki long: '''parse, read, write''' bad '''convert'''.
(Kiba kham paw ki functions  jong ka '''SeqIO''' module dei  : '''parse, read, write''' bad '''convert'''.
+
  
 
|-
 
|-
 
| 04:38
 
| 04:38
| Open the terminal by pressing '''Ctrl, Alt''' and '''t''' keys simultaneously.
+
| Plie ia ka terminal da kaba nion lang ia '''Ctrl, Alt''' bad '''t''' keys.  
(Plie ia ka terminal da kaba nion sah  ia '''Ctrl, Alt''' bad '''t''' keys.  
+
  
 
|-
 
|-
 
| 04:44
 
| 04:44
| Start '''Ipython''' by typing "ipython" at the prompt. Press '''Enter'''.
+
| Plie ia ka '''Ipython''' da kaba type "ipython" ha ka prompt. Nion '''Enter'''.
(Plie ia ka '''Ipython''' da kaba type "ipython" ha ka prompt. Nion '''Enter'''.
+
  
 
|-
 
|-
 
| 04:51
 
| 04:51
| Next, '''import''' "SeqIO" module from '''Bio''' package.
+
| Nangta, '''import''' "SeqIO" module na '''Bio''' package.
(Nangta, '''import''' "SeqIO" module na '''Bio''' package.
+
  
 
|-
 
|-
 
| 04:56
 
| 04:56
| At the prompt, type: '''from Bio import SeqIO'''. Press '''Enter'''.
+
| Ha ka prompt, type: '''from Bio import SeqIO'''. Nion '''Enter'''.
(Ha ka prompt, type: '''from Bio import SeqIO'''. Nion '''Enter'''.
+
  
 
|-
 
|-
 
| 05:04
 
| 05:04
| We will start with the most important function “'''parse'''”.
+
| Ngin ia sdang da ka function kaba kham donkam ka “'''parse'''”.
(Ngin ia sdang da ka function kaba kham donkam ka “'''parse'''”.
+
  
 
|-
 
|-
 
|05:07
 
|05:07
|For demonstration, I will use a '''FASTA''' file that has many '''record'''s which we had downloaded earlier from the database.
+
| Na ka bynta ka nuksa, ngan pyndonkam ka '''FASTA''' file ka ba don bun '''record'''s kaba ngi download hashwa na ka database.  
(Na ka bynta ka nuksa, ngan pyndonkam ka '''FASTA''' file ka ba don bun '''record'''s kaba ngi download shen  na ka database.  
+
  
 
|-
 
|-
 
| 05:17
 
| 05:17
| For simple '''FASTA''' parsing, type the following at the prompt.
+
| Na ba bynta ka '''FASTA''' parsing ba kham suk, type kumne harum ha ka prompt.  
(Na ba bynta ka '''FASTA''' parsing ba kham suk, type kumne harum ha ka prompt.  
+
  
 
|-
 
|-
 
| 05:22
 
| 05:22
|Here, we are using the '''parse''' function to read the contents of the '''sequence.fasta''' file.
+
| Hangne, ngi pyndonkam ia ka '''parse''' function ban read ia ki contents jong ka '''sequence.fasta''' file.
(Hangne, ngi pyndonkam ia ka '''parse''' function ban read ia ki contents jong ka '''sequence.fasta''' file.
+
  
 
|-
 
|-
 
| 05:30
 
| 05:30
|For the output, print '''record id''', sequence present in the record and also the length of the sequence.
+
| Na ka bynta ka output, print '''record id''', sequence kaba don ha ka record bad ruh ia ka jingjrong jong ka sequence.  
(Na ka bynta ka output, print '''record id''', sequence kaba don ha ka record bad ruh ia ka jingjrong jong ka sequence.  
+
  
 
|-
 
|-
 
| 05:41
 
| 05:41
|Also notice that the '''parse''' function is used to read sequence data as '''Sequence record objects'''.
+
| Peit bha ruh ba ia ka '''parse''' function la pyndonkam ban read sequence data kum '''Sequence record objects'''.
(Peit bha ruh ba ia ka '''parse''' function shait  pyndonkam ban read sequence data kum '''Sequence record objects'''.
+
  
 
|-
 
|-
 
| 05:48
 
| 05:48
|It is generally used with a '''for''' loop.
+
| La ju pyndonkam barabor bad ka '''for''' loop.  
(Shait  pyndonkam barabor bad ka '''for''' loop.  
+
  
 
|-
 
|-
 
| 05:52
 
| 05:52
|It can accept two '''arguments''', the first one is the file name to read the data.
+
| Ka pdiang ar tylli ki '''arguments''', kaba nyngkong dei ka file name ban read ia ka data.  
(Ka lah ban shim  ar tylli ki '''arguments''', kaba nyngkong dei ka file name ban read ia ka data.  
+
  
 
|-
 
|-
 
| 05:59
 
| 05:59
|The second specifies the file format.
+
| Ka ba ar ka batai shai ia ka format jong ka file.  
(Ka ba ar pat ka pyntikna  ia ka format jong ka file.  
+
  
 
|-
 
|-
 
| 06:02
 
| 06:02
|Press '''Enter''' key twice to get the output.
+
| Nion ia u '''Enter''' key ar sien ban ioh ia ka output.  
(Nion ia u '''Enter''' key ar sien ban ioh ia ka output.  
+
  
 
|-
 
|-
 
| 06:07
 
| 06:07
| The output shows the ''' identifier line, '''followed by the sequence contained in the file, also the length of the sequence for all the records in the file.
+
| Ka output ka pyni ia u ''' identifier line, nangta bud sa u sequence uba don ha ka file, bad ruh ka jingjrong jong ka sequence na ka bynta baroh ki records ha ka file.  
(Ka output ka pyni ia u ''' identifier line, nangta bud sa u sequence uba don ha ka file, bad ruh ka jingjrong jong ka sequence na ka bynta baroh ki records ha ka file.  
+
  
 
|-
 
|-
 
| 06:21
 
| 06:21
|Notice that the '''FASTA''' format does not specify the alphabet.
+
| Peit thuh ba ka '''FASTA''' format kam batai shai ia u alphabet.  
(Phin shem  ba ka '''FASTA''' format kan nym pyntikna  ia u alphabet.  
+
  
 
|-
 
|-
 
| 06:26
 
| 06:26
|So, the output does not specifies it as a '''DNA sequence'''.
+
| Te, ka output kan nym pyni ia ka kum ka '''DNA sequence'''.
(Te, ka output kan nym pyni ia ka kum ka '''DNA sequence'''.
+
  
 
|-
 
|-
 
| 06:31
 
| 06:31
| The same steps can be repeated for parsing '''GenBank''' file.
+
| Ki juh ki synjam lah ban leh biang ban parse iaka '''GenBank''' file.
(Ki juh ki rukom lah ban bud  ban parsing  iaka '''GenBank''' file.
+
  
 
|-
 
|-
 
| 06:36
 
| 06:36
|For Demonstration we will use the '''GenBank''' file which we have downloaded earlier from the database.
+
| Ban pyni nuksa ngin pyndonkam ia ka '''GenBank''' file kaba ngi lah download mynne na ka database.  
(Ban pyni nuksa ngin pyndonkam ia ka '''GenBank''' file kaba ngi lah download mynne na ka database.  
+
  
 
|-
 
|-
 
| 06:43
 
| 06:43
|Press up-arrow key to get the lines of code which we had used earlier.
+
| Nion ia u up-arrow key ban ioh ia ki lines jong ki code kiba ngi lah pyndonkam hashwa.
(Nion ia u up-arrow key ban ioh ia ki lines jong ki code kiba ngi lah pyndonkam nyngkong .
+
  
 
|-
 
|-
 
| 06:49
 
| 06:49
|Change the file name to '''sequence.gb '''.
+
| Pynkylla ia ka kyrteng jong ka file sha ka '''sequence.gb '''.
(Pynkylla ia ka kyrteng jong ka file sha ka '''sequence.gb '''.
+
  
 
|-
 
|-
 
| 06:53
 
| 06:53
|Change the file format to '''genbank.'''
+
| Pynkylla ia ka file format sha ka '''genbank.'''
(Pynkylla ia ka file format sha ka '''genbank.'''
+
  
 
|-
 
|-
 
| 06:56
 
| 06:56
|The rest of the code remains same.
+
| Ki code ba sah kin neh kumjuh.  
(Ki code ba sah kin neh kumjuh.  
+
  
 
|-
 
|-
 
| 06:58
 
| 06:58
| Press '''Enter''' key twice to get the output.
+
| Nion ia u '''Enter''' key arsien ban ioh ia ka output.  
(Nion ia u '''Enter''' key arsien ban ioh ia ka output.  
+
  
 
|-
 
|-
 
| 07:03
 
| 07:03
|Here too the output shows the '''record id''', '''sequence''' and the length of the sequence for all the records in the file.
+
| Hangne ruh ka output ka pyni ia ka '''record id''', '''sequence''' bad ka jingjrong jong ka sequence na ka bynta baroh ki records ha ka file.  
(Hangne ruh ka output ka pyni ia ka '''record id''', '''sequence''' bad ka jingjrong jong ka sequence na ka bynta baroh ki records ha ka file.  
+
  
 
|-
 
|-
 
| 07:12
 
| 07:12
|Notice that the '''GenBank''' format specifies the sequence as DNA sequence.
+
| Peit thuh ba ka '''GenBank''' format ka pyntikna ia ka sequence kum ka DNA sequence.
(Sa khmih  ba ka '''GenBank''' format ka pyntikna ia ka sequence kum ka DNA sequence.
+
  
 
|-
 
|-
 
| 07:19
 
| 07:19
| Similarly, '''Swiss-prot''' and '''EMBL''' files can be parsed using the same code as above.
+
| Kumjuh ruh, ia ki '''Swiss-prot''' bad ki file '''EMBL''' lah ban parse da kaba pyndonkam ia u juh u code kum haneng.  
(Kumjuh ruh, ia ki '''Swiss-prot''' bad '''EMBL''' files  lah ban parse da kaba pyndonkam ia u juh u code kum haneng.  
+
 
+
  
 
|-
 
|-
 
| 07:27
 
| 07:27
| If your file contains a single record then type the following lines for '''parsing'''.
+
| Lada ka file jong phi ka don uwei u record phi hap type ia ki line harum ban leh '''parsing'''.  
(Lada ka file jong phi ka don uwei u record phi hap type ia ki line harum ban leh '''parsing'''.  
+
  
 
|-
 
|-
 
| 07:34
 
| 07:34
| Here, we will use the previously saved '''FASTA''' file with a single record, that is, '''insulin.fasta '''as an example.
+
| Hangne, ngin pyndonkam iaka  '''FASTA''' file ba lah save mynne, ba tang uwei u record, uta u dei '''insulin.fasta ''' kum ka nuksa.  
(Hangne, ngin pyndonkam ia '''FASTA''' file ba lah save mynne, bad  uwei u record, uta u dei '''insulin.fasta ''' kum ka nuksa.  
+
  
 
|-
 
|-
 
| 07:43
 
| 07:43
|Notice that we have used '''read''' function instead of '''parse''' function. Press '''Enter'''.
+
| Peit thuh ba ngi lah dep pyndonkam ia ka '''read''' function ha jaka jong '''parse''' function. Nion '''Enter'''.
(Phin sa shem  ba ngi lah dep pyndonkam ia ka '''read''' function ha jaka jong '''parse''' function. Nion '''Enter'''.
+
  
 
|-
 
|-
 
| 07:50
 
| 07:50
| The output shows the contents for the file '''insulin.fasta'''.
+
| Ka output ka pyni ki contents na ka bynta ka file '''insulin.fasta'''.
(Ka output ka pyni ki contents na ka bynta ka file '''insulin.fasta'''.
+
  
 
|-
 
|-
 
| 07:55
 
| 07:55
|It shows the sequence as '''sequence record object'''.
+
| Ka pyni ia ka sequence kum '''sequence record object'''.
(Ka pyni ia ka sequence kum '''sequence record object'''.
+
  
 
|-
 
|-
 
| 07:59
 
| 07:59
|And other attributes such as '''GI, accession number '''and '''description'''.
+
| Bad kiwei ki jinglong kum '''GI, accession number '''bad '''description'''.
(Bad kiwei ki attributes  kum '''GI, accession number '''bad '''description'''.
+
  
 
|-
 
|-
 
| 08:06
 
| 08:06
| We can also view the individual attributes of this record as follows.
+
| Ngi lah ruh ban peit ia ki jinglong ba shimet jong kane ka record kumne harum.
(Ngi lah ruh ban peit ia ki individual attributes  jong kane ka record kumne harum.
+
  
 
|-
 
|-
 
| 08:11
 
| 08:11
|At the prompt, type: '''record dot seq'''. Press '''Enter'''.
+
| Ha ka prompt, type: '''record dot seq'''. Nion '''Enter'''.
(Ha ka prompt, type: '''record dot seq'''. Nion '''Enter'''.
+
  
 
|-
 
|-
 
| 08:18
 
| 08:18
| The output shows the sequence present in the file.
+
| Ka output ka pyni ia ka sequence ba don ha ka file.  
(Ka output ka pyni ia ka sequence ba don ha ka file.  
+
  
 
|-
 
|-
 
| 08:22
 
| 08:22
| To view the identifiers for this record, type: '''record dot id.''' Press '''Enter'''.
+
| Ban peit ia ki identifiers jong kane ka record, type: '''record dot id. Nion '''Enter'''.
(Ban peit ia ki identifiers jong kane ka record, type: '''record dot id. Nion '''Enter'''.
+
  
 
|-
 
|-
 
| 08:29
 
| 08:29
|The output shows the '''GI''' number and accession number etc.
+
| Ka output ka pyni ia u '''GI''' number bad accession number bad kumta ter ter.  
(Ka output ka pyni ia u '''GI''' number bad accession number bad kumta ter ter.  
+
  
 
|-
 
|-
 
| 08:34
 
| 08:34
|You can use the function described above to '''parse''' the data files of your choice.
+
| Phi lah ban pyndonkam ia u function ba lah ong haneng ban '''parse''' ia ki data files kiba phi kwah.  
(Phi lah ban pyndonkam ia u function ba lah ong haneng ban '''parse''' ia ki data files ha ka rukom kaba ngi mon.
+
  
 
|-
 
|-
 
| 08:40
 
| 08:40
| Now, let's summarize.
+
| Mynta to ngin batai lyngkot).
(To ngin ia khmih ia kiba ngi lah kdew haneng )
+
  
 
|-
 
|-
 
| 08:42
 
| 08:42
|In this tutorial, we have learnt:to download '''FASTA''' and '''GenBank''' files from '''NCBI''' database website and use '''parse''' and '''read''' functions from the '''SeqIO''' module
+
| Ha kane ka tutorial, ngi lah pule ban: download'''FASTA''' bad '''GenBank''' files na ka '''NCBI''' database website bad pyndonkam iaka  '''parse''' bad '''read''' functions na ka '''SeqIO''' module.
(Ha kane ka jingbatai  (tutorial), ngi lah nang  ban: download'''FASTA''' bad '''GenBank''' files na  
+
ka '''NCBI''' database website bad pyndonkam ia '''parse''' bad '''read''' functions na ka '''SeqIO''' module.
+
  
 
|-
 
|-
 
| 08:55
 
| 08:55
| to extract data such as '''record id'''s, description and sequences from '''FASTA''' and '''GenBank''' files.
+
| Ban sei ia ki data kum ki  '''record id'''s, description bad sequences na '''FASTA''' bad '''GenBank''' files.
(Ban sei ia ki data kum ki  '''record id'''s, description bad sequences na '''FASTA''' bad '''GenBank''' files.
+
  
 
|-
 
|-
 
| 09:03
 
| 09:03
| Now, for the assignment-
+
| Mynta, na ka bynta ka assignment-
(Mynta, na ka bynta ka assignment-
+
  
 
|-
 
|-
 
| 09:06
 
| 09:06
|Download '''FASTA''' files for nucleotide sequence of your choice from '''NCBI''' database.
+
| Download ia ki '''FASTA''' files na ka bynta ka nucleotide sequence haka jingjied jong phi na '''NCBI''' database.
(Download ia ki '''FASTA''' files na ka bynta ka nucleotide sequence kiba phi sngewiahap  na '''NCBI''' database.
+
  
 
|-
 
|-
 
| 09:13
 
| 09:13
|Convert the file of sequences to their '''reverse complement'''s.
+
| Pynkylla ia ki file jong ki sequences sha ki '''reverse complement'''s jong ki.
(Pynkylla ia ki file jong ki sequences sha '''reverse complement'''s jong ki.
+
  
 
|-
 
|-
 
| 09:17
 
| 09:17
| Your completed assignment should have the following lines of code.
+
| Ka assignment ba lah dep jong phi ka dei ban don ki lines of code kumne harum.
(Ka assignment ba lah dep jong phi ka dei ban don ki lines of code kumne harum.
+
  
 
|-
 
|-
 
| 09:22
 
| 09:22
|Use '''parse''' function to '''load''' nucleotide sequences from the '''FASTA''' file.
+
| Pyndonkam ka '''parse''' function ban '''load''' ia nucleotide sequences na ka '''FASTA''' file.
(Pyndonkam  '''parse''' function ban '''load''' ia nucleotide sequences na ka '''FASTA''' file.
+
  
 
|-
 
|-
 
| 09:28
 
| 09:28
|Next, print reverse complements using the Sequence object’s built in '''reverse complement''' method.
+
| Nangta, print ia ka reverse complements da kaba pyndonkam ia ka Sequence object’s built in '''reverse complement''' method.
(Nangta, print ia ka reverse complements da kaba pyndonkam ia ka Sequence object’s built in '''reverse complement''' method.
+
  
 
|-
 
|-
 
| 09:37
 
| 09:37
| Video at the following link summarizes the spoken-tutorial project.
+
| Ka video ha ka link harum ka batai lyngkot ia kane ka spoken-tutorial project.
(Ka video ha ka link hapoh kan pynsngewthuh shuh shuh  ia kane ka spoken-tutorial project.
+
 
 
|-
 
|-
 
| 09:42
 
| 09:42
|Please download and watch it.
+
| Sngewbha download bad peit ia ka.  
(Sngewbha download bad peit ia ka.  
+
  
 
|-
 
|-
 
| 09:44
 
| 09:44
| The Spoken Tutorial Project team conducts workshops and gives certificates to those who pass an on-line test.
+
| Ka Spoken Tutorial Project team ka ju pynlong ia ki workshops bad ai ruh ia ki certificates sha kito ba pass ia ka on-line test.
(Ka Spoken Tutorial Project team ka ju pynlong ia ki workshops bad ai ruh ia ki certificates sha kito ba pass ia ka on-line test.
+
  
 
|-
 
|-
 
| 09:51
 
| 09:51
|For more details, please write to us.
+
| Ban tip kham bniah, sngewbha thoh sha ngi.
(Na ka bynta ka jingtip ba  kham bniah, sngewbha thoh sha ngi.
+
  
 
|-
 
|-
 
| 09:55
 
| 09:55
| The Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
+
| Ia ka Spoken Tutorial Project la bei tyngka da ka NMEICT, MHRD, Sorkar India.
(Ia ka Spoken Tutorial Project la bei tyngka da ka NMEICT, MHRD, Government of India.
+
  
 
|-
 
|-
 
| 10:01
 
| 10:01
|More information on this mission is available at the link shown.
+
| Khambun ki jingtip halor kane ka mission ka don ka ha link harum.
(Ki jingtip ba kham pura ia  kane ka mission lah ban ioh na ka link harum.
+
 
+
  
 
|-
 
|-
 
| 10:06
 
| 10:06
|This is Snehalatha from '''IIT Bombay''', signing off. Thank you for joining.
+
|ïa kane ka script la pynkylla sha ka Ktien Khasi da u Yuwanki Kharlukhi na Shillong,bad ma nga u Hezekiah Lyngdoh ngan pynkut noh. khublei ba phi la ïasnoh lang.  
(Nga dei ka Snehalatha na '''IIT Bombay''', signing off. Khublei shibun )
+
 
+
 
|}
 
|}

Latest revision as of 15:18, 31 May 2018

Time
Narration
00:01 Khublei baroh. Ngi pdiang sngewbha ia phi sha kane ka tutorial halor ka Parsing Data.
00:06 Ha kane ka tutorial, ngin pule kumno ban download ia ki FASTA bad GenBank files na ka NCBI database website.
00:14 Bad ban Parseia ki data files da kaba pyndonkam ia ki functions ha ka Sequence Input/Output module.
00:19 Ban bud ia kane ka tutorial, phi dei ban tip bha ia ka undergraduate biochemistry lane bioinformatics.
00:26 bad ka basic Python programming.
00:30 Peit ia ka Python tutorials ha ka link ba la ai.
00:34 Ban record ia kane ka tutorial, nga pyndonkam da ka : * Ubuntu OS version 14.10
00:40 Python version 2.7.8
00:44 Ipython interpretor version 2.3.0
00:48 Biopython version 1.64 bad * Mozilla Firefox browser 35.0.
00:56 Ki scientific data jong ka biology ju store barabor ia ki ha ka text file kum FASTA, GenBank, EMBL, Swiss-Prot kumta ter ter.
01:07 Ia ki data files lah ban download na ka database websites.
01:12 Plie ia ka website link ba lah ai harum, da uno uno u web browser.
01:17 Ka web-page ka plie.
01:19 To ngin download iaka FASTA bad GenBank files na ka bynta ka human insulin gene.
01:25 Ha ka search box, type: "human insulin", click ha Search button.
01:31 Ka web-page ka pyni shibun tylli ki flies na ka bynta ka human insulin gene.
01:35 Ban pyni nuksa, ngan jied 4 tylli ki files kiba kyrteng “Homo sapiens Insulin mRNA”.
01:43 Ngan jied ia ki files ba duna ia ka 500 base pairs.
01:48 Click ha ka check-box ban jied ia ka file ban download.
01:56 Wanrah ia u cursor sha “Send to” option, kaba don ha kyndong khlieh ka mon jong ka page.
02:02 Click ha i selection button barit ba don u khnam ba kdew shapoh, ba don hajan ka “Send to” button.
02:09 Hapoh ka heading “Choose destination”, click ha File option.
02:13 Phi lah ban save ia kane ka file ha kano kano ka format, kiba don hapoh format drop-down list box.
02:21 Jied FASTA na ki options ba ai hapoh.
02:25 Nangta sa click ha Create file option.
02:29 Ka dialog-box kan sa mih ha ka screen.
02:32 Jied Open with, click ha OK.
02:36 Ka file ka plie ha ka text editor.
02:39 Kane ka file ka pyni 4 tylli ki records, namar ngi la jied 4 tylli ki files ban download.
02:46 U line banyngkong ha kawei pa kawei ka record u dei u identifier line.
02:50 U sdang da u “greater than >)” symbol.
02:53 Nangta la pynbud da usequence.
02:56 Save ia ka file ha homefolder jong phi kum ka“sequence.fasta'”.
03:01 Khang ia ka text editor.
03:03 Pynbud ki juh ki syn jam kum haneng ban download ia ki files ha GenBank format
03:08 na ka bynta ki files ba la jied hashwa.
03:12 Jied ia ka file format kum GenBank.
03:16 Shna ia ka file. Plie da u text editor.
03:21 Peit thuh ba ka sequence file ha GenBank format ka kham bun features ban ia ka FASTA file.
03:27 Save ia ka file kum "sequence.gb" ha ka home folder. Khang noh u text editor.
03:34 Ban peit nuksa, ngi donkam ia ka FASTA file ba don record tang iwei.
03:39 Na ka bynta kane, pynkhuid ia ki jingjied ba hashwa da kaba click biang ha ki check box.
03:48 Mynta, jied ia ka file “Human insulin gene complete cds”.
03:54 Click ha ka check-box.
03:57 Bad sa bud ia ki rukom kumba la pyni mynne ban save ia ka file ha ka home folder.
04:01 Save ia ka file kum "insulin.fasta".
04:08 Ia ki Biological data ba lah store ha kine ki files lah ban sei bad pynkylla da kaba pyndonkam da ka Biopython libraries.
04:16 Khang ia u text-editor.
04:19 Ban sei ia ki data na data files la khot Parsing.
04:23 Jan ia baroh ki file formats lah ban parsed da kaba pyndonkam functions kiba don ha SeqIO module.
04:30 Ki function jong ka SeqIO module kiba ju kham pyndonkam bha baroh ki long: parse, read, write bad convert.
04:38 Plie ia ka terminal da kaba nion lang ia Ctrl, Alt bad t keys.
04:44 Plie ia ka Ipython da kaba type "ipython" ha ka prompt. Nion Enter.
04:51 Nangta, import "SeqIO" module na Bio package.
04:56 Ha ka prompt, type: from Bio import SeqIO. Nion Enter.
05:04 Ngin ia sdang da ka function kaba kham donkam ka “parse”.
05:07 Na ka bynta ka nuksa, ngan pyndonkam ka FASTA file ka ba don bun records kaba ngi download hashwa na ka database.
05:17 Na ba bynta ka FASTA parsing ba kham suk, type kumne harum ha ka prompt.
05:22 Hangne, ngi pyndonkam ia ka parse function ban read ia ki contents jong ka sequence.fasta file.
05:30 Na ka bynta ka output, print record id, sequence kaba don ha ka record bad ruh ia ka jingjrong jong ka sequence.
05:41 Peit bha ruh ba ia ka parse function la pyndonkam ban read sequence data kum Sequence record objects.
05:48 La ju pyndonkam barabor bad ka for loop.
05:52 Ka pdiang ar tylli ki arguments, kaba nyngkong dei ka file name ban read ia ka data.
05:59 Ka ba ar ka batai shai ia ka format jong ka file.
06:02 Nion ia u Enter key ar sien ban ioh ia ka output.
06:07 Ka output ka pyni ia u identifier line, nangta bud sa u sequence uba don ha ka file, bad ruh ka jingjrong jong ka sequence na ka bynta baroh ki records ha ka file.
06:21 Peit thuh ba ka FASTA format kam batai shai ia u alphabet.
06:26 Te, ka output kan nym pyni ia ka kum ka DNA sequence.
06:31 Ki juh ki synjam lah ban leh biang ban parse iaka GenBank file.
06:36 Ban pyni nuksa ngin pyndonkam ia ka GenBank file kaba ngi lah download mynne na ka database.
06:43 Nion ia u up-arrow key ban ioh ia ki lines jong ki code kiba ngi lah pyndonkam hashwa.
06:49 Pynkylla ia ka kyrteng jong ka file sha ka sequence.gb .
06:53 Pynkylla ia ka file format sha ka genbank.
06:56 Ki code ba sah kin neh kumjuh.
06:58 Nion ia u Enter key arsien ban ioh ia ka output.
07:03 Hangne ruh ka output ka pyni ia ka record id, sequence bad ka jingjrong jong ka sequence na ka bynta baroh ki records ha ka file.
07:12 Peit thuh ba ka GenBank format ka pyntikna ia ka sequence kum ka DNA sequence.
07:19 Kumjuh ruh, ia ki Swiss-prot bad ki file EMBL lah ban parse da kaba pyndonkam ia u juh u code kum haneng.
07:27 Lada ka file jong phi ka don uwei u record phi hap type ia ki line harum ban leh parsing.
07:34 Hangne, ngin pyndonkam iaka FASTA file ba lah save mynne, ba tang uwei u record, uta u dei insulin.fasta kum ka nuksa.
07:43 Peit thuh ba ngi lah dep pyndonkam ia ka read function ha jaka jong parse function. Nion Enter.
07:50 Ka output ka pyni ki contents na ka bynta ka file insulin.fasta.
07:55 Ka pyni ia ka sequence kum sequence record object.
07:59 Bad kiwei ki jinglong kum GI, accession number bad description.
08:06 Ngi lah ruh ban peit ia ki jinglong ba shimet jong kane ka record kumne harum.
08:11 Ha ka prompt, type: record dot seq. Nion Enter.
08:18 Ka output ka pyni ia ka sequence ba don ha ka file.
08:22 Ban peit ia ki identifiers jong kane ka record, type: record dot id. Nion Enter.
08:29 Ka output ka pyni ia u GI number bad accession number bad kumta ter ter.
08:34 Phi lah ban pyndonkam ia u function ba lah ong haneng ban parse ia ki data files kiba phi kwah.
08:40 Mynta to ngin batai lyngkot).
08:42 Ha kane ka tutorial, ngi lah pule ban: downloadFASTA bad GenBank files na ka NCBI database website bad pyndonkam iaka parse bad read functions na ka SeqIO module.
08:55 Ban sei ia ki data kum ki record ids, description bad sequences na FASTA bad GenBank files.
09:03 Mynta, na ka bynta ka assignment-
09:06 Download ia ki FASTA files na ka bynta ka nucleotide sequence haka jingjied jong phi na NCBI database.
09:13 Pynkylla ia ki file jong ki sequences sha ki reverse complements jong ki.
09:17 Ka assignment ba lah dep jong phi ka dei ban don ki lines of code kumne harum.
09:22 Pyndonkam ka parse function ban load ia nucleotide sequences na ka FASTA file.
09:28 Nangta, print ia ka reverse complements da kaba pyndonkam ia ka Sequence object’s built in reverse complement method.
09:37 Ka video ha ka link harum ka batai lyngkot ia kane ka spoken-tutorial project.
09:42 Sngewbha download bad peit ia ka.
09:44 Ka Spoken Tutorial Project team ka ju pynlong ia ki workshops bad ai ruh ia ki certificates sha kito ba pass ia ka on-line test.
09:51 Ban tip kham bniah, sngewbha thoh sha ngi.
09:55 Ia ka Spoken Tutorial Project la bei tyngka da ka NMEICT, MHRD, Sorkar India.
10:01 Khambun ki jingtip halor kane ka mission ka don ka ha link harum.
10:06 ïa kane ka script la pynkylla sha ka Ktien Khasi da u Yuwanki Kharlukhi na Shillong,bad ma nga u Hezekiah Lyngdoh ngan pynkut noh. khublei ba phi la ïasnoh lang.

Contributors and Content Editors

Hezekiah2016