Difference between revisions of "Python-3.4.3/C2/Parsing-data/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
(One intermediate revision by one other user not shown)
Line 7: Line 7:
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show Slide  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show Slide  
  
 
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Welcome to the spoken tutorial on '''Parsing data'''.  
 
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Welcome to the spoken tutorial on '''Parsing-data'''.  
+
  
 
|-
 
|-
Line 20: Line 18:
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| In this tutorial, we will learn to-  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| In this tutorial, we will learn to-  
  
 
+
* Split a''' string '''using a''' delimiter. '''
* '''Split '''a''' string '''using''' '''a''' delimiter. '''
+
 
* Remove the leading, trailing and all '''whitespaces''' in a '''string''' and
 
* Remove the leading, trailing and all '''whitespaces''' in a '''string''' and
* Convert between different built-in '''datatypes'''  
+
* Convert between different '''built-in datatypes'''  
 
+
 
+
  
 
|-
 
|-
Line 32: Line 27:
 
System Specifications  
 
System Specifications  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| To record this tutorial, I am using  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| To record this tutorial, I am using  
 
  
 
* '''Ubuntu Linux 16.04''' operating system
 
* '''Ubuntu Linux 16.04''' operating system
 
* '''Python 3.4.3 '''and
 
* '''Python 3.4.3 '''and
 
* '''IPython 5.1.0'''
 
* '''IPython 5.1.0'''
 
 
  
 
|-
 
|-
Line 56: Line 48:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| First, let us understand, what is meant by '''parsing''' '''data'''.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| First, let us understand, what is meant by '''parsing data'''.
  
  
 
* '''Parsing''' the '''data''' is reading data in text form.
 
* '''Parsing''' the '''data''' is reading data in text form.
 
* It is converted into a form which can be used for computations.  
 
* It is converted into a form which can be used for computations.  
 
 
  
 
|-
 
|-
Line 71: Line 61:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''split()''' function.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''split() function.'''
  
  
'''split()''' function breaks up a larger string into smaller strings using a defined separator.
+
'''split() function''' breaks up a larger '''string''' into smaller '''strings''' using a defined '''separator'''.
  
  
If no argument is specified, then '''whitespace''' is used as default separator.
+
If no '''argument''' is specified, then '''whitespace''' is used as default '''separator'''.
  
  
'''Syntax '''is''': str dot split '''''inside parentheses '''''argument'''
+
Syntax is''': str dot split''' inside '''parentheses argument'''
  
 
|-
 
|-
Line 86: Line 76:
  
 
split() function
 
split() function
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| The '''split''' function parses a string and returns an array of '''tokens'''.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| The '''split function''' '''parses''' a '''string''' and returns an '''array''' of '''tokens'''.
  
  
This is called '''string''' '''tokenizing'''.
+
This is called '''string tokenizing'''.
  
 
|-
 
|-
Line 97: Line 87:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''ipython3'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''ipython3'''
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Type, '''ipython3 '''and press '''Enter'''.  
+
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Type '''ipython3 '''and press '''Enter'''.  
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| '''%pylab '''and press '''Enter.'''
 
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| '''%pylab '''and press '''Enter.'''
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Let us initialize the '''pylab''' package.
+
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Let us initialize the '''pylab package'''.
  
  
Type, '''percentage sign pylab '''and press''' Enter.'''
+
Type '''percentage sign pylab '''and press''' Enter.'''
  
 
|-
 
|-
Line 116: Line 106:
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| From here onwards, please remember to press the '''Enter''' key after typing every command on the '''terminal'''.  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| From here onwards, please remember to press the '''Enter''' key after typing every command on the '''terminal'''.  
  
Let us define a variable '''str1''' as '''string '''data type.
+
 
 +
Let us define a variable '''str1''' as '''string data type.'''
  
  
Line 122: Line 113:
  
  
We can have any number of '''whitespaces '''between '''to '''and '''Python tutorials. '''But all the '''spaces''' are treated as one space.  
+
We can have any number of '''whitespaces '''between '''to '''and '''Python tutorials'''.
 +
 
 +
But all the '''spaces''' are treated as one space.  
  
 
|-
 
|-
Line 129: Line 122:
  
 
Highlight output
 
Highlight output
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we are going to '''split''' this string on '''whitespace'''.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we are going to '''split''' this '''string''' on '''whitespace'''.  
  
  
Type, '''str1 '''''dot '''''split '''''open and close parentheses.''
+
Type, '''str1 dot split '''open and close '''parentheses.'''
  
  
''As you can see, we get a '''list''' of '''strings.'''''
+
As we can see, we get a '''list''' of '''strings.'''
  
 
|-
 
|-
Line 144: Line 137:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us take another example for '''split()''' function with argument.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us take another example for '''split() function''' with '''argument'''.
  
  
Line 151: Line 144:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''x.split(';')'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''x.split(';')'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''x '''''dot '''''split'' '''inside parentheses inside single quotes '''''semicolon.'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''x dot split'' '''inside parentheses inside single quotes '''semicolon.'''
  
 
|-
 
|-
Line 168: Line 161:
  
  
Split '''x''' using space as argument.  
+
Split '''x''' using '''space''' as '''argument'''.  
  
  
Is it same as splitting without an argument?  
+
Is it same as splitting without an '''argument'''?  
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal for the solution.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the '''terminal''' for the solution.  
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' b <nowiki>= </nowiki>x.split()'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' b <nowiki>= </nowiki>x.split()'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b '''''is equal to '''''x '''''dot '''''split '''''open and close parentheses.''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b '''is equal to '''x dot split '''open and close '''parentheses'''.
 
+
 
+
 
+
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c <nowiki>= </nowiki>x.split(' ')'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c <nowiki>= </nowiki>x.split(' ')'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c '''''is equal to '''''x '''''dot '''''split '''''open and close parentheses and inside single quotes '''space'''.''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type ''' c '''is equal to '''x dot split '''inside  parentheses and inside single quotes '''space'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b '''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b '''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b '''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c '''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c '''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type''' c '''
  
 
|-
 
|-
Line 202: Line 192:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show slide:  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show slide:  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Splitting the string without argument will split the string separated by any number of spaces.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Splitting the '''string''' without '''argument''' will split the '''string''' separated by any number of '''spaces'''.
  
  
And giving space as argument will split the sentence specifically on single whitespace'''.'''
+
And giving '''space''' as '''argument''' will split the sentence specifically on single '''whitespace'''.
  
 
|-
 
|-
Line 213: Line 203:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b= str1.split()'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b= str1.split()'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we will split this string without argument.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we will split this '''string''' without '''argument'''.  
  
  
Type, '''b'' '''is equal to '''''str1 '''''dot '''''split '''''open and close parentheses.''
+
Type '''b''' is equal to '''str1 dot split '''open and close '''parentheses'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c=str1.split(' ')'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c=str1.split(' ')'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c '''''is equal to '''''str1 '''''dot '''''split '''''open and close parentheses and inside single quotes '''space'''.''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type ''' c '''is equal to '''str1 dot split '''inside '''parentheses''' and inside single quotes '''space'''.''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b '''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type''' c '''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight the output
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight the output
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| As you can see, here '''b''' is not equal to '''c''' since '''c''' has '''whitespaces '''as entries whereas '''b''' has only words'''.'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| As you can see, here '''b''' is not equal to '''c''' since '''c''' has '''whitespaces '''as entries, whereas '''b''' has only words'''.'''
  
  
Line 241: Line 231:
  
 
'''strip() function'''
 
'''strip() function'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''strip''' method.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''strip method.'''
  
  
'''strip''' function removes all leading and trailing '''whitespaces''' in a '''string'''.
+
'''strip function''' removes all leading and trailing '''whitespaces''' in a '''string'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped = " Hello world " '''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped = " Hello world " '''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us define a string by typing
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us define a '''string''' by typing
  
'''unstripped '''''is equal to inside double quotes space '''''Hello world '''''space''
+
'''unstripped '''is equal to inside double quotes '''space Hello world space'''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped.strip()'''  
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped.strip()'''  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now to remove the whitespace,
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now to remove the '''whitespace''', type, '''unstripped dot strip''' open and close '''parentheses.'''
 
+
 
+
Type, '''unstripped '''''dot '''''strip '''''open and close parentheses.''
+
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight output
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight output
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We can see that '''strip''' removes all the '''white spaces''' in the beginning and at the end of the string.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We can see that '''strip''' removes all the '''whitespaces''' in the beginning and at the end of the '''string'''.
  
After splitting and stripping we get a list of strings with leading and trailing spaces stripped off.
+
After '''splitting''' and '''stripping''' we get a '''list''' of '''strings''' with leading and trailing '''spaces''' stripped off.
  
 
<nowiki><<PAUSE>></nowiki>
 
<nowiki><<PAUSE>></nowiki>
Line 271: Line 258:
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now we shall look at converting '''strings''' into '''floats''' and '''integers'''.
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now we shall look at converting '''strings''' into '''floats''' and '''integers'''.
  
Type, '''mark '''''underscore''''' str '''''is equal to inside double quotes''''' 1.25 '''
+
Type, '''mark ''underscore'' str '''is equal to inside double quotes '''1.25 '''
  
  
Note that 1.25 is a '''string''' and not a '''float''' as it is within double quotes.
+
Note that '''1.25''' is a '''string''' and not a '''float''' as it is within double quotes.
  
 
|-
 
|-
Line 281: Line 268:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''mark '''''is equal to''''' float '''''inside parentheses '''''mark '''''underscore '''''str'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''mark '''is equal to''' float '''inside '''parentheses mark underscore str'''.
  
  
Line 288: Line 275:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark_str)'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark_str)'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''type '''''inside parentheses '''''mark '''''underscore '''''str'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type '''inside parentheses '''mark underscore str'''.
 
+
 
+
This tells you the datatype of '''mark_str''' i.e. '''string.'''
+
 
+
  
  
 +
This tells you the '''datatype''' of '''mark_str''' i.e. '''string.'''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark)'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark)'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''type '''''inside parentheses '''''mark '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type '''inside parentheses '''mark '''.
  
This shows '''mark''' is a '''float''' datatype.
+
This shows '''mark''' is a '''float datatype.'''
  
 
|-
 
|-
Line 307: Line 291:
  
  
Now we can perform '''mathematical''' '''operations''' on them.  
+
Now we can perform '''mathematical operations''' on them.  
  
 
|-
 
|-
Line 319: Line 303:
  
  
What happens if you type, '''int''' ''inside parentheses inside double quotes'' 1.25 in the terminal?
+
What happens if you type, '''int''' inside parentheses inside double quotes '''1.25''' in the '''terminal'''?
  
 
|-
 
|-
Line 330: Line 314:
  
 
Highlight '''ValueError'''
 
Highlight '''ValueError'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''int '''''inside parentheses inside double quotes '''''1.25'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''int '''inside parentheses inside double quotes '''1.25'''
  
  
Line 343: Line 327:
  
  
Type, '''dcml '''''underscore '''''str '''''is equal to inside double quotes''''' 1.25.'''
+
Type '''dcml underscore str '''''is equal to inside double quotes''' 1.25.'''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt = float(dcml_str)'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt = float(dcml_str)'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''flt '''''is equal to''''' float '''''inside parentheses '''''dcml''' ''underscore''''' str.'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt '''is equal to''' float '''inside parentheses '''dcml underscore str.'''
  
  
Line 354: Line 338:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''flt '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt '''
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''number = int(flt)'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''number = int(flt)'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''number '''''is equal to''''' int '''''inside parentheses '''''flt '''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''number '''is equal to''' int '''inside parentheses '''flt '''
  
  
Line 368: Line 352:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''number'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''number'''
  
we got the output as '''integer'''.
+
We got the output as '''integer'''.
  
 
This is how we should convert '''strings''' into '''floats''' and '''integers'''.
 
This is how we should convert '''strings''' into '''floats''' and '''integers'''.
Line 381: Line 365:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next, we will use a data file''' '''to''' parse '''the''' data.'''
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next, we will use a data file to''' parse '''the''' data.'''
  
  
Let me open the file '''student '''''underscore''''' record.txt''' in text editor.
+
Let me open the file '''student underscore record.txt''' in text editor.
  
 
|-
 
|-
Line 391: Line 375:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| A file '''student '''''underscore''''' record.txt '''is available in the '''Code files''' link of this tutorial.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| A file '''student underscore record.txt '''is available in the '''Code files''' link of this tutorial.  
 
+
 
+
Please download it in your '''Home''' directory and use it.
+
 
+
  
  
 +
Please download it in your '''Home directory''' and use it.
  
 
|-
 
|-
Line 404: Line 385:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We will first '''read''' the '''file''' line by line and '''parse''' each record in this file.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We will first '''read''' the '''file''' line by line and '''parse''' each '''record''' in this '''file'''.
  
It contains records of students and their marks in the '''State Secondary Board Examination'''.  
+
It contains '''records''' of students and their marks in the '''State Secondary Board Examination'''.  
  
  
Line 425: Line 406:
  
  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Each line in the '''file''' is a set of fields separated by '''semicolons'''.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Each line in the '''file''' is a set of '''fields''' separated by '''semicolons'''.  
  
  
Line 431: Line 412:
  
  
The following are the fields in any given line.  
+
The following are the '''fields''' in any given line.  
  
* Region Code  
+
* '''Region Code'''
* Roll Number
+
* '''Roll Number'''
* Name
+
* '''Name'''
 
* Marks of 5 subjects
 
* Marks of 5 subjects
 
* Total marks
 
* Total marks
 
 
  
 
|-
 
|-
Line 461: Line 440:
  
  
The '''for loop''' will process the student record and split the fields of each record.
+
The '''for loop''' will process the student '''record''' and split the fields of each '''record'''.
  
 
|-
 
|-
Line 473: Line 452:
  
 
|-
 
|-
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight''' '''the code for this narration.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight the code for this narration.  
  
  
 
'''if region_code == "A": math_marks_A.append(math_mark)'''
 
'''if region_code == "A": math_marks_A.append(math_mark)'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Then it is appended and stored as a list in a variable '''math '''''underscore '''''marks '''''underscore '''''A '''for region code '''A'''.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Then it is appended and stored as a '''list''' in a variable '''math underscore marks underscore A '''for region code '''A'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save python file as''' marks.py'''
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save python file as''' marks.py'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save the file as''' marks.py '''in the home directory.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save the file as''' marks.py '''in the '''Home directory'''.
  
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the '''terminal'''.
  
 
|-
 
|-
Line 499: Line 478:
  
  
Now we have all the math marks for region '''A''' in the list '''math '''''underscore '''''marks '''''underscore '''''A'''.
+
Now we have all the math marks for region '''A''' in the list '''math underscore marks underscore A'''.
  
 
|-
 
|-
Line 510: Line 489:
  
 
Highlight '''len(math_marks_A) '''
 
Highlight '''len(math_marks_A) '''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Add the below lines to calculate the mean of math marks for region '''A'''.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Add the below lines to calculate the '''mean''' of math marks for region '''A'''.
  
  
Line 524: Line 503:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the '''terminal'''.
  
 
|-
 
|-
Line 532: Line 511:
 
|-
 
|-
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight output
 
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight output
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Hence we get our final '''output'''.  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Hence we got our final '''output'''.  
  
  
Here the mean value for region '''A''' is calculated roughly for 1 lakh 67 thousand records.
+
Here the '''mean''' value for region '''A''' is calculated roughly for 1 lakh 80 thousand '''records'''.
  
  
Line 557: Line 536:
 
# '''Tokenize''' a '''string'''  
 
# '''Tokenize''' a '''string'''  
 
# '''Split''' a '''string''' separated by '''delimiters''' with '''split()''' '''function'''
 
# '''Split''' a '''string''' separated by '''delimiters''' with '''split()''' '''function'''
 
 
  
 
|-
 
|-
Line 564: Line 541:
  
 
Summary slide  
 
Summary slide  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| # Remove '''whitespaces''' using the '''strip() '''function.
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"|  
 +
# Remove '''whitespaces''' using the '''strip() function.'''
 
# Convert '''datatypes''' of numbers from one type to another
 
# Convert '''datatypes''' of numbers from one type to another
 
# '''Parse''' input '''data''' and perform computations on it.  
 
# '''Parse''' input '''data''' and perform computations on it.  
 
 
  
 
|-
 
|-
Line 574: Line 550:
  
 
Evaluation  
 
Evaluation  
 
  
  
Line 580: Line 555:
  
  
# How do you split the string “Guido;Rossum;Python" to get the words.
+
# How do you split the string “Guido;Rossum;Python" to get the words?
 
+
  
  
Line 588: Line 562:
  
 
Evaluation  
 
Evaluation  
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| 2. What does int("20.0") produce  
+
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| 2. What does int inside paranthesis inside double quotes 20.0 produce?
  
 
|-
 
|-
Line 596: Line 570:
 
Solutions  
 
Solutions  
  
 +
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| And the answers-
  
 
+
# '''line.split''' inside '''parantheses''' inside single quotes '''comma'''
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| And the answers,''' '''
+
# '''int''' inside '''parantheses''' inside double quotes '''20.0''' will give an error, because converting a '''string''' directly into '''integer''' is not possible.  
 
+
# line.split(',')
+
# int("20.0") will give an error, because converting a string directly into integer is not possible.  
+
 
+
 
+
  
 
|-
 
|-

Latest revision as of 20:36, 6 May 2018

Visual Cue
Narration
Show Slide Welcome to the spoken tutorial on Parsing data.
Show Slide

Objectives


In this tutorial, we will learn to-
  • Split a string using a delimiter.
  • Remove the leading, trailing and all whitespaces in a string and
  • Convert between different built-in datatypes
Show slide

System Specifications

To record this tutorial, I am using
  • Ubuntu Linux 16.04 operating system
  • Python 3.4.3 and
  • IPython 5.1.0
Show Slide

Prerequisite slide

To practice this tutorial, you should know how to use lists.


If not, see the relevant Python tutorials on this website.

Show Slide

Parsing Data


First, let us understand, what is meant by parsing data.


  • Parsing the data is reading data in text form.
  • It is converted into a form which can be used for computations.
Show Slide

split() function


Next we will learn about split() function.


split() function breaks up a larger string into smaller strings using a defined separator.


If no argument is specified, then whitespace is used as default separator.


Syntax is: str dot split inside parentheses argument

Show Slide

split() function

The split function parses a string and returns an array of tokens.


This is called string tokenizing.

Press Ctrl+Alt+T keys Let us first open the terminal by pressing Ctrl+Alt+T keys simultaneously.
Type ipython3 Type ipython3 and press Enter.
%pylab and press Enter. Let us initialize the pylab package.


Type percentage sign pylab and press Enter.

str1 = "Welcome to Python tutorials"


Highlight whitespaces


From here onwards, please remember to press the Enter key after typing every command on the terminal.


Let us define a variable str1 as string data type.


Type, str1 is equal to inside double quotes Welcome to insert some whitespaces, then Python tutorials


We can have any number of whitespaces between to and Python tutorials.

But all the spaces are treated as one space.

str1.split()


Highlight output

Now, we are going to split this string on whitespace.


Type, str1 dot split open and close parentheses.


As we can see, we get a list of strings.

Type

x = "08-26-2009;08-27-2009;08-29-2009"


Let us take another example for split() function with argument.


Type as shown.

Type x.split(';') Type, x dot split inside parentheses inside single quotes semicolon.
Point to the output We get a list of strings separated by comma.
Show Slide


Exercise 1

Pause the video.


Try this exercise and then resume the video.


Split x using space as argument.


Is it same as splitting without an argument?

Switch to the terminal Switch to the terminal for the solution.
Type, b = x.split() Type b is equal to x dot split open and close parentheses.
Type, c = x.split(' ') Type c is equal to x dot split inside parentheses and inside single quotes space.
Type, b Type b
Type, c Type c
Highlight the output We can see that splitting without argument is same as giving space as argument.
Show slide: Splitting the string without argument will split the string separated by any number of spaces.


And giving space as argument will split the sentence specifically on single whitespace.

Type str1 Let us recall the variable str1.
Type b= str1.split() Now, we will split this string without argument.


Type b is equal to str1 dot split open and close parentheses.

Type c=str1.split(' ') Type c is equal to str1 dot split inside parentheses and inside single quotes space.
Type b Type b
Type c Type c
Highlight the output As you can see, here b is not equal to c since c has whitespaces as entries, whereas b has only words.


<<PAUSE>>

show slide

strip() function

Next we will learn about strip method.


strip function removes all leading and trailing whitespaces in a string.

Type unstripped = " Hello world " Let us define a string by typing

unstripped is equal to inside double quotes space Hello world space

Type unstripped.strip() Now to remove the whitespace, type, unstripped dot strip open and close parentheses.
Highlight output We can see that strip removes all the whitespaces in the beginning and at the end of the string.

After splitting and stripping we get a list of strings with leading and trailing spaces stripped off.

<<PAUSE>>

Type mark_str = "1.25" Now we shall look at converting strings into floats and integers.

Type, mark underscore str is equal to inside double quotes 1.25


Note that 1.25 is a string and not a float as it is within double quotes.

Type mark = float(mark_str)


Type, mark is equal to float inside parentheses mark underscore str.


Here we are converting string to float.

Type type(mark_str) Type type inside parentheses mark underscore str.


This tells you the datatype of mark_str i.e. string.

Type type(mark) Type type inside parentheses mark .

This shows mark is a float datatype.

Highlight the output We can see that string is converted to float.


Now we can perform mathematical operations on them.

Show Slide

Exercise 2


Pause the video. Try this exercise and then resume the video.


What happens if you type, int inside parentheses inside double quotes 1.25 in the terminal?

Switch to terminal Switch to the terminal for the solution.
Type int("1.25")


Highlight ValueError

Type, int inside parentheses inside double quotes 1.25


We can see a ValueError.


We cannot convert a string to integer directly.

Type dcml_str = "1.25" Let us see the correct solution for this.


Type dcml underscore str is equal to inside double quotes 1.25.

Type flt = float(dcml_str) Type flt is equal to float inside parentheses dcml underscore str.


Here we are converting the string into float as we cannot directly convert it into integer.

Type flt Type flt
Type number = int(flt) Type, number is equal to int inside parentheses flt


We are now converting float into integer.

Type number


Type number

We got the output as integer.

This is how we should convert strings into floats and integers.

<<PAUSE>>

Open the file text editor.


Next, we will use a data file to parse the data.


Let me open the file student underscore record.txt in text editor.

Show text: student_record.txt is available in the Code files link.


A file student underscore record.txt is available in the Code files link of this tutorial.


Please download it in your Home directory and use it.

Scroll down and show the records


We will first read the file line by line and parse each record in this file.

It contains records of students and their marks in the State Secondary Board Examination.


It has 1 lakh 80 thousand lines of record.


We are going to read it and process this data.

Highlight A;015163;JOSEPH RAJ S;083;042;47;00;72;244


  • Highlight 'A'
  • Highlight 015163
  • Highlight JOSEPH RAJ S
  • Highlight 083;042;47;00;72
  • Highlight 24


Each line in the file is a set of fields separated by semicolons.


Consider a sample record from this file.


The following are the fields in any given line.

  • Region Code
  • Roll Number
  • Name
  • Marks of 5 subjects
  • Total marks
Open text editor Open a new text editor.
Copy paste the code from text editor Type the code as shown.
Highlight

for line in open("student_record.txt"):

fields = line.split(";")

Let me explain this program.


We have already learnt for loop in earlier tutorial.


The for loop will process the student record and split the fields of each record.

Highlight

math_mark = float(math_mark_str)


The math marks are then converted to float.
Highlight the code for this narration.


if region_code == "A": math_marks_A.append(math_mark)

Then it is appended and stored as a list in a variable math underscore marks underscore A for region code A.
Save python file as marks.py Save the file as marks.py in the Home directory.
Switch to terminal Switch to the terminal.
Type, %run marks.py Execute the file with percentage sign run space marks.py.
Switch to editor


Highlight math_marks_A

Switch back to the editor.


Now we have all the math marks for region A in the list math underscore marks underscore A.

Add in the marks.py file

math_marks_mean = sum(math_marks_A) / len(math_marks_A)


print (math_marks_mean)

Highlight len(math_marks_A)

Add the below lines to calculate the mean of math marks for region A.


For this, we just have to sum the math marks and divide by the length.


Note that the length will give the number of students in region ‘A’.

Press ctrl + s Let us save the file.
Switch to terminal Switch to the terminal.
Type, %run marks.py Execute the file again with percentage sign run space marks.py.
Highlight output Hence we got our final output.


Here the mean value for region A is calculated roughly for 1 lakh 80 thousand records.


This is how we split and read a huge data and perform computations on it.


<<PAUSE>>

Show Slide

Summary slide


This brings us to the end of this tutorial.


In this tutorial, we learnt to,

  1. Tokenize a string
  2. Split a string separated by delimiters with split() function
Show Slide

Summary slide

  1. Remove whitespaces using the strip() function.
  2. Convert datatypes of numbers from one type to another
  3. Parse input data and perform computations on it.
Show Slide

Evaluation


Here are some self assessment questions for you to solve


  1. How do you split the string “Guido;Rossum;Python" to get the words?


Show Slide

Evaluation

2. What does int inside paranthesis inside double quotes 20.0 produce?
Show Slide


Solutions

And the answers-
  1. line.split inside parantheses inside single quotes comma
  2. int inside parantheses inside double quotes 20.0 will give an error, because converting a string directly into integer is not possible.
Show Slide

Forum

Please post your timed queries in this forum.
Show Slide

Fossee Forum

Please post your general queries on Python in this forum.
Show Slide

Textbook Companion

FOSSEE team coordinates the TBC project.
Show Slide

Acknowledgment

http://spoken-tutorial.org

Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.


For more details, visit this website.

Show Slide

Thank You

This is Priya from IIT Bombay signing off.

Thanks for watching.

Contributors and Content Editors

Nancyvarkey, Nirmala Venkat, Priyacst