Difference between revisions of "Python-3.4.3/C2/Parsing-data/English"
Nancyvarkey (Talk | contribs) |
|||
Line 7: | Line 7: | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show Slide | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show Slide | ||
− | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Welcome to the spoken tutorial on '''Parsing data'''. | |
− | + | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Welcome to the spoken tutorial on '''Parsing | + | |
|- | |- | ||
Line 20: | Line 18: | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| In this tutorial, we will learn to- | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| In this tutorial, we will learn to- | ||
− | + | * Split a''' string '''using a''' delimiter. ''' | |
− | * | + | |
* Remove the leading, trailing and all '''whitespaces''' in a '''string''' and | * Remove the leading, trailing and all '''whitespaces''' in a '''string''' and | ||
− | * Convert between different | + | * Convert between different '''built-in datatypes''' |
− | + | ||
− | + | ||
|- | |- | ||
Line 32: | Line 27: | ||
System Specifications | System Specifications | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| To record this tutorial, I am using | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| To record this tutorial, I am using | ||
− | |||
* '''Ubuntu Linux 16.04''' operating system | * '''Ubuntu Linux 16.04''' operating system | ||
* '''Python 3.4.3 '''and | * '''Python 3.4.3 '''and | ||
* '''IPython 5.1.0''' | * '''IPython 5.1.0''' | ||
− | |||
− | |||
|- | |- | ||
Line 56: | Line 48: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| First, let us understand, what is meant by '''parsing | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| First, let us understand, what is meant by '''parsing data'''. |
* '''Parsing''' the '''data''' is reading data in text form. | * '''Parsing''' the '''data''' is reading data in text form. | ||
* It is converted into a form which can be used for computations. | * It is converted into a form which can be used for computations. | ||
− | |||
− | |||
|- | |- | ||
Line 71: | Line 61: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''split()''' | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''split() function.''' |
− | '''split()''' | + | '''split() function''' breaks up a larger '''string''' into smaller '''strings''' using a defined '''separator'''. |
− | If no argument is specified, then '''whitespace''' is used as default separator. | + | If no '''argument''' is specified, then '''whitespace''' is used as default '''separator'''. |
− | + | Syntax is''': str dot split''' inside '''parentheses argument''' | |
|- | |- | ||
Line 86: | Line 76: | ||
split() function | split() function | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| The '''split''' | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| The '''split function''' '''parses''' a '''string''' and returns an '''array''' of '''tokens'''. |
− | This is called '''string | + | This is called '''string tokenizing'''. |
|- | |- | ||
Line 97: | Line 87: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''ipython3''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''ipython3''' | ||
− | | style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Type '''ipython3 '''and press '''Enter'''. |
|- | |- | ||
| style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| '''%pylab '''and press '''Enter.''' | | style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| '''%pylab '''and press '''Enter.''' | ||
− | | style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Let us initialize the '''pylab''' | + | | style="background-color:#ffffff;border:1pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.014cm;padding-right:0.191cm;"| Let us initialize the '''pylab package'''. |
− | Type | + | Type '''percentage sign pylab '''and press''' Enter.''' |
|- | |- | ||
Line 116: | Line 106: | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| From here onwards, please remember to press the '''Enter''' key after typing every command on the '''terminal'''. | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| From here onwards, please remember to press the '''Enter''' key after typing every command on the '''terminal'''. | ||
− | Let us define a variable '''str1''' as '''string | + | |
+ | Let us define a variable '''str1''' as '''string data type.''' | ||
Line 122: | Line 113: | ||
− | We can have any number of '''whitespaces '''between '''to '''and '''Python tutorials. | + | We can have any number of '''whitespaces '''between '''to '''and '''Python tutorials'''. |
But all the '''spaces''' are treated as one space. | But all the '''spaces''' are treated as one space. | ||
Line 131: | Line 122: | ||
Highlight output | Highlight output | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we are going to '''split''' this string on '''whitespace'''. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we are going to '''split''' this '''string''' on '''whitespace'''. |
− | Type, '''str1 | + | Type, '''str1 dot split '''open and close '''parentheses.''' |
− | + | As we can see, we get a '''list''' of '''strings.''' | |
|- | |- | ||
Line 146: | Line 137: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us take another example for '''split()''' | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us take another example for '''split() function''' with '''argument'''. |
Line 153: | Line 144: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''x.split(';')''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''x.split(';')''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''x | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''x dot split'' '''inside parentheses inside single quotes '''semicolon.''' |
|- | |- | ||
Line 170: | Line 161: | ||
− | Split '''x''' using space as argument. | + | Split '''x''' using '''space''' as '''argument'''. |
− | Is it same as splitting without an argument? | + | Is it same as splitting without an '''argument'''? |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal for the solution. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the '''terminal''' for the solution. |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' b <nowiki>= </nowiki>x.split()''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' b <nowiki>= </nowiki>x.split()''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b '''is equal to '''x dot split '''open and close '''parentheses'''. |
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c <nowiki>= </nowiki>x.split(' ')''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c <nowiki>= </nowiki>x.split(' ')''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type ''' c '''is equal to '''x dot split '''inside parentheses and inside single quotes '''space'''. |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b ''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''b ''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b ''' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c ''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type,''' c ''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type''' c ''' |
|- | |- | ||
Line 204: | Line 192: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show slide: | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Show slide: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Splitting the string without argument will split the string separated by any number of spaces. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Splitting the '''string''' without '''argument''' will split the '''string''' separated by any number of '''spaces'''. |
− | And giving space as argument will split the sentence specifically on single | + | And giving '''space''' as '''argument''' will split the sentence specifically on single '''whitespace'''. |
|- | |- | ||
Line 215: | Line 203: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b= str1.split()''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b= str1.split()''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we will split this string without argument. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now, we will split this '''string''' without '''argument'''. |
− | Type | + | Type '''b''' is equal to '''str1 dot split '''open and close '''parentheses'''. |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c=str1.split(' ')''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c=str1.split(' ')''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type ''' c '''is equal to '''str1 dot split '''inside '''parentheses''' and inside single quotes '''space'''.'' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''b ''' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''c''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type''' c ''' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight the output | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight the output | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| As you can see, here '''b''' is not equal to '''c''' since '''c''' has '''whitespaces '''as entries whereas '''b''' has only words'''.''' | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| As you can see, here '''b''' is not equal to '''c''' since '''c''' has '''whitespaces '''as entries, whereas '''b''' has only words'''.''' |
Line 243: | Line 231: | ||
'''strip() function''' | '''strip() function''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''strip''' | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next we will learn about '''strip method.''' |
− | '''strip''' | + | '''strip function''' removes all leading and trailing '''whitespaces''' in a '''string'''. |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped = " Hello world " ''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped = " Hello world " ''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us define a string by typing | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Let us define a '''string''' by typing |
− | '''unstripped | + | '''unstripped '''is equal to inside double quotes '''space Hello world space''' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped.strip()''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''unstripped.strip()''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now to remove the | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now to remove the '''whitespace''', type, '''unstripped dot strip''' open and close '''parentheses.''' |
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight output | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight output | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We can see that '''strip''' removes all the ''' | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We can see that '''strip''' removes all the '''whitespaces''' in the beginning and at the end of the '''string'''. |
− | After splitting and stripping we get a list of strings with leading and trailing spaces stripped off. | + | After '''splitting''' and '''stripping''' we get a '''list''' of '''strings''' with leading and trailing '''spaces''' stripped off. |
<nowiki><<PAUSE>></nowiki> | <nowiki><<PAUSE>></nowiki> | ||
Line 273: | Line 258: | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now we shall look at converting '''strings''' into '''floats''' and '''integers'''. | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Now we shall look at converting '''strings''' into '''floats''' and '''integers'''. | ||
− | Type, '''mark | + | Type, '''mark ''underscore'' str '''is equal to inside double quotes '''1.25 ''' |
− | Note that 1.25 is a '''string''' and not a '''float''' as it is within double quotes. | + | Note that '''1.25''' is a '''string''' and not a '''float''' as it is within double quotes. |
|- | |- | ||
Line 283: | Line 268: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''mark | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''mark '''is equal to''' float '''inside '''parentheses mark underscore str'''. |
Line 290: | Line 275: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark_str)''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark_str)''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type '''inside parentheses '''mark underscore str'''. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | This tells you the '''datatype''' of '''mark_str''' i.e. '''string.''' | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark)''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type(mark)''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''type '''inside parentheses '''mark '''. |
− | This shows '''mark''' is a '''float''' | + | This shows '''mark''' is a '''float datatype.''' |
|- | |- | ||
Line 309: | Line 291: | ||
− | Now we can perform '''mathematical | + | Now we can perform '''mathematical operations''' on them. |
|- | |- | ||
Line 321: | Line 303: | ||
− | What happens if you type, '''int | + | What happens if you type, '''int''' inside parentheses inside double quotes '''1.25''' in the '''terminal'''? |
|- | |- | ||
Line 332: | Line 314: | ||
Highlight '''ValueError''' | Highlight '''ValueError''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''int | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''int '''inside parentheses inside double quotes '''1.25''' |
Line 345: | Line 327: | ||
− | Type | + | Type '''dcml underscore str '''''is equal to inside double quotes''' 1.25.''' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt = float(dcml_str)''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt = float(dcml_str)''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt '''is equal to''' float '''inside parentheses '''dcml underscore str.''' |
Line 356: | Line 338: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''flt ''' |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''number = int(flt)''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''number = int(flt)''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''number | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type, '''number '''is equal to''' int '''inside parentheses '''flt ''' |
Line 370: | Line 352: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Type '''number''' |
− | + | We got the output as '''integer'''. | |
This is how we should convert '''strings''' into '''floats''' and '''integers'''. | This is how we should convert '''strings''' into '''floats''' and '''integers'''. | ||
Line 383: | Line 365: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next, we will use a data file | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Next, we will use a data file to''' parse '''the''' data.''' |
− | Let me open the file '''student | + | Let me open the file '''student underscore record.txt''' in text editor. |
|- | |- | ||
Line 393: | Line 375: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| A file '''student | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| A file '''student underscore record.txt '''is available in the '''Code files''' link of this tutorial. |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | Please download it in your '''Home directory''' and use it. | ||
|- | |- | ||
Line 406: | Line 385: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We will first '''read''' the '''file''' line by line and '''parse''' each record in this file. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| We will first '''read''' the '''file''' line by line and '''parse''' each '''record''' in this '''file'''. |
− | It contains records of students and their marks in the '''State Secondary Board Examination'''. | + | It contains '''records''' of students and their marks in the '''State Secondary Board Examination'''. |
Line 427: | Line 406: | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Each line in the '''file''' is a set of fields separated by '''semicolons'''. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Each line in the '''file''' is a set of '''fields''' separated by '''semicolons'''. |
Line 433: | Line 412: | ||
− | The following are the fields in any given line. | + | The following are the '''fields''' in any given line. |
− | * Region Code | + | * '''Region Code''' |
− | * Roll Number | + | * '''Roll Number''' |
− | * Name | + | * '''Name''' |
* Marks of 5 subjects | * Marks of 5 subjects | ||
* Total marks | * Total marks | ||
− | |||
− | |||
|- | |- | ||
Line 463: | Line 440: | ||
− | The '''for loop''' will process the student record and split the fields of each record. | + | The '''for loop''' will process the student '''record''' and split the fields of each '''record'''. |
|- | |- | ||
Line 475: | Line 452: | ||
|- | |- | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Highlight the code for this narration. |
'''if region_code == "A": math_marks_A.append(math_mark)''' | '''if region_code == "A": math_marks_A.append(math_mark)''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Then it is appended and stored as a list in a variable '''math | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Then it is appended and stored as a '''list''' in a variable '''math underscore marks underscore A '''for region code '''A'''. |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save python file as''' marks.py''' | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save python file as''' marks.py''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save the file as''' marks.py '''in the | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Save the file as''' marks.py '''in the '''Home directory'''. |
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the '''terminal'''. |
|- | |- | ||
Line 501: | Line 478: | ||
− | Now we have all the math marks for region '''A''' in the list '''math | + | Now we have all the math marks for region '''A''' in the list '''math underscore marks underscore A'''. |
|- | |- | ||
Line 512: | Line 489: | ||
Highlight '''len(math_marks_A) ''' | Highlight '''len(math_marks_A) ''' | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Add the below lines to calculate the mean of math marks for region '''A'''. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Add the below lines to calculate the '''mean''' of math marks for region '''A'''. |
Line 526: | Line 503: | ||
|- | |- | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to terminal | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the terminal. | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| Switch to the '''terminal'''. |
|- | |- | ||
Line 537: | Line 514: | ||
− | Here the mean value for region '''A''' is calculated roughly for 1 lakh | + | Here the '''mean''' value for region '''A''' is calculated roughly for 1 lakh 80 thousand '''records'''. |
Line 559: | Line 536: | ||
# '''Tokenize''' a '''string''' | # '''Tokenize''' a '''string''' | ||
# '''Split''' a '''string''' separated by '''delimiters''' with '''split()''' '''function''' | # '''Split''' a '''string''' separated by '''delimiters''' with '''split()''' '''function''' | ||
− | |||
− | |||
|- | |- | ||
Line 567: | Line 542: | ||
Summary slide | Summary slide | ||
| style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| | ||
− | # Remove '''whitespaces''' using the '''strip() ''' | + | # Remove '''whitespaces''' using the '''strip() function.''' |
# Convert '''datatypes''' of numbers from one type to another | # Convert '''datatypes''' of numbers from one type to another | ||
# '''Parse''' input '''data''' and perform computations on it. | # '''Parse''' input '''data''' and perform computations on it. | ||
− | |||
− | |||
|- | |- | ||
Line 582: | Line 555: | ||
− | # How do you split the string “Guido;Rossum;Python" to get the words | + | # How do you split the string “Guido;Rossum;Python" to get the words? |
Line 589: | Line 562: | ||
Evaluation | Evaluation | ||
− | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| 2. What does int inside paranthesis inside double quotes 20.0 produce | + | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| 2. What does int inside paranthesis inside double quotes 20.0 produce? |
|- | |- | ||
Line 597: | Line 570: | ||
Solutions | Solutions | ||
+ | | style="background-color:#ffffff;border:0.5pt solid #000001;padding-top:0cm;padding-bottom:0cm;padding-left:0.088cm;padding-right:0.191cm;"| And the answers- | ||
− | + | # '''line.split''' inside '''parantheses''' inside single quotes '''comma''' | |
− | + | # '''int''' inside '''parantheses''' inside double quotes '''20.0''' will give an error, because converting a '''string''' directly into '''integer''' is not possible. | |
− | + | ||
− | + | ||
− | # int inside | + | |
− | + | ||
− | + | ||
|- | |- |
Latest revision as of 20:36, 6 May 2018
|
|
Show Slide | Welcome to the spoken tutorial on Parsing data. |
Show Slide
Objectives
|
In this tutorial, we will learn to-
|
Show slide
System Specifications |
To record this tutorial, I am using
|
Show Slide
Prerequisite slide |
To practice this tutorial, you should know how to use lists.
|
Show Slide
Parsing Data
|
First, let us understand, what is meant by parsing data.
|
Show Slide
split() function
|
Next we will learn about split() function.
|
Show Slide
split() function |
The split function parses a string and returns an array of tokens.
|
Press Ctrl+Alt+T keys | Let us first open the terminal by pressing Ctrl+Alt+T keys simultaneously. |
Type ipython3 | Type ipython3 and press Enter. |
%pylab and press Enter. | Let us initialize the pylab package.
|
str1 = "Welcome to Python tutorials"
|
From here onwards, please remember to press the Enter key after typing every command on the terminal.
But all the spaces are treated as one space. |
str1.split()
|
Now, we are going to split this string on whitespace.
|
Type
x = "08-26-2009;08-27-2009;08-29-2009"
|
Let us take another example for split() function with argument.
|
Type x.split(';') | Type, x dot split inside parentheses inside single quotes semicolon. |
Point to the output | We get a list of strings separated by comma. |
Show Slide
|
Pause the video.
|
Switch to the terminal | Switch to the terminal for the solution. |
Type, b = x.split() | Type b is equal to x dot split open and close parentheses. |
Type, c = x.split(' ') | Type c is equal to x dot split inside parentheses and inside single quotes space. |
Type, b | Type b |
Type, c | Type c |
Highlight the output | We can see that splitting without argument is same as giving space as argument. |
Show slide: | Splitting the string without argument will split the string separated by any number of spaces.
|
Type str1 | Let us recall the variable str1. |
Type b= str1.split() | Now, we will split this string without argument.
|
Type c=str1.split(' ') | Type c is equal to str1 dot split inside parentheses and inside single quotes space. |
Type b | Type b |
Type c | Type c |
Highlight the output | As you can see, here b is not equal to c since c has whitespaces as entries, whereas b has only words.
|
show slide
strip() function |
Next we will learn about strip method.
|
Type unstripped = " Hello world " | Let us define a string by typing
unstripped is equal to inside double quotes space Hello world space |
Type unstripped.strip() | Now to remove the whitespace, type, unstripped dot strip open and close parentheses. |
Highlight output | We can see that strip removes all the whitespaces in the beginning and at the end of the string.
After splitting and stripping we get a list of strings with leading and trailing spaces stripped off. <<PAUSE>> |
Type mark_str = "1.25" | Now we shall look at converting strings into floats and integers.
Type, mark underscore str is equal to inside double quotes 1.25
|
Type mark = float(mark_str)
|
Type, mark is equal to float inside parentheses mark underscore str.
|
Type type(mark_str) | Type type inside parentheses mark underscore str.
|
Type type(mark) | Type type inside parentheses mark .
This shows mark is a float datatype. |
Highlight the output | We can see that string is converted to float.
|
Show Slide
Exercise 2
|
Pause the video. Try this exercise and then resume the video.
|
Switch to terminal | Switch to the terminal for the solution. |
Type int("1.25")
|
Type, int inside parentheses inside double quotes 1.25
|
Type dcml_str = "1.25" | Let us see the correct solution for this.
|
Type flt = float(dcml_str) | Type flt is equal to float inside parentheses dcml underscore str.
|
Type flt | Type flt |
Type number = int(flt) | Type, number is equal to int inside parentheses flt
|
Type number
|
Type number
We got the output as integer. This is how we should convert strings into floats and integers. <<PAUSE>> |
Open the file text editor.
|
Next, we will use a data file to parse the data.
|
Show text: student_record.txt is available in the Code files link.
|
A file student underscore record.txt is available in the Code files link of this tutorial.
|
Scroll down and show the records
|
We will first read the file line by line and parse each record in this file.
It contains records of students and their marks in the State Secondary Board Examination.
|
Highlight A;015163;JOSEPH RAJ S;083;042;47;00;72;244
|
Each line in the file is a set of fields separated by semicolons.
|
Open text editor | Open a new text editor. |
Copy paste the code from text editor | Type the code as shown. |
Highlight
for line in open("student_record.txt"): fields = line.split(";") |
Let me explain this program.
|
Highlight
math_mark = float(math_mark_str)
|
The math marks are then converted to float. |
Highlight the code for this narration.
|
Then it is appended and stored as a list in a variable math underscore marks underscore A for region code A. |
Save python file as marks.py | Save the file as marks.py in the Home directory. |
Switch to terminal | Switch to the terminal. |
Type, %run marks.py | Execute the file with percentage sign run space marks.py. |
Switch to editor
|
Switch back to the editor.
|
Add in the marks.py file
math_marks_mean = sum(math_marks_A) / len(math_marks_A)
Highlight len(math_marks_A) |
Add the below lines to calculate the mean of math marks for region A.
|
Press ctrl + s | Let us save the file. |
Switch to terminal | Switch to the terminal. |
Type, %run marks.py | Execute the file again with percentage sign run space marks.py. |
Highlight output | Hence we got our final output.
|
Show Slide
Summary slide
|
This brings us to the end of this tutorial.
|
Show Slide
Summary slide |
|
Show Slide
Evaluation
|
Here are some self assessment questions for you to solve
|
Show Slide
Evaluation |
2. What does int inside paranthesis inside double quotes 20.0 produce? |
Show Slide
|
And the answers-
|
Show Slide
Forum |
Please post your timed queries in this forum. |
Show Slide
Fossee Forum |
Please post your general queries on Python in this forum. |
Show Slide
Textbook Companion |
FOSSEE team coordinates the TBC project. |
Show Slide
Acknowledgment |
Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
|
Show Slide
Thank You |
This is Priya from IIT Bombay signing off.
Thanks for watching. |