Python-3.4.3/C2/Parsing-data/English
|
|
Show Slide | Welcome to the spoken tutorial on Parsing data. |
Show Slide
Objectives
|
In this tutorial, we will learn to-
|
Show slide
System Specifications |
To record this tutorial, I am using
|
Show Slide
Prerequisite slide |
To practice this tutorial, you should know how to use lists.
|
Show Slide
Parsing Data
|
First, let us understand, what is meant by parsing data.
|
Show Slide
split() function
|
Next we will learn about split() function.
|
Show Slide
split() function |
The split function parses a string and returns an array of tokens.
|
Press Ctrl+Alt+T keys | Let us first open the terminal by pressing Ctrl+Alt+T keys simultaneously. |
Type ipython3 | Type ipython3 and press Enter. |
%pylab and press Enter. | Let us initialize the pylab package.
|
str1 = "Welcome to Python tutorials"
|
From here onwards, please remember to press the Enter key after typing every command on the terminal.
But all the spaces are treated as one space. |
str1.split()
|
Now, we are going to split this string on whitespace.
|
Type
x = "08-26-2009;08-27-2009;08-29-2009"
|
Let us take another example for split() function with argument.
|
Type x.split(';') | Type, x dot split inside parentheses inside single quotes semicolon. |
Point to the output | We get a list of strings separated by comma. |
Show Slide
|
Pause the video.
|
Switch to the terminal | Switch to the terminal for the solution. |
Type, b = x.split() | Type b is equal to x dot split open and close parentheses. |
Type, c = x.split(' ') | Type c is equal to x dot split inside parentheses and inside single quotes space. |
Type, b | Type b |
Type, c | Type c |
Highlight the output | We can see that splitting without argument is same as giving space as argument. |
Show slide: | Splitting the string without argument will split the string separated by any number of spaces.
|
Type str1 | Let us recall the variable str1. |
Type b= str1.split() | Now, we will split this string without argument.
|
Type c=str1.split(' ') | Type c is equal to str1 dot split inside parentheses and inside single quotes space. |
Type b | Type b |
Type c | Type c |
Highlight the output | As you can see, here b is not equal to c since c has whitespaces as entries, whereas b has only words.
|
show slide
strip() function |
Next we will learn about strip method.
|
Type unstripped = " Hello world " | Let us define a string by typing
unstripped is equal to inside double quotes space Hello world space |
Type unstripped.strip() | Now to remove the whitespace, type, unstripped dot strip open and close parentheses. |
Highlight output | We can see that strip removes all the whitespaces in the beginning and at the end of the string.
After splitting and stripping we get a list of strings with leading and trailing spaces stripped off. <<PAUSE>> |
Type mark_str = "1.25" | Now we shall look at converting strings into floats and integers.
Type, mark underscore str is equal to inside double quotes 1.25
|
Type mark = float(mark_str)
|
Type, mark is equal to float inside parentheses mark underscore str.
|
Type type(mark_str) | Type type inside parentheses mark underscore str.
|
Type type(mark) | Type type inside parentheses mark .
This shows mark is a float datatype. |
Highlight the output | We can see that string is converted to float.
|
Show Slide
Exercise 2
|
Pause the video. Try this exercise and then resume the video.
|
Switch to terminal | Switch to the terminal for the solution. |
Type int("1.25")
|
Type, int inside parentheses inside double quotes 1.25
|
Type dcml_str = "1.25" | Let us see the correct solution for this.
|
Type flt = float(dcml_str) | Type flt is equal to float inside parentheses dcml underscore str.
|
Type flt | Type flt |
Type number = int(flt) | Type, number is equal to int inside parentheses flt
|
Type number
|
Type number
We got the output as integer. This is how we should convert strings into floats and integers. <<PAUSE>> |
Open the file text editor.
|
Next, we will use a data file to parse the data.
|
Show text: student_record.txt is available in the Code files link.
|
A file student underscore record.txt is available in the Code files link of this tutorial.
|
Scroll down and show the records
|
We will first read the file line by line and parse each record in this file.
It contains records of students and their marks in the State Secondary Board Examination.
|
Highlight A;015163;JOSEPH RAJ S;083;042;47;00;72;244
|
Each line in the file is a set of fields separated by semicolons.
|
Open text editor | Open a new text editor. |
Copy paste the code from text editor | Type the code as shown. |
Highlight
for line in open("student_record.txt"): fields = line.split(";") |
Let me explain this program.
|
Highlight
math_mark = float(math_mark_str)
|
The math marks are then converted to float. |
Highlight the code for this narration.
|
Then it is appended and stored as a list in a variable math underscore marks underscore A for region code A. |
Save python file as marks.py | Save the file as marks.py in the Home directory. |
Switch to terminal | Switch to the terminal. |
Type, %run marks.py | Execute the file with percentage sign run space marks.py. |
Switch to editor
|
Switch back to the editor.
|
Add in the marks.py file
math_marks_mean = sum(math_marks_A) / len(math_marks_A)
Highlight len(math_marks_A) |
Add the below lines to calculate the mean of math marks for region A.
|
Press ctrl + s | Let us save the file. |
Switch to terminal | Switch to the terminal. |
Type, %run marks.py | Execute the file again with percentage sign run space marks.py. |
Highlight output | Hence we got our final output.
|
Show Slide
Summary slide
|
This brings us to the end of this tutorial.
|
Show Slide
Summary slide |
|
Show Slide
Evaluation
|
Here are some self assessment questions for you to solve
|
Show Slide
Evaluation |
2. What does int inside paranthesis inside double quotes 20.0 produce? |
Show Slide
|
And the answers-
|
Show Slide
Forum |
Please post your timed queries in this forum. |
Show Slide
Fossee Forum |
Please post your general queries on Python in this forum. |
Show Slide
Textbook Companion |
FOSSEE team coordinates the TBC project. |
Show Slide
Acknowledgment |
Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
|
Show Slide
Thank You |
This is Priya from IIT Bombay signing off.
Thanks for watching. |