Python-3.4.3/C3/Statistics/English-timed

From Script | Spoken-Tutorial
Revision as of 15:28, 31 May 2019 by PoojaMoolya (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Time
Narration
00:01 Hello Friends. Welcome to the tutorial on "Statistics” using Python
00:07 At the end of this tutorial, you will be able to - Do statistical operations in Python
00:14 Sum a set of numbers and Find their mean, median and standard deviation
00:22 To record this tutorial, I am using Ubuntu Linux 16.04 operating system
00:29 Python 3.4.3 and IPython 5.1.0
00:36 To practise this tutorial, you should know how to load data from files
00:42 use Lists and access parts of Arrays
00:47 If not, see the pre-requisite Python tutorials on this website.
00:53 For this tutorial, we will use the data file student_record.txt which we used in the earlier tutorial.
01:03 You can also find this file in the Code Files link of this tutorial.
01:08 Please download it in Home directory and use it.
01:12 We will use mathematical and logical operations on this array structured file.

For this, we need to install Numpy.

01:22 NumPy, stands for Numerical Python.
01:26 It is a library consisting of pre-compiled functions for mathematical and numerical routines.
01:33 NumPy has to be installed separately.
01:37 Let us first open the Terminal by pressing Ctrl+Alt+T keys simultaneously.
01:45 Let us install latest pip.

pip command is used to install python libraries.

01:53 Type, sudo apt-get install python3 hyphen pip and press Enter.
02:03 You need to have root access for installation as it asks for admin password.
02:15 Next, we need to install numpy library as we will be using numpy library throughout the tutorial.
02:24 Type, sudo pip3 install numpy is equal to is equal to 1.13.3 and press Enter.
02:38 The installation is completed successfully. We can see the terminal prompt without any error.
02:47 Next we will learn about loadtxt() function.
02:52 To get the data as an array, we use the loadtxt() function.
02:58 For loadtxt() function, we need to import numpy library first.
03:04 Switch back to the terminal.Now, type ipython3 and press Enter.
03:12 Type import numpy as np and press Enter.

Where np is alias to numpy and it can be any name.

03:24 Let us load the data from the file student_record.txt as an array.
03:32 Type, L is equal to np dot loadtxt inside parentheses inside quotes student_record.txt comma usecols is equal to inside parentheses 3 comma 4 comma 5 comma 6 comma 7 comma delimiter is equal to inside quotes semicolon and Press Enter.
04:04 Type L and press Enter.
04:07 We get the output in the form of an array.
04:11 loadtxt loads data from an external file.
04:16

Delimiter specifies the kind of character that the fields of data is separated by.

usecols specifies the columns to be used.

04:27 loadtxt, delimiter and usecols are keywords.
04:33 So columns 3,4,5,6,7 from student_record.txt are loaded here.
04:42 The 'comma' between column numbers is added because usecols is a sequence.
04:49 As we can see L is an array. We can get the shape of this array using shape.
04:58 Type, L dot shape and press Enter.
05:04 We get a tuple giving the numbers of rows and columns respectively.
05:11 In this example, the array L has one lakh eighty five thousand six hundred and sixty seven rows and 5 columns.
05:22 Let us switch back to the student_record.txt file.
05:28 Let us start applying statistical operations on these.

How do you find the sum of marks of all subjects for the first student?

05:39 Switch back to the terminal.

To access the first row in an array, we will type L inside square brackets 0 and press Enter.

05:54 Now to sum this, type, totalmarks is equal to sum inside parentheses L inside square brackets 0 and Press Enter.
06:09 Type totalmarks and press Enter.

We got sum of marks of all subjects of the first student.

06:19 Now to get the mean we can divide the totalmarks by the length of the array.
06:26 Type, totalmarks divided by len inside parentheses L inside square brackets 0 and press Enter.
06:40 Or simply use the function mean. Type np dot mean inside parentheses L inside square brackets 0 and press Enter.
06:55 But we have such a large data set.

And calculating the mean for each student one by one is time consuming.

07:04 Is there a way to reduce the work?

For this, we will look into the documentation of mean.

07:12 Type, np dot mean questionmark and press Enter. Read the text for more information.
07:23 Type q to exit the documentation.
07:28 In the above example, L is a two dimensional array like matrix.
07:35 We can calculate the mean across each of the axis of the array.
07:41 The axis of rows is referred by 0 and columns by 1.
07:48 To calculate mean across all columns, we have to pass extra parameter 1 for the axis.
07:57 Switch back to the terminal.
08:00 Let us calculate, mean of the marks scored by all the students for each subject.
08:07 Type np dot mean inside parentheses L comma 0 and press Enter.
08:18 Next, we will calculate the median of English marks for all the students.
08:25 Type L inside square brackets colon comma 0 and press Enter.
08:35 Note colon comma zero displays first column in the array that is, English Mark.
08:45 To get the median we will simply use the function median.
08:51 Type np dot median inside parentheses L inside square brackets colon comma 0

Press Enter.

09:04 For all the subjects, we can calculate median across all rows using median function as shown here.
09:13 Type np dot median inside parentheses L comma 0

Press Enter.

09:24 Similarly to calculate standard deviation we will use the function std
09:31 Standard deviation for English subject can be found by typing np dot std inside parentheses L inside square brackets colon comma 0. Press Enter.
09:50 And for all rows, we do, np dot std inside parentheses L comma 0 and press Enter.
10:03 Pause the video here, try out the following exercise and resume the video.
10:09 Refer to the file football.txt, that is available in the Code Files link of this tutorial.
10:18 Download and save the file in the present working directory.
10:23 Currently the present working directory is the Home directory.
10:28 In football.txt, the first column is player name,
10:34 Second is goals at home and third is goals away.
10:42 Find the total goals for each player

Mean of home and goals away

10:50 Standard deviation of home and goals away
10:55 Switch to the terminal.
10:58 The solution is, first, type, L is equal to np dot loadtxt inside parentheses inside quotes football.txt comma usecols is equal to inside parentheses 1 comma 2 comma delimiter is equal to inside quotes comma. Press Enter.
11:31 np dot sum inside parentheses L comma 1 and press Enter.
11:39 The answer for the second, np dot mean inside parentheses L comma 0 and press Enter.
11:50 Third, np dot std inside parentheses L comma 0 and press Enter.
11:59 This brings us to the end of the tutorial.

In this tutorial, we have learnt to do the standard statistical operations like: sum, mean, median and standard deviation in Python.

12:18 Here are some self assessment questions for you to solve.
12:23 Given a two dimensional list as shown, how do you calculate the mean of each row?
12:32 Second. Calculate the median of the given list.
12:37 Third. There is a file with 6 columns. But we want to load text only from columns 2,3,4,5.

How do we specify that?

12:51 And the answers,

To get the mean of each row, we just pass 1 as the second parameter to the function mean

13:02 np.mean inside parentheses two_dimensional_list comma 1
13:11 We use the function median to calculate the median of the list

np.median inside parentheses student_marks

13:24 Third, To specify the particular columns of a file, we use the parameter usecols is equal to inside parentheses 2, 3, 4, 5
13:39 Please post your timed queries in this forum.
13:43 Please post your general queries on Python in this forum.
13:48 FOSSEE team coordinates the TBC project.
13:53 Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.

For more details, visit this website.

14:05 Thats it for the tutorial.

This is Trupti Kini from IIT Bombay signing off. Thank you.

Contributors and Content Editors

PoojaMoolya