Python3.4.3/C3/Statistics/Englishtimed


00:01  Hello Friends. Welcome to the tutorial on "Statistics” using Python 
00:07  At the end of this tutorial, you will be able to  Do statistical operations in Python 
00:14  Sum a set of numbers and Find their mean, median and standard deviation 
00:22  To record this tutorial, I am using Ubuntu Linux 16.04 operating system 
00:29  Python 3.4.3 and IPython 5.1.0 
00:36  To practise this tutorial, you should know how to load data from files 
00:42  use Lists and access parts of Arrays 
00:47  If not, see the prerequisite Python tutorials on this website. 
00:53  For this tutorial, we will use the data file student_record.txt which we used in the earlier tutorial. 
01:03  You can also find this file in the Code Files link of this tutorial. 
01:08  Please download it in Home directory and use it. 
01:12  We will use mathematical and logical operations on this array structured file.
For this, we need to install Numpy. 
01:22  NumPy, stands for Numerical Python. 
01:26  It is a library consisting of precompiled functions for mathematical and numerical routines. 
01:33  NumPy has to be installed separately. 
01:37  Let us first open the Terminal by pressing Ctrl+Alt+T keys simultaneously. 
01:45  Let us install latest pip.
pip command is used to install python libraries. 
01:53  Type, sudo aptget install python3 hyphen pip and press Enter. 
02:03  You need to have root access for installation as it asks for admin password. 
02:15  Next, we need to install numpy library as we will be using numpy library throughout the tutorial. 
02:24  Type, sudo pip3 install numpy is equal to is equal to 1.13.3 and press Enter. 
02:38  The installation is completed successfully. We can see the terminal prompt without any error. 
02:47  Next we will learn about loadtxt() function. 
02:52  To get the data as an array, we use the loadtxt() function. 
02:58  For loadtxt() function, we need to import numpy library first. 
03:04  Switch back to the terminal.Now, type ipython3 and press Enter. 
03:12  Type import numpy as np and press Enter.
Where np is alias to numpy and it can be any name. 
03:24  Let us load the data from the file student_record.txt as an array. 
03:32  Type, L is equal to np dot loadtxt inside parentheses inside quotes student_record.txt comma usecols is equal to inside parentheses 3 comma 4 comma 5 comma 6 comma 7 comma delimiter is equal to inside quotes semicolon and Press Enter. 
04:04  Type L and press Enter. 
04:07  We get the output in the form of an array. 
04:11  loadtxt loads data from an external file. 
04:16 
Delimiter specifies the kind of character that the fields of data is separated by. usecols specifies the columns to be used. 
04:27  loadtxt, delimiter and usecols are keywords. 
04:33  So columns 3,4,5,6,7 from student_record.txt are loaded here. 
04:42  The 'comma' between column numbers is added because usecols is a sequence. 
04:49  As we can see L is an array. We can get the shape of this array using shape. 
04:58  Type, L dot shape and press Enter. 
05:04  We get a tuple giving the numbers of rows and columns respectively. 
05:11  In this example, the array L has one lakh eighty five thousand six hundred and sixty seven rows and 5 columns. 
05:22  Let us switch back to the student_record.txt file. 
05:28  Let us start applying statistical operations on these.
How do you find the sum of marks of all subjects for the first student? 
05:39  Switch back to the terminal.
To access the first row in an array, we will type L inside square brackets 0 and press Enter. 
05:54  Now to sum this, type, totalmarks is equal to sum inside parentheses L inside square brackets 0 and Press Enter. 
06:09  Type totalmarks and press Enter.
We got sum of marks of all subjects of the first student. 
06:19  Now to get the mean we can divide the totalmarks by the length of the array. 
06:26  Type, totalmarks divided by len inside parentheses L inside square brackets 0 and press Enter. 
06:40  Or simply use the function mean. Type np dot mean inside parentheses L inside square brackets 0 and press Enter. 
06:55  But we have such a large data set.
And calculating the mean for each student one by one is time consuming. 
07:04  Is there a way to reduce the work?
For this, we will look into the documentation of mean. 
07:12  Type, np dot mean questionmark and press Enter. Read the text for more information. 
07:23  Type q to exit the documentation. 
07:28  In the above example, L is a two dimensional array like matrix. 
07:35  We can calculate the mean across each of the axis of the array. 
07:41  The axis of rows is referred by 0 and columns by 1. 
07:48  To calculate mean across all columns, we have to pass extra parameter 1 for the axis. 
07:57  Switch back to the terminal. 
08:00  Let us calculate, mean of the marks scored by all the students for each subject. 
08:07  Type np dot mean inside parentheses L comma 0 and press Enter. 
08:18  Next, we will calculate the median of English marks for all the students. 
08:25  Type L inside square brackets colon comma 0 and press Enter. 
08:35  Note colon comma zero displays first column in the array that is, English Mark. 
08:45  To get the median we will simply use the function median. 
08:51  Type np dot median inside parentheses L inside square brackets colon comma 0
Press Enter. 
09:04  For all the subjects, we can calculate median across all rows using median function as shown here. 
09:13  Type np dot median inside parentheses L comma 0
Press Enter. 
09:24  Similarly to calculate standard deviation we will use the function std 
09:31  Standard deviation for English subject can be found by typing np dot std inside parentheses L inside square brackets colon comma 0. Press Enter. 
09:50  And for all rows, we do, np dot std inside parentheses L comma 0 and press Enter. 
10:03  Pause the video here, try out the following exercise and resume the video. 
10:09  Refer to the file football.txt, that is available in the Code Files link of this tutorial. 
10:18  Download and save the file in the present working directory. 
10:23  Currently the present working directory is the Home directory. 
10:28  In football.txt, the first column is player name, 
10:34  Second is goals at home and third is goals away. 
10:42  Find the total goals for each player
Mean of home and goals away 
10:50  Standard deviation of home and goals away 
10:55  Switch to the terminal. 
10:58  The solution is, first, type, L is equal to np dot loadtxt inside parentheses inside quotes football.txt comma usecols is equal to inside parentheses 1 comma 2 comma delimiter is equal to inside quotes comma. Press Enter. 
11:31  np dot sum inside parentheses L comma 1 and press Enter. 
11:39  The answer for the second, np dot mean inside parentheses L comma 0 and press Enter. 
11:50  Third, np dot std inside parentheses L comma 0 and press Enter. 
11:59  This brings us to the end of the tutorial.
In this tutorial, we have learnt to do the standard statistical operations like: sum, mean, median and standard deviation in Python. 
12:18  Here are some self assessment questions for you to solve. 
12:23  Given a two dimensional list as shown, how do you calculate the mean of each row? 
12:32  Second. Calculate the median of the given list. 
12:37  Third. There is a file with 6 columns. But we want to load text only from columns 2,3,4,5.
How do we specify that? 
12:51  And the answers,
To get the mean of each row, we just pass 1 as the second parameter to the function mean 
13:02  np.mean inside parentheses two_dimensional_list comma 1 
13:11  We use the function median to calculate the median of the list
np.median inside parentheses student_marks 
13:24  Third, To specify the particular columns of a file, we use the parameter usecols is equal to inside parentheses 2, 3, 4, 5 
13:39  Please post your timed queries in this forum. 
13:43  Please post your general queries on Python in this forum. 
13:48  FOSSEE team coordinates the TBC project. 
13:53  Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
For more details, visit this website. 
14:05  Thats it for the tutorial.
This is Trupti Kini from IIT Bombay signing off. Thank you. 