Python-3.4.3/C3/Statistics/English-timed
|
|
00:01 | Hello Friends. Welcome to the tutorial on "Statistics” using Python |
00:07 | At the end of this tutorial, you will be able to - Do statistical operations in Python |
00:14 | Sum a set of numbers and Find their mean, median and standard deviation |
00:22 | To record this tutorial, I am using Ubuntu Linux 16.04 operating system |
00:29 | Python 3.4.3 and IPython 5.1.0 |
00:36 | To practise this tutorial, you should know how to load data from files |
00:42 | use Lists and access parts of Arrays |
00:47 | If not, see the pre-requisite Python tutorials on this website. |
00:53 | For this tutorial, we will use the data file student_record.txt which we used in the earlier tutorial. |
01:03 | You can also find this file in the Code Files link of this tutorial. |
01:08 | Please download it in Home directory and use it. |
01:12 | We will use mathematical and logical operations on this array structured file.
For this, we need to install Numpy. |
01:22 | NumPy, stands for Numerical Python. |
01:26 | It is a library consisting of pre-compiled functions for mathematical and numerical routines. |
01:33 | NumPy has to be installed separately. |
01:37 | Let us first open the Terminal by pressing Ctrl+Alt+T keys simultaneously. |
01:45 | Let us install latest pip.
pip command is used to install python libraries. |
01:53 | Type, sudo apt-get install python3 hyphen pip and press Enter. |
02:03 | You need to have root access for installation as it asks for admin password. |
02:15 | Next, we need to install numpy library as we will be using numpy library throughout the tutorial. |
02:24 | Type, sudo pip3 install numpy is equal to is equal to 1.13.3 and press Enter. |
02:38 | The installation is completed successfully. We can see the terminal prompt without any error. |
02:47 | Next we will learn about loadtxt() function. |
02:52 | To get the data as an array, we use the loadtxt() function. |
02:58 | For loadtxt() function, we need to import numpy library first. |
03:04 | Switch back to the terminal.Now, type ipython3 and press Enter. |
03:12 | Type import numpy as np and press Enter.
Where np is alias to numpy and it can be any name. |
03:24 | Let us load the data from the file student_record.txt as an array. |
03:32 | Type, L is equal to np dot loadtxt inside parentheses inside quotes student_record.txt comma usecols is equal to inside parentheses 3 comma 4 comma 5 comma 6 comma 7 comma delimiter is equal to inside quotes semicolon and Press Enter. |
04:04 | Type L and press Enter. |
04:07 | We get the output in the form of an array. |
04:11 | loadtxt loads data from an external file. |
04:16 |
Delimiter specifies the kind of character that the fields of data is separated by. usecols specifies the columns to be used. |
04:27 | loadtxt, delimiter and usecols are keywords. |
04:33 | So columns 3,4,5,6,7 from student_record.txt are loaded here. |
04:42 | The 'comma' between column numbers is added because usecols is a sequence. |
04:49 | As we can see L is an array. We can get the shape of this array using shape. |
04:58 | Type, L dot shape and press Enter. |
05:04 | We get a tuple giving the numbers of rows and columns respectively. |
05:11 | In this example, the array L has one lakh eighty five thousand six hundred and sixty seven rows and 5 columns. |
05:22 | Let us switch back to the student_record.txt file. |
05:28 | Let us start applying statistical operations on these.
How do you find the sum of marks of all subjects for the first student? |
05:39 | Switch back to the terminal.
To access the first row in an array, we will type L inside square brackets 0 and press Enter. |
05:54 | Now to sum this, type, totalmarks is equal to sum inside parentheses L inside square brackets 0 and Press Enter. |
06:09 | Type totalmarks and press Enter.
We got sum of marks of all subjects of the first student. |
06:19 | Now to get the mean we can divide the totalmarks by the length of the array. |
06:26 | Type, totalmarks divided by len inside parentheses L inside square brackets 0 and press Enter. |
06:40 | Or simply use the function mean. Type np dot mean inside parentheses L inside square brackets 0 and press Enter. |
06:55 | But we have such a large data set.
And calculating the mean for each student one by one is time consuming. |
07:04 | Is there a way to reduce the work?
For this, we will look into the documentation of mean. |
07:12 | Type, np dot mean questionmark and press Enter. Read the text for more information. |
07:23 | Type q to exit the documentation. |
07:28 | In the above example, L is a two dimensional array like matrix. |
07:35 | We can calculate the mean across each of the axis of the array. |
07:41 | The axis of rows is referred by 0 and columns by 1. |
07:48 | To calculate mean across all columns, we have to pass extra parameter 1 for the axis. |
07:57 | Switch back to the terminal. |
08:00 | Let us calculate, mean of the marks scored by all the students for each subject. |
08:07 | Type np dot mean inside parentheses L comma 0 and press Enter. |
08:18 | Next, we will calculate the median of English marks for all the students. |
08:25 | Type L inside square brackets colon comma 0 and press Enter. |
08:35 | Note colon comma zero displays first column in the array that is, English Mark. |
08:45 | To get the median we will simply use the function median. |
08:51 | Type np dot median inside parentheses L inside square brackets colon comma 0
Press Enter. |
09:04 | For all the subjects, we can calculate median across all rows using median function as shown here. |
09:13 | Type np dot median inside parentheses L comma 0
Press Enter. |
09:24 | Similarly to calculate standard deviation we will use the function std |
09:31 | Standard deviation for English subject can be found by typing np dot std inside parentheses L inside square brackets colon comma 0. Press Enter. |
09:50 | And for all rows, we do, np dot std inside parentheses L comma 0 and press Enter. |
10:03 | Pause the video here, try out the following exercise and resume the video. |
10:09 | Refer to the file football.txt, that is available in the Code Files link of this tutorial. |
10:18 | Download and save the file in the present working directory. |
10:23 | Currently the present working directory is the Home directory. |
10:28 | In football.txt, the first column is player name, |
10:34 | Second is goals at home and third is goals away. |
10:42 | Find the total goals for each player
Mean of home and goals away |
10:50 | Standard deviation of home and goals away |
10:55 | Switch to the terminal. |
10:58 | The solution is, first, type, L is equal to np dot loadtxt inside parentheses inside quotes football.txt comma usecols is equal to inside parentheses 1 comma 2 comma delimiter is equal to inside quotes comma. Press Enter. |
11:31 | np dot sum inside parentheses L comma 1 and press Enter. |
11:39 | The answer for the second, np dot mean inside parentheses L comma 0 and press Enter. |
11:50 | Third, np dot std inside parentheses L comma 0 and press Enter. |
11:59 | This brings us to the end of the tutorial.
In this tutorial, we have learnt to do the standard statistical operations like: sum, mean, median and standard deviation in Python. |
12:18 | Here are some self assessment questions for you to solve. |
12:23 | Given a two dimensional list as shown, how do you calculate the mean of each row? |
12:32 | Second. Calculate the median of the given list. |
12:37 | Third. There is a file with 6 columns. But we want to load text only from columns 2,3,4,5.
How do we specify that? |
12:51 | And the answers,
To get the mean of each row, we just pass 1 as the second parameter to the function mean |
13:02 | np.mean inside parentheses two_dimensional_list comma 1 |
13:11 | We use the function median to calculate the median of the list
np.median inside parentheses student_marks |
13:24 | Third, To specify the particular columns of a file, we use the parameter usecols is equal to inside parentheses 2, 3, 4, 5 |
13:39 | Please post your timed queries in this forum. |
13:43 | Please post your general queries on Python in this forum. |
13:48 | FOSSEE team coordinates the TBC project. |
13:53 | Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
For more details, visit this website. |
14:05 | Thats it for the tutorial.
This is Trupti Kini from IIT Bombay signing off. Thank you. |