Python/C3/Statistics/English-timed
From Script | Spoken-Tutorial
Time | Narration |
00:00 | Hello friends and welcome to the tutorial on 'Statistics' using Python. |
00:06 | At the end of this tutorial,you will be able to,
Do statistical operations in Python Sum a set of numbers Find their mean,median and standard deviation |
00:17 | Before beginning this tutorial,we would suggest you to complete the tutorial on |
00:21 | "Loading Data from files" "Getting started with Lists" and "Accessing Pieces of Arrays". |
00:29 | Now, type in terminal ipython space hyphen pylab. |
00:38 | For this tutorial, we will use data file that is at the path slash home slash fossee slash sslc2 dot txt. |
00:47 | It contains record of students and their performance in one of the State Secondary Board Examination. |
00:53 | It has 180,000 lines of record. |
00:57 | We are going to read it and process this data. |
01:02 | We can see the content of the file by double clicking on it. |
01:06 | It might take some time to open since it is quite a large file. |
01:11 | Please don't edit the data since it has a particular structure. |
01:15 | To check the contents of the file, we use the cat command. |
01:18 | So type cat space slash home slash fossee slash sslc2 dot txt and Hit enter. |
01:31 | Each line in the file is a set of 11 fields separated by semi-colons. |
01:38 | Consider a sample line from this file. |
01:43 | A semicolon 015163 semicolon JOSEPH RAJ S semicolon 083 semicolon 042 semicolon 47 semicolon 00 semicolon 72 semicolon 244 and three semicolons in a row. |
02:11 | The following are the fields in any given line. |
02:16 | Region Code which is 'A' * Roll Number 015163 * Name JOSEPH RAJ S * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **
Science 35 **Social Science 72 and Total marks 244 |
02:42 | Lets load this data as an array and then run various functions on it. |
02:48 | To get the data as an array, we use the loadtxt command |
02:53 | So type on the terminal L is equal to loadtxt within brackets , single quotes slash home slash fossee slash sslc2 dot txt comma usecols is equal to within brackets 3,4,5,6,7 comma delimiter is equal to within single quotes semicolon) and hit Enter. |
03:45 | We get our output in the form of an array dot loadtxt function. |
03:57 | Now we got an error. We have to type loadtxt before the brackets. |
04:09 | Delimiter specifies the kind of character, that the fields of data separated by usecols specifies the columns to be used. |
04:19 | So within brackets 3,4,5,6,7 loads those columns. |
04:26 | The 'comma' is added because usecols is a sequence. |
04:31 | As we can see L is an array. |
04:35 | We can get the shape of this array using in the terminal we can type L dot shape and hit Enter. |
04:43 | We get a tuple stating the numbers of rows and columns respectively. |
04:50 | Lets start applying statistical operations on these. |
04:55 | We will start with the most basic, summing. |
04:59 | How do you find the sum of marks of all subjects for the first student. |
05:04 | As we know from our knowledge of accessing pieces of arrays, to access the first row, we will do in terminal type L square brackets 0 comma colon. |
05:19 | Now to sum this we can say total marks is equal to sum within brackets L within square brackets 0 comma colon and Hit Enter.Then total marks. Then again Enter. |
05:47 | Now to get the mean we can divide the total marks by the length. |
05:52 | So type total marks slash len within brackets L in square brackets 0 comma colon. |
06:10 | Or simply use the function mean. |
06:13 | For that type mean within brackets L and in square brackets 0 comma colon and hit Enter. |
06:31 | But we have such a large data set and calculating the mean of each student one by one is impossible. |
06:38 | Is there a way to reduce the work. |
06:40 | For this we will look into the documentation of mean |
06:42 | So for that type mean question mark in the terminal. |
06:49 | As we know L is a two dimensional array. |
06:52 | We can calculate the mean across each of the axis of the array. |
06:57 | The axis of rows is referred by number 0 and columns by 1. |
07:02 | So to calculate mean across all columns, we will pass extra parameter 1 for the axis. |
07:07 | So type mean within brackets L comma 1 and hit Enter. |
07:17 | L here, is a two dimensional array. |
07:20 | Similarly to calculate average marks scored by all the students for each subject can be calculated using mean within brackets L comma 0. |
07:36 | Next, let us calculate the median of English marks for the all the students. |
07:41 | We can access English marks of all students using L in square brackets colon comma zero and hit Enter. |
07:53 | To get the median we will simply use the function median. |
07:57 | So type median within brackets L square brackets colon comma 0 . |
08:17 | For all the subjects we can use the same syntax as mean and calculate median across all rows using median |
08:25 | So type median in brackets L comma 0 and hit Enter. |
08:35 | Similarly to calculate standard deviation for English we will use the function std |
08:41 | So type std, in brackets L and in square brackets colon comma 0 and hit Enter |
08:57 | and for all rows, we do std within brackets L comma 0. |
09:08 | Pause the video here, try out the following exercise and resume the video. |
09:13 | In the given file football dot txt at path slash home slash fossee slash football dot txt , one column is player name,second is goals at home and third goals away. |
09:28 | Find the total goals for each player |
09:33 | Mean of home and away goals |
09:37 | Standard deviation of home and away goals |
09:46 | This is the required data. |
09:49 | For that open the football dot txt file. |
09:54 | The solution is on your screen. |
10:00 | This brings us to the end of the tutorial. |
10:03 | In this tutorial,we have learnt to, |
10:07 | Do the standard statistical operations sum , mean median and standard deviation in Python. |
10:14 | Combine text loading and the statistical operation to solve real world problems. |
10:24 | Here are some self assessment questions for you to solve |
10:27 | Given a two dimensional list, two_dimensional_list is equal to within square brackets [3,5,8,2,1],within another square brackets [4,3,6,2,1] how do we calculate the mean of each row? |
10:49 | Calculate the median of the given list? student_marks is equal to within square brackets 74,78,56,87,91,82 |
11:03 | And the third question is Suppose there is a file with 6 columns but we wish to load text only in columns 2,3,4,5. How do we specify that? |
11:16 | And the answers, |
11:20 | To get the mean of each row, we just pass 1 as the second parameter to the function mean. |
11:29 | So we have to type mean within brackets two_dimensional_list comma 1 |
11:37 | We use the function median to calculate the median of the list |
11:42 | by typing median within brackets student_marks. |
11:47 | And the final one To specify the particular columns of a file, we use the parameter usecols is equal to 2,3,4,5. |
12:01 | Hope you have enjoyed this tutorial and found it useful. |
12:05 | Thank you! |