Python/C3/Statistics/English-timed

From Script | Spoken-Tutorial
Jump to: navigation, search
Time Narration
00:00 Hello friends and welcome to the tutorial on 'Statistics' using Python.
00:06 At the end of this tutorial,you will be able to,

Do statistical operations in Python Sum a set of numbers Find their mean,median and standard deviation

00:17 Before beginning this tutorial,we would suggest you to complete the tutorial on
00:21 "Loading Data from files" "Getting started with Lists" and "Accessing Pieces of Arrays".
00:29 Now, type in terminal ipython space hyphen pylab.
00:38 For this tutorial, we will use data file that is at the path slash home slash fossee slash sslc2 dot txt.
00:47 It contains record of students and their performance in one of the State Secondary Board Examination.
00:53 It has 180,000 lines of record.
00:57 We are going to read it and process this data.
01:02 We can see the content of the file by double clicking on it.
01:06 It might take some time to open since it is quite a large file.
01:11 Please don't edit the data since it has a particular structure.
01:15 To check the contents of the file, we use the cat command.
01:18 So type cat space slash home slash fossee slash sslc2 dot txt and Hit enter.
01:31 Each line in the file is a set of 11 fields separated by semi-colons.
01:38 Consider a sample line from this file.
01:43 A semicolon 015163 semicolon JOSEPH RAJ S semicolon 083 semicolon 042 semicolon 47 semicolon 00 semicolon 72 semicolon 244 and three semicolons in a row.
02:11 The following are the fields in any given line.
02:16 Region Code which is 'A' * Roll Number 015163 * Name JOSEPH RAJ S * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **

Science 35 **Social Science 72 and Total marks 244

02:42 Lets load this data as an array and then run various functions on it.
02:48 To get the data as an array, we use the loadtxt command
02:53 So type on the terminal L is equal to loadtxt within brackets , single quotes slash home slash fossee slash sslc2 dot txt comma usecols is equal to within brackets 3,4,5,6,7 comma delimiter is equal to within single quotes semicolon) and hit Enter.
03:45 We get our output in the form of an array dot loadtxt function.
03:57 Now we got an error. We have to type loadtxt before the brackets.
04:09 Delimiter specifies the kind of character, that the fields of data separated by usecols specifies the columns to be used.
04:19 So within brackets 3,4,5,6,7 loads those columns.
04:26 The 'comma' is added because usecols is a sequence.
04:31 As we can see L is an array.
04:35 We can get the shape of this array using in the terminal we can type L dot shape and hit Enter.
04:43 We get a tuple stating the numbers of rows and columns respectively.
04:50 Lets start applying statistical operations on these.
04:55 We will start with the most basic, summing.
04:59 How do you find the sum of marks of all subjects for the first student.
05:04 As we know from our knowledge of accessing pieces of arrays, to access the first row, we will do in terminal type L square brackets 0 comma colon.
05:19 Now to sum this we can say total marks is equal to sum within brackets L within square brackets 0 comma colon and Hit Enter.Then total marks. Then again Enter.
05:47 Now to get the mean we can divide the total marks by the length.
05:52 So type total marks slash len within brackets L in square brackets 0 comma colon.
06:10 Or simply use the function mean.
06:13 For that type mean within brackets L and in square brackets 0 comma colon and hit Enter.
06:31 But we have such a large data set and calculating the mean of each student one by one is impossible.
06:38 Is there a way to reduce the work.
06:40 For this we will look into the documentation of mean
06:42 So for that type mean question mark in the terminal.
06:49 As we know L is a two dimensional array.
06:52 We can calculate the mean across each of the axis of the array.
06:57 The axis of rows is referred by number 0 and columns by 1.
07:02 So to calculate mean across all columns, we will pass extra parameter 1 for the axis.
07:07 So type mean within brackets L comma 1 and hit Enter.
07:17 L here, is a two dimensional array.
07:20 Similarly to calculate average marks scored by all the students for each subject can be calculated using mean within brackets L comma 0.
07:36 Next, let us calculate the median of English marks for the all the students.
07:41 We can access English marks of all students using L in square brackets colon comma zero and hit Enter.
07:53 To get the median we will simply use the function median.
07:57 So type median within brackets L square brackets colon comma 0 .
08:17 For all the subjects we can use the same syntax as mean and calculate median across all rows using median
08:25 So type median in brackets L comma 0 and hit Enter.
08:35 Similarly to calculate standard deviation for English we will use the function std
08:41 So type std, in brackets L and in square brackets colon comma 0 and hit Enter
08:57 and for all rows, we do std within brackets L comma 0.
09:08 Pause the video here, try out the following exercise and resume the video.
09:13 In the given file football dot txt at path slash home slash fossee slash football dot txt , one column is player name,second is goals at home and third goals away.
09:28 Find the total goals for each player
09:33 Mean of home and away goals
09:37 Standard deviation of home and away goals
09:46 This is the required data.
09:49 For that open the football dot txt file.
09:54 The solution is on your screen.
10:00 This brings us to the end of the tutorial.
10:03 In this tutorial,we have learnt to,
10:07 Do the standard statistical operations sum , mean median and standard deviation in Python.
10:14 Combine text loading and the statistical operation to solve real world problems.
10:24 Here are some self assessment questions for you to solve
10:27 Given a two dimensional list, two_dimensional_list is equal to within square brackets [3,5,8,2,1],within another square brackets [4,3,6,2,1] how do we calculate the mean of each row?
10:49 Calculate the median of the given list? student_marks is equal to within square brackets 74,78,56,87,91,82
11:03 And the third question is Suppose there is a file with 6 columns but we wish to load text only in columns 2,3,4,5. How do we specify that?
11:16 And the answers,
11:20 To get the mean of each row, we just pass 1 as the second parameter to the function mean.
11:29 So we have to type mean within brackets two_dimensional_list comma 1
11:37 We use the function median to calculate the median of the list
11:42 by typing median within brackets student_marks.
11:47 And the final one To specify the particular columns of a file, we use the parameter usecols is equal to 2,3,4,5.
12:01 Hope you have enjoyed this tutorial and found it useful.
12:05 Thank you!

Contributors and Content Editors

Gaurav, Minal, PoojaMoolya, Sneha