Python/C3/Statistics/English-timed
From Script | Spoken-Tutorial
Timing | Narration |
---|---|
0:00 | Hello friends and welcome to the tutorial on 'Statistics' using Python. |
0:06 | At the end of this tutorial,you will be able to,
|
0:17 | Before beginning this tutorial,we would suggest you to complete the tutorial on |
0:21 | "Loading Data from files" "Getting started with Lists" and "Accessing Pieces of Arrays". |
0:29 | Now, type in terminal ipython space hyphen pylab. |
0:38 | For this tutorial, we will use data file that is at the path slash home slash fossee slash sslc2 dot txt. |
0:47 | It contains record of students and their performance in one of the State Secondary Board Examination. |
0:53 | It has 180,000 lines of record. |
0:57 | We are going to read it and process this data. |
1:02 | We can see the content of file by double clicking on it. |
1:06 | It might take some time to open since it is quite a large file. |
1:11 | Please don't edit the data since it has a particular structure. |
1:15 | To check the contents of the file, we use the cat command. |
1:18 | So type cat space slash home slash fossee slash sslc2 dot txt. Hit enter. |
1:31 | Each line in the file is a set of 11 fields separated by semi-colons. |
1:38 | Consider a sample line from this file. |
1:43 | A semicolon 015163 semicolon JOSEPH RAJ S semicolon 083 semicolon 042 semicolon 47 semicolon 00 semicolon 72 semicolon 244 and three semicolons in a row. |
2:11 | The following are the fields in any given line. |
2:16 | * Region Code which is 'A' * Roll Number 015163 * Name JOSEPH RAJ S * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **
Science 35 **Social Science 72 and Total marks 244 |
2:42 | Lets load this data as an array and then run various functions on it. |
2:48 | To get the data as an array, we use the loadtxt command |
2:53 | So type on the terminal L is equal to loadtxt within brackets , single quotes slash home slash fossee slash sslc2 dot txt comma usecols is equal to within brackets 3,4,5,6,7 comma delimiter is equal to within single quotes semicolon) and hit Enter. |
3:45 | We get our output in the form of an array dot loadtxt function. |
3:57 | Now we have an error. |
3:58 | We have to type loadtxt before the brackets. |
4:09 | Delimiter specifies the kind of character, that the fields of data separated by usecols specifies the columns to be used. |
4:19 | So within brackets 3,4,5,6,7 loads those columns. |
4:26 | The 'comma' is added because usecols is a sequence. |
4:31 | As we can see L is an array. |
4:35 | We can get the shape of this array using in the terminal we can type L dot shape and hit Enter. |
4:43 | We get a tuple stating the numbers of rows and columns respectively. |
4:50 | Lets start applying statistical operations on these. |
4:55 | We will start with the most basic, summing. |
4:59 | How do you find the sum of marks of all subjects for the first student. |
5:04 | As we know from our knowledge of accessing pieces of arrays, to access the first row, we will do in terminal type L square brackets 0 comma colon. |
5:19 | Now to sum this we can say totalmarks is equal to sum within brackets L within square brackets 0 comma colon. Hit Enter.Then totalmarks. Then again Enter. |
5:47 | Now to get the mean we can divide the totalmarks by the length. |
5:52 | So type totalmarks slash len within brackets L in square brackets 0 comma colon. |
6:10 | Or simply use the function mean. |
6:13 | For that type mean within brackets L and in square brackets 0 comma colon and hit Enter. |
6:31 | But we have such a large data set and calculating the mean for each student one by one is impossible. |
6:38 | Is there a way to reduce the work. |
6:40 | For this we will look into the documentation of mean |
6:42 | So for that type mean question mark in the terminal. |
6:49 | As we know L is a two dimensional array. |
6:52 | We can calculate the mean across each of the axis of the array. |
6:57 | The axis of rows is referred by number 0 and columns by 1. |
7:02 | So to calculate mean across all columns, we will pass extra parameter 1 for the axis. |
7:07 | So type mean within brackets L comma 1 and hit Enter. |
7:17 | L here, is a two dimensional array. |
7:20 | Similarly to calculate average marks scored by all the students for each subject can be calculated using mean within brackets L comma 0. |
7:36 | Next, let us calculate the median of English marks for the all the students. |
7:41 | We can access English marks of all students using L in square brackets colon comma zero and hit Enter. |
7:53 | To get the median we will simply use the function median. |
7:57 | So type median within brackets L square brackets colon comma 0 . |
8:17 | For all the subjects we can use the same syntax as mean and calculate median across all rows using median |
8:25 | So type median in brackets L comma 0 and hit Enter. |
8:35 | Similarly to calculate standard deviation for English we will use the function std |
8:41 | So type std, in brackets L and in square brackets colon comma 0 and hit Enter |
8:57 | and for all rows, we do std within brackets L comma 0. |
9:08 | Pause the video here, try out the following exercise and resume the video. |
9:13 | In the given file football dot txt at path slash home slash fossee slash football dot txt , one column is player name,second is goals at home and third goals away. |
9:28 | 1.Find the total goals for each player |
9:33 | 2.Mean of home and away goals |
9:37 | 3.Standard deviation of home and away goals |
9:46 | This is the required data. |
9:49 | For that open the football dot txt file. |
9:54 | The solution is on your screen. |
10:00 | This brings us to the end of the tutorial. |
10:03 | In this tutorial,we have learnt to, |
10:07 | 1. Do the standard statistical operations sum , mean median and standard deviation in Python. |
10:14 | 2. Combine text loading and the statistical operation to solve real world problems. |
10:24 | Here are some self assessment questions for you to solve |
10:27 | 1. Given a two dimensional list, two_dimensional_list is equal to within square brackets [3,5,8,2,1],within another square brackets [4,3,6,2,1] how do we calculate the mean of each row? |
10:49 | 2.Calculate the median of the given list? student_marks is equal to within square brackets 74,78,56,87,91,82 |
11:03 | And the third question is Suppose there is a file with 6 columns but we wish to load text only in columns 2,3,4,5. How do we specify that?
|
11:16 | And the answers, |
11:20 | 1. To get the mean of each row, we just pass 1 as the second parameter to the function mean. |
11:29 | So we have to type mean within brackets two_dimensional_list comma 1 |
11:37 | 2. We use the function median to calculate the median of the list |
11:42 | by typing median within brackets student_marks. |
11:47 | And the final one To specify the particular columns of a file, we use the parameter usecols is equal to 2,3,4,5. |
12:01 | Hope you have enjoyed this tutorial and found it useful. |
12:05 | Thank you! |