Difference between revisions of "Python/C3/Statistics/English-timed"
From Script | Spoken-Tutorial
PoojaMoolya (Talk | contribs) |
|||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{| border=1 | {| border=1 | ||
− | + | |'''Time''' | |
− | + | |'''Narration''' | |
+ | |||
|- | |- | ||
− | | | + | | 00:00 |
| Hello friends and welcome to the tutorial on 'Statistics' using Python. | | Hello friends and welcome to the tutorial on 'Statistics' using Python. | ||
|- | |- | ||
− | | | + | | 00:06 |
| At the end of this tutorial,you will be able to, | | At the end of this tutorial,you will be able to, | ||
− | + | Do statistical operations in Python | |
− | + | Sum a set of numbers | |
− | + | Find their mean,median and standard deviation | |
− | + | ||
− | + | ||
− | + | ||
|- | |- | ||
− | | | + | | 00:17 |
| Before beginning this tutorial,we would suggest you to complete the tutorial on | | Before beginning this tutorial,we would suggest you to complete the tutorial on | ||
|- | |- | ||
− | | | + | |00:21 |
|"Loading Data from files" "Getting started with Lists" and "Accessing Pieces of Arrays". | |"Loading Data from files" "Getting started with Lists" and "Accessing Pieces of Arrays". | ||
|- | |- | ||
− | | | + | |00:29 |
| Now, type in terminal ipython space hyphen pylab. | | Now, type in terminal ipython space hyphen pylab. | ||
|- | |- | ||
− | | | + | |00:38 |
| For this tutorial, we will use data file that is at the path slash home slash fossee slash sslc2 dot txt. | | For this tutorial, we will use data file that is at the path slash home slash fossee slash sslc2 dot txt. | ||
|- | |- | ||
− | | | + | |00:47 |
|It contains record of students and their performance in one of the State Secondary Board Examination. | |It contains record of students and their performance in one of the State Secondary Board Examination. | ||
|- | |- | ||
− | | | + | |00:53 |
|It has 180,000 lines of record. | |It has 180,000 lines of record. | ||
|- | |- | ||
− | | | + | |00:57 |
|We are going to read it and process this data. | |We are going to read it and process this data. | ||
|- | |- | ||
− | | | + | |01:02 |
| We can see the content of the file by double clicking on it. | | We can see the content of the file by double clicking on it. | ||
|- | |- | ||
− | | | + | |01:06 |
|It might take some time to open since it is quite a large file. | |It might take some time to open since it is quite a large file. | ||
|- | |- | ||
− | | | + | |01:11 |
| Please don't edit the data since it has a particular structure. | | Please don't edit the data since it has a particular structure. | ||
|- | |- | ||
− | | | + | | 01:15 |
| To check the contents of the file, we use the cat command. | | To check the contents of the file, we use the cat command. | ||
|- | |- | ||
− | | | + | |01:18 |
| So type cat space slash home slash fossee slash sslc2 dot txt and Hit enter. | | So type cat space slash home slash fossee slash sslc2 dot txt and Hit enter. | ||
|- | |- | ||
− | | | + | | 01:31 |
| Each line in the file is a set of 11 fields separated by semi-colons. | | Each line in the file is a set of 11 fields separated by semi-colons. | ||
|- | |- | ||
− | | | + | |01:38 |
| Consider a sample line from this file. | | Consider a sample line from this file. | ||
|- | |- | ||
− | | | + | |01:43 |
| A semicolon 015163 semicolon JOSEPH RAJ S semicolon 083 semicolon 042 semicolon 47 semicolon 00 semicolon 72 semicolon 244 and three semicolons in a row. | | A semicolon 015163 semicolon JOSEPH RAJ S semicolon 083 semicolon 042 semicolon 47 semicolon 00 semicolon 72 semicolon 244 and three semicolons in a row. | ||
|- | |- | ||
− | | | + | |02:11 |
|The following are the fields in any given line. | |The following are the fields in any given line. | ||
|- | |- | ||
− | | | + | |02:16 |
− | | | + | | Region Code which is 'A' * Roll Number 015163 * Name JOSEPH RAJ S * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 ** |
Science 35 **Social Science 72 and Total marks 244 | Science 35 **Social Science 72 and Total marks 244 | ||
|- | |- | ||
− | | | + | | 02:42 |
| Lets load this data as an array and then run various functions on it. | | Lets load this data as an array and then run various functions on it. | ||
|- | |- | ||
− | | | + | |02:48 |
|To get the data as an array, we use the loadtxt command | |To get the data as an array, we use the loadtxt command | ||
|- | |- | ||
− | | | + | |02:53 |
|So type on the terminal L is equal to loadtxt within brackets , single quotes slash home slash fossee slash sslc2 dot txt comma usecols is equal to within brackets 3,4,5,6,7 comma delimiter is equal to within single quotes semicolon) and hit Enter. | |So type on the terminal L is equal to loadtxt within brackets , single quotes slash home slash fossee slash sslc2 dot txt comma usecols is equal to within brackets 3,4,5,6,7 comma delimiter is equal to within single quotes semicolon) and hit Enter. | ||
|- | |- | ||
− | | | + | | 03:45 |
| We get our output in the form of an array dot loadtxt function. | | We get our output in the form of an array dot loadtxt function. | ||
|- | |- | ||
− | | | + | |03:57 |
− | |Now we got an error. | + | |Now we got an error. We have to type loadtxt before the brackets. |
|- | |- | ||
− | | | + | |04:09 |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
| Delimiter specifies the kind of character, that the fields of data separated by usecols specifies the columns to be used. | | Delimiter specifies the kind of character, that the fields of data separated by usecols specifies the columns to be used. | ||
|- | |- | ||
− | | | + | |04:19 |
| So within brackets 3,4,5,6,7 loads those columns. | | So within brackets 3,4,5,6,7 loads those columns. | ||
|- | |- | ||
− | | | + | |04:26 |
|The 'comma' is added because usecols is a sequence. | |The 'comma' is added because usecols is a sequence. | ||
|- | |- | ||
− | | | + | |04:31 |
|As we can see L is an array. | |As we can see L is an array. | ||
|- | |- | ||
− | | | + | |04:35 |
| We can get the shape of this array using in the terminal we can type L dot shape and hit Enter. | | We can get the shape of this array using in the terminal we can type L dot shape and hit Enter. | ||
|- | |- | ||
− | | | + | | 04:43 |
| We get a tuple stating the numbers of rows and columns respectively. | | We get a tuple stating the numbers of rows and columns respectively. | ||
|- | |- | ||
− | | | + | |04:50 |
|Lets start applying statistical operations on these. | |Lets start applying statistical operations on these. | ||
|- | |- | ||
− | | | + | |04:55 |
|We will start with the most basic, summing. | |We will start with the most basic, summing. | ||
|- | |- | ||
− | | | + | |04:59 |
| How do you find the sum of marks of all subjects for the first student. | | How do you find the sum of marks of all subjects for the first student. | ||
|- | |- | ||
− | | | + | | 05:04 |
| As we know from our knowledge of accessing pieces of arrays, to access the first row, we will do in terminal type L square brackets 0 comma colon. | | As we know from our knowledge of accessing pieces of arrays, to access the first row, we will do in terminal type L square brackets 0 comma colon. | ||
|- | |- | ||
− | | | + | | 05:19 |
| Now to sum this we can say total marks is equal to sum within brackets L within square brackets 0 comma colon and Hit Enter.Then total marks. Then again Enter. | | Now to sum this we can say total marks is equal to sum within brackets L within square brackets 0 comma colon and Hit Enter.Then total marks. Then again Enter. | ||
|- | |- | ||
− | | | + | | 05:47 |
| Now to get the mean we can divide the total marks by the length. | | Now to get the mean we can divide the total marks by the length. | ||
|- | |- | ||
− | | | + | |05:52 |
|So type total marks slash len within brackets L in square brackets 0 comma colon. | |So type total marks slash len within brackets L in square brackets 0 comma colon. | ||
|- | |- | ||
− | | | + | |06:10 |
|Or simply use the function mean. | |Or simply use the function mean. | ||
|- | |- | ||
− | | | + | |06:13 |
|For that type mean within brackets L and in square brackets 0 comma colon and hit Enter. | |For that type mean within brackets L and in square brackets 0 comma colon and hit Enter. | ||
|- | |- | ||
− | | | + | |06:31 |
| But we have such a large data set and calculating the mean of each student one by one is impossible. | | But we have such a large data set and calculating the mean of each student one by one is impossible. | ||
|- | |- | ||
− | | | + | |06:38 |
| Is there a way to reduce the work. | | Is there a way to reduce the work. | ||
|- | |- | ||
− | | | + | |06:40 |
|For this we will look into the documentation of mean | |For this we will look into the documentation of mean | ||
|- | |- | ||
− | | | + | |06:42 |
|So for that type mean question mark in the terminal. | |So for that type mean question mark in the terminal. | ||
|- | |- | ||
− | | | + | |06:49 |
| As we know L is a two dimensional array. | | As we know L is a two dimensional array. | ||
|- | |- | ||
− | | | + | |06:52 |
| We can calculate the mean across each of the axis of the array. | | We can calculate the mean across each of the axis of the array. | ||
|- | |- | ||
− | | | + | |06:57 |
| The axis of rows is referred by number 0 and columns by 1. | | The axis of rows is referred by number 0 and columns by 1. | ||
|- | |- | ||
− | | | + | |07:02 |
| So to calculate mean across all columns, we will pass extra parameter 1 for the axis. | | So to calculate mean across all columns, we will pass extra parameter 1 for the axis. | ||
|- | |- | ||
− | | | + | |07:07 |
|So type mean within brackets L comma 1 and hit Enter. | |So type mean within brackets L comma 1 and hit Enter. | ||
|- | |- | ||
− | | | + | |07:17 |
| L here, is a two dimensional array. | | L here, is a two dimensional array. | ||
|- | |- | ||
− | | | + | |07:20 |
|Similarly to calculate average marks scored by all the students for each subject can be calculated using mean within brackets L comma 0. | |Similarly to calculate average marks scored by all the students for each subject can be calculated using mean within brackets L comma 0. | ||
|- | |- | ||
− | | | + | |07:36 |
| Next, let us calculate the median of English marks for the all the students. | | Next, let us calculate the median of English marks for the all the students. | ||
|- | |- | ||
− | | | + | |07:41 |
| We can access English marks of all students using L in square brackets colon comma zero and hit Enter. | | We can access English marks of all students using L in square brackets colon comma zero and hit Enter. | ||
|- | |- | ||
− | | | + | | 07:53 |
| To get the median we will simply use the function median. | | To get the median we will simply use the function median. | ||
|- | |- | ||
− | | | + | |07:57 |
|So type median within brackets L square brackets colon comma 0 . | |So type median within brackets L square brackets colon comma 0 . | ||
|- | |- | ||
− | | | + | | 08:17 |
| For all the subjects we can use the same syntax as mean and calculate median across all rows using median | | For all the subjects we can use the same syntax as mean and calculate median across all rows using median | ||
|- | |- | ||
− | | | + | |08:25 |
|So type median in brackets L comma 0 and hit Enter. | |So type median in brackets L comma 0 and hit Enter. | ||
|- | |- | ||
− | | | + | | 08:35 |
| Similarly to calculate standard deviation for English we will use the function std | | Similarly to calculate standard deviation for English we will use the function std | ||
|- | |- | ||
− | | | + | |08:41 |
|So type std, in brackets L and in square brackets colon comma 0 and hit Enter | |So type std, in brackets L and in square brackets colon comma 0 and hit Enter | ||
|- | |- | ||
− | | | + | | 08:57 |
| and for all rows, we do std within brackets L comma 0. | | and for all rows, we do std within brackets L comma 0. | ||
|- | |- | ||
− | | | + | | 09:08 |
| Pause the video here, try out the following exercise and resume the video. | | Pause the video here, try out the following exercise and resume the video. | ||
|- | |- | ||
− | | | + | | 09:13 |
| In the given file football dot txt at path slash home slash fossee slash football dot txt , one column is player name,second is goals at home and third goals away. | | In the given file football dot txt at path slash home slash fossee slash football dot txt , one column is player name,second is goals at home and third goals away. | ||
|- | |- | ||
− | | | + | |09:28 |
− | | | + | |Find the total goals for each player |
|- | |- | ||
− | | | + | |09:33 |
− | | | + | |Mean of home and away goals |
|- | |- | ||
− | | | + | |09:37 |
− | | | + | |Standard deviation of home and away goals |
|- | |- | ||
− | | | + | | 09:46 |
| This is the required data. | | This is the required data. | ||
|- | |- | ||
− | | | + | |09:49 |
|For that open the football dot txt file. | |For that open the football dot txt file. | ||
|- | |- | ||
− | | | + | |09:54 |
| The solution is on your screen. | | The solution is on your screen. | ||
Line 291: | Line 285: | ||
|- | |- | ||
|10:07 | |10:07 | ||
− | | | + | | Do the standard statistical operations sum , mean median and standard deviation in Python. |
|- | |- | ||
|10:14 | |10:14 | ||
− | | | + | |Combine text loading and the statistical operation to solve real world problems. |
|- | |- | ||
Line 303: | Line 297: | ||
|- | |- | ||
|10:27 | |10:27 | ||
− | | | + | |Given a two dimensional list, two_dimensional_list is equal to within square brackets [3,5,8,2,1],within another square brackets [4,3,6,2,1] how do we calculate the mean of each row? |
|- | |- | ||
|10:49 | |10:49 | ||
− | | | + | | Calculate the median of the given list? student_marks is equal to within square brackets 74,78,56,87,91,82 |
|- | |- | ||
|11:03 | |11:03 | ||
|And the third question is Suppose there is a file with 6 columns but we wish to load text only in columns 2,3,4,5. How do we specify that? | |And the third question is Suppose there is a file with 6 columns but we wish to load text only in columns 2,3,4,5. How do we specify that? | ||
− | |||
− | |||
|- | |- | ||
Line 321: | Line 313: | ||
|- | |- | ||
|11:20 | |11:20 | ||
− | | | + | |To get the mean of each row, we just pass 1 as the second parameter to the function <tt>mean</tt>. |
|- | |- | ||
Line 329: | Line 321: | ||
|- | |- | ||
|11:37 | |11:37 | ||
− | | | + | | We use the function median to calculate the median of the list |
|- | |- |
Latest revision as of 11:42, 27 March 2017
Time | Narration |
00:00 | Hello friends and welcome to the tutorial on 'Statistics' using Python. |
00:06 | At the end of this tutorial,you will be able to,
Do statistical operations in Python Sum a set of numbers Find their mean,median and standard deviation |
00:17 | Before beginning this tutorial,we would suggest you to complete the tutorial on |
00:21 | "Loading Data from files" "Getting started with Lists" and "Accessing Pieces of Arrays". |
00:29 | Now, type in terminal ipython space hyphen pylab. |
00:38 | For this tutorial, we will use data file that is at the path slash home slash fossee slash sslc2 dot txt. |
00:47 | It contains record of students and their performance in one of the State Secondary Board Examination. |
00:53 | It has 180,000 lines of record. |
00:57 | We are going to read it and process this data. |
01:02 | We can see the content of the file by double clicking on it. |
01:06 | It might take some time to open since it is quite a large file. |
01:11 | Please don't edit the data since it has a particular structure. |
01:15 | To check the contents of the file, we use the cat command. |
01:18 | So type cat space slash home slash fossee slash sslc2 dot txt and Hit enter. |
01:31 | Each line in the file is a set of 11 fields separated by semi-colons. |
01:38 | Consider a sample line from this file. |
01:43 | A semicolon 015163 semicolon JOSEPH RAJ S semicolon 083 semicolon 042 semicolon 47 semicolon 00 semicolon 72 semicolon 244 and three semicolons in a row. |
02:11 | The following are the fields in any given line. |
02:16 | Region Code which is 'A' * Roll Number 015163 * Name JOSEPH RAJ S * Marks of 5 subjects: ** English 083 ** Hindi 042 ** Maths 47 **
Science 35 **Social Science 72 and Total marks 244 |
02:42 | Lets load this data as an array and then run various functions on it. |
02:48 | To get the data as an array, we use the loadtxt command |
02:53 | So type on the terminal L is equal to loadtxt within brackets , single quotes slash home slash fossee slash sslc2 dot txt comma usecols is equal to within brackets 3,4,5,6,7 comma delimiter is equal to within single quotes semicolon) and hit Enter. |
03:45 | We get our output in the form of an array dot loadtxt function. |
03:57 | Now we got an error. We have to type loadtxt before the brackets. |
04:09 | Delimiter specifies the kind of character, that the fields of data separated by usecols specifies the columns to be used. |
04:19 | So within brackets 3,4,5,6,7 loads those columns. |
04:26 | The 'comma' is added because usecols is a sequence. |
04:31 | As we can see L is an array. |
04:35 | We can get the shape of this array using in the terminal we can type L dot shape and hit Enter. |
04:43 | We get a tuple stating the numbers of rows and columns respectively. |
04:50 | Lets start applying statistical operations on these. |
04:55 | We will start with the most basic, summing. |
04:59 | How do you find the sum of marks of all subjects for the first student. |
05:04 | As we know from our knowledge of accessing pieces of arrays, to access the first row, we will do in terminal type L square brackets 0 comma colon. |
05:19 | Now to sum this we can say total marks is equal to sum within brackets L within square brackets 0 comma colon and Hit Enter.Then total marks. Then again Enter. |
05:47 | Now to get the mean we can divide the total marks by the length. |
05:52 | So type total marks slash len within brackets L in square brackets 0 comma colon. |
06:10 | Or simply use the function mean. |
06:13 | For that type mean within brackets L and in square brackets 0 comma colon and hit Enter. |
06:31 | But we have such a large data set and calculating the mean of each student one by one is impossible. |
06:38 | Is there a way to reduce the work. |
06:40 | For this we will look into the documentation of mean |
06:42 | So for that type mean question mark in the terminal. |
06:49 | As we know L is a two dimensional array. |
06:52 | We can calculate the mean across each of the axis of the array. |
06:57 | The axis of rows is referred by number 0 and columns by 1. |
07:02 | So to calculate mean across all columns, we will pass extra parameter 1 for the axis. |
07:07 | So type mean within brackets L comma 1 and hit Enter. |
07:17 | L here, is a two dimensional array. |
07:20 | Similarly to calculate average marks scored by all the students for each subject can be calculated using mean within brackets L comma 0. |
07:36 | Next, let us calculate the median of English marks for the all the students. |
07:41 | We can access English marks of all students using L in square brackets colon comma zero and hit Enter. |
07:53 | To get the median we will simply use the function median. |
07:57 | So type median within brackets L square brackets colon comma 0 . |
08:17 | For all the subjects we can use the same syntax as mean and calculate median across all rows using median |
08:25 | So type median in brackets L comma 0 and hit Enter. |
08:35 | Similarly to calculate standard deviation for English we will use the function std |
08:41 | So type std, in brackets L and in square brackets colon comma 0 and hit Enter |
08:57 | and for all rows, we do std within brackets L comma 0. |
09:08 | Pause the video here, try out the following exercise and resume the video. |
09:13 | In the given file football dot txt at path slash home slash fossee slash football dot txt , one column is player name,second is goals at home and third goals away. |
09:28 | Find the total goals for each player |
09:33 | Mean of home and away goals |
09:37 | Standard deviation of home and away goals |
09:46 | This is the required data. |
09:49 | For that open the football dot txt file. |
09:54 | The solution is on your screen. |
10:00 | This brings us to the end of the tutorial. |
10:03 | In this tutorial,we have learnt to, |
10:07 | Do the standard statistical operations sum , mean median and standard deviation in Python. |
10:14 | Combine text loading and the statistical operation to solve real world problems. |
10:24 | Here are some self assessment questions for you to solve |
10:27 | Given a two dimensional list, two_dimensional_list is equal to within square brackets [3,5,8,2,1],within another square brackets [4,3,6,2,1] how do we calculate the mean of each row? |
10:49 | Calculate the median of the given list? student_marks is equal to within square brackets 74,78,56,87,91,82 |
11:03 | And the third question is Suppose there is a file with 6 columns but we wish to load text only in columns 2,3,4,5. How do we specify that? |
11:16 | And the answers, |
11:20 | To get the mean of each row, we just pass 1 as the second parameter to the function mean. |
11:29 | So we have to type mean within brackets two_dimensional_list comma 1 |
11:37 | We use the function median to calculate the median of the list |
11:42 | by typing median within brackets student_marks. |
11:47 | And the final one To specify the particular columns of a file, we use the parameter usecols is equal to 2,3,4,5. |
12:01 | Hope you have enjoyed this tutorial and found it useful. |
12:05 | Thank you! |