Python-3.4.3/C2/Statistics/English
|
|
Show Slide | Hello Friends. Welcome to the tutorial on "Statistics” using Python |
Show Slide
Objectives
|
At the end of this tutorial, you will be able to -
|
Show Slide
System Specifications |
To record this tutorial, I am using
|
Show Slide:
Pre-requisites
|
To practise this tutorial, you should know how to -
|
[File Browser]
open and Show the file student_record.txt
|
For this tutorial, we will use the data file student_record.txt which we used in the earlier tutorial.
|
[File Browser]
Show the file student_record.txt |
We will use mathematical and logical operations on this array structured file.
|
Numpy(Numerical Python)
slide:
|
NumPy, stands for Numerical Python.
|
Open terminal by pressing Ctrl+Alt+T keys simultaneously | Let us first open the Terminal by pressing Ctrl+Alt+T keys simultaneously. |
[Terminal] Install latest Python
type sudo apt-get install python3-pip |
Let us install latest pip.
|
Install numpy
type sudo pip3 install numpy==1.13.3 |
Next, we need to install numpy library as we will be using numpy library throughout the tutorial.
|
Highlight prompt after installation | The installation is completed successfully.
|
Slide:loadtxt()
|
Next we will learn about loadtxt() function.
For loadtxt() function, we need to import numpy library first. |
[Terminal] type ipython3 | Switch back to the terminal.
Now, type ipython3 and press Enter. |
[IPython Terminal]
Type import numpy as np |
Type, import numpy as np and press Enter.
Where np is alias to numpy and it can be any name. |
Type
L=np.loadtxt('student_record.txt', usecols=(3,4,5,6,7), delimiter=';')
|
Let us load the data from the file student_record.txt as an array.
|
Highlight the output | We get the output in the form of an array. |
Highlight command one by one | loadtxt loads data from an external file.
|
Highlight command one by one | So columns 3,4,5,6,7 from student_record.txt are loaded here.
|
[IPython Terminal]
Type L.shape |
As we can see L is an array.
|
Type L.shape | Type, L dot shape and press Enter. |
[IPython Terminal]
4:45
|
We get a tuple giving the numbers of rows and columns respectively.
|
Let us switch back to the student_record.txt file. | |
Highlight record | Let us start applying statistical operations on these.
|
[IPython Terminal]
Type L[0] |
Switch back to the terminal.
|
[IPython Terminal]
Type totalmarks=sum(L[0]) |
Now to sum this, type,
totalmarks is equal to sum inside parentheses L inside square brackets 0
|
Type totalmarks
Highlight 177.0 |
Type, totalmarks and press Enter.
|
[IPython Terminal]
Type totalmarks/len(L[0]) Highlight 35.399999999999999 |
Now to get the mean we can divide the totalmarks by the length of the array.
|
[IPython Terminal]
Type np.mean(L[0]) |
Or simply use the function mean.
Type np dot mean inside parentheses L inside square brackets 0 and press Enter. |
[IPython Terminal]
Type np.mean? |
But we have such a large data set.
Read the text for more information. |
Type q and press enter | Type q to exit the documentation. |
show slide
Two-Dimensional array |
In the above example, L is a two dimensional array like matrix.
|
[IPython Terminal]
Type np.mean(L,0) |
Switch back to the terminal.
|
[IPython Terminal]
Type L[:,0] Highlight output array([ 53., 58., 72., ..., 49., 33., 17.]) |
Next, we will calculate the median of English marks for all the students.
|
[IPython Terminal]
Type np.median(L[:,0]) |
To get the median we will simply use the function median.
Type np dot median inside parentheses L inside square brackets colon comma 0
|
[IPython Terminal]
Type np.median(L,0) |
For all the subjects, we can calculate median across all rows using median function as shown here.
|
[IPython Terminal]
Type np.std(L[:,0]) |
Similarly to calculate standard deviation we will use the function std
|
[IPython Terminal]Type
np.std(L,0) |
And for all rows, we do, np dot std inside parentheses L comma 0 and press Enter. |
Pause the video here, try out the following exercise and resume the video. | |
Show Slide
Exercise 1 |
Refer to the file football.txt, that is available in the code files link of this tutorial.
|
highlight | In football.txt,
|
Show Slide
Exercise 1 |
# Find the total goals for each player
|
Ipython Terminal
Type L=np.loadtxt('football.txt',usecols=(1,2), delimiter=',')
|
Switch to the terminal.
L is equal to np dot loadtxt inside parentheses inside quotes football.txt comma usecols is equal to inside parentheses 1 comma 2 comma delimiter is equal to inside quotes comma.
|
Ipython Terminal
Type np.mean(L,0) |
Answer for the second, np dot mean inside parentheses L comma 0 and press enter. |
[Ipython Termina]
Type np.std(L,0) |
Third, np dot std inside parentheses L comma 0 and press enter. |
Show Slide
Summary
|
This brings us to the end of the tutorial.
sum mean median and standard deviation in Python. |
Show Slide
Assignment
|
Here are some self assessment questions for you to solve
|
Show Slide
Assignment |
How do we specify that? |
Show Slide
|
And the answers,
1. To get the mean of each row, we just pass 1 as the second parameter to the function mean. np.mean inside parentheses two_dimensional_list comma 1 2. We use the function median to calculate the median of the list np.median inside parentheses student_marks 3. To specify the particular columns of a file, we use the parameter usecols is equal to inside parentheses 2, 3, 4, 5 |
Show SlideForum | Please post your timed queries in this forum. |
Show Slide
Fossee Forum |
Please post your general queries on Python in this forum. |
Show Slide Textbook Companion | FOSSEE team coordinates the TBC project. |
Show Slide
Acknowledgment http://spoken-tutorial.org |
Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
For more details, visit this website. |
Previous slide | Thats it for the tutorial.
|