Python3.4.3/C2/Statistics/English


Show Slide  Hello Friends. Welcome to the tutorial on "Statistics” using Python 
Show Slide
Objectives

At the end of this tutorial, you will be able to 

Show Slide
System Specifications 
To record this tutorial, I am using

Show Slide:
Prerequisites

To practise this tutorial, you should know how to 

[File Browser]
open and Show the file student_record.txt

For this tutorial, we will use the data file student_record.txt which we used in the earlier tutorial.

[File Browser]
Show the file student_record.txt 
We will use mathematical and logical operations on this array structured file.

Numpy(Numerical Python)
slide:

NumPy, stands for Numerical Python.

Open terminal by pressing Ctrl+Alt+T keys simultaneously  Let us first open the Terminal by pressing Ctrl+Alt+T keys simultaneously. 
[Terminal] Install latest Python
type sudo aptget install python3pip 
Let us install latest pip.

Install numpy
type sudo pip3 install numpy==1.13.3 
Next, we need to install numpy library as we will be using numpy library throughout the tutorial.

Highlight prompt after installation  The installation is completed successfully.

Slide:loadtxt()

Next we will learn about loadtxt() function.
For loadtxt() function, we need to import numpy library first. 
[Terminal] type ipython3  Switch back to the terminal.
Now, type ipython3 and press Enter. 
[IPython Terminal]
Type import numpy as np 
Type import numpy as np and press Enter.
Where np is alias to numpy and it can be any name. 
Type
L=np.loadtxt('student_record.txt', usecols=(3,4,5,6,7), delimiter=';')

Let us load the data from the file student_record.txt as an array.

Highlight the output  We get the output in the form of an array. 
Highlight command one by one  loadtxt loads data from an external file.

Highlight command one by one  So columns 3,4,5,6,7 from student_record.txt are loaded here.

[IPython Terminal]
Type L.shape 
As we can see L is an array.

Type L.shape  Type, L dot shape and press Enter. 
[IPython Terminal]
4:45

We get a tuple giving the numbers of rows and columns respectively.

Let us switch back to the student_record.txt file.  
Highlight record  Let us start applying statistical operations on these.

[IPython Terminal]
Type L[0] 
Switch back to the terminal.

[IPython Terminal]
Type totalmarks=sum(L[0]) 
Now to sum this, type,
totalmarks is equal to sum inside parentheses L inside square brackets 0

Type totalmarks
Highlight 177.0 
Type totalmarks and press Enter.

[IPython Terminal]
Type totalmarks/len(L[0]) Highlight 35.399999999999999 
Now to get the mean we can divide the totalmarks by the length of the array.

[IPython Terminal]
Type np.mean(L[0]) 
Or simply use the function mean.
Type np dot mean inside parentheses L inside square brackets 0 and press Enter. 
[IPython Terminal]
Type np.mean? 
But we have such a large data set.
Read the text for more information. 
Type q and press enter  Type q to exit the documentation. 
show slide
TwoDimensional array 
In the above example, L is a two dimensional array like matrix.

[IPython Terminal]
Type np.mean(L,0) 
Switch back to the terminal.

[IPython Terminal]
Type L[:,0] Highlight output array([ 53., 58., 72., ..., 49., 33., 17.]) 
Next, we will calculate the median of English marks for all the students.

[IPython Terminal]
Type np.median(L[:,0]) 
To get the median we will simply use the function median.
Type np dot median inside parentheses L inside square brackets colon comma 0

[IPython Terminal]
Type np.median(L,0) 
For all the subjects, we can calculate median across all rows using median function as shown here.

[IPython Terminal]
Type np.std(L[:,0]) 
Similarly to calculate standard deviation we will use the function std

[IPython Terminal]Type
np.std(L,0) 
And for all rows, we do, np dot std inside parentheses L comma 0 and press Enter. 
Pause the video here, try out the following exercise and resume the video.  
Show Slide
Exercise 1 
Refer to the file football.txt, that is available in the Code Files link of this tutorial.

highlight  In football.txt,

Show Slide
Exercise 1 
# Find the total goals for each player

Ipython Terminal
Type L=np.loadtxt('football.txt',usecols=(1,2), delimiter=',')

Switch to the terminal.
L is equal to np dot loadtxt inside parentheses inside quotes football.txt comma usecols is equal to inside parentheses 1 comma 2 comma delimiter is equal to inside quotes comma.

Ipython Terminal
Type np.mean(L,0) 
Answer for the second, np dot mean inside parentheses L comma 0 and press Enter. 
[Ipython Termina]
Type np.std(L,0) 
Third, np dot std inside parentheses L comma 0 and press Enter. 
Show Slide
Summary

This brings us to the end of the tutorial.
sum mean median and standard deviation in Python. 
Show Slide
Assignment

Here are some self assessment questions for you to solve.

Show Slide
Assignment 
How do we specify that? 
Show Slide

And the answers,
1. To get the mean of each row, we just pass 1 as the second parameter to the function mean np.mean inside parentheses two_dimensional_list comma 1 2. We use the function median to calculate the median of the list np.median inside parentheses student_marks 3. To specify the particular columns of a file, we use the parameter usecols is equal to inside parentheses 2, 3, 4, 5 
Show SlideForum  Please post your timed queries in this forum. 
Show Slide
Fossee Forum 
Please post your general queries on Python in this forum. 
Show Slide Textbook Companion  FOSSEE team coordinates the TBC project. 
Show Slide
Acknowledgment http://spokentutorial.org 
Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
For more details, visit this website. 
Previous slide  Thats it for the tutorial.
