Difference between revisions of "PhET-Simulations-for-Mathematics/C3/Least-Squares-Regression/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with "'''Title of the script''': '''Least Squares Regression''' '''Author: Shraddha Kodavade '''Keywords''': Phet simulation, Least-Squares Regression, best-fit line, residuals, s...")
 
Line 54: Line 54:
  
 
'''https://phet.colorado.edu/en/simulations/least-squares-regression'''
 
'''https://phet.colorado.edu/en/simulations/least-squares-regression'''
|| Please use the given link to download the P'''hET simulation'''.
+
|| Please use the given link to download the '''PhET simulation'''.
  
 
|-
 
|-
Line 62: Line 62:
 
|| '''Least Squares Regression'''
 
|| '''Least Squares Regression'''
  
A '''best fit line''' that reduces error by minimising the distance of the residuals.
+
A '''best fit''' line that reduces error by minimising the distance of the residuals.
  
 
|-
 
|-
Line 68: Line 68:
  
 
'''PhET Simulations'''
 
'''PhET Simulations'''
|| In this tutorial, we will use '''Least-Squares Regression PhET Simulation.'''
+
|| In this tutorial, we will use '''Least-Squares Regression PhET Simulation'''.
  
 
|-
 
|-
Line 80: Line 80:
 
|-
 
|-
 
|| '''Cursor''' on the '''interface'''.
 
|| '''Cursor''' on the '''interface'''.
|| This is the interface of '''Least-Squares Regression. '''
+
|| This is the interface of '''Least-Squares Regression'''.
  
 
|-
 
|-
Line 86: Line 86:
 
|| In the middle of the screen we can see a white plotting region.  
 
|| In the middle of the screen we can see a white plotting region.  
  
This is the '''Cartesian '''system.
+
This is the '''Cartesian system '''.
  
 
|-
 
|-
Line 110: Line 110:
 
|| Drag and place a point at (15, 15).
 
|| Drag and place a point at (15, 15).
  
Next drag and place another point at (5,5).
+
Next drag and place another point at (5, 5).
  
 
|-
 
|-
Line 120: Line 120:
 
|| Two panels can be seen on the left hand side.  
 
|| Two panels can be seen on the left hand side.  
  
Click the green plus sign to show the '''Best-Fit Line '''panel.
+
Click the green plus sign to show the '''Best-Fit Line ''' panel.
  
Click the green plus sign to show the '''Correlation Coefficient '''panel.
+
Click the green plus sign to show the '''Correlation Coefficient ''' panel.
  
 
|-
 
|-
Line 154: Line 154:
 
|| Click on the '''Squared Residuals''' check box.  
 
|| Click on the '''Squared Residuals''' check box.  
  
No '''squared residuals '''are observed.
+
No '''squared residuals ''' are observed.
  
 
|-
 
|-
Line 160: Line 160:
  
 
Point to the value.
 
Point to the value.
|| '''Correlation coefficient '''shows the degree of association between two variables.  
+
|| '''Correlation coefficient ''' shows the degree of association between two variables.  
  
 
It lies between -1 to +1.  
 
It lies between -1 to +1.  
Line 173: Line 173:
  
 
|-
 
|-
|| Move the point from''' (5,5) '''to''' (5,15). '''
+
|| Move the point from (5,5) to (5,15).
  
Point to '''-1.'''
+
Point to -1.
|| Move the point from''' '''(5, 5) to (5, 15).  
+
|| Move the point from (5, 5) to (5, 15).  
  
 
The value changes to -1.''' '''
 
The value changes to -1.''' '''
Line 185: Line 185:
  
 
|-
 
|-
|| Place the third point at '''(10,3).'''
+
|| Place the third point at (10,3).
 
|| Place a third point at (10, 3).  
 
|| Place a third point at (10, 3).  
  
The '''correlation coefficient '''comes close to 0.  
+
The '''correlation coefficient ''' comes close to 0.  
  
 
This means there is no association between the three points.
 
This means there is no association between the three points.
Line 216: Line 216:
 
|-
 
|-
 
|| Point to the equation '''y = ax + b'''.
 
|| Point to the equation '''y = ax + b'''.
|| The equation of line is in form '''y=ax+b'''
+
|| The equation of line is in form '''y=ax + b'''
  
 
|-
 
|-
|| Point to''' a'''.
+
|| Point to ''' a'''.
  
Point to''' b'''.
+
Point to ''' b'''.
 
|| Here '''a''' stands for slope.
 
|| Here '''a''' stands for slope.
  
Line 252: Line 252:
 
|| Click on the '''Reset''' option.  
 
|| Click on the '''Reset''' option.  
  
Click the green plus sign to show the '''Best-Fit Line '''panel.
+
Click the green plus sign to show the '''Best-Fit Line ''' panel.
  
Click the green plus sign to show the '''Correlation Coefficient '''panel.
+
Click the green plus sign to show the '''Correlation Coefficient ''' panel.
  
 
|-
 
|-
Line 271: Line 271:
  
 
Click the '''Internet users vs. Time '''option.
 
Click the '''Internet users vs. Time '''option.
|| Let us explore the '''Internet users vs. Time '''option.
+
|| Let us explore the '''Internet users vs. Time ''' option.
  
 
We can see an exponential shaped arrangement of data points.
 
We can see an exponential shaped arrangement of data points.
Line 293: Line 293:
 
This data is from the '''World Bank'''.  
 
This data is from the '''World Bank'''.  
  
It denotes the global internet users from 1990 to 2010. Click the cross button.
+
It denotes the global Internet users from 1990 to 2010. Click the cross button.
  
 
|-
 
|-
Line 299: Line 299:
 
|| The correlation coefficient is +0.94.
 
|| The correlation coefficient is +0.94.
  
It means that as the years increase, there is a surge in internet users.
+
It means that as the years increase, there is a surge in Internet users.
  
 
|-
 
|-
Line 317: Line 317:
  
 
Point to the vertical lines.
 
Point to the vertical lines.
|| Click the''' Residuals '''check box.
+
|| Click the ''' Residuals ''' check box.
  
 
Vertical lines appear.
 
Vertical lines appear.
Line 344: Line 344:
 
|| Click the '''Reset''' button.
 
|| Click the '''Reset''' button.
  
Click the green plus sign to show the '''Best-Fit Line '''panel.
+
Click the green plus sign to show the '''Best-Fit Line ''' panel.
  
Click the green plus sign to show the '''Correlation Coefficient '''panel.
+
Click the green plus sign to show the '''Correlation Coefficient ''' panel.
 
|-
 
|-
 
|| Click the '''Life Expectancy vs. TVs'''
 
|| Click the '''Life Expectancy vs. TVs'''
Line 392: Line 392:
 
The majority life expectancy is 70 to 80 years.  
 
The majority life expectancy is 70 to 80 years.  
  
The extreme points are outliers over here.  
+
The extreme points are '''outliers''' over here.  
 
|-
 
|-
 
|| Click the''' Best-Fit Line''' check box.  
 
|| Click the''' Best-Fit Line''' check box.  
Line 403: Line 403:
 
A red line appears.  
 
A red line appears.  
  
Its equation is y=-0.04x+69.6.  
+
Its equation is y=-0.04x+ 69.6.  
 
|-
 
|-
 
|| Click the '''Residuals '''checkbox.
 
|| Click the '''Residuals '''checkbox.
Line 419: Line 419:
 
|| Click the '''Reset''' option.
 
|| Click the '''Reset''' option.
  
Click the '''Best Fit''' and''' Correlation coefficient.'''
+
Click the '''Best Fit''' and ''' Correlation coefficient'''.
 
|-
 
|-
 
|| Click the '''Temperature (F) vs. Longitude'''.
 
|| Click the '''Temperature (F) vs. Longitude'''.
Line 434: Line 434:
 
|| The x axis denotes the '''Longitude'''.
 
|| The x axis denotes the '''Longitude'''.
  
The y axis denotes Average January Temperature in''' Fahrenheit(F)'''.
+
The y axis denotes Average January Temperature in ''' Fahrenheit(F)'''.
 
|-
 
|-
 
|| Click the question mark button.
 
|| Click the question mark button.
Line 465: Line 465:
 
A red line appears.  
 
A red line appears.  
  
Its equation is y=-0.02x+24.6.  
+
Its equation is y= 0.02x+24.6.  
 
|-
 
|-
 
|| Click the Residuals check box..
 
|| Click the Residuals check box..

Revision as of 13:26, 19 May 2023

Title of the script: Least Squares Regression

Author: Shraddha Kodavade

Keywords: Phet simulation, Least-Squares Regression, best-fit line, residuals, slope, intercept, square residuals, spoken tutorial, video tutorial.


Visual Cue Narration
Slide Number 1

Title Slide

Welcome to this spoken tutorial on Least Squares Regression.
Slide Number 2

Learning Objectives

In this tutorial, we will learn about:
  • Linear Regression
  • Correlation coefficient
  • Residuals and best fit line
  • Sum of squared residuals
Slide Number 3

System Requirements

This tutorial is recorded using,

Windows 10-64-bit operating system

Chrome version 101.0.49

Slide Number 4

Pre-requisites

https://spoken-tutorial.org

To follow this tutorial the learner should be familiar with topics in basic mathematics.

Please use the link below to access the tutorials on PhET Simulations.

Slide Number 5

Link for Phet Simulations

https://phet.colorado.edu/en/simulations/least-squares-regression

Please use the given link to download the PhET simulation.
Slide Number 6

Least Squares Regression

Least Squares Regression

A best fit line that reduces error by minimising the distance of the residuals.

Slide Number 7

PhET Simulations

In this tutorial, we will use Least-Squares Regression PhET Simulation.
Show the Downloads folder. I have already downloaded the Least-Squares Regression simulation to my Downloads folder.
Cursor on the interface. Let us begin.
Cursor on the interface. This is the interface of Least-Squares Regression.
Point to the white region In the middle of the screen we can see a white plotting region.

This is the Cartesian system .

Point to the left bottom side.

Point to the data points bucket.

A data points bucket is seen at the bottom left.

It consists of orange points.

The data points can be pulled out or put in the bucket.

Check the grid checkbox. Let us check the grid checkbox to show the grid.

This will help to place the points accurately.

Place one point at (15,15)

Place one point at (5,5)

Drag and place a point at (15, 15).

Next drag and place another point at (5, 5).

Point to the left hand side.

Click the green plus sign to show the Best-Fit Line panel.

Click the green plus sign to show the Correlation Coefficient panel.

Two panels can be seen on the left hand side.

Click the green plus sign to show the Best-Fit Line panel.

Click the green plus sign to show the Correlation Coefficient panel.

click the Best-fit line. Point to the line.

Point to equation.

Click the ‘Best-Fit line’ checkbox.

A line passing through the two points is seen on the screen.

Note the equation written in the white box.

Point to the two points. Necessary condition to plot a line in 2-D space is the presence of two points.
Click the Residuals.

Point to the line.

Click on the Residuals check box.

No residuals are observed.

This is because the condition of 2 points is satisfied.

click the Squared Residuals.

Point to the line.

Click on the Squared Residuals check box.

No squared residuals are observed.

Point to Correlation Coefficient.

Point to the value.

Correlation coefficient shows the degree of association between two variables.

It lies between -1 to +1.

Point to 1. Here the value is +1.

This means it has a positive relation.

This indicates that movement in both variables is in the same direction.

Move the point from (5,5) to (5,15).

Point to -1.

Move the point from (5, 5) to (5, 15).

The value changes to -1.

This means it has a negative relation.

It means that movement in both variables is in the opposite direction.

Place the third point at (10,3). Place a third point at (10, 3).

The correlation coefficient comes close to 0.

This means there is no association between the three points.

Point to the residuals. Here we can see three red vertical lines.

The distance from the data points to the best fit line shows the deviations for the line.

These are called residuals.

The goal of the best fit line is to minimise the distance of the error lines.

Point to the square boxes. Three square boxes can be seen on the screen.

They denote the square of the residuals.

The objective function minimises this to get the best fit line.

Point to the top right corner The My Line option has been ticked by default.
Point to the equation y = ax + b. The equation of line is in form y=ax + b
Point to a.

Point to b.

Here a stands for slope.

It is the rate of change of y with respect to x.

b stands for intercept which shows where the line intersects the y axis.

Click the Residuals and Squared Residuals check boxes. Click the Residuals and Squared Residuals check boxes.
Move the slides for a and b.


Point to the lines.

The same concept applied on the best fit line applies here.

Move the slider for a and b.

Observe the difference between Best Fit and My Line.

Click the Reset option.

Click the green plus sign to show the Best-Fit Line panel.

Click the green plus sign to show the Correlation Coefficient panel.

Click on the Reset option.

Click the green plus sign to show the Best-Fit Line panel.

Click the green plus sign to show the Correlation Coefficient panel.

Click the drop down list.


Point to the Custom option.

Click the Custom drop down box.

There are total of 15 options.

We have now explored Custom option.

Point to the curve.

Click the Internet users vs. Time option.

Let us explore the Internet users vs. Time option.

We can see an exponential shaped arrangement of data points.

Point to the x-axis.

Point to the y axis.

The x axis denotes the Years since 1990.

The y axis denotes Total Internet users (in billions).

Click the question mark.

Point to the information in the box.

Click the cross button.

Click the question mark button.

This data is from the World Bank.

It denotes the global Internet users from 1990 to 2010. Click the cross button.

Point to Correlation Coefficient. The correlation coefficient is +0.94.

It means that as the years increase, there is a surge in Internet users.

Click the Best-Fit line checkbox.

Point to the line.

Point to the equation.

Click the Best-Fit line checkbox.

A red line appears.

Its equation is y=0.10x-0.38.

Click the Residuals check box.

Point to the vertical lines.

Click the Residuals check box.

Vertical lines appear.

This is the best combination of lines that minimises the residual value.

Drag slider a=0.10 x. click the residuals. Drag slider a to 0.10 x.

Click the Residuals checkbox.

Point to blue residuals.

Drag slider b=-0.36

For b=0, the blue residuals are higher than the red residuals.

For b= -0.36 the residuals reduce and the least sum of squares is obtained.

Click the Reset button.

Click the green plus sign to show the Best-Fit Line panel.

Click the green plus sign to show the Correlation Coefficient panel.

Click the Reset button.

Click the green plus sign to show the Best-Fit Line panel.

Click the green plus sign to show the Correlation Coefficient panel.

Click the Life Expectancy vs. TVs Let us explore another option.

Click the Life Expectancy vs. TV option.

Point to the curve The curve has no particular trend.

Majority of the data points are observed on the higher end of y-axis.

Point to the x-axis.

Point to the y axis.

The x axis denotes the average number of people per TV.

The y axis denotes Life Expectancy in years.

Click the question mark button.

Click the cross button.

Click the question mark button.

This data is from The World Almanac and Book of Facts.

It denotes the life expectancy in 40 countries.

It is compared to the average number of people per TV in that country.

Click the cross button to close the information box.

Point to Correlation Coefficient. The correlation coefficient is -0.61.

Hence, less the average number of people per TV, more the life expectancy.

Point to the higher end of the y-axis.

Point to other data points.

Observe the cluster formed on y-axis.

For values close to 0 people per TV, there is higher life expectancy.

The majority life expectancy is 70 to 80 years.

The extreme points are outliers over here.

Click the Best-Fit Line check box.

Point to the line.

Point to the equation.

Click the Best-Fit Line checkbox

A red line appears.

Its equation is y=-0.04x+ 69.6.

Click the Residuals checkbox.

Point to the cluster.

Click the Residuals checkbox.

The vertical lines appear.

Due to the close association of cluster values, all lines are not visible.

Click the Reset option.

Click the Best Fit box and correlation coefficient panels.

Click the Reset option.

Click the Best Fit and Correlation coefficient.

Click the Temperature (F) vs. Longitude. Let us explore another option.

Click the Temperature in Fahrenheit(F) vs. Longitude.

Point to the points. The data points are scattered on the higher end of x-axis.
Point to the x-axis.

Point to the y axis.

The x axis denotes the Longitude.

The y axis denotes Average January Temperature in Fahrenheit(F).

Click the question mark button.

Point to the information in the box.

Click the cross button.

Click the question mark button.

This data shows average January temperature in 50 major US cities.

They are compared with their longitudes.

Click the cross button.

Point to Correlation Coefficient. The correlation coefficient is +0.02.

This value is close to 0.

This means there is no strong relation between both the variables.

Click the Best-Fit line.

Point to the line.

Point to the equation.

Click the Best-Fit Line checkbox.

A red line appears.

Its equation is y= 0.02x+24.6.

Click the Residuals check box.. Click the Residuals check box.

The vertical lines appear.

These lines are very distant from the best fit line.

Click the Squared Residuals check box.. Click the Squared Residuals check box.

The square of the errors is denoted by these squares.

Since the residuals are high, the squared residual values have also increased.

Only Narration With this we have come to the end of this tutorial.

Let us summarise.

Slide Number 8

Summary

In this tutorial, we have learnt about:
  • Linear Regression
  • Correlation coefficient
  • Residuals and best fit line
  • Sum of squared residuals
Slide Number 9

Assignment

As an assignment,

Explore the various data combinations given in this simulation.

Slide Number 10

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Slide Number 11

Spoken Tutorial workshops

The Spoken Tutorial Project team:

conducts workshops using spoken tutorials and

gives certificates on passing online tests.

For more details, please write to us.

Slide Number 12

Forum for specific questions:

Do you have questions about THIS Spoken Tutorial?

Please visit this site.

Choose the minute and second where you have the question

Explain your question briefly

Someone from our team will answer them

Please post your timed queries in this forum.
Slide Number 13

Acknowledgements

The Spoken Tutorial project is funded by the Ministry of Education, Govt. of India
Slide Number 14

Thank you

This is Shraddha Kodavade, a FOSSEE summer fellow 2022, IIT Bombay signing off.

Thanks for joining.

Contributors and Content Editors

Madhurig