Difference between revisions of "ApplicationsofGeoGebra/C3/StatisticsusingGeoGebra/English"
(Created page with " {border=1   '''Visual Cue'''   '''Narration'''    '''Slide Number 1''' '''Title Slide'''   Welcome to this '''tutorial''' on '''Statistics using GeoGebra''' ...") 

Line 70:  Line 70:  
    
  '''Slide Number 6'''    '''Slide Number 6'''  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
'''Fish Feed '''  '''Fish Feed '''  
Line 138:  Line 97:  
Let us look at some of these data.  Let us look at some of these data.  
    
−    '''Slide Number  +    '''Slide Number 7''' 
'''Fish Feed Data'''  '''Fish Feed Data'''  
Line 147:  Line 106:  
Please download the '''code file''', '''Fisherydata''', provided along with this '''tutorial'''.  Please download the '''code file''', '''Fisherydata''', provided along with this '''tutorial'''.  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
−  
    
  Show the '''GeoGebra''' window.    Show the '''GeoGebra''' window.  
Line 178:  Line 122:  
Show data in '''columns H''' and '''I'''.  Show data in '''columns H''' and '''I'''.  
−  +  Hold '''Ctrl''' key down and press '''C'''.  
  In the '''code file''', drag '''mouse''' to highlight length and weight data from '''columns H''' and '''I'''.    In the '''code file''', drag '''mouse''' to highlight length and weight data from '''columns H''' and '''I'''.  
Line 259:  Line 203:  
  In the '''display box''', click on the '''dropdown menu button'''.    In the '''display box''', click on the '''dropdown menu button'''.  
  In the '''display box''', click on the '''dropdown menu button''' to display the list of plots.    In the '''display box''', click on the '''dropdown menu button''' to display the list of plots.  
−  
−  
−  
    
  Select '''Histogram'''.    Select '''Histogram'''.  
Line 294:  Line 235:  
  Uncheck '''Set Classes Manually''' check box.    Uncheck '''Set Classes Manually''' check box.  
    
−     +    Under '''Show''', uncheck '''Histogram''' check box. 
−     +    Under '''Show''', uncheck '''Histogram''' check box to make it disappear. 
+    
+    Scroll down and check '''Frequency Polygon''' to show it.  
+    Scroll down and check '''Frequency Polygon''' to show it.  
+    
+    Check '''Cumulative''' option as the '''Frequency Type'''.  
+    Under '''Frequency Type''', check '''Cumulative''' option.  
+    
+    Point to default '''Count''' selection.  
+  
+  Point to the '''cumulative frequency count'''.  
+    The default '''Count''' selection shows the '''cumulative frequency count''' for the data.  
+    
+    Drag '''slider''', bring it back to 20.  
+    Drag the '''slider''' and note the effects on smoothness of the '''cumulative frequency count curve'''.  
+  
+  We will drag the '''slider''' back to 20.  
+    
+    Under '''Frequency Type''', uncheck '''Cumulative''' and under '''Show''', uncheck '''Frequency Polygon'''.  
+    Under '''Frequency Type''', uncheck '''Cumulative''' and under '''Show''', uncheck '''Frequency Polygon'''.  
+    
+    Under '''Show''', check '''Histogram''' and uncheck '''Frequency Polygon''' > > click on '''Options''' button again to hide the window.  
+    Under '''Show''', check '''Histogram''' and uncheck '''Frequency Polygon'''.  
+  
+  And click on '''Options''' button again to hide the window.  
    
  Click on '''Show Data''' tool >> point to data highlighted in the '''Spreadsheet'''.    Click on '''Show Data''' tool >> point to data highlighted in the '''Spreadsheet'''.  
Line 314:  Line 279:  
  The same data are graphed in two vertically placed plots.    The same data are graphed in two vertically placed plots.  
−  You can select plot types from the '''dropdown menu''' above each plot.  +  You can select plot types from the '''dropdown menu button''' above each plot. 
    
  Click on '''Show Statistics''' tool.    Click on '''Show Statistics''' tool.  
Line 325:  Line 290:  
  Drag the boundary to see it properly.    Drag the boundary to see it properly.  
  Drag the boundary to see it properly.    Drag the boundary to see it properly.  
+    
+    '''Slide Number 8'''  
+  
+  '''Box Plot'''  
+  
+  [[Image:]]  
+    '''Box Plot'''  
+  
+  '''Box plot''' is a standardized way of showing data, based on the '''five number summary'''.  
    
  Click and point to '''Median''', '''Min''', '''Max''', '''Q<sub>1</sub>''' and '''Q<sub>3'''</sub> values in the '''box plot'''.    Click and point to '''Median''', '''Min''', '''Max''', '''Q<sub>1</sub>''' and '''Q<sub>3'''</sub> values in the '''box plot'''.  
Line 342:  Line 316:  
  Close the '''Data Analysis''' window.    Close the '''Data Analysis''' window.  
    
−    '''Slide Number  +    '''Slide Number 9''' 
'''Least Squares Linear Regression (LSLR)'''  '''Least Squares Linear Regression (LSLR)'''  
Line 362:  Line 336:  
It is given by '''y = b<sub>0</sub> + b<sub>1</sub>x'''.  It is given by '''y = b<sub>0</sub> + b<sub>1</sub>x'''.  
−  
−  
'''b<sub>1</sub>''', the '''slope''', is the '''regression coefficient'''.  '''b<sub>1</sub>''', the '''slope''', is the '''regression coefficient'''.  
−  
'''Coefficient of determination R<sup> </sup>squared'''  '''Coefficient of determination R<sup> </sup>squared'''  
'''R<sup> </sup>squared''' ranges from 0 to 1.  '''R<sup> </sup>squared''' ranges from 0 to 1.  
+  
+  The closer '''R squared''' is to 1, the better is the prediction of variance in '''y''' from '''x'''.  
    
  Show length and weight data in the '''Spreadsheet''' in the '''GeoGebra'''.    Show length and weight data in the '''Spreadsheet''' in the '''GeoGebra'''.  
−     +    Let us go back to the length and weight data in the '''Spreadsheet''' view in '''GeoGebra'''. 
    
  Drag '''mouse''' to highlight all labels and data in the two '''columns'''.    Drag '''mouse''' to highlight all labels and data in the two '''columns'''.  
Line 400:  Line 373:  
    
  Point to the red line that is drawn through some points.    Point to the red line that is drawn through some points.  
−    Note the red line  +    Note the red line in the '''Scatterplot'''. 
−  +  
−  +  
−  +  
    
  Point to equation is given in red, '''y= 0.08x48.39'''.    Point to equation is given in red, '''y= 0.08x48.39'''.  
  This is the '''best fit line''' that passes through as many points as possible.    This is the '''best fit line''' that passes through as many points as possible.  
−  Its equation is given in red  +  Its equation is given in red at the bottom. 
    
  Point to '''R<sup>2'''</sup> value of 0.7722.    Point to '''R<sup>2'''</sup> value of 0.7722.  
−     +    This ''' R<sup> </sup>squared''' value indicates good fit between the model and the actual data. 
−  +  
−  +  
−  +  
−  This ''' R<sup> </sup>squared''' value indicates good fit between the model and the actual data.  +  
    
  Select other '''regression models''' to see effects on '''R<sup>2'''</sup>.    Select other '''regression models''' to see effects on '''R<sup>2'''</sup>.  
Line 422:  Line 388:  
  Point to the lower '''Residual Plot'''.    Point to the lower '''Residual Plot'''.  
  The lower plot is the '''Residual Plot'''.    The lower plot is the '''Residual Plot'''.  
−  
−  
'''Residuals '''are the differences between observed and predicted values of all points.  '''Residuals '''are the differences between observed and predicted values of all points.  
−  
−  
    
  Click on '''Switch Axes button'''.    Click on '''Switch Axes button'''.  
Line 439:  Line 401:  
Point to equation '''y= 9.91x + 684.3'''.  Point to equation '''y= 9.91x + 684.3'''.  
  Observe that the '''best fit line''' and many '''statistics''' change.    Observe that the '''best fit line''' and many '''statistics''' change.  
+  
+  Its equation is now '''y= 9.91x + 684.3'''.  
    
  Point to '''r''', '''R<sup>2</sup> '''and '''rho (ρ)'''.    Point to '''r''', '''R<sup>2</sup> '''and '''rho (ρ)'''.  
Line 456:  Line 420:  
    
  Point at the line in the '''Scatterplot'''.    Point at the line in the '''Scatterplot'''.  
−     +    To get logical predictions, we will enter '''x''' values above the '''xintercept'''. 
−  +  
−  To get logical predictions, we will enter '''x''' values above the '''xintercept'''.  +  
    
  In '''Symbolic Evaluation''', type in a value for '''x''' and press '''Enter'''.    In '''Symbolic Evaluation''', type in a value for '''x''' and press '''Enter'''.  
Line 486:  Line 448:  
  In the '''Data Source window''' that pops up, click '''Analyze button'''.    In the '''Data Source window''' that pops up, click '''Analyze button'''.  
    
−    Point to'''  +    Point to '''Box Plots''' in the window and to the cell numbers in each row. 
−    '''  +    '''Box Plots''' appear in the window. 
They are for length and weight data.  They are for length and weight data.  
Line 504:  Line 466:  
When the '''arrow''' appears, drag the boundary to resize the windows.  When the '''arrow''' appears, drag the boundary to resize the windows.  
−  
−  
−  
    
      
  Let us summarize.    Let us summarize.  
    
−    '''Slide Number  +    '''Slide Number 10''' 
'''Summary'''  '''Summary'''  
Line 522:  Line 481:  
'''Multiple Variable Analysis''' to calculate different statistical parameters  '''Multiple Variable Analysis''' to calculate different statistical parameters  
    
−    '''Slide Number  +    '''Slide Number 11''' 
'''Assignment'''  '''Assignment'''  
−  
−  Perform statistical analyses for weight and girth data  +  Perform statistical analyses for weight and girth data 
−  +  Is any of the oils absorbed more than the others?  
[[Image:]]  [[Image:]]  
+  
+    '''Assignment'''  
+  
+  Perform statistical analyses for weight and girth data given in this '''tutorial'''  
+  
+  Four oils were used to deep fry chips. Amount of absorbed fat was measured for 6 chips fried in 4 oils. Is any of the oils absorbed more than the others?  
    
−    '''Slide Number  +    '''Slide Number 12''' 
'''About Spoken Tutorial project'''  '''About Spoken Tutorial project'''  
Line 540:  Line 504:  
Please download and watch it.  Please download and watch it.  
    
−    '''Slide Number  +    '''Slide Number 13''' 
'''Spoken Tutorial workshops'''  '''Spoken Tutorial workshops'''  
Line 547:  Line 511:  
For more details, please write to us.  For more details, please write to us.  
    
−    '''Slide Number  +    '''Slide Number 14''' 
'''Forum for specific questions:'''  '''Forum for specific questions:'''  
Line 562:  Line 526:  
  Please post your timed queries on this forum.    Please post your timed queries on this forum.  
    
−    '''Slide Number  +    '''Slide Number 15''' 
'''Acknowledgement'''  '''Acknowledgement''' 
Revision as of 07:48, 16 January 2019
Visual Cue  Narration 
Slide Number 1
Title Slide 
Welcome to this tutorial on Statistics using GeoGebra 
Slide Number 2
Learning Objectives 
In this tutorial, we will learn how to use GeoGebra to perform:
One Variable Analysis to calculate different statistical parameters Two Variable Regression Analysis to estimate best fit line Multiple Variable Analysis to calculate different statistical parameters 
Slide Number 3
System Requirement 
Here I am using:
Ubuntu Linux OS version 16.04 GeoGebra 5.0.481.0d 
Slide Number 4
Prerequisites 
To follow this tutorial, you should be familiar with:
GeoGebra interface Statistics 
Slide Number 5
Statistics Data analysis and interpretation Measures of central tendency Measures of Dispersion Comparing variability of data series Additional material 
Statistics Statistics deals with Data analysis and interpretation Measures of central tendency Measures of Dispersion Comparing variability of data series Please refer to additional material provided along with this tutorial. 
Slide Number 6
Fish Feed A fishery is testing four feed formulations on its fish: A, B, C and D Length (mm) Weight (lbs) Girth (mm) 
Fish Feed
Let us look at an example. A fishery is testing four types of feed formulations on its fish: A, B, C and D. Data to be collected after feeding the fish for 6 months are: Length in millimeters Weight in pounds Girth in millimeters Let us look at some of these data. 
Slide Number 7
Fish Feed Data

We will use these data for our analyses.
Please download the code file, Fisherydata, provided along with this tutorial. 
Show the GeoGebra window.  I have opened the GeoGebra interface. 
Click on View tool and select Spreadsheet.  Click on View tool and select Spreadsheet. 
Click on X at top right corner of Graphics and Algebra views.  Click on X at top right corner of Graphics and Algebra views.
This will close these views. 
In the code file, drag mouse to highlight length and weight data from columns H and I.
Show data in columns H and I. Hold Ctrl key down and press C. 
In the code file, drag mouse to highlight length and weight data from columns H and I.
These are data for fish that have been fed formulation C. Hold Control key down and press C. 
Click on Spreadsheet view in GeoGebra.  Click in the top of the Spreadsheet in GeoGebra. 
Press Ctrl key down and press V.  Hold Control key down and press V.
This will copy and paste the highlighted data from the code file into GeoGebra. 
Place the cursor on the first column header in Spreadsheet view.
Drag and adjust column A's width. 
Place the cursor on the first column header in Spreadsheet view.
Drag and adjust column A's width. 
Rightclick on column A heading of Length (mm).
Select Object Properties. Point to dialog box. 
Rightclick on column A heading of Length millimetres.
Select Object Properties. A dialog box opens. 
Click on Text tab and change the name to Length (mm)C.
Close the dialog box. Similarly, add –C to Weight (lbs). 
Click on Text tab and change the name to Length millimetres hyphen C.
Close the dialog box. Similarly, add hyphen C to Weight pounds. 
Adjust column B width.  Adjust column B width. 
Use mouse to drag and highlight first column A’s length data and label in GeoGebra.  Click on column A heading of Length millimetres C.
Drag to highlight length data in Spreadsheet view. 
Below the menubar, click on One Variable Analysis tool.
Point to Data Source popup window. Click on Analyze button. 
Below the menubar, click on One Variable Analysis tool.
A Data Source popup window appears. Click on Analyze button. 
Point to Data Analysis window and histogram.  A Data Analysis window appears.
By default, a histogram is plotted. 
Drag the boundary to see the graph properly.  Drag the boundary to see the graph properly. 
Point to length on the xaxis and frequency on the yaxis.  The length is plotted on the xaxis.
The number of fish that are of a particular length, the frequency, is plotted on the yaxis. 
Point to the display box above the graph containing the word Histogram.  Note the display box above the graph containing the word Histogram. 
In the display box, click on the dropdown menu button.  In the display box, click on the dropdown menu button to display the list of plots. 
Select Histogram.
Point to slider to the right of the display. 
We will stay with the histogram option.
To the right of the dropdown menu is a slider. 
Drag the slider from left to right to go to 20.  Drag the slider from left to right to go to 20. 
Point to rectangles between minimum and maximum values of data.  The slider changes the number of rectangles between the minimum and maximum values of data. 
Click on Options button to the right of the slider.  Click on Options button to the right of the slider. 
Under Classes, check Set Classes Manually.  Under Classes, check Set Classes Manually check box.
This displays Start and Width textboxes to the left of the Options button. 
Type 800 in the Start textbox and press Enter.
Show the value of 5 in the Width textbox. 
As all the fish are over 800 mm long, type 800 in the Start textbox and press Enter.
We will stay with the default value of 5 for rectangle width. 
Uncheck Set Classes Manually.  Uncheck Set Classes Manually check box. 
Under Show, uncheck Histogram check box.  Under Show, uncheck Histogram check box to make it disappear. 
Scroll down and check Frequency Polygon to show it.  Scroll down and check Frequency Polygon to show it. 
Check Cumulative option as the Frequency Type.  Under Frequency Type, check Cumulative option. 
Point to default Count selection.
Point to the cumulative frequency count. 
The default Count selection shows the cumulative frequency count for the data. 
Drag slider, bring it back to 20.  Drag the slider and note the effects on smoothness of the cumulative frequency count curve.
We will drag the slider back to 20. 
Under Frequency Type, uncheck Cumulative and under Show, uncheck Frequency Polygon.  Under Frequency Type, uncheck Cumulative and under Show, uncheck Frequency Polygon. 
Under Show, check Histogram and uncheck Frequency Polygon > > click on Options button again to hide the window.  Under Show, check Histogram and uncheck Frequency Polygon.
And click on Options button again to hide the window. 
Click on Show Data tool >> point to data highlighted in the Spreadsheet.  Above the Histogram textbox, click on the third Show Data tool button.
This displays all the data highlighted in the Spreadsheet. 
Drag the boundary to see the data properly.  Drag the boundary to see the data properly. 
Click on Show Data tool again to hide the list.  Click on the Show Data tool again to hide the list. 
Click on Show 2^{nd} Plot tool.  Above the Histogram textbox, click on the last Show 2^{nd} Plot tool button. 
Select histogram for top plot and box plot for bottom plot.  The same data are graphed in two vertically placed plots.
You can select plot types from the dropdown menu button above each plot. 
Click on Show Statistics tool.
Point to Statistics for both plots. 
Above the Histogram textbox, click on the second Show Statistics tool button.
Statistics for the plot appears as a panel in the middle. 
Drag the boundary to see it properly.  Drag the boundary to see it properly. 
Slide Number 8
Box Plot [[Image:]] 
Box Plot
Box plot is a standardized way of showing data, based on the five number summary. 
Click and point to Median, Min, Max, Q_{1} and Q_{3} values in the box plot.  Let us compare histogram and box plot.
In the box plot, locate the Median, Min, Max, Q_{1} and Q_{3} values. 
Click on the button next to Options button above the plot.  Above each plot, in the upper right corner, click on the button next to Options.
A dropdown menu appears with which you can copy each plot to Clipboard or export it as an image. 
Click on Show Statistics tool button to hide the data.  Click on Show Statistics tool button to hide the data. 
Close the Data Analysis window.  Close the Data Analysis window. 
Slide Number 9
Least Squares Linear Regression (LSLR) Changing an independent variable x changes the dependent variable y. LSLR predicts y based on x value. LSRL (best fit line) y = b_{0} + b_{1}x Coefficient of determination R^{2} 
Least Squares Linear Regression (LSLR)
Changing an independent variable x changes the dependent variable y. LSLR predicts y based on x value. Least Squares Regression Line (LSRL) is also called the best fit line. It is given by y = b_{0} + b_{1}x. b_{1}, the slope, is the regression coefficient. Coefficient of determination R^{ }squared R^{ }squared ranges from 0 to 1. The closer R squared is to 1, the better is the prediction of variance in y from x. 
Show length and weight data in the Spreadsheet in the GeoGebra.  Let us go back to the length and weight data in the Spreadsheet view in GeoGebra. 
Drag mouse to highlight all labels and data in the two columns.  Drag and select all the data in both columns. 
Under One Variable Analysis, click on Two Variable Regression Analysis tool.  Under One Variable Analysis, click on Two Variable Regression Analysis tool. 
Click Analyze button in the Data Source window that pops up.  In the Data Source window that pops up, click Analyze button. 
Data Analysis window appears.  A Data Analysis window appears with two plots. 
Show both plots.  By default, the upper plot is a Scatterplot and the lower a Residual plot. 
Click on Show Statistics tool to see Statistics.  Click on Show Statistics tool to see the Statistics. 
Drag the boundary to see them properly.  Drag the boundary to see them properly. 
Below Statistics window, click on the Regression Model menu button and select Linear.  Below the Statistics window, click on the Regression Model menu button and select Linear. 
Point to the red line that is drawn through some points.  Note the red line in the Scatterplot. 
Point to equation is given in red, y= 0.08x48.39.  This is the best fit line that passes through as many points as possible.
Its equation is given in red at the bottom. 
Point to R^{2} value of 0.7722.  This R^{ }squared value indicates good fit between the model and the actual data. 
Select other regression models to see effects on R^{2}.  Select other regression models to see effects on the R^{ }squared value. 
Point to the lower Residual Plot.  The lower plot is the Residual Plot.
Residuals are the differences between observed and predicted values of all points. 
Click on Switch Axes button.  Above the Statistics window, click on the last Switch Axes button. 
Point to length now plotted on yaxis and weight on xaxis.  For the scatterplot, length is now plotted along yaxis and weight along xaxis. 
Point to the best fit line and statistics.
Point to equation y= 9.91x + 684.3. 
Observe that the best fit line and many statistics change.
Its equation is now y= 9.91x + 684.3. 
Point to r, R^{2} and rho (ρ).  The only statistics that remain the same are r, R^{ }squared and rho (ρ).
Note that r and rho are greater than 0.8, indicating positive correlation. Weight increases as length increases for fish given feed C. The relationship is strong and well predicted by the best fit lines. 
Again, click on Switch Axes button.  Again, click on Switch Axes button. 
Point to Symbolic Evaluation at the bottom.  At the bottom, in Symbolic Evaluation, you can enter a value for x to get a prediction for y. 
Point at the line in the Scatterplot.  To get logical predictions, we will enter x values above the xintercept. 
In Symbolic Evaluation, type in a value for x and press Enter.  In Symbolic Evaluation, in the textbox for x, type 800 and press Enter. 
Point to y value appearing next to the display box.  Note that a y value appears next to the display box.
The x value was substituted in the best fit line equation to get the y value. 
Again, click on Show Statistics tool button.  Again, click on Show Statistics tool button. 
Close the Data Analysis window.  Close the Data Analysis window. 
Point to length and weight data in the Spreadsheet.  Let’s go back to the length and weight data in the Spreadsheet. 
Drag mouse to highlight all labels and data in the two columns.  In the Spreadsheet, select all the data in both columns. 
Under One Variable Analysis, click on Multiple Variable Analysis tool.  Under One Variable Analysis, click on Multiple Variable Analysis tool. 
Click Analyze button in the Data Source window that pops up.  In the Data Source window that pops up, click Analyze button. 
Point to Box Plots in the window and to the cell numbers in each row.  Box Plots appear in the window.
They are for length and weight data. 
Click on Show Statistics tool.
Point to Statistics for both plots. 
Above the plot, click on the second Show Statistics tool.
Statistics for both plots appear below. 
Place the cursor on the boundary between the plot and statistics.
When the arrow appears, drag the boundary to resize the windows. 
Place the cursor on the boundary between the plot and statistics.
When the arrow appears, drag the boundary to resize the windows. 
Let us summarize.  
Slide Number 10
Summary 
In this tutorial, we have learnt how to use GeoGebra to perform:
One Variable Analysis to calculate different statistical parameters Two Variable Regression Analysis to estimate best fit line Multiple Variable Analysis to calculate different statistical parameters 
Slide Number 11
Assignment Perform statistical analyses for weight and girth data Is any of the oils absorbed more than the others? [[Image:]] 
Assignment
Perform statistical analyses for weight and girth data given in this tutorial Four oils were used to deep fry chips. Amount of absorbed fat was measured for 6 chips fried in 4 oils. Is any of the oils absorbed more than the others? 
Slide Number 12
About Spoken Tutorial project 
The video at the following link summarizes the Spoken Tutorial project.
Please download and watch it. 
Slide Number 13
Spoken Tutorial workshops 
The Spoken Tutorial Project team conducts workshops and gives certificates.
For more details, please write to us. 
Slide Number 14
Forum for specific questions: Do you have questions in THIS Spoken Tutorial? Please visit this site Choose the minute and second where you have the question Explain your question briefly Someone from our team will answer them 
Please post your timed queries on this forum. 
Slide Number 15
Acknowledgement 
Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
More information on this mission is available at this link. 
This is Vidhya Iyer from IIT Bombay, signing off.
Thank you for joining. 