Applications-of-GeoGebra/C3/Statistics-using-GeoGebra/English-timed
From Script | Spoken-Tutorial
Time | Narration |
00:01 | Welcome to this tutorial on Statistics using GeoGebra. |
00:06 | In this tutorial, we will learn how to use GeoGebra to perform: |
00:12 | One Variable Analysis to calculate different statistical parameters. |
00:18 | Two Variable Regression Analysis to estimate best fit line. |
00:23 | Multiple Variable Analysis to calculate different statistical parameters. |
00:29 | Here I am using:
Ubuntu Linux Operating System version 16.04. |
00:36 | GeoGebra 5.0.481.0 hyphen d. |
00:43 | To follow this tutorial, you should be familiar with:
GeoGebra interface, Statistics |
00:51 | Statistics deals with
Data analysis and interpretation. |
00:56 | Measures of central tendency. |
00:59 | Measures of Dispersion. |
01:02 | Comparing variability of data series. |
01:06 | Please refer to additional material provided along with this tutorial. |
01:12 | Fish Feed |
01:14 | Let us look at an example. |
01:17 | A fishery is testing four types of feed formulations on its fish: A, B, C and D. |
01:26 | Data to be collected after feeding the fish for 6 months are:
Length in millimeters, Weight in pounds, Girth in millimeters. |
01:39 | Let us look at some of these data. |
01:42 | Fish Feed Data |
01:44 | We will use these data for our analyses. |
01:49 | Please download the code file, Fishery-data, provided along with this tutorial. |
01:57 | I have opened the GeoGebra interface. |
02:01 | Click on View tool and select Spreadsheet. |
02:07 | Click on X at top right corner of Graphics and Algebra views.
This will close these views. |
02:17 | In the code file, drag mouse to highlight length and weight data from columns H and I. |
02:26 | These are data for fish that have been fed formulation C. |
02:32 | Hold Control key down and press C. |
02:36 | Click in the top of the Spreadsheet in GeoGebra. |
02:41 | Hold Control key down and press V. |
02:45 | This will copy and paste the highlighted data from the code file into GeoGebra. |
02:52 | Place the cursor on the first column header in Spreadsheet view. |
02:58 | Drag and adjust column A's width. |
03:02 | Right-click on column A heading of Length millimetres. |
03:08 | Select Object Properties. |
03:11 | A dialog box opens. |
03:14 | Click on Text tab and change the name to Length millimetres hyphen C. |
03:23 | Close the dialog box. |
03:26 | Similarly, add hyphen C to Weight pounds. |
03:35 | Adjust column B width. |
03:38 | Click on column A heading of Length millimetres C. |
03:44 | Drag to highlight length data in Spreadsheet view. |
03:49 | Below the menubar, click on One Variable Analysis tool. |
03:55 | A Data Source popup window appears.
Click on Analyze button. |
04:02 | A Data Analysis window appears. |
04:06 | By default, a histogram is plotted. |
04:10 | Drag the boundary to see the graph properly. |
04:14 | The length is plotted on the x-axis. |
04:18 | The number of fish that are of a particular length, the frequency, is plotted on the y-axis. |
04:26 | Note the display box above the graph containing the word Histogram. |
04:32 | In the display box, click on the dropdown menu button to display the list of plots. |
04:39 | We will stay with the histogram option. |
04:43 | To the right of the dropdown menu is a slider. |
04:48 | Drag the slider from left to right to go to 20. |
04:53 | The slider changes the number of rectangles between the minimum and maximum values of data. |
05:01 | Click on Options button to the right of the slider. |
05:06 | Under Classes, check Set Classes Manually check box. |
05:12 | This displays Start and Width text-boxes to the left of the Options button. |
05:19 | As all the fish are over 800 milimeters long, type 800 in the Start text-box and press Enter. |
05:29 | We will stay with the default value of 5 for rectangle width. |
05:35 | Uncheck Set Classes Manually check box. |
05:39 | Under Show, uncheck Histogram check box to make it disappear. |
05:45 | Scroll down and check Frequency Polygon to show it. |
05:51 | Under Frequency Type, check Cumulative option. |
05:56 | The default Count selection shows the cumulative frequency count for the data. |
06:03 | Drag the slider and note the effects on smoothness of the cumulative frequency count curve. |
06:11 | We will drag the slider back to 20. |
06:15 | Under Frequency Type, uncheck Cumulative and under Show, uncheck Frequency Polygon. |
06:24 | Under Show, check Histogram .
And click on Options button again to hide the window. |
06:33 | Above the Histogram text-box, click on the third Show Data tool button. |
06:40 | This displays all the data highlighted in the Spreadsheet. |
06:45 | Drag the boundary to see the data properly. |
06:49 | Click on the Show Data tool again to hide the list. |
06:55 | Above the Histogram text-box, click on the last Show 2nd Plot tool button. |
07:02 | The same data are graphed in two vertically placed plots. |
07:07 | You can select plot types from the dropdown menu button above each plot. |
07:14 | Above the Histogram text box, click on the second Show Statistics tool button. |
07:22 | Statistics for the plot appears as a panel in the middle. |
07:27 | Drag the boundary to see it properly. |
07:31 | Box Plot
Box plot is a standardized way of showing data, based on the five number summary. |
07:41 | Let us compare histogram and box plot. |
07:45 | In the box plot, locate the Median, Min, Max, Q1 and Q3 values. |
07:57 | Above each plot, in the upper right corner, click on the button next to Options. |
08:06 | A dropdown menu appears with which you can copy each plot to Clipboard or export it as an image. |
08:15 | Click on Show Statistics tool button to hide the data. |
08:21 | Close the Data Analysis window. |
08:25 | Least Squares Linear Regression (LSLR) |
08:30 | Changing an independent variable x changes the dependent variable y. |
08:36 | LSLR predicts y based on x value. |
08:41 | Least Squares Regression Line (LSRL) is also called the best fit line. |
08:49 | It is given by y = b0 + b1x. |
08:55 | b1, the slope, is the regression coefficient. |
09:00 | Coefficient of determination R squared |
09:04 | R squared ranges from 0 to 1. |
09:08 | The closer R squared is to 1, the better is the prediction of variance in y from x. |
09:16 | Let us go back to the length and weight data in the Spreadsheet view in GeoGebra. |
09:23 | Drag and select all the data in both columns. |
09:29 | Under One Variable Analysis, click on Two Variable Regression Analysis tool. |
09:36 | In the Data Source window that pops up, click Analyze button. |
09:41 | A Data Analysis window appears with two plots. |
09:46 | By default, the upper plot is a Scatterplot and the lower a Residual plot. |
09:54 | Click on Show Statistics tool to see the Statistics. |
10:00 | Drag the boundary to see them properly. |
10:04 | Below the Statistics window, click on the Regression Model menu button and select Linear. |
10:13 | Note the red line in the Scatterplot. |
10:17 | This is the best fit line that passes through as many points as possible. |
10:23 | Its equation is given in red at the bottom. |
10:28 | This R squared value indicates good fit between the model and the actual data. |
10:36 | Select other regression models to see effects on the R squared value. |
10:43 | The lower plot is the Residual Plot. |
10:47 | Residuals are the differences between observed and predicted values of all points. |
10:54 | Above the Statistics window, click on the last Switch Axes button. |
11:00 | For the scatterplot, length is now plotted along y axis and weight along x axis. |
11:08 | Observe that the best fit line and many statistics change. |
11:13 | Its equation is now y= 9.91x + 684.3. |
11:22 | The only statistics that remain the same are r, R squared and rho (ρ). |
11:31 | Note that r and rho are greater than 0.8, indicating positive correlation. |
11:39 | Weight increases as length increases for fish given feed C. |
11:46 | The relationship is strong and well predicted by the best fit lines. |
11:52 | Again, click on Switch Axes button. |
11:56 | At the bottom, in Symbolic Evaluation, you can enter a value for x to get a prediction for y. |
12:04 | To get logical predictions, we will enter x values above the x intercept. |
12:11 | In Symbolic Evaluation, in the text-box for x, type 800 and press Enter. |
12:19 | Note that a y value appears next to the display box. |
12:25 | The x value was substituted in the best fit line equation to get the y value. |
12:32 | Again, click on Show Statistics tool button. |
12:37 | Close the Data Analysis window. |
12:40 | Let’s go back to the length and weight data in the Spreadsheet. |
12:45 | In the Spreadsheet, select all the data in both columns. |
12:51 | Under One Variable Analysis, click on Multiple Variable Analysis tool. |
12:59 | In the Data Source window that pops up, click Analyze button. |
13:04 | Box Plots appear in the window. |
13:07 | They are for length and weight data. |
13:11 | Above the plot, click on the second Show Statistics tool. |
13:17 | Statistics for both plots appear below. |
13:21 | Place the cursor on the boundary between the plot and statistics. |
13:27 | When the arrow appears, drag the boundary to resize the windows. |
13:34 | Let us summarize. |
13:36 | In this tutorial, we have learnt how to use GeoGebra to perform: |
13:41 | One Variable Analysis to calculate different statistical parameters. |
13:47 | Two Variable Regression Analysis to estimate best fit line. |
13:52 | Multiple Variable Analysis to calculate different statistical parameters. |
13:58 | Assignment
Perform statistical analyses for weight and girth data given in this tutorial. |
14:07 | Four oils were used to deep fry chips. |
14:11 | Amount of absorbed fat was measured for 6 chips fried in 4 oils. |
14:19 | Is any of the oils absorbed more than the others? |
14:24 | The video at the following link summarizes the Spoken Tutorial project.
Please download and watch it. |
14:32 | The Spoken Tutorial Project team conducts workshops and gives certificates.
For more details, please write to us. |
14:42 | Please post your timed queries on this forum. |
14:46 | Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.
More information on this mission is available at this link. |
14:59 | This is Vidhya Iyer from IIT Bombay, signing off.
Thank you for joining. |