Applications-of-GeoGebra/C3/Statistics-using-GeoGebra/English-timed

From Script | Spoken-Tutorial
Jump to: navigation, search


Time Narration
00:01 Welcome to this tutorial on Statistics using GeoGebra.
00:06 In this tutorial, we will learn how to use GeoGebra to perform:
00:12 One Variable Analysis to calculate different statistical parameters.
00:18 Two Variable Regression Analysis to estimate best fit line.
00:23 Multiple Variable Analysis to calculate different statistical parameters.
00:29 Here I am using:

Ubuntu Linux Operating System version 16.04.

00:36 GeoGebra 5.0.481.0 hyphen d.
00:43 To follow this tutorial, you should be familiar with:

GeoGebra interface, Statistics

00:51 Statistics deals with

Data analysis and interpretation.

00:56 Measures of central tendency.
00:59 Measures of Dispersion.
01:02 Comparing variability of data series.
01:06 Please refer to additional material provided along with this tutorial.
01:12 Fish Feed
01:14 Let us look at an example.
01:17 A fishery is testing four types of feed formulations on its fish: A, B, C and D.
01:26 Data to be collected after feeding the fish for 6 months are:

Length in millimeters, Weight in pounds, Girth in millimeters.

01:39 Let us look at some of these data.
01:42 Fish Feed Data
01:44 We will use these data for our analyses.
01:49 Please download the code file, Fishery-data, provided along with this tutorial.
01:57 I have opened the GeoGebra interface.
02:01 Click on View tool and select Spreadsheet.
02:07 Click on X at top right corner of Graphics and Algebra views.

This will close these views.

02:17 In the code file, drag mouse to highlight length and weight data from columns H and I.
02:26 These are data for fish that have been fed formulation C.
02:32 Hold Control key down and press C.
02:36 Click in the top of the Spreadsheet in GeoGebra.
02:41 Hold Control key down and press V.
02:45 This will copy and paste the highlighted data from the code file into GeoGebra.
02:52 Place the cursor on the first column header in Spreadsheet view.
02:58 Drag and adjust column A's width.
03:02 Right-click on column A heading of Length millimetres.
03:08 Select Object Properties.
03:11 A dialog box opens.
03:14 Click on Text tab and change the name to Length millimetres hyphen C.
03:23 Close the dialog box.
03:26 Similarly, add hyphen C to Weight pounds.
03:35 Adjust column B width.
03:38 Click on column A heading of Length millimetres C.
03:44 Drag to highlight length data in Spreadsheet view.
03:49 Below the menubar, click on One Variable Analysis tool.
03:55 A Data Source popup window appears.

Click on Analyze button.

04:02 A Data Analysis window appears.
04:06 By default, a histogram is plotted.
04:10 Drag the boundary to see the graph properly.
04:14 The length is plotted on the x-axis.
04:18 The number of fish that are of a particular length, the frequency, is plotted on the y-axis.
04:26 Note the display box above the graph containing the word Histogram.
04:32 In the display box, click on the dropdown menu button to display the list of plots.
04:39 We will stay with the histogram option.
04:43 To the right of the dropdown menu is a slider.
04:48 Drag the slider from left to right to go to 20.
04:53 The slider changes the number of rectangles between the minimum and maximum values of data.
05:01 Click on Options button to the right of the slider.
05:06 Under Classes, check Set Classes Manually check box.
05:12 This displays Start and Width text-boxes to the left of the Options button.
05:19 As all the fish are over 800 milimeters long, type 800 in the Start text-box and press Enter.
05:29 We will stay with the default value of 5 for rectangle width.
05:35 Uncheck Set Classes Manually check box.
05:39 Under Show, uncheck Histogram check box to make it disappear.
05:45 Scroll down and check Frequency Polygon to show it.
05:51 Under Frequency Type, check Cumulative option.
05:56 The default Count selection shows the cumulative frequency count for the data.
06:03 Drag the slider and note the effects on smoothness of the cumulative frequency count curve.
06:11 We will drag the slider back to 20.
06:15 Under Frequency Type, uncheck Cumulative and under Show, uncheck Frequency Polygon.
06:24 Under Show, check Histogram .

And click on Options button again to hide the window.

06:33 Above the Histogram text-box, click on the third Show Data tool button.
06:40 This displays all the data highlighted in the Spreadsheet.
06:45 Drag the boundary to see the data properly.
06:49 Click on the Show Data tool again to hide the list.
06:55 Above the Histogram text-box, click on the last Show 2nd Plot tool button.
07:02 The same data are graphed in two vertically placed plots.
07:07 You can select plot types from the dropdown menu button above each plot.
07:14 Above the Histogram text box, click on the second Show Statistics tool button.
07:22 Statistics for the plot appears as a panel in the middle.
07:27 Drag the boundary to see it properly.
07:31 Box Plot

Box plot is a standardized way of showing data, based on the five number summary.

07:41 Let us compare histogram and box plot.
07:45 In the box plot, locate the Median, Min, Max, Q1 and Q3 values.
07:57 Above each plot, in the upper right corner, click on the button next to Options.
08:06 A dropdown menu appears with which you can copy each plot to Clipboard or export it as an image.
08:15 Click on Show Statistics tool button to hide the data.
08:21 Close the Data Analysis window.
08:25 Least Squares Linear Regression (LSLR)
08:30 Changing an independent variable x changes the dependent variable y.
08:36 LSLR predicts y based on x value.
08:41 Least Squares Regression Line (LSRL) is also called the best fit line.
08:49 It is given by y = b0 + b1x.
08:55 b1, the slope, is the regression coefficient.
09:00 Coefficient of determination R squared
09:04 R squared ranges from 0 to 1.
09:08 The closer R squared is to 1, the better is the prediction of variance in y from x.
09:16 Let us go back to the length and weight data in the Spreadsheet view in GeoGebra.
09:23 Drag and select all the data in both columns.
09:29 Under One Variable Analysis, click on Two Variable Regression Analysis tool.
09:36 In the Data Source window that pops up, click Analyze button.
09:41 A Data Analysis window appears with two plots.
09:46 By default, the upper plot is a Scatterplot and the lower a Residual plot.
09:54 Click on Show Statistics tool to see the Statistics.
10:00 Drag the boundary to see them properly.
10:04 Below the Statistics window, click on the Regression Model menu button and select Linear.
10:13 Note the red line in the Scatterplot.
10:17 This is the best fit line that passes through as many points as possible.
10:23 Its equation is given in red at the bottom.
10:28 This R squared value indicates good fit between the model and the actual data.
10:36 Select other regression models to see effects on the R squared value.
10:43 The lower plot is the Residual Plot.
10:47 Residuals are the differences between observed and predicted values of all points.
10:54 Above the Statistics window, click on the last Switch Axes button.
11:00 For the scatterplot, length is now plotted along y axis and weight along x axis.
11:08 Observe that the best fit line and many statistics change.
11:13 Its equation is now y= 9.91x + 684.3.
11:22 The only statistics that remain the same are r, R squared and rho (ρ).
11:31 Note that r and rho are greater than 0.8, indicating positive correlation.
11:39 Weight increases as length increases for fish given feed C.
11:46 The relationship is strong and well predicted by the best fit lines.
11:52 Again, click on Switch Axes button.
11:56 At the bottom, in Symbolic Evaluation, you can enter a value for x to get a prediction for y.
12:04 To get logical predictions, we will enter x values above the x intercept.
12:11 In Symbolic Evaluation, in the text-box for x, type 800 and press Enter.
12:19 Note that a y value appears next to the display box.
12:25 The x value was substituted in the best fit line equation to get the y value.
12:32 Again, click on Show Statistics tool button.
12:37 Close the Data Analysis window.
12:40 Let’s go back to the length and weight data in the Spreadsheet.
12:45 In the Spreadsheet, select all the data in both columns.
12:51 Under One Variable Analysis, click on Multiple Variable Analysis tool.
12:59 In the Data Source window that pops up, click Analyze button.
13:04 Box Plots appear in the window.
13:07 They are for length and weight data.
13:11 Above the plot, click on the second Show Statistics tool.
13:17 Statistics for both plots appear below.
13:21 Place the cursor on the boundary between the plot and statistics.
13:27 When the arrow appears, drag the boundary to resize the windows.
13:34 Let us summarize.
13:36 In this tutorial, we have learnt how to use GeoGebra to perform:
13:41 One Variable Analysis to calculate different statistical parameters.
13:47 Two Variable Regression Analysis to estimate best fit line.
13:52 Multiple Variable Analysis to calculate different statistical parameters.
13:58 Assignment

Perform statistical analyses for weight and girth data given in this tutorial.

14:07 Four oils were used to deep fry chips.
14:11 Amount of absorbed fat was measured for 6 chips fried in 4 oils.
14:19 Is any of the oils absorbed more than the others?
14:24 The video at the following link summarizes the Spoken Tutorial project.

Please download and watch it.

14:32 The Spoken Tutorial Project team conducts workshops and gives certificates.

For more details, please write to us.

14:42 Please post your timed queries on this forum.
14:46 Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

More information on this mission is available at this link.

14:59 This is Vidhya Iyer from IIT Bombay, signing off.

Thank you for joining.

Contributors and Content Editors

PoojaMoolya