Applications-of-GeoGebra/C3/Statistics-using-GeoGebra/English

From Script | Spoken-Tutorial
Revision as of 11:39, 21 January 2019 by Snehalathak (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Visual Cue Narration
Slide Number 1

Title Slide

Welcome to this tutorial on Statistics using GeoGebra.
Slide Number 2

Learning Objectives

In this tutorial, we will learn how to use GeoGebra to perform:

One Variable Analysis to calculate different statistical parameters.

Two Variable Regression Analysis to estimate best fit line.

Multiple Variable Analysis to calculate different statistical parameters.

Slide Number 3

System Requirement

Here I am using:

Ubuntu Linux Operating System version 16.04.

GeoGebra 5.0.481.0-d.

Slide Number 4

Pre-requisites

To follow this tutorial, you should be familiar with:

GeoGebra interface.

Statistics

Slide Number 5

Statistics

Data analysis and interpretation.

Measures of central tendency

Measures of Dispersion.

Comparing variability of data series

Additional material

Statistics deals with

Data analysis and interpretation.

Measures of central tendency.

Measures of Dispersion.

Comparing variability of data series.

Please refer to additional material provided along with this tutorial.

Slide Number 6

Fish Feed

A fishery is testing four feed formulations on its fish: A, B, C and D

Length (mm)

Weight (lbs)

Girth (mm)

Fish Feed

Let us look at an example.

A fishery is testing four types of feed formulations on its fish: A, B, C and D.

Data to be collected after feeding the fish for 6 months are:

Length in millimeters.

Weight in pounds.

Girth in millimeters.

Let us look at some of these data.

Slide Number 7

Fish Feed Data

Fish Feed Data

We will use these data for our analyses.

Please download the code file, Fishery-data, provided along with this tutorial.

Show the GeoGebra window. I have opened the GeoGebra interface.
Click on View tool >> select Spreadsheet. Click on View tool and select Spreadsheet.
Click on X at top right corner of Graphics, Algebra views. Click on X at top right corner of Graphics and Algebra views.

This will close these views.

In the code file, drag mouse to highlight length and weight data from columns H and I.

Show data in columns H and I.

Hold Ctrl key down and press C.

In the code file, drag mouse to highlight length and weight data from columns H and I.

These are data for fish that have been fed formulation C.

Hold Control key down and press C.

Click on Spreadsheet view in GeoGebra. Click in the top of the Spreadsheet in GeoGebra.
Press Ctrl key down, press V. Hold Control key down and press V.

This will copy and paste the highlighted data from the code file into GeoGebra.

Place the cursor on the first column header in Spreadsheet view.

Drag and adjust column A's width.

Place the cursor on the first column header in Spreadsheet view.

Drag and adjust column A's width.

Right-click on column A heading of Length (mm).

Select Object Properties.

Point to dialog box.

Right-click on column A heading of Length millimetres.

Select Object Properties.

A dialog box opens.

Click on Text tab and change the name to Length (mm)-C.

Close the dialog box.

Similarly, add –C to Weight (lbs).

Click on Text tab and change the name to Length millimetres hyphen C.

Close the dialog box.

Similarly, add hyphen C to Weight pounds.

Adjust column B width. Adjust column B width.
Use mouse to drag and highlight first column A’s length data and label in GeoGebra. Click on column A heading of Length millimetres C.

Drag to highlight length data in Spreadsheet view.

Below the menubar, click on One Variable Analysis tool.

Point to Data Source popup window.

Click on Analyze button.

Below the menubar, click on One Variable Analysis tool.

A Data Source popup window appears.

Click on Analyze button.

Point to Data Analysis window and histogram. A Data Analysis window appears.

By default, a histogram is plotted.

Drag the boundary to see the graph properly. Drag the boundary to see the graph properly.
Point to length on the x-axis and frequency on the y-axis. The length is plotted on the x-axis.

The number of fish that are of a particular length, the frequency, is plotted on the y-axis.

Point to the display box above the graph containing the word Histogram. Note the display box above the graph containing the word Histogram.
In the display box, click on the dropdown menu button. In the display box, click on the dropdown menu button to display the list of plots.
Select Histogram.

Point to slider to the right of the display.

We will stay with the histogram option.

To the right of the dropdown menu is a slider.

Drag the slider from left to right to go to 20. Drag the slider from left to right to go to 20.
Point to rectangles between minimum and maximum values of data. The slider changes the number of rectangles between the minimum and maximum values of data.
Click on Options button to the right of the slider. Click on Options button to the right of the slider.
Under Classes, check Set Classes Manually. Under Classes, check Set Classes Manually check box.

This displays Start and Width text-boxes to the left of the Options button.

Type 800 in the Start text-box and press Enter.

Show the value of 5 in the Width text-box.

As all the fish are over 800 mm long, type 800 in the Start text-box and press Enter.

We will stay with the default value of 5 for rectangle width.

Uncheck Set Classes Manually. Uncheck Set Classes Manually check box.
Under Show, uncheck Histogram check box. Under Show, uncheck Histogram check box to make it disappear.
Scroll down and check Frequency Polygon to show it. Scroll down and check Frequency Polygon to show it.
Check Cumulative option as the Frequency Type. Under Frequency Type, check Cumulative option.
Point to default Count selection.

Point to the cumulative frequency count.

The default Count selection shows the cumulative frequency count for the data.
Drag slider, bring it back to 20. Drag the slider and note the effects on smoothness of the cumulative frequency count curve.

We will drag the slider back to 20.

Under Frequency Type, uncheck Cumulative and under Show, uncheck Frequency Polygon. Under Frequency Type, uncheck Cumulative and under Show, uncheck Frequency Polygon.
Under Show, check Histogram and uncheck Frequency Polygon > > click on Options button again to hide the window. Under Show, check Histogram .

And click on Options button again to hide the window.

Click on Show Data tool >> point to data highlighted in the Spreadsheet. Above the Histogram text-box, click on the third Show Data tool button.

This displays all the data highlighted in the Spreadsheet.

Drag the boundary to see the data properly. Drag the boundary to see the data properly.
Click on Show Data tool again to hide the list. Click on the Show Data tool again to hide the list.
Click on Show 2nd Plot tool. Above the Histogram text-box, click on the last Show 2nd Plot tool button.
Select histogram for top plot and box plot for bottom plot. The same data are graphed in two vertically placed plots.

You can select plot types from the dropdown menu button above each plot.

Click on Show Statistics tool.

Point to Statistics for both plots.

Above the Histogram text-box, click on the second Show Statistics tool button.

Statistics for the plot appears as a panel in the middle.

Drag the boundary to see it properly. Drag the boundary to see it properly.
Slide Number 8

Box Plot

Box Plot

Box plot is a standardized way of showing data, based on the five number summary.

Click and point to Median, Min, Max, Q1 and Q3 values in the box plot. Let us compare histogram and box plot.

In the box plot, locate the Median, Min, Max, Q1 and Q3 values.

Click on the button next to Options button above the plot. Above each plot, in the upper right corner, click on the button next to Options.

A dropdown menu appears with which you can copy each plot to Clipboard or export it as an image.

Click on Show Statistics tool button to hide the data. Click on Show Statistics tool button to hide the data.
Close the Data Analysis window. Close the Data Analysis window.
Slide Number 9

Least Squares Linear Regression (LSLR)

Changing an independent variable x changes the dependent variable y.

LSLR predicts y based on x value.

LSRL (best fit line) y = b0 + b1x

Coefficient of determination R2

Least Squares Linear Regression (LSLR)

Changing an independent variable x changes the dependent variable y.

LSLR predicts y based on x value.

Least Squares Regression Line (LSRL) is also called the best fit line.

It is given by y = b0 + b1x.

b1, the slope, is the regression coefficient.

Coefficient of determination R squared

R squared ranges from 0 to 1.

The closer R squared is to 1, the better is the prediction of variance in y from x.

Show length and weight data in the Spreadsheet in the GeoGebra. Let us go back to the length and weight data in the Spreadsheet view in GeoGebra.
Drag mouse to highlight all labels and data in the two columns. Drag and select all the data in both columns.
Under One Variable Analysis, click on Two Variable Regression Analysis tool. Under One Variable Analysis, click on Two Variable Regression Analysis tool.
Click Analyze button in the Data Source window that pops up. In the Data Source window that pops up, click Analyze button.
Data Analysis window appears. A Data Analysis window appears with two plots.
Show both plots. By default, the upper plot is a Scatterplot and the lower a Residual plot.
Click on Show Statistics tool to see Statistics. Click on Show Statistics tool to see the Statistics.
Drag the boundary to see them properly. Drag the boundary to see them properly.
Below Statistics window, click on the Regression Model menu button >> select Linear. Below the Statistics window, click on the Regression Model menu button and select Linear.
Point to the red line that is drawn through some points. Note the red line in the Scatterplot.
Point to equation is given in red, y= 0.08x-48.39. This is the best fit line that passes through as many points as possible.

Its equation is given in red at the bottom.

Point to R2 value of 0.7722. This R squared value indicates good fit between the model and the actual data.
Select other regression models to see effects on R2. Select other regression models to see effects on the R squared value.
Point to the lower Residual Plot. The lower plot is the Residual Plot.

Residuals are the differences between observed and predicted values of all points.

Click on Switch Axes button. Above the Statistics window, click on the last Switch Axes button.
Point to length now plotted on y-axis and weight on x-axis. For the scatterplot, length is now plotted along y-axis and weight along x-axis.
Point to the best fit line and statistics.

Point to equation y= 9.91x + 684.3.

Observe that the best fit line and many statistics change.

Its equation is now y= 9.91x + 684.3.

Point to r, R2 and rho (ρ). The only statistics that remain the same are r, R squared and rho (ρ).

Note that r and rho are greater than 0.8, indicating positive correlation.

Weight increases as length increases for fish given feed C.

The relationship is strong and well predicted by the best fit lines.

Click on Switch Axes button. Again, click on Switch Axes button.
Point to Symbolic Evaluation at the bottom. At the bottom, in Symbolic Evaluation, you can enter a value for x to get a prediction for y.
Point at the line in the Scatterplot. To get logical predictions, we will enter x values above the x-intercept.
In Symbolic Evaluation, type in a value for x >> press Enter. In Symbolic Evaluation, in the text-box for x, type 800 and press Enter.
Point to y value appearing next to the display box. Note that a y value appears next to the display box.

The x value was substituted in the best fit line equation to get the y value.

Click on Show Statistics tool button. Again, click on Show Statistics tool button.
Close the Data Analysis window. Close the Data Analysis window.
Point to length and weight data in the Spreadsheet. Let’s go back to the length and weight data in the Spreadsheet.
Drag mouse to highlight all labels and data in the two columns. In the Spreadsheet, select all the data in both columns.
Under One Variable Analysis, click on Multiple Variable Analysis tool. Under One Variable Analysis, click on Multiple Variable Analysis tool.
Click Analyze button in the Data Source window that pops up. In the Data Source window that pops up, click Analyze button.
Point to Box Plots in the window and to the cell numbers in each row. Box Plots appear in the window.

They are for length and weight data.

Click on Show Statistics tool.

Point to Statistics for both plots.

Above the plot, click on the second Show Statistics tool.

Statistics for both plots appear below.

Place the cursor on the boundary between the plot and statistics.

When the arrow appears, drag the boundary to resize the windows.

Place the cursor on the boundary between the plot and statistics.

When the arrow appears, drag the boundary to resize the windows.

Let us summarize.
Slide Number 10

Summary

In this tutorial, we have learnt how to use GeoGebra to perform:

One Variable Analysis to calculate different statistical parameters.

Two Variable Regression Analysis to estimate best fit line.

Multiple Variable Analysis to calculate different statistical parameters.

Slide Number 11

Assignment

Perform statistical analyses for weight and girth data.

Is any of the oils absorbed more than the others?

Assignment

Perform statistical analyses for weight and girth data given in this tutorial.

Four oils were used to deep fry chips.

Amount of absorbed fat was measured for 6 chips fried in 4 oils.

Is any of the oils absorbed more than the others?

Slide Number 12

About Spoken Tutorial project

The video at the following link summarizes the Spoken Tutorial project.

Please download and watch it.

Slide Number 13

Spoken Tutorial workshops

The Spoken Tutorial Project team conducts workshops and gives certificates.

For more details, please write to us.

Slide Number 14

Forum for specific questions:

Do you have questions in THIS Spoken Tutorial?

Please visit this site

Choose the minute and second where you have the question

Explain your question briefly

Someone from our team will answer them

Please post your timed queries on this forum.
Slide Number 15

Acknowledgement

Spoken Tutorial Project is funded by NMEICT, MHRD, Government of India.

More information on this mission is available at this link.

This is Vidhya Iyer from IIT Bombay, signing off.

Thank you for joining.

Contributors and Content Editors

Madhurig, Snehalathak, Vidhya