Gnuplot/C2/Statistics-and-box-plot/English

From Script | Spoken-Tutorial
Revision as of 13:41, 7 February 2020 by Ranipv076 (Talk | contribs)

Jump to: navigation, search
Visual Cue Narration
Slide Number 1

Title Slide

Welcome to the tutorial on Statistics and Box Plot.
Slide Number 2

Learning Objectives

In this tutorial, we will
  • Plot string data on x-axis
  • Calculate Statistical summary for the input file
  • Draw candlestick plot
  • Draw boxplot with outliers and without outliers
Slide Number 3

Learning Objectives

and
  • Specify the position of boxplot on x-axis
Slide Number 4

System and Software Requirement

To record this tutorial, I am using
  • Ubuntu Linux version 16.04 OS
  • gnuplot version 5.2.6
  • Gedit version 3.18
Slide Number 5

Pre-requisites

To follow this tutorial,
  • Learner must be familiar with the basics of gnuplot.
  • For pre-requisite gnuplot tutorials, please visit this site.
Slide Number 6

Code FIles

The files used in this tutorial are provided in the Code files link.

Please download and extract them.

Go to Desktop.

Show file icon statistics.txt on Desktop.

I have saved the input file on Desktop.
Screenshot of file opened in gedit. The input file statistics.txt consists of 3 columns.
Hover mouse over first column.

House mouse over second column.

Hover mouse over third column.

The first column is row number.

The second column is y data.

The third column is the x data, which has the corresponding string or names.

Press Ctrl+Alt+T. Open a terminal.
Enter the command, cd Desktop . Change the directory to Desktop.
Enter the command gnuplot . Let’s open gnuplot.
Press Ctrl+L . I will clear the screen.

Let's draw an xy plot labeling the x axis with strings.

Enter the command, set xtics rotate . Enter the commands as seen on the screen.

I will rotate the x tics labels by 90 degrees.

Enter the command, set autoscale . Use the command set autoscale to set the axis range to autoscale.
Enter the command, plot "statistics.txt" using 2:xticlabels(3) . The next command plots the string x data against y to make a 2D plot.
Hover mouse on x tics strings and graph. Notice the graphical plot of x string data against the numeric y axis data.

An example of string data is teacher plotting marks against student names.

Cursor on the graphics window. Analysis of such data, often involves, calculation of statistical parameters.
Close the graphics window. Close the graphics window.
Go to the terminal. Go to the terminal and type a command as seen on the screen.
Type stats "staticstics.txt" using 2 and press Enter. The command stats, filename using column number, calculates statistics.
Output is seen on the screen. A statistical summary is generated on the screen.
Scroll up the page. Let's scroll up.
Hover mouse over 61. This shows, the file has 61 data points.
Hover mouse over std dev.

Hover mouse over Sum. Hover mouse over max and min values.

The output shows the statistical summary for the input file.

Mean, standard deviation & sum of squares are seen.

Minimum values & maximum values are also seen.

Median and quartile range is generated on the screen.

Hover mouse next to quartile range again. We can plot the statistical analysis using candlestick plot or box plot.

This is useful for descriptive or informative analysis.

The height of the box can correspond to either standard deviation or quartile range.

Open gedit. We will create a datafile to make candlestick plot with this data.

Open a gedit window and enter the values as seen here.

Type #candlestick plot style and start a new line. I will make a comment on the first row.

This Indicates the data is for the candlestick plot style.

Type,

#x-position tab (mean-stddev) tab y-min tab y-max tab (mean+stddev) tab with candlesticks . Press Enter.

I will also include the data format for this plot, with tab separation.

For further information on candlestick plot, use the gnuplot help section.

Type 1, Press tab, type 300 . In the next line, enter the values for plotting.

The first column is an arbitrary x value.

The candlestick data will be plotted on this x position.

300 is the value of mean minus the standard deviation.

This defines the lower limit of the box.

Press tab, type 119, press tab, type 2965. Y minimum in the data is 119.

The fourth input is the y max value and it is 2965.

Press tab, type 1582. Enter the value of mean plus standard deviation as 1582 .
Save file in Desktop directory.

Give filename candlestick.dat.

Save the file on Desktop with the file name candlestick.dat.
Click on save. Click on the save button to save the script.
Close Gedit. I will close gedit.
Go to gnuplot. Go back to the terminal.
Press Ctrl+L. I will also clear the screen.
Type,

set xrange [0.97:1.03] and press Enter.

Let’s also set x axis limits with set xrange command as seen.
Enter the command, plot 'candlestick.dat' using 1:2:3:4:5 with candlesticks . Make a plot with the command as seen on the screen.
Cursor on the graphics window. The candlestick plot appears on the graphics window.

Often, we want to plot, outliers and quartile range for box height in the graph.

Close the graphics window. Let us see how to do this.

Close the graphics window.

Enter the command, set autoscale . In gnuplot prompt, set autoscale for axis range.
Enter the command, set style data boxplot. Set the box plot style for graph as seen on the screen.

This command, sets the plot style to boxplot.

Enter the command, set style boxplot outliers. The next command, plots the boxplot with outliers.
Enter the command, set style fill solid 0.5 border -1 . Set a solid style color fill for the box.
Enter the command, plot 'statistics.txt' using (0):2 ls 1 notitle . Type the plot command as seen on the screen.

The plot will be set in x axis position zero.

Graphics screen shows. In the graph, notice the outliers are also plotted.

Outliers are data points, beyond the quartile range of the data set.

Type,

set style boxplot nooutliers and press Enter.

If the outliers are not to be plotted, do the following.

Go to the gnuplot terminal and enter the command as seen on the screen.

Type

replot and press Enter.

Replot to see the results.
Close graph.

Enter q to quit gnuplot.

Close the graphics window and quit gnuplot.
Slide Number 7

Summary

Now let's summarize.

In this tutorial, we

  • Plotted string data on x-axis
  • Calculated statistical summary with stats command
  • Generated candlestick plot
  • Generated box plot with and without outliers and
Slide Number 8

Summary

* Specified the x-axis position for the box plot
Slide Number 9

Assignment 1 http://gnuplot.sourceforge.net/demo_5.2/

For the assignment activity, please do the following.

Practice box plot with and without outliers for the file boxplot.txt.

Practice and understand example boxplot styles from gnuplot website.

Slide Number 10

Assignment 2 Draw a time-activity bar chart

We will do one more assignment.
  • Draw a Time-activity graph for your daily activities
  • For this, time your activities in a day and make a time table.
  • Plot time in hours on y-axis and activities on x-axis.
Glimpse of assignment. Your assignment may look similar to this.
Slide Number 11

Spoken Tutorial Project

This video summarises the Spoken Tutorial Project.

Please download and watch it.

Slide Number 12

Spoken Tutorial workshops

The Spoken Tutorial Team
  • conducts workshops and
  • gives certificates.

For more details, please write to us.

Slide Number 13

Forum for specific questions:

Please post your timed queries in the forum.
Slide Number 14

Acknowledgement

Spoken Tutorial Project is funded by MHRD, Government of India.
This is Rani from IIT, Bombay. Thank you for joining.

Contributors and Content Editors

Madhurig, Ranipv076, Snehalathak