Gnuplot/C2/Run-gnuplot-from-Perl-script/English

From Script | Spoken-Tutorial
Revision as of 17:08, 27 May 2020 by Madhurig (Talk | contribs)

Jump to: navigation, search
Visual Cue Narration
Slide Number 1

Title Slide

Welcome to the tutorial on Run gnuplot from Perl Script.
Slide Number 2

Learning Objectives

In this tutorial we will learn to generate,
  • Statistical summary from gnuplot for datasets using a Perl script
  • Files containing only outliers for labeling box plot
  • A gnuplot graph, with multiple box plots with labeled outliers
Slide Number 3

System and Software Requirement

To record this tutorial, I am using* Ubuntu Linux v 16.04 OS
  • gnuplot v 5.2.6
  • Perl v 5.22 and
  • Gedit v 3.18
Slide Number 4

Pre-requisites

https://spoken-tutorial.org

To follow this tutorial,
  • Learner must be familiar with the basics of gnuplot and Perl.
  • For prerequisite gnuplot and Perl tutorials, please visit this site.
Slide Number 5

Code Files

The files used in this tutorial are provided in the Code files.

Please download and extract the files.

Make a copy and use them while practising.

Slide Number 6

Download Link

https://www.perl.org/get.html

The link to download link for Perl is shown here.
Go to Desktop. I have saved the input files on the Desktop.

Let’s open stat1.txt in any text editor. I will open in gedit.

Show screenshot of the file stat1.txt in gedit. Each file consists of 3 columns.

First column is a row number.

The second column is y-data and third column is x-data.

Close gedit text editor. Let’s close the text editor.
Press Ctrl+Alt+T . We will write a script to generate box plots using Perl script.

Open a terminal.

Type cd Desktop and press Enter. Change the directory to Desktop.
Type gedit stats.pl & and press Enter. Type gedit space stats.pl space & and press Enter.

Default extension of a Perl file is pl.

This opens the stats.pl file in gedit text editor.

You may use the text editor of your choice.

Type, #!/usr/bin/perl and press Enter. Let’s write a Perl script in the file stats.pl.

First line is the header and is essential for a Perl program.

It is called a shebang line.

Type,

use strict ; use warnings; and press Enter.

Enter commands as seen on the screen.

I will use the strict mode here.

Use warnings will generate warnings, if there are any errors in the program.

Note that the commands end with a semi colon.

Type,

my @filelist;

my $statfile;

my $label ;

my $upperlimit ;

my @dataarray ;

my @key ;

my @value ;

my @quarts ;

my $PROGRAM ;

my $GP ;

my $add;

my $linearray;

and press Enter.

I will also declare the variables that will be used later.

The role of each variable will be explained as and when they appear.

The @ sign indicates that the variable is an array.

Type

@filelist=('stat1.txt', 'stat2.txt', 'stat3.txt' ) ;

and press Enter.

Here I am defining an array for the input filelist, with three input file names.


Type,

foreach my $n (@filelist) { and press Enter.

I will add a foreach loop to generate statistical summary from gnuplot.
Type, $statfile = $n.".stat" ;

and press Enter.

Enter the commands as seen on the screen.
Cursor next to variable $statfile. Define a variable named statfile for a new file name.


This file will store the statistical data generated by gnuplot stats command.

Type,

open my $PROGRAM, '|-', 'gnuplot'

or die "Couldn't pipe to gnuplot: $!";

and press Enter.

Next, pipe to gnuplot to run the gnuplot commands.

If Perl is not able to access gnuplot, it will give the warning message as seen.

Type,

say {$PROGRAM} "set print '$statfile' "; and press Enter.

Set the name of the output file for statistical summary as seen.
Type,

say {$PROGRAM} "stats '$n' using 2 "; close $PROGRAM; } and press Enter.

Type the stats command to generate statistical summary.


The foreach loop runs the stats command for all input files in gnuplot.

Then, close the for loop.

Press Ctrl+S, minimize gedit.

Go to terminal.

Save the file, minimize gedit and go back to the terminal.
Type, chmod u+x stats.pl and press Enter. Use chmod command to change the mode of the script file to executable form.
Type, ./stats.pl and press Enter. Let’s run the script as seen here.
Type ls and press Enter.

Add in video type dir for windows users.

Type ls to list the files that are in the directory.
Hover mouse next to the newly created .stat files. Notice that the statistical summary for all three input files are created.

The newly created files have dot stat extension.

I will clear the screen for clarity.

Type, gedit stat1.txt.stat and press Enter. I will open the statistics file, stat1.txt.stat in a text editor.

Enter the command as seen here.

Scroll down to lower and upper quartile region. Lower and upper quartile values are used to set the box height, in the boxplot.


Lower and upper quartile values are displayed here.

Go to stats.pl file.

Click on the X button to close the file.

Let’s extract these values from the file and find the box plot upper limit.

let us close the text editor.

Go back to the script file in the text editor, to edit it.

Type,

foreach my $n (@filelist)

{

and press Enter.

Add a foreach loop to read the quartiles from the stat file and store the values.


Use one more foreach loop to extract the quartiles and calculate the box range.

Type, $statfile = $n.".stat" ;

and press Enter.

Here, the inputs are the statistical summary files that were generated.
Type,

open (Statisticsfile , $statfile) ; and press Enter.

The Open command, opens the generated stat file.
Type,

while (my $limit = <Statisticsfile>) {

chomp $limit ; and press Enter.

Open the statistical summary file, using while loop.

let's delete the new line character at the end of the line with chomp command.


Type,

@dataarray = split(/\t/,$limit); } and press Enter.

Let’s split each line by tab delimiter.

Store the split data in an array with the push command.

Close the while loop.

Type,

push(@key,$dataarray[0]);

push(@value,$dataarray[1]);

} and press Enter.

Here we are creating two arrays named as key and value.
Type,

$upperlimit = $value[21] + 1.5 * ($value[21]-$value[19]) ; and press Enter.

This line calculates the upper boxplot limit used by gnuplot.

We are using the values generated in the statistical summary file for this.

Show the statistical summary file, stat1.txt.stat .


Hover mouse next to quartile range lines.

The upper quartile value is shown in line 22 of the statistical summary file.

The Lower quartile value is seen in line 20 of the file.

Highlight and hover mouse next to quartile range lines. Learner must go through their statistical summary files.

User must find the relevant line numbers for the quartile range in your file.

Slide Number

Line Number in Statistical Summary File.

For example, if are using gnuplot 5.2.3,

The values are generated in line numbers 19 and 21.

Cursor in stats.pl file Come back to stats.pl file.
Type, push(@quarts,$upperlimit);

and press Enter.

I will append the value of the upperlimit, to an array named as quarts.

The push command stores the variables into an array.

We are interested in labeling values, which are above the box range.

Cursor on the gedit window.

highlight the lines.

$upperlimit = $value[21] + 1.5 * ($value[21]-$value[19]) ;

Array numbering start from zero and line numbers start from one.

Hence I have used 19 and 21. instead of 20 and 22 to calculate upper box limit.

Cursor on the gedit window. Next, we will filter the outliers and write them to a file.
Type, open (Fileoutlier , $n) ; and press Enter. Here, we are opening the input datafile.
Type, while (my $add = <Fileoutlier> ) {

and press Enter.

A while loop is used to read the lines of the input data file.
Type, chomp $add ;

and press Enter.

Let’s delete the new line character at the end of the line with chomp command.
Type, if ($add =~ /^$/ or $add =~ /^#/) {'next; }' and press Enter. This if command allows blank and lines starting with hash to be ignored.
Type,

my @linearray = split (/\t/,$add); and press Enter.

Now, split the line with tab delimiter.

I will also define an array for each line in the input file.

Type,

$label= $n.".label"; and press Enter.

Define a new file name with the suffix label, to save the outlier data.
Type,

open (OUT4,'>>', $label) ; and press Enter.

Open the file to write the values, with the open command.

Make sure to open the file in append mode, to write all the outlier data points.

Type,

if ($linearray[1] > $upperlimit) { and press Enter.

Open an if loop, to test if the y value is greater than the box upper limit.
Type,

print OUT4 "$linearray[0] \t $linearray[1] \t $linearray[2] \n " ; } and press Enter.

If the y-value is higher, write the data to the outfile with the print command.

Close the if loop braces.

Cursor on the > sign in the if statement. To print the lower outliers, the condition has to be added in the if statement.

I will not print the outliers below the lower box limit here.

Type,

} } and press Enter.

Close the while loop, and close the foreach loop.
Cursor on the gedit window. Do not forget to clear the array, before reading each new statistics file.
Go to the end of the line of, foreach my $n (@filelist)

{ $statfile = $n.".stat" ;

Go to the line before opening the statistics files as seen on the screen.
Press Enter to start a new line. Start a new line.
Type,

@dataarray = (); @key = (); @value = (); and press Enter.

Enter the commands as seen to clear the three arrays.
Type,

{ open(GP, "| gnuplot") or die "Error while piping to Gnuplot: $! \n"; print GP << "Gnuplot-end"; and press Enter.

Now, pipe to gnuplot again to generate the box plots and outlier labels.
Type,

set term svg set output 'boxplot-labeled.svg' and press Enter.

I will set the terminal to svg format.


Specify a filename for the output file as seen.

Type,

set font "times,5" and press Enter.

Set the font style to times and also the size.
Type,

set xtics ("stat1" 1, "stat2" 2, "stat3" 3) and press Enter.

Specify the xtics label with the file name, at 1, 2 and 3.
Type,

set style data boxplot set xrange[0:4] and press Enter.

Set the plot style to boxplot.

I will set x-axis range from 0 to 4.

Type,

plot 'stat1.txt' using (1):2 notitle, 'stat1.txt.label' using (1):2:3 with labels notitle offset 2,0, 'stat2.txt' using (2):2 notitle, "stat2.txt.label" using (2):2:3 with labels notitle offset 2,0, 'stat3.txt' using (3):2 notitle, 'stat3.txt.label' using (3):2:3 with labels notitle offset 3,0

and press Enter.

Issue the plot command to plot the boxplot style data for the 3 input files.

Add the labels using the newly generated files in the commands.

The three input files and the 3 outlier files are plotted here.

Hover mouse over 'stat1.txt.label' u (1):2:3 with labels notitle offset 2,0 This part of the command, adds the labels from the outlier file.


The offset gives a gap between the symbol and label.

Cursor on offset and with labels part. This increases the clarity in the output file.
Type,

Gnuplot-end close (GP) ; } and press Enter.

Close the program with the following command.
Press Ctrl+S.

Go to the terminal.

Type, ./stats.pl and press Enter.

Type ls

Save the file and go back to the terminal.

Type the command as seen here to run the Perl script.

Type ls to list the commands files that are in the directory.

Hover mouse next to svg file on Desktop. A new svg type image file for the graph is generated.

Since the path is Desktop, the output file is generated in the Desktop directory.

Open the image file. Open the svg file and view the output.
Cursor on the image. You may modify the script, to generate the choice of graph.
Close the image file. I will close the output svg file.
Slide Number 8

Other Programming Languages

  • You may use C, Java, python or another language of your choice to program.
  • If so, write the program accordingly.
  • You may also use dataset of your choice.
Slide Number 6

Summary

Now let's summarize.

In this tutorial, we generated,

  • Statistical summary from gnuplot using Perl script
  • Datafiles containing only outlier points in box plot
  • Plotted multiple box plots in a graphical window
Slide Number 6

Summary

  • Specified the position of each box plot on x-axis
  • Labeled the outliers in the box plot
Slide Number 7

Assignment

For assignment activity, please do the following.
  • Three files are provided in the Codes files link for assignment.
  • Using the files generate box plot and label the outliers in the graph.

Modify the script to incorporate the correct file names.

Show screenshot of assignment. Your assignment may look similar to this.
Slide Number 9

Spoken Tutorial Project

This video summarises the SpokenTutorial Project.

Please download and watch it.

Slide Number 9

Spoken Tutorial workshops

The Spoken tutorial project team
  • conducts workshops and gives
  • certificates on passing online tests.

For more details, please write to us.

Slide Number 10

Forum for specific questions:

Post your timed queries in the forum.
Slide Number 11

Acknowledgement

Spoken Tutorial Project is funded by MHRD, Government of India.
This is Rani from IIT, Bombay. Thank you for joining.

Contributors and Content Editors

Madhurig, Ranipv076, Snehalathak