Difference between revisions of "R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
Line 21: Line 21:
 
* Plot '''bar charts'''
 
* Plot '''bar charts'''
 
* Plot '''scatter plot'''
 
* Plot '''scatter plot'''
* Find the correlation coefficient between two objects  
+
* Find the correlation coefficient between two objects.
  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Pre-requisites
+
'''Pre-requisites'''
  
https://spoken-tutorial.org
+
'''https://spoken-tutorial.org'''
  
 
|| To understand this tutorial, you should know,  
 
|| To understand this tutorial, you should know,  
Line 36: Line 36:
 
If not, please locate the relevant tutorials '''R''' on this website.
 
If not, please locate the relevant tutorials '''R''' on this website.
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
System Specifications
+
'''System Specification'''s
 
|| This tutorial is recorded on
 
|| This tutorial is recorded on
 
* '''Ubuntu Linux '''OS version '''16.04'''
 
* '''Ubuntu Linux '''OS version '''16.04'''
Line 46: Line 46:
 
Install '''R''' version '''3.2.0''' or higher.  
 
Install '''R''' version '''3.2.0''' or higher.  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Download Files
+
'''Download Files'''
 
|| For this tutorial, we will use
 
|| For this tutorial, we will use
 
* A '''data frame moviesData.csv'''
 
* A '''data frame moviesData.csv'''
Line 55: Line 55:
 
Please download these files from the '''Code files''' link of this tutorial.  
 
Please download these files from the '''Code files''' link of this tutorial.  
 
|-
 
|-
|| [Computer screen]
+
|| Point to the files in the Plots folder.
  
 
Highlight '''moviesData.csv '''and''' barPlots.R '''in the folder '''Plots '''
 
Highlight '''moviesData.csv '''and''' barPlots.R '''in the folder '''Plots '''
Line 64: Line 64:
 
I have also set '''Plots''' folder as my '''Working Directory.'''  
 
I have also set '''Plots''' folder as my '''Working Directory.'''  
 
|-
 
|-
||  
+
|| cursor on the interface.
 
|| Let us switch to '''Rstudio'''.
 
|| Let us switch to '''Rstudio'''.
 
|-
 
|-
Line 87: Line 87:
 
|| Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''.  
 
|| Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''.  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Bar Chart  
+
'''Bar Chart'''
 
||  
 
||  
 
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''.  
 
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''.  
Line 104: Line 104:
 
|| Click on the '''script barPlots.R'''
 
|| Click on the '''script barPlots.R'''
 
|-
 
|-
|| '''  '''
+
|| Cursor on the interface.
  
 
'''[Rstudio]'''
 
'''[Rstudio]'''
Line 111: Line 111:
 
|| In the '''Source''' window, type the following '''command'''.  
 
|| In the '''Source''' window, type the following '''command'''.  
  
Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously.  
+
Save the '''script''' and run the current line by pressing '''Ctrl + Enter''' keys simultaneously.  
 
|-
 
|-
||  
+
|| Drag the boundary to resize the window.
 
|| Let me resize the '''Source''' window.  
 
|| Let me resize the '''Source''' window.  
 
|-
 
|-
Line 121: Line 121:
 
Now, we draw a bar chart of '''imdb_rating''' for these movies.  
 
Now, we draw a bar chart of '''imdb_rating''' for these movies.  
 
|-
 
|-
|| [RStudio]
+
||Cursor in the '''Source''' window.
 +
 
 +
[RStudio]
  
 
'''barplot(moviesSub$imdb_rating,'''
 
'''barplot(moviesSub$imdb_rating,'''
Line 140: Line 142:
 
|| Here, we have used the following '''arguments''':
 
|| Here, we have used the following '''arguments''':
 
* '''moviesSub dollar sign imdb underscore rating '''is the data for plotting  
 
* '''moviesSub dollar sign imdb underscore rating '''is the data for plotting  
* '''ylab''' and '''xlab''' for adding labels to the respective axes
+
* '''ylab''' and '''xlab''' for adding labels to the respective axes.
 
* '''col''' to set the color of bins  
 
* '''col''' to set the color of bins  
 
* '''ylim''' to set the range of values on Y-axis  
 
* '''ylim''' to set the range of values on Y-axis  
* '''main''' for adding a title to the bar chart  
+
* '''main''' for adding a title to the bar chart.
  
 
|-
 
|-
Line 150: Line 152:
 
|-
 
|-
 
|| Highlight the plot in the '''Plots''' window
 
|| Highlight the plot in the '''Plots''' window
|| The bar chart is displayed with '''Movies''' on X-axis and their '''imdb_rating '''on Y-axis.  
+
|| The bar chart is displayed with '''Movies''' on X-axis and their '''imdb_rating''' on Y-axis.  
 
|-
 
|-
 
|| Highlight '''Files''' and '''Plots''' window  
 
|| Highlight '''Files''' and '''Plots''' window  
Line 163: Line 165:
 
However, we do not know the name of the movies.  
 
However, we do not know the name of the movies.  
 
|-
 
|-
||  
+
||Cursor on the plot window.
 
|| So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis.  
 
|| So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis.  
 
|-
 
|-
||  
+
|| Click on the Close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
Line 204: Line 206:
 
That’s why, we will make these names perpendicular to X-axis.  
 
That’s why, we will make these names perpendicular to X-axis.  
 
|-
 
|-
||  
+
|| Click on the Close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
Line 224: Line 226:
 
|| In the '''Source''' window, type the following '''command'''.  
 
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
||  
+
|| Highlight '''las''' in the '''Source''' window
 
+
Highlight '''las''' in the '''Source''' window
+
 
|| Here, we have used '''las argument'''.  
 
|| Here, we have used '''las argument'''.  
  
Line 234: Line 234:
 
|| Run the current line.  
 
|| Run the current line.  
 
|-
 
|-
|| Highlight '''Files''' and '''Plots''' window  
+
|| Highlight '''Files''' and '''Plots''' window.
 
|| In the '''Plots''' window, click on '''Zoom''' to maximize the plot.  
 
|| In the '''Plots''' window, click on '''Zoom''' to maximize the plot.  
 
|-
 
|-
Line 249: Line 249:
 
For more information, please refer to the '''Additional Material''' section on this website.  
 
For more information, please refer to the '''Additional Material''' section on this website.  
 
|-
 
|-
||  
+
|| click on the Close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| In the '''Source''' window, click on '''movies. '''
+
|| In the '''Source''' window, click on '''movies'''.
 
|-
 
|-
 
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window  
 
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window  
|| Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score. '''
+
|| Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score'''.
  
 
For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''.  
 
For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''.  
Line 262: Line 262:
 
Remember, we have already learnt how to plot a single object.  
 
Remember, we have already learnt how to plot a single object.  
 
|-
 
|-
|| Show Slide  
+
|| '''Show Slide'''
  
Scatter Plot  
+
'''Scatter Plot'''
 
||  
 
||  
 
* '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes.  
 
* '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes.  
Line 296: Line 296:
 
|-
 
|-
 
|| Highlight '''plot function''' in the '''Source''' window
 
|| Highlight '''plot function''' in the '''Source''' window
|| Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score '''on the Y-axis.  
+
|| Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score''' on the Y-axis.  
 
|-
 
|-
 
|| Highlight '''xlim''' in the '''Source''' window  
 
|| Highlight '''xlim''' in the '''Source''' window  
 
||  
 
||  
*As '''imdb underscore rating''' of any movie varies between 0 and 10,  
+
* As '''imdb underscore rating''' of any movie varies between 0 and 10,  
*we have set the range of values on X-axis from 0 to 10.  
+
* we have set the range of values on X-axis from 0 to 10.  
 
|-
 
|-
 
|| Highlight '''ylim''' in the '''Source''' window  
 
|| Highlight '''ylim''' in the '''Source''' window  
Line 313: Line 313:
 
|-
 
|-
 
|| Highlight the plot in the '''Plots''' window
 
|| Highlight the plot in the '''Plots''' window
|| We can observe that the movies having higher '''imdb underscore rating '''has a high '''audience underscore score'''.  
+
|| We can observe that the movies having higher '''imdb underscore rating''' has a high '''audience underscore score'''.  
 
|-
 
|-
||  
+
|| Click on the close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
||  
+
|| Cursor on the interface.
 
|| Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''.  
 
|| Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''.  
  
Line 332: Line 332:
 
|-
 
|-
 
|| Highlight the output in the '''Console''' window  
 
|| Highlight the output in the '''Console''' window  
|| The correlation coefficient between '''imdb underscore rating '''and '''audience underscore score''' is evaluated as 0.865.  
+
|| The correlation coefficient between '''imdb underscore rating''' and '''audience underscore score''' is evaluated as 0.865.  
 
|-
 
|-
 
|| Highlight the output in the '''Console''' window  
 
|| Highlight the output in the '''Console''' window  
Line 342: Line 342:
 
|| Let us summarize what we have learnt.
 
|| Let us summarize what we have learnt.
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Summary
+
'''Summary'''
 
|| In this tutorial, we have learnt how to:
 
|| In this tutorial, we have learnt how to:
 
* Plot '''bar charts'''
 
* Plot '''bar charts'''
Line 351: Line 351:
  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Assignment
+
'''Assignment'''
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
 
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies.  
 
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies.  
Line 360: Line 360:
  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
About the Spoken Tutorial Project
+
'''About the Spoken Tutorial Project'''
 
|| The video at the following link summarises the Spoken Tutorial project.
 
|| The video at the following link summarises the Spoken Tutorial project.
  
 
Please download and watch it.
 
Please download and watch it.
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Spoken Tutorial Workshops
+
'''Spoken Tutorial Workshops'''
 
|| We conduct workshops using Spoken Tutorials and give certificates.
 
|| We conduct workshops using Spoken Tutorials and give certificates.
  
 
Please contact us.
 
Please contact us.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Forum to answer questions
+
'''Forum to answer questions'''
 
|| Please post your timed queries in this forum.
 
|| Please post your timed queries in this forum.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Forum to answer questions
+
'''Forum to answer questions'''
 
|| Please post your general queries in this forum.
 
|| Please post your general queries in this forum.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Textbook Companion
+
'''Textbook Companion'''
 
|| The '''FOSSEE '''team coordinates the '''TBC '''project.
 
|| The '''FOSSEE '''team coordinates the '''TBC '''project.
  
 
For more details, please visit these sites.
 
For more details, please visit these sites.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Acknowledgment
+
'''Acknowledgment'''
 
|| The Spoken Tutorial project is funded by '''NMEICT''', '''MHRD''', Govt. of India
 
|| The Spoken Tutorial project is funded by '''NMEICT''', '''MHRD''', Govt. of India
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Thank You
+
'''Thank You'''
 
|| The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).
 
|| The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).
  

Latest revision as of 23:27, 1 June 2019

Title of the script: Plotting Bar Charts and Scatter Plots

Author: Tushar Bajaj (TISS Mumbai) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, graphs, bar chart, labels, scatter plot, correlation, video tutorial, spoken tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to this tutorial on Plotting bar charts and scatter plot.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects.
Show slide

Pre-requisites

https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Data frames in R
  • Basics of Statistics

If not, please locate the relevant tutorials R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file barPlots.R.

Please download these files from the Code files link of this tutorial.

Point to the files in the Plots folder.

Highlight moviesData.csv and barPlots.R in the folder Plots

I have downloaded and moved these files to Plots folder.

This folder is located in myProject folder on my Desktop.

I have also set Plots folder as my Working Directory.

cursor on the interface. Let us switch to Rstudio.
Highlight barPlots.R in the Files window of RStudio Open the script barPlots.R in RStudio.
Highlight the Source button Run this script by clicking on Source button.
Highlight movies in the Source window movies data frame opens in the Source window.
Highlight dim(movies) in the Console window It has 600 observations of 31 variables.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.

This will enable us to see the remaining objects of movies data frame.

Highlight imdb_rating in the Source window Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies.
Show slide

Bar Chart

  • A bar chart represents data in rectangular bars with length of the bar proportional to the value of the variable.
  • R uses the function barplot to create bar charts.
Let us switch to RStudio.
Highlight movies in the Source window. For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart.
Highlight barPlots.R in the Source window Click on the script barPlots.R
Cursor on the interface.

[Rstudio]

moviesSub <- movies[1:20,]

In the Source window, type the following command.

Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.

Drag the boundary to resize the window. Let me resize the Source window.
Highlight moviesSub in the Environment window. moviesSub with 20 observations is loaded in the Environment.

Now, we draw a bar chart of imdb_rating for these movies.

Cursor in the Source window.

[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

xlab = "Movies",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating")

In the Source window, type the following command.
Highlight barplot in the Source window Here, we have used the following arguments:
  • moviesSub dollar sign imdb underscore rating is the data for plotting
  • ylab and xlab for adding labels to the respective axes.
  • col to set the color of bins
  • ylim to set the range of values on Y-axis
  • main for adding a title to the bar chart.
Highlight Run button in the Source window. Run the current line.
Highlight the plot in the Plots window The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the first bar in the plot This particular movie has an IMDB rating of approximately 6.
Highlight the third bar in the plot Similarly, this particular movie has an IMDB rating of approximately 8.

However, we do not know the name of the movies.

Cursor on the plot window. So, we will add more arguments in barplot function to show the names of movies on X-axis.
Click on the Close button. Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title)

In the Source window, type the following command.
Highlight names.arg in the Source window Here, we have used the argument names.arg and set it to title.

Remember, title column in moviesSub contains the names of movies.

Highlight Run button in the Source window. Run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight X-axis of the plot Now, the names of movies are displayed on the X-axis.

But not for all movies.

This is due to the point that the names are too long to be accommodated.

That’s why, we will make these names perpendicular to X-axis.

Click on the Close button. Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title,

las = 2)

In the Source window, type the following command.
Highlight las in the Source window Here, we have used las argument.

las equal to 2 produces labels which are at right angles to the axis.

Highlight Run button in the Source window Run the current line.
Highlight Files and Plots window. In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window Now the names for all the movies are displayed on X-axis.

For example, Filly Brown has an IMDB rating of approximately 6.

Highlight the plot in the Plots window However, longer names are being truncated.

We can add more arguments to barplot function for adjusting labels.

For more information, please refer to the Additional Material section on this website.

click on the Close button. Close this plot.
Highlight movies in the Source window In the Source window, click on movies.
Highlight imdb_rating and audience_score in the Source window Let us analyze the relation between imdb underscore rating and audience underscore score.

For this, we will draw a scatter plot with these two objects by using plot function.

Remember, we have already learnt how to plot a single object.

Show Slide

Scatter Plot

  • Scatter plot is a graph in which the values of two variables are plotted along two axes.
  • The pattern of the resulting points reveals the correlation.
Let us switch to RStudio.
Highlight barPlots.R in the Source window In the Source window, click on the script barPlots.R
[RStudio]

plot(x = movies$imdb_rating,

y = movies$audience_score,

main = "IMDB Rating vs Audience Score",

xlab = "IMDB Rating",

ylab = "Audience Score",

xlim = c(0,10),

ylim = c(0,100),

col = "blue")

In the Source window, type the following command.
Highlight plot function in the Source window Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis.
Highlight xlim in the Source window
  • As imdb underscore rating of any movie varies between 0 and 10,
  • we have set the range of values on X-axis from 0 to 10.
Highlight ylim in the Source window Similarly, we have set the range of values on Y-axis from 0 to 100.
Highlight Run button in the Source window Save the script and run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window We can observe that the movies having higher imdb underscore rating has a high audience underscore score.
Click on the close button. Close this plot.
Cursor on the interface. Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score.

For this, we use cor function.

[RStudio]

cor(movies$imdb_rating, movies$audience_score)

In the Source window, type the following command.
Highlight Run button in the Source window Save the script and run the current line.
Highlight the output in the Console window The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865.
Highlight the output in the Console window The value of correlation coefficient is always between -1 and +1.

A positive value indicates that the variables are positively related.

Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects
Show slide

Assignment

We now suggest an assignment.
  • Read the file moviesData.csv. Create a bar chart of critics underscore score for the first 10 movies.
  • Create a scatter plot of imdb underscore rating and imdb underscore num underscore votes to see their relation.
  • Save both the plots.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst