Difference between revisions of "R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
Line 39: Line 39:
  
 
System Specifications
 
System Specifications
|| This tutorial is recorded on* '''Ubuntu Linux '''OS version '''16.04'''
+
|| This tutorial is recorded on
 +
* '''Ubuntu Linux '''OS version '''16.04'''
 
* '''R''' version '''3.4.4'''
 
* '''R''' version '''3.4.4'''
 
* '''RStudio''' version '''1.1.463'''
 
* '''RStudio''' version '''1.1.463'''
Line 49: Line 50:
 
Download Files
 
Download Files
 
|| For this tutorial, we will use
 
|| For this tutorial, we will use
* A data frame '''moviesData.csv'''
+
* A '''data frame moviesData.csv'''
* A script file '''barPlots.R'''.
+
* A '''script''' file '''barPlots.R'''.
  
 
Please download these files from the '''Code files''' link of this tutorial.  
 
Please download these files from the '''Code files''' link of this tutorial.  
Line 73: Line 74:
 
|-
 
|-
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| '''movies''' '''data frame''' opens in the '''Source''' window.  
+
|| '''movies data frame''' opens in the '''Source''' window.  
 
|-
 
|-
 
|| Highlight '''dim(movies)''' in the '''Console''' window  
 
|| Highlight '''dim(movies)''' in the '''Console''' window  
|| It has 600 observations of 31 variables.  
+
|| It has 600 '''observations''' of 31 '''variables'''.  
 
|-
 
|-
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
|| In the '''Source''' window, scroll from left to right. This will enable us to see the remaining objects of '''movies''' '''data frame'''.  
+
|| In the '''Source''' window, scroll from left to right.  
 +
 
 +
This will enable us to see the remaining objects of '''movies data frame'''.  
 
|-
 
|-
 
|| Highlight '''imdb_rating''' in the '''Source''' window  
 
|| Highlight '''imdb_rating''' in the '''Source''' window  
|| Now, we will learn how to draw a bar chart of the object named '''imdb '''underscore '''rating''' in '''movies'''.  
+
|| Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''.  
 
|-
 
|-
 
|| Show slide
 
|| Show slide
Line 88: Line 91:
 
Bar Chart  
 
Bar Chart  
 
||  
 
||  
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the variable.  
+
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''.  
* R uses the function '''barplot''' to create bar charts.
+
* '''R''' uses the '''function barplot''' to create bar charts.
  
 
|-
 
|-
Line 95: Line 98:
 
|| Let us switch to '''RStudio'''.  
 
|| Let us switch to '''RStudio'''.  
 
|-
 
|-
|| Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window.
|| For the sake of simplicity, we are considering only the first 20 observations of '''movies''' to draw a '''bar chart'''.  
+
|| For the sake of simplicity, we are considering only the first 20 '''observations''' of '''movies''' to draw a bar chart.  
 
|-
 
|-
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
|| Click on the '''script''' '''barPlots.R'''
+
|| Click on the '''script barPlots.R'''
 
|-
 
|-
 
|| '''  '''
 
|| '''  '''
Line 106: Line 109:
  
 
'''moviesSub <- movies[1:20,]'''
 
'''moviesSub <- movies[1:20,]'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
  
 
Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously.  
 
Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously.  
Line 113: Line 116:
 
|| Let me resize the '''Source''' window.  
 
|| Let me resize the '''Source''' window.  
 
|-
 
|-
|| Highlight '''moviesSub''' in the '''Environment''' window  
+
|| Highlight '''moviesSub''' in the '''Environment''' window.
|| '''moviesSub''' with 20 observations is loaded in the '''Environment'''.  
+
|| '''moviesSub''' with 20 '''observations''' is loaded in the '''Environment'''.  
  
 
Now, we draw a bar chart of '''imdb_rating''' for these movies.  
 
Now, we draw a bar chart of '''imdb_rating''' for these movies.  
Line 131: Line 134:
  
 
'''main="Movies' IMDB Rating")'''
 
'''main="Movies' IMDB Rating")'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.
 +
 
|-
 
|-
 
|| Highlight '''barplot''' in the '''Source''' window  
 
|| Highlight '''barplot''' in the '''Source''' window  
|| Here, we have used the following arguments:
+
|| Here, we have used the following '''arguments''':
* '''moviesSub dollar sign imdb '''underscore''' rating '''is the data for plotting  
+
* '''moviesSub dollar sign imdb underscore rating '''is the data for plotting  
 
* '''ylab''' and '''xlab''' for adding labels to the respective axes
 
* '''ylab''' and '''xlab''' for adding labels to the respective axes
 
* '''col''' to set the color of bins  
 
* '''col''' to set the color of bins  
Line 142: Line 146:
  
 
|-
 
|-
|| Highlight '''Run''' button in the '''Source''' window  
+
|| Highlight '''Run''' button in the '''Source''' window.
 
|| Run the current line.  
 
|| Run the current line.  
 
|-
 
|-
Line 160: Line 164:
 
|-
 
|-
 
||  
 
||  
|| So, we will add more arguments in '''barplot '''function to show the names of movies on X-axis.  
+
|| So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis.  
 
|-
 
|-
 
||  
 
||  
Line 178: Line 182:
  
 
'''names.arg=moviesSub$title)'''
 
'''names.arg=moviesSub$title)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
 
|| Highlight '''names.arg''' in the '''Source''' window
 
|| Highlight '''names.arg''' in the '''Source''' window
|| Here, we have used the argument '''names.arg''' and set it to '''title'''.  
+
|| Here, we have used the '''argument names.arg''' and set it to '''title'''.  
  
 
Remember, '''title''' column in '''moviesSub''' contains the names of movies.  
 
Remember, '''title''' column in '''moviesSub''' contains the names of movies.  
 
|-
 
|-
|| Highlight '''Run''' button in the '''Source''' window  
+
|| Highlight '''Run''' button in the '''Source''' window.
 
|| Run the current line.  
 
|| Run the current line.  
 
|-
 
|-
Line 218: Line 222:
  
 
'''las = 2)'''
 
'''las = 2)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
 
||  
 
||  
  
 
Highlight '''las''' in the '''Source''' window
 
Highlight '''las''' in the '''Source''' window
|| Here, we have used '''las''' argument.  
+
|| Here, we have used '''las argument'''.  
  
 
'''las '''equal to''' 2''' produces labels which are at right angles to the axis.  
 
'''las '''equal to''' 2''' produces labels which are at right angles to the axis.  
Line 241: Line 245:
 
|| However, longer names are being truncated.  
 
|| However, longer names are being truncated.  
  
We can add more arguments to '''barplot''' function for adjusting labels.  
+
We can add more '''arguments''' to '''barplot function''' for adjusting labels.  
  
 
For more information, please refer to the '''Additional Material''' section on this website.  
 
For more information, please refer to the '''Additional Material''' section on this website.  
Line 252: Line 256:
 
|-
 
|-
 
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window  
 
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window  
|| Let us analyze the relation between '''imdb '''underscore''' rating''' and '''audience '''underscore '''score. '''
+
|| Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score. '''
  
For this, we will draw a '''scatter plot''' with these two objects by using '''plot''' function.  
+
For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''.  
  
 
Remember, we have already learnt how to plot a single object.  
 
Remember, we have already learnt how to plot a single object.  
Line 262: Line 266:
 
Scatter Plot  
 
Scatter Plot  
 
||  
 
||  
* '''Scatter plot''' is a graph in which the values of two variables are plotted along two axes.  
+
* '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes.  
 
* The pattern of the resulting points reveals the correlation.
 
* The pattern of the resulting points reveals the correlation.
  
Line 270: Line 274:
 
|-
 
|-
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
|| In the '''Source''' window, click on the '''script''' '''barPlots.R'''
+
|| In the '''Source''' window, click on the '''script barPlots.R'''
 
|-
 
|-
 
|| [RStudio]
 
|| [RStudio]
Line 289: Line 293:
  
 
'''col = "blue") '''
 
'''col = "blue") '''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
|| Highlight '''plot''' function in the '''Source''' window
+
|| Highlight '''plot function''' in the '''Source''' window
|| Here, we have kept '''imdb '''underscore '''rating''' on the X-axis and '''audience '''underscore '''score '''on the Y-axis.  
+
|| Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score '''on the Y-axis.  
 
|-
 
|-
 
|| Highlight '''xlim''' in the '''Source''' window  
 
|| Highlight '''xlim''' in the '''Source''' window  
|| As '''imdb '''underscore''' rating''' of any movie varies between 0 and 10, we have set the range of values on X-axis from 0 to 10.  
+
||  
 +
*As '''imdb underscore rating''' of any movie varies between 0 and 10,  
 +
*we have set the range of values on X-axis from 0 to 10.  
 
|-
 
|-
 
|| Highlight '''ylim''' in the '''Source''' window  
 
|| Highlight '''ylim''' in the '''Source''' window  
Line 301: Line 307:
 
|-
 
|-
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
|| Save the script and run the current line.  
+
|| Save the '''script''' and run the current line.  
 
|-
 
|-
 
|| Highlight '''Files''' and '''Plots''' window  
 
|| Highlight '''Files''' and '''Plots''' window  
Line 307: Line 313:
 
|-
 
|-
 
|| Highlight the plot in the '''Plots''' window
 
|| Highlight the plot in the '''Plots''' window
|| We can observe that the movies having higher '''imdb '''underscore '''rating '''has a high '''audience '''underscore '''score'''.  
+
|| We can observe that the movies having higher '''imdb underscore rating '''has a high '''audience underscore score'''.  
 
|-
 
|-
 
||  
 
||  
Line 313: Line 319:
 
|-
 
|-
 
||  
 
||  
|| Now we will learn how to calculate the correlation coefficient between '''imdb '''underscore '''rating '''and '''audience '''underscore '''score'''.  
+
|| Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''.  
  
For this, we use '''cor''' function.  
+
For this, we use '''cor function'''.  
 
|-
 
|-
 
|| [RStudio]
 
|| [RStudio]
  
 
'''cor(movies$imdb_rating, movies$audience_score)'''
 
'''cor(movies$imdb_rating, movies$audience_score)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
Line 326: Line 332:
 
|-
 
|-
 
|| Highlight the output in the '''Console''' window  
 
|| Highlight the output in the '''Console''' window  
|| The correlation coefficient between '''imdb '''underscore''' rating '''and '''audience '''underscore '''score''' is evaluated as 0.865.  
+
|| The correlation coefficient between '''imdb underscore rating '''and '''audience underscore score''' is evaluated as 0.865.  
 
|-
 
|-
 
|| Highlight the output in the '''Console''' window  
 
|| Highlight the output in the '''Console''' window  
 
|| The value of correlation coefficient is always between -1 and +1.
 
|| The value of correlation coefficient is always between -1 and +1.
  
A positive value indicates that the variables are positively related.  
+
A positive value indicates that the '''variables''' are positively related.  
 
|-
 
|-
 
||  
 
||  
Line 349: Line 355:
 
Assignment
 
Assignment
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics '''underscore '''score''' for the first 10 movies.  
+
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies.  
* Create a '''scatter plot''' of''' imdb '''underscore '''rating''' and '''imdb '''underscore '''num '''underscore '''votes''' to see their relation.  
+
* Create a '''scatter plot''' of''' imdb underscore rating''' and '''imdb underscore num underscore votes''' to see their relation.  
 
* Save both the plots.  
 
* Save both the plots.  
  

Revision as of 08:18, 31 May 2019

Title of the script: Plotting Bar Charts and Scatter Plots

Author: Tushar Bajaj (TISS Mumbai) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, graphs, bar chart, labels, scatter plot, correlation, video tutorial, spoken tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to this tutorial on Plotting bar charts and scatter plot.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects
Show slide

Pre-requisites

https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Data frames in R
  • Basics of Statistics

If not, please locate the relevant tutorials R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file barPlots.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and barPlots.R in the folder Plots

I have downloaded and moved these files to Plots folder.

This folder is located in myProject folder on my Desktop.

I have also set Plots folder as my Working Directory.

Let us switch to Rstudio.
Highlight barPlots.R in the Files window of RStudio Open the script barPlots.R in RStudio.
Highlight the Source button Run this script by clicking on Source button.
Highlight movies in the Source window movies data frame opens in the Source window.
Highlight dim(movies) in the Console window It has 600 observations of 31 variables.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.

This will enable us to see the remaining objects of movies data frame.

Highlight imdb_rating in the Source window Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies.
Show slide

Bar Chart

  • A bar chart represents data in rectangular bars with length of the bar proportional to the value of the variable.
  • R uses the function barplot to create bar charts.
Let us switch to RStudio.
Highlight movies in the Source window. For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart.
Highlight barPlots.R in the Source window Click on the script barPlots.R

[Rstudio]

moviesSub <- movies[1:20,]

In the Source window, type the following command.

Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.

Let me resize the Source window.
Highlight moviesSub in the Environment window. moviesSub with 20 observations is loaded in the Environment.

Now, we draw a bar chart of imdb_rating for these movies.

[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

xlab = "Movies",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating")

In the Source window, type the following command.
Highlight barplot in the Source window Here, we have used the following arguments:
  • moviesSub dollar sign imdb underscore rating is the data for plotting
  • ylab and xlab for adding labels to the respective axes
  • col to set the color of bins
  • ylim to set the range of values on Y-axis
  • main for adding a title to the bar chart
Highlight Run button in the Source window. Run the current line.
Highlight the plot in the Plots window The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the first bar in the plot This particular movie has an IMDB rating of approximately 6.
Highlight the third bar in the plot Similarly, this particular movie has an IMDB rating of approximately 8.

However, we do not know the name of the movies.

So, we will add more arguments in barplot function to show the names of movies on X-axis.
Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title)

In the Source window, type the following command.
Highlight names.arg in the Source window Here, we have used the argument names.arg and set it to title.

Remember, title column in moviesSub contains the names of movies.

Highlight Run button in the Source window. Run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight X-axis of the plot Now, the names of movies are displayed on the X-axis.

But not for all movies.

This is due to the point that the names are too long to be accommodated.

That’s why, we will make these names perpendicular to X-axis.

Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title,

las = 2)

In the Source window, type the following command.

Highlight las in the Source window

Here, we have used las argument.

las equal to 2 produces labels which are at right angles to the axis.

Highlight Run button in the Source window Run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window Now the names for all the movies are displayed on X-axis.

For example, Filly Brown has an IMDB rating of approximately 6.

Highlight the plot in the Plots window However, longer names are being truncated.

We can add more arguments to barplot function for adjusting labels.

For more information, please refer to the Additional Material section on this website.

Close this plot.
Highlight movies in the Source window In the Source window, click on movies.
Highlight imdb_rating and audience_score in the Source window Let us analyze the relation between imdb underscore rating and audience underscore score.

For this, we will draw a scatter plot with these two objects by using plot function.

Remember, we have already learnt how to plot a single object.

Show Slide

Scatter Plot

  • Scatter plot is a graph in which the values of two variables are plotted along two axes.
  • The pattern of the resulting points reveals the correlation.
Let us switch to RStudio.
Highlight barPlots.R in the Source window In the Source window, click on the script barPlots.R
[RStudio]

plot(x = movies$imdb_rating,

y = movies$audience_score,

main = "IMDB Rating vs Audience Score",

xlab = "IMDB Rating",

ylab = "Audience Score",

xlim = c(0,10),

ylim = c(0,100),

col = "blue")

In the Source window, type the following command.
Highlight plot function in the Source window Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis.
Highlight xlim in the Source window
  • As imdb underscore rating of any movie varies between 0 and 10,
  • we have set the range of values on X-axis from 0 to 10.
Highlight ylim in the Source window Similarly, we have set the range of values on Y-axis from 0 to 100.
Highlight Run button in the Source window Save the script and run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window We can observe that the movies having higher imdb underscore rating has a high audience underscore score.
Close this plot.
Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score.

For this, we use cor function.

[RStudio]

cor(movies$imdb_rating, movies$audience_score)

In the Source window, type the following command.
Highlight Run button in the Source window Save the script and run the current line.
Highlight the output in the Console window The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865.
Highlight the output in the Console window The value of correlation coefficient is always between -1 and +1.

A positive value indicates that the variables are positively related.

Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects
Show slide

Assignment

We now suggest an assignment.
  • Read the file moviesData.csv. Create a bar chart of critics underscore score for the first 10 movies.
  • Create a scatter plot of imdb underscore rating and imdb underscore num underscore votes to see their relation.
  • Save both the plots.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst