Difference between revisions of "R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
(4 intermediate revisions by 2 users not shown)
Line 17: Line 17:
  
 
Learning Objectives
 
Learning Objectives
 
 
  
 
|| In this tutorial, we will learn how to:  
 
|| In this tutorial, we will learn how to:  
 
* Plot '''bar charts'''
 
* Plot '''bar charts'''
 
* Plot '''scatter plot'''
 
* Plot '''scatter plot'''
* Find the correlation coefficient between two objects  
+
* Find the correlation coefficient between two objects.
 
+
  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Pre-requisites
+
'''Pre-requisites'''
  
https://spoken-tutorial.org
+
'''https://spoken-tutorial.org'''
  
 
|| To understand this tutorial, you should know,  
 
|| To understand this tutorial, you should know,  
Line 39: Line 36:
 
If not, please locate the relevant tutorials '''R''' on this website.
 
If not, please locate the relevant tutorials '''R''' on this website.
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
System Specifications
+
'''System Specification'''s
|| This tutorial is recorded on* '''Ubuntu Linux '''OS version '''16.04'''
+
|| This tutorial is recorded on
 +
* '''Ubuntu Linux '''OS version '''16.04'''
 
* '''R''' version '''3.4.4'''
 
* '''R''' version '''3.4.4'''
 
* '''RStudio''' version '''1.1.463'''
 
* '''RStudio''' version '''1.1.463'''
Line 48: Line 46:
 
Install '''R''' version '''3.2.0''' or higher.  
 
Install '''R''' version '''3.2.0''' or higher.  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Download Files
+
'''Download Files'''
 
|| For this tutorial, we will use
 
|| For this tutorial, we will use
* A data frame '''moviesData.csv'''
+
* A '''data frame moviesData.csv'''
* A script file '''barPlots.R'''.
+
* A '''script''' file '''barPlots.R'''.
  
 
Please download these files from the '''Code files''' link of this tutorial.  
 
Please download these files from the '''Code files''' link of this tutorial.  
 
|-
 
|-
|| [Computer screen]
+
|| Point to the files in the Plots folder.
  
 
Highlight '''moviesData.csv '''and''' barPlots.R '''in the folder '''Plots '''
 
Highlight '''moviesData.csv '''and''' barPlots.R '''in the folder '''Plots '''
Line 66: Line 64:
 
I have also set '''Plots''' folder as my '''Working Directory.'''  
 
I have also set '''Plots''' folder as my '''Working Directory.'''  
 
|-
 
|-
||  
+
|| cursor on the interface.
 
|| Let us switch to '''Rstudio'''.
 
|| Let us switch to '''Rstudio'''.
 
|-
 
|-
Line 76: Line 74:
 
|-
 
|-
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| '''movies''' '''data frame''' opens in the '''Source''' window.  
+
|| '''movies data frame''' opens in the '''Source''' window.  
 
|-
 
|-
 
|| Highlight '''dim(movies)''' in the '''Console''' window  
 
|| Highlight '''dim(movies)''' in the '''Console''' window  
|| It has 600 observations of 31 variables.  
+
|| It has 600 '''observations''' of 31 '''variables'''.  
 
|-
 
|-
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
|| In the '''Source''' window, scroll from left to right. This will enable us to see the remaining objects of '''movies''' '''data frame'''.  
+
|| In the '''Source''' window, scroll from left to right.  
 +
 
 +
This will enable us to see the remaining objects of '''movies data frame'''.  
 
|-
 
|-
 
|| Highlight '''imdb_rating''' in the '''Source''' window  
 
|| Highlight '''imdb_rating''' in the '''Source''' window  
|| Now, we will learn how to draw a bar chart of the object named '''imdb '''underscore '''rating''' in '''movies'''.  
+
|| Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''.  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Bar Chart  
+
'''Bar Chart'''
|| * A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the variable.  
+
||  
* R uses the function '''barplot''' to create bar charts.
+
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''.  
 +
* '''R''' uses the '''function barplot''' to create bar charts.
  
 
|-
 
|-
Line 97: Line 98:
 
|| Let us switch to '''RStudio'''.  
 
|| Let us switch to '''RStudio'''.  
 
|-
 
|-
|| Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window.
|| For the sake of simplicity, we are considering only the first 20 observations of '''movies''' to draw a '''bar chart'''.  
+
|| For the sake of simplicity, we are considering only the first 20 '''observations''' of '''movies''' to draw a bar chart.  
 
|-
 
|-
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
|| Click on the '''script''' '''barPlots.R'''
+
|| Click on the '''script barPlots.R'''
 
|-
 
|-
|| '''  '''
+
|| Cursor on the interface.
  
 
'''[Rstudio]'''
 
'''[Rstudio]'''
  
 
'''moviesSub <- movies[1:20,]'''
 
'''moviesSub <- movies[1:20,]'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
  
 
+
Save the '''script''' and run the current line by pressing '''Ctrl + Enter''' keys simultaneously.  
Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously.  
+
 
|-
 
|-
||  
+
|| Drag the boundary to resize the window.
 
|| Let me resize the '''Source''' window.  
 
|| Let me resize the '''Source''' window.  
 
|-
 
|-
|| Highlight '''moviesSub''' in the '''Environment''' window  
+
|| Highlight '''moviesSub''' in the '''Environment''' window.
|| '''moviesSub''' with 20 observations is loaded in the '''Environment'''.  
+
|| '''moviesSub''' with 20 '''observations''' is loaded in the '''Environment'''.  
  
 
Now, we draw a bar chart of '''imdb_rating''' for these movies.  
 
Now, we draw a bar chart of '''imdb_rating''' for these movies.  
 
|-
 
|-
|| [RStudio]
+
||Cursor in the '''Source''' window.
 +
 
 +
[RStudio]
  
 
'''barplot(moviesSub$imdb_rating,'''
 
'''barplot(moviesSub$imdb_rating,'''
Line 134: Line 136:
  
 
'''main="Movies' IMDB Rating")'''
 
'''main="Movies' IMDB Rating")'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.
 +
 
|-
 
|-
 
|| Highlight '''barplot''' in the '''Source''' window  
 
|| Highlight '''barplot''' in the '''Source''' window  
|| Here, we have used the following arguments:
+
|| Here, we have used the following '''arguments''':
* '''moviesSub dollar sign imdb '''underscore''' rating '''is the data for plotting  
+
* '''moviesSub dollar sign imdb underscore rating '''is the data for plotting  
* '''ylab''' and '''xlab''' for adding labels to the respective axes
+
* '''ylab''' and '''xlab''' for adding labels to the respective axes.
 
* '''col''' to set the color of bins  
 
* '''col''' to set the color of bins  
 
* '''ylim''' to set the range of values on Y-axis  
 
* '''ylim''' to set the range of values on Y-axis  
* '''main''' for adding a title to the bar chart  
+
* '''main''' for adding a title to the bar chart.
 
+
  
 
|-
 
|-
|| Highlight '''Run''' button in the '''Source''' window  
+
|| Highlight '''Run''' button in the '''Source''' window.
 
|| Run the current line.  
 
|| Run the current line.  
 
|-
 
|-
 
|| Highlight the plot in the '''Plots''' window
 
|| Highlight the plot in the '''Plots''' window
|| The bar chart is displayed with '''Movies''' on X-axis and their '''imdb_rating '''on Y-axis.  
+
|| The bar chart is displayed with '''Movies''' on X-axis and their '''imdb_rating''' on Y-axis.  
 
|-
 
|-
 
|| Highlight '''Files''' and '''Plots''' window  
 
|| Highlight '''Files''' and '''Plots''' window  
Line 163: Line 165:
 
However, we do not know the name of the movies.  
 
However, we do not know the name of the movies.  
 
|-
 
|-
||  
+
||Cursor on the plot window.
|| So, we will add more arguments in '''barplot '''function to show the names of movies on X-axis.  
+
|| So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis.  
 
|-
 
|-
||  
+
|| Click on the Close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
Line 182: Line 184:
  
 
'''names.arg=moviesSub$title)'''
 
'''names.arg=moviesSub$title)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
 
|| Highlight '''names.arg''' in the '''Source''' window
 
|| Highlight '''names.arg''' in the '''Source''' window
|| Here, we have used the argument '''names.arg''' and set it to '''title'''.  
+
|| Here, we have used the '''argument names.arg''' and set it to '''title'''.  
  
 
Remember, '''title''' column in '''moviesSub''' contains the names of movies.  
 
Remember, '''title''' column in '''moviesSub''' contains the names of movies.  
 
|-
 
|-
|| Highlight '''Run''' button in the '''Source''' window  
+
|| Highlight '''Run''' button in the '''Source''' window.
 
|| Run the current line.  
 
|| Run the current line.  
 
|-
 
|-
Line 204: Line 206:
 
That’s why, we will make these names perpendicular to X-axis.  
 
That’s why, we will make these names perpendicular to X-axis.  
 
|-
 
|-
||  
+
|| Click on the Close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
Line 222: Line 224:
  
 
'''las = 2)'''
 
'''las = 2)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
||  
+
|| Highlight '''las''' in the '''Source''' window
 
+
|| Here, we have used '''las argument'''.  
Highlight '''las''' in the '''Source''' window
+
|| Here, we have used '''las''' argument.  
+
  
 
'''las '''equal to''' 2''' produces labels which are at right angles to the axis.  
 
'''las '''equal to''' 2''' produces labels which are at right angles to the axis.  
Line 234: Line 234:
 
|| Run the current line.  
 
|| Run the current line.  
 
|-
 
|-
|| Highlight '''Files''' and '''Plots''' window  
+
|| Highlight '''Files''' and '''Plots''' window.
 
|| In the '''Plots''' window, click on '''Zoom''' to maximize the plot.  
 
|| In the '''Plots''' window, click on '''Zoom''' to maximize the plot.  
 
|-
 
|-
Line 245: Line 245:
 
|| However, longer names are being truncated.  
 
|| However, longer names are being truncated.  
  
 
+
We can add more '''arguments''' to '''barplot function''' for adjusting labels.  
We can add more arguments to '''barplot''' function for adjusting labels.  
+
 
+
  
 
For more information, please refer to the '''Additional Material''' section on this website.  
 
For more information, please refer to the '''Additional Material''' section on this website.  
 
|-
 
|-
||  
+
|| click on the Close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| In the '''Source''' window, click on '''movies. '''
+
|| In the '''Source''' window, click on '''movies'''.
 
|-
 
|-
 
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window  
 
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window  
|| Let us analyze the relation between '''imdb '''underscore''' rating''' and '''audience '''underscore '''score. '''
+
|| Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score'''.
  
For this, we will draw a '''scatter plot''' with these two objects by using '''plot''' function.  
+
For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''.  
  
 
Remember, we have already learnt how to plot a single object.  
 
Remember, we have already learnt how to plot a single object.  
 
|-
 
|-
|| Show Slide  
+
|| '''Show Slide'''
  
Scatter Plot  
+
'''Scatter Plot'''
|| * '''Scatter plot''' is a graph in which the values of two variables are plotted along two axes.  
+
||  
 +
* '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes.  
 
* The pattern of the resulting points reveals the correlation.
 
* The pattern of the resulting points reveals the correlation.
 
  
 
|-
 
|-
Line 276: Line 274:
 
|-
 
|-
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
 
|| Highlight '''barPlots.R '''in the '''Source''' window  
|| In the '''Source''' window, click on the '''script''' '''barPlots.R'''
+
|| In the '''Source''' window, click on the '''script barPlots.R'''
 
|-
 
|-
 
|| [RStudio]
 
|| [RStudio]
Line 295: Line 293:
  
 
'''col = "blue") '''
 
'''col = "blue") '''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
|| Highlight '''plot''' function in the '''Source''' window
+
|| Highlight '''plot function''' in the '''Source''' window
|| Here, we have kept '''imdb '''underscore '''rating''' on the X-axis and '''audience '''underscore '''score '''on the Y-axis.  
+
|| Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score''' on the Y-axis.  
 
|-
 
|-
 
|| Highlight '''xlim''' in the '''Source''' window  
 
|| Highlight '''xlim''' in the '''Source''' window  
|| As '''imdb '''underscore''' rating''' of any movie varies between 0 and 10, we have set the range of values on X-axis from 0 to 10.  
+
||  
 +
* As '''imdb underscore rating''' of any movie varies between 0 and 10,  
 +
* we have set the range of values on X-axis from 0 to 10.  
 
|-
 
|-
 
|| Highlight '''ylim''' in the '''Source''' window  
 
|| Highlight '''ylim''' in the '''Source''' window  
Line 307: Line 307:
 
|-
 
|-
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
|| Save the script and run the current line.  
+
|| Save the '''script''' and run the current line.  
 
|-
 
|-
 
|| Highlight '''Files''' and '''Plots''' window  
 
|| Highlight '''Files''' and '''Plots''' window  
Line 313: Line 313:
 
|-
 
|-
 
|| Highlight the plot in the '''Plots''' window
 
|| Highlight the plot in the '''Plots''' window
|| We can observe that the movies having higher '''imdb '''underscore '''rating '''has a high '''audience '''underscore '''score'''.  
+
|| We can observe that the movies having higher '''imdb underscore rating''' has a high '''audience underscore score'''.  
 
|-
 
|-
||  
+
|| Click on the close button.
 
|| Close this plot.  
 
|| Close this plot.  
 
|-
 
|-
||  
+
|| Cursor on the interface.
|| Now we will learn how to calculate the correlation coefficient between '''imdb '''underscore '''rating '''and '''audience '''underscore '''score'''.  
+
|| Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''.  
  
For this, we use '''cor''' function.  
+
For this, we use '''cor function'''.  
 
|-
 
|-
 
|| [RStudio]
 
|| [RStudio]
  
 
'''cor(movies$imdb_rating, movies$audience_score)'''
 
'''cor(movies$imdb_rating, movies$audience_score)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
Line 332: Line 332:
 
|-
 
|-
 
|| Highlight the output in the '''Console''' window  
 
|| Highlight the output in the '''Console''' window  
|| The correlation coefficient between '''imdb '''underscore''' rating '''and '''audience '''underscore '''score''' is evaluated as 0.865.  
+
|| The correlation coefficient between '''imdb underscore rating''' and '''audience underscore score''' is evaluated as 0.865.  
 
|-
 
|-
 
|| Highlight the output in the '''Console''' window  
 
|| Highlight the output in the '''Console''' window  
 
|| The value of correlation coefficient is always between -1 and +1.
 
|| The value of correlation coefficient is always between -1 and +1.
  
A positive value indicates that the variables are positively related.  
+
A positive value indicates that the '''variables''' are positively related.  
 
|-
 
|-
 
||  
 
||  
 
|| Let us summarize what we have learnt.
 
|| Let us summarize what we have learnt.
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Summary
+
'''Summary'''
 
|| In this tutorial, we have learnt how to:
 
|| In this tutorial, we have learnt how to:
 
* Plot '''bar charts'''
 
* Plot '''bar charts'''
 
* Plot '''scatter plot'''
 
* Plot '''scatter plot'''
 
* Find the correlation coefficient between two objects  
 
* Find the correlation coefficient between two objects  
 
  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Assignment
+
'''Assignment'''
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics '''underscore '''score''' for the first 10 movies.  
+
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies.  
* Create a '''scatter plot''' of''' imdb '''underscore '''rating''' and '''imdb '''underscore '''num '''underscore '''votes''' to see their relation.  
+
* Create a '''scatter plot''' of''' imdb underscore rating''' and '''imdb underscore num underscore votes''' to see their relation.  
 
* Save both the plots.  
 
* Save both the plots.  
 
  
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
About the Spoken Tutorial Project
+
'''About the Spoken Tutorial Project'''
 
|| The video at the following link summarises the Spoken Tutorial project.
 
|| The video at the following link summarises the Spoken Tutorial project.
  
 
Please download and watch it.
 
Please download and watch it.
 
|-
 
|-
|| Show slide
+
|| '''Show slide'''
  
Spoken Tutorial Workshops
+
'''Spoken Tutorial Workshops'''
 
|| We conduct workshops using Spoken Tutorials and give certificates.
 
|| We conduct workshops using Spoken Tutorials and give certificates.
 
  
 
Please contact us.
 
Please contact us.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Forum to answer questions
+
'''Forum to answer questions'''
 
|| Please post your timed queries in this forum.
 
|| Please post your timed queries in this forum.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Forum to answer questions
+
'''Forum to answer questions'''
 
|| Please post your general queries in this forum.
 
|| Please post your general queries in this forum.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Textbook Companion
+
'''Textbook Companion'''
 
|| The '''FOSSEE '''team coordinates the '''TBC '''project.
 
|| The '''FOSSEE '''team coordinates the '''TBC '''project.
  
 
For more details, please visit these sites.
 
For more details, please visit these sites.
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Acknowledgment
+
'''Acknowledgment'''
 
|| The Spoken Tutorial project is funded by '''NMEICT''', '''MHRD''', Govt. of India
 
|| The Spoken Tutorial project is funded by '''NMEICT''', '''MHRD''', Govt. of India
 
|-
 
|-
|| Show Slide
+
|| '''Show Slide'''
  
Thank You
+
'''Thank You'''
 
|| The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).
 
|| The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).
 
  
 
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.
 
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.
 
|-
 
|-
 
|}
 
|}

Latest revision as of 23:27, 1 June 2019

Title of the script: Plotting Bar Charts and Scatter Plots

Author: Tushar Bajaj (TISS Mumbai) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, graphs, bar chart, labels, scatter plot, correlation, video tutorial, spoken tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to this tutorial on Plotting bar charts and scatter plot.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects.
Show slide

Pre-requisites

https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Data frames in R
  • Basics of Statistics

If not, please locate the relevant tutorials R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file barPlots.R.

Please download these files from the Code files link of this tutorial.

Point to the files in the Plots folder.

Highlight moviesData.csv and barPlots.R in the folder Plots

I have downloaded and moved these files to Plots folder.

This folder is located in myProject folder on my Desktop.

I have also set Plots folder as my Working Directory.

cursor on the interface. Let us switch to Rstudio.
Highlight barPlots.R in the Files window of RStudio Open the script barPlots.R in RStudio.
Highlight the Source button Run this script by clicking on Source button.
Highlight movies in the Source window movies data frame opens in the Source window.
Highlight dim(movies) in the Console window It has 600 observations of 31 variables.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.

This will enable us to see the remaining objects of movies data frame.

Highlight imdb_rating in the Source window Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies.
Show slide

Bar Chart

  • A bar chart represents data in rectangular bars with length of the bar proportional to the value of the variable.
  • R uses the function barplot to create bar charts.
Let us switch to RStudio.
Highlight movies in the Source window. For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart.
Highlight barPlots.R in the Source window Click on the script barPlots.R
Cursor on the interface.

[Rstudio]

moviesSub <- movies[1:20,]

In the Source window, type the following command.

Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.

Drag the boundary to resize the window. Let me resize the Source window.
Highlight moviesSub in the Environment window. moviesSub with 20 observations is loaded in the Environment.

Now, we draw a bar chart of imdb_rating for these movies.

Cursor in the Source window.

[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

xlab = "Movies",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating")

In the Source window, type the following command.
Highlight barplot in the Source window Here, we have used the following arguments:
  • moviesSub dollar sign imdb underscore rating is the data for plotting
  • ylab and xlab for adding labels to the respective axes.
  • col to set the color of bins
  • ylim to set the range of values on Y-axis
  • main for adding a title to the bar chart.
Highlight Run button in the Source window. Run the current line.
Highlight the plot in the Plots window The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the first bar in the plot This particular movie has an IMDB rating of approximately 6.
Highlight the third bar in the plot Similarly, this particular movie has an IMDB rating of approximately 8.

However, we do not know the name of the movies.

Cursor on the plot window. So, we will add more arguments in barplot function to show the names of movies on X-axis.
Click on the Close button. Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title)

In the Source window, type the following command.
Highlight names.arg in the Source window Here, we have used the argument names.arg and set it to title.

Remember, title column in moviesSub contains the names of movies.

Highlight Run button in the Source window. Run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight X-axis of the plot Now, the names of movies are displayed on the X-axis.

But not for all movies.

This is due to the point that the names are too long to be accommodated.

That’s why, we will make these names perpendicular to X-axis.

Click on the Close button. Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title,

las = 2)

In the Source window, type the following command.
Highlight las in the Source window Here, we have used las argument.

las equal to 2 produces labels which are at right angles to the axis.

Highlight Run button in the Source window Run the current line.
Highlight Files and Plots window. In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window Now the names for all the movies are displayed on X-axis.

For example, Filly Brown has an IMDB rating of approximately 6.

Highlight the plot in the Plots window However, longer names are being truncated.

We can add more arguments to barplot function for adjusting labels.

For more information, please refer to the Additional Material section on this website.

click on the Close button. Close this plot.
Highlight movies in the Source window In the Source window, click on movies.
Highlight imdb_rating and audience_score in the Source window Let us analyze the relation between imdb underscore rating and audience underscore score.

For this, we will draw a scatter plot with these two objects by using plot function.

Remember, we have already learnt how to plot a single object.

Show Slide

Scatter Plot

  • Scatter plot is a graph in which the values of two variables are plotted along two axes.
  • The pattern of the resulting points reveals the correlation.
Let us switch to RStudio.
Highlight barPlots.R in the Source window In the Source window, click on the script barPlots.R
[RStudio]

plot(x = movies$imdb_rating,

y = movies$audience_score,

main = "IMDB Rating vs Audience Score",

xlab = "IMDB Rating",

ylab = "Audience Score",

xlim = c(0,10),

ylim = c(0,100),

col = "blue")

In the Source window, type the following command.
Highlight plot function in the Source window Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis.
Highlight xlim in the Source window
  • As imdb underscore rating of any movie varies between 0 and 10,
  • we have set the range of values on X-axis from 0 to 10.
Highlight ylim in the Source window Similarly, we have set the range of values on Y-axis from 0 to 100.
Highlight Run button in the Source window Save the script and run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window We can observe that the movies having higher imdb underscore rating has a high audience underscore score.
Click on the close button. Close this plot.
Cursor on the interface. Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score.

For this, we use cor function.

[RStudio]

cor(movies$imdb_rating, movies$audience_score)

In the Source window, type the following command.
Highlight Run button in the Source window Save the script and run the current line.
Highlight the output in the Console window The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865.
Highlight the output in the Console window The value of correlation coefficient is always between -1 and +1.

A positive value indicates that the variables are positively related.

Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects
Show slide

Assignment

We now suggest an assignment.
  • Read the file moviesData.csv. Create a bar chart of critics underscore score for the first 10 movies.
  • Create a scatter plot of imdb underscore rating and imdb underscore num underscore votes to see their relation.
  • Save both the plots.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst