Difference between revisions of "R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
Line 87: Line 87:
  
 
Bar Chart  
 
Bar Chart  
|| * A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the variable.  
+
||  
 +
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the variable.  
 
* R uses the function '''barplot''' to create bar charts.
 
* R uses the function '''barplot''' to create bar charts.
  

Revision as of 17:33, 27 May 2019

Title of the script: Plotting Bar Charts and Scatter Plots

Author: Tushar Bajaj (TISS Mumbai) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, graphs, bar chart, labels, scatter plot, correlation, video tutorial, spoken tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to this tutorial on Plotting bar charts and scatter plot.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects
Show slide

Pre-requisites

https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Data frames in R
  • Basics of Statistics

If not, please locate the relevant tutorials R on this website.

Show slide

System Specifications

This tutorial is recorded on* Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file barPlots.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and barPlots.R in the folder Plots

I have downloaded and moved these files to Plots folder.

This folder is located in myProject folder on my Desktop.

I have also set Plots folder as my Working Directory.

Let us switch to Rstudio.
Highlight barPlots.R in the Files window of RStudio Open the script barPlots.R in RStudio.
Highlight the Source button Run this script by clicking on Source button.
Highlight movies in the Source window movies data frame opens in the Source window.
Highlight dim(movies) in the Console window It has 600 observations of 31 variables.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right. This will enable us to see the remaining objects of movies data frame.
Highlight imdb_rating in the Source window Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies.
Show slide

Bar Chart

  • A bar chart represents data in rectangular bars with length of the bar proportional to the value of the variable.
  • R uses the function barplot to create bar charts.
Let us switch to RStudio.
Highlight movies in the Source window For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart.
Highlight barPlots.R in the Source window Click on the script barPlots.R

[Rstudio]

moviesSub <- movies[1:20,]

In the Source window, type the following command.

Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.

Let me resize the Source window.
Highlight moviesSub in the Environment window moviesSub with 20 observations is loaded in the Environment.

Now, we draw a bar chart of imdb_rating for these movies.

[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

xlab = "Movies",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating")

In the Source window, type the following command.
Highlight barplot in the Source window Here, we have used the following arguments:
  • moviesSub dollar sign imdb underscore rating is the data for plotting
  • ylab and xlab for adding labels to the respective axes
  • col to set the color of bins
  • ylim to set the range of values on Y-axis
  • main for adding a title to the bar chart
Highlight Run button in the Source window Run the current line.
Highlight the plot in the Plots window The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the first bar in the plot This particular movie has an IMDB rating of approximately 6.
Highlight the third bar in the plot Similarly, this particular movie has an IMDB rating of approximately 8.

However, we do not know the name of the movies.

So, we will add more arguments in barplot function to show the names of movies on X-axis.
Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title)

In the Source window, type the following command.
Highlight names.arg in the Source window Here, we have used the argument names.arg and set it to title.

Remember, title column in moviesSub contains the names of movies.

Highlight Run button in the Source window Run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight X-axis of the plot Now, the names of movies are displayed on the X-axis.

But not for all movies.

This is due to the point that the names are too long to be accommodated.

That’s why, we will make these names perpendicular to X-axis.

Close this plot.
[RStudio]

barplot(moviesSub$imdb_rating,

ylab="IMDB Rating",

col="blue",

ylim=c(0,10),

main="Movies' IMDB Rating",

names.arg=moviesSub$title,

las = 2)

In the Source window, type the following command.

Highlight las in the Source window

Here, we have used las argument.

las equal to 2 produces labels which are at right angles to the axis.

Highlight Run button in the Source window Run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window Now the names for all the movies are displayed on X-axis.

For example, Filly Brown has an IMDB rating of approximately 6.

Highlight the plot in the Plots window However, longer names are being truncated.

We can add more arguments to barplot function for adjusting labels.

For more information, please refer to the Additional Material section on this website.

Close this plot.
Highlight movies in the Source window In the Source window, click on movies.
Highlight imdb_rating and audience_score in the Source window Let us analyze the relation between imdb underscore rating and audience underscore score.

For this, we will draw a scatter plot with these two objects by using plot function.

Remember, we have already learnt how to plot a single object.

Show Slide

Scatter Plot

* Scatter plot is a graph in which the values of two variables are plotted along two axes.
  • The pattern of the resulting points reveals the correlation.
Let us switch to RStudio.
Highlight barPlots.R in the Source window In the Source window, click on the script barPlots.R
[RStudio]

plot(x = movies$imdb_rating,

y = movies$audience_score,

main = "IMDB Rating vs Audience Score",

xlab = "IMDB Rating",

ylab = "Audience Score",

xlim = c(0,10),

ylim = c(0,100),

col = "blue")

In the Source window, type the following command.
Highlight plot function in the Source window Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis.
Highlight xlim in the Source window As imdb underscore rating of any movie varies between 0 and 10, we have set the range of values on X-axis from 0 to 10.
Highlight ylim in the Source window Similarly, we have set the range of values on Y-axis from 0 to 100.
Highlight Run button in the Source window Save the script and run the current line.
Highlight Files and Plots window In the Plots window, click on Zoom to maximize the plot.
Highlight the plot in the Plots window We can observe that the movies having higher imdb underscore rating has a high audience underscore score.
Close this plot.
Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score.

For this, we use cor function.

[RStudio]

cor(movies$imdb_rating, movies$audience_score)

In the Source window, type the following command.
Highlight Run button in the Source window Save the script and run the current line.
Highlight the output in the Console window The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865.
Highlight the output in the Console window The value of correlation coefficient is always between -1 and +1.

A positive value indicates that the variables are positively related.

Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt how to:
  • Plot bar charts
  • Plot scatter plot
  • Find the correlation coefficient between two objects
Show slide

Assignment

We now suggest an assignment.
  • Read the file moviesData.csv. Create a bar chart of critics underscore score for the first 10 movies.
  • Create a scatter plot of imdb underscore rating and imdb underscore num underscore votes to see their relation.
  • Save both the plots.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst