Difference between revisions of "R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English"
Sudhakarst (Talk | contribs) |
Nancyvarkey (Talk | contribs) |
||
Line 39: | Line 39: | ||
System Specifications | System Specifications | ||
− | || This tutorial is recorded on* '''Ubuntu Linux '''OS version '''16.04''' | + | || This tutorial is recorded on |
+ | * '''Ubuntu Linux '''OS version '''16.04''' | ||
* '''R''' version '''3.4.4''' | * '''R''' version '''3.4.4''' | ||
* '''RStudio''' version '''1.1.463''' | * '''RStudio''' version '''1.1.463''' | ||
Line 49: | Line 50: | ||
Download Files | Download Files | ||
|| For this tutorial, we will use | || For this tutorial, we will use | ||
− | * A | + | * A '''data frame moviesData.csv''' |
− | * A script file '''barPlots.R'''. | + | * A '''script''' file '''barPlots.R'''. |
Please download these files from the '''Code files''' link of this tutorial. | Please download these files from the '''Code files''' link of this tutorial. | ||
Line 73: | Line 74: | ||
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
− | || '''movies | + | || '''movies data frame''' opens in the '''Source''' window. |
|- | |- | ||
|| Highlight '''dim(movies)''' in the '''Console''' window | || Highlight '''dim(movies)''' in the '''Console''' window | ||
− | || It has 600 observations of 31 variables. | + | || It has 600 '''observations''' of 31 '''variables'''. |
|- | |- | ||
|| Highlight the scroll bar in the '''Source''' window | || Highlight the scroll bar in the '''Source''' window | ||
− | || In the '''Source''' window, scroll from left to right. This will enable us to see the remaining objects of '''movies | + | || In the '''Source''' window, scroll from left to right. |
+ | |||
+ | This will enable us to see the remaining objects of '''movies data frame'''. | ||
|- | |- | ||
|| Highlight '''imdb_rating''' in the '''Source''' window | || Highlight '''imdb_rating''' in the '''Source''' window | ||
− | || Now, we will learn how to draw a bar chart of the object named '''imdb | + | || Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''. |
|- | |- | ||
|| Show slide | || Show slide | ||
Line 88: | Line 91: | ||
Bar Chart | Bar Chart | ||
|| | || | ||
− | * A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the variable. | + | * A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''. |
− | * R uses the | + | * '''R''' uses the '''function barplot''' to create bar charts. |
|- | |- | ||
Line 95: | Line 98: | ||
|| Let us switch to '''RStudio'''. | || Let us switch to '''RStudio'''. | ||
|- | |- | ||
− | || Highlight '''movies''' in the '''Source''' window | + | || Highlight '''movies''' in the '''Source''' window. |
− | || For the sake of simplicity, we are considering only the first 20 observations of '''movies''' to draw a | + | || For the sake of simplicity, we are considering only the first 20 '''observations''' of '''movies''' to draw a bar chart. |
|- | |- | ||
|| Highlight '''barPlots.R '''in the '''Source''' window | || Highlight '''barPlots.R '''in the '''Source''' window | ||
− | || Click on the '''script | + | || Click on the '''script barPlots.R''' |
|- | |- | ||
|| ''' ''' | || ''' ''' | ||
Line 106: | Line 109: | ||
'''moviesSub <- movies[1:20,]''' | '''moviesSub <- movies[1:20,]''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously. | Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously. | ||
Line 113: | Line 116: | ||
|| Let me resize the '''Source''' window. | || Let me resize the '''Source''' window. | ||
|- | |- | ||
− | || Highlight '''moviesSub''' in the '''Environment''' window | + | || Highlight '''moviesSub''' in the '''Environment''' window. |
− | || '''moviesSub''' with 20 observations is loaded in the '''Environment'''. | + | || '''moviesSub''' with 20 '''observations''' is loaded in the '''Environment'''. |
Now, we draw a bar chart of '''imdb_rating''' for these movies. | Now, we draw a bar chart of '''imdb_rating''' for these movies. | ||
Line 131: | Line 134: | ||
'''main="Movies' IMDB Rating")''' | '''main="Movies' IMDB Rating")''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
+ | |||
|- | |- | ||
|| Highlight '''barplot''' in the '''Source''' window | || Highlight '''barplot''' in the '''Source''' window | ||
− | || Here, we have used the following arguments: | + | || Here, we have used the following '''arguments''': |
− | * '''moviesSub dollar sign imdb | + | * '''moviesSub dollar sign imdb underscore rating '''is the data for plotting |
* '''ylab''' and '''xlab''' for adding labels to the respective axes | * '''ylab''' and '''xlab''' for adding labels to the respective axes | ||
* '''col''' to set the color of bins | * '''col''' to set the color of bins | ||
Line 142: | Line 146: | ||
|- | |- | ||
− | || Highlight '''Run''' button in the '''Source''' window | + | || Highlight '''Run''' button in the '''Source''' window. |
|| Run the current line. | || Run the current line. | ||
|- | |- | ||
Line 160: | Line 164: | ||
|- | |- | ||
|| | || | ||
− | || So, we will add more arguments in '''barplot ''' | + | || So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis. |
|- | |- | ||
|| | || | ||
Line 178: | Line 182: | ||
'''names.arg=moviesSub$title)''' | '''names.arg=moviesSub$title)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''names.arg''' in the '''Source''' window | || Highlight '''names.arg''' in the '''Source''' window | ||
− | || Here, we have used the | + | || Here, we have used the '''argument names.arg''' and set it to '''title'''. |
Remember, '''title''' column in '''moviesSub''' contains the names of movies. | Remember, '''title''' column in '''moviesSub''' contains the names of movies. | ||
|- | |- | ||
− | || Highlight '''Run''' button in the '''Source''' window | + | || Highlight '''Run''' button in the '''Source''' window. |
|| Run the current line. | || Run the current line. | ||
|- | |- | ||
Line 218: | Line 222: | ||
'''las = 2)''' | '''las = 2)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| | || | ||
Highlight '''las''' in the '''Source''' window | Highlight '''las''' in the '''Source''' window | ||
− | || Here, we have used '''las''' | + | || Here, we have used '''las argument'''. |
'''las '''equal to''' 2''' produces labels which are at right angles to the axis. | '''las '''equal to''' 2''' produces labels which are at right angles to the axis. | ||
Line 241: | Line 245: | ||
|| However, longer names are being truncated. | || However, longer names are being truncated. | ||
− | We can add more arguments to '''barplot''' | + | We can add more '''arguments''' to '''barplot function''' for adjusting labels. |
For more information, please refer to the '''Additional Material''' section on this website. | For more information, please refer to the '''Additional Material''' section on this website. | ||
Line 252: | Line 256: | ||
|- | |- | ||
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window | || Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window | ||
− | || Let us analyze the relation between '''imdb | + | || Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score. ''' |
− | For this, we will draw a '''scatter plot''' with these two objects by using '''plot''' | + | For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''. |
Remember, we have already learnt how to plot a single object. | Remember, we have already learnt how to plot a single object. | ||
Line 262: | Line 266: | ||
Scatter Plot | Scatter Plot | ||
|| | || | ||
− | * '''Scatter plot''' is a graph in which the values of two variables are plotted along two axes. | + | * '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes. |
* The pattern of the resulting points reveals the correlation. | * The pattern of the resulting points reveals the correlation. | ||
Line 270: | Line 274: | ||
|- | |- | ||
|| Highlight '''barPlots.R '''in the '''Source''' window | || Highlight '''barPlots.R '''in the '''Source''' window | ||
− | || In the '''Source''' window, click on the '''script | + | || In the '''Source''' window, click on the '''script barPlots.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 289: | Line 293: | ||
'''col = "blue") ''' | '''col = "blue") ''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
− | || Highlight '''plot''' | + | || Highlight '''plot function''' in the '''Source''' window |
− | || Here, we have kept '''imdb | + | || Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score '''on the Y-axis. |
|- | |- | ||
|| Highlight '''xlim''' in the '''Source''' window | || Highlight '''xlim''' in the '''Source''' window | ||
− | || As '''imdb | + | || |
+ | *As '''imdb underscore rating''' of any movie varies between 0 and 10, | ||
+ | *we have set the range of values on X-axis from 0 to 10. | ||
|- | |- | ||
|| Highlight '''ylim''' in the '''Source''' window | || Highlight '''ylim''' in the '''Source''' window | ||
Line 301: | Line 307: | ||
|- | |- | ||
|| Highlight '''Run''' button in the '''Source''' window | || Highlight '''Run''' button in the '''Source''' window | ||
− | || Save the script and run the current line. | + | || Save the '''script''' and run the current line. |
|- | |- | ||
|| Highlight '''Files''' and '''Plots''' window | || Highlight '''Files''' and '''Plots''' window | ||
Line 307: | Line 313: | ||
|- | |- | ||
|| Highlight the plot in the '''Plots''' window | || Highlight the plot in the '''Plots''' window | ||
− | || We can observe that the movies having higher '''imdb | + | || We can observe that the movies having higher '''imdb underscore rating '''has a high '''audience underscore score'''. |
|- | |- | ||
|| | || | ||
Line 313: | Line 319: | ||
|- | |- | ||
|| | || | ||
− | || Now we will learn how to calculate the correlation coefficient between '''imdb | + | || Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''. |
− | For this, we use '''cor''' | + | For this, we use '''cor function'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''cor(movies$imdb_rating, movies$audience_score)''' | '''cor(movies$imdb_rating, movies$audience_score)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''Run''' button in the '''Source''' window | || Highlight '''Run''' button in the '''Source''' window | ||
Line 326: | Line 332: | ||
|- | |- | ||
|| Highlight the output in the '''Console''' window | || Highlight the output in the '''Console''' window | ||
− | || The correlation coefficient between '''imdb | + | || The correlation coefficient between '''imdb underscore rating '''and '''audience underscore score''' is evaluated as 0.865. |
|- | |- | ||
|| Highlight the output in the '''Console''' window | || Highlight the output in the '''Console''' window | ||
|| The value of correlation coefficient is always between -1 and +1. | || The value of correlation coefficient is always between -1 and +1. | ||
− | A positive value indicates that the variables are positively related. | + | A positive value indicates that the '''variables''' are positively related. |
|- | |- | ||
|| | || | ||
Line 349: | Line 355: | ||
Assignment | Assignment | ||
|| We now suggest an assignment. | || We now suggest an assignment. | ||
− | * Read the file '''moviesData.csv'''. Create a bar chart of '''critics | + | * Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies. |
− | * Create a '''scatter plot''' of''' imdb | + | * Create a '''scatter plot''' of''' imdb underscore rating''' and '''imdb underscore num underscore votes''' to see their relation. |
* Save both the plots. | * Save both the plots. | ||
Revision as of 08:18, 31 May 2019
Title of the script: Plotting Bar Charts and Scatter Plots
Author: Tushar Bajaj (TISS Mumbai) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, graphs, bar chart, labels, scatter plot, correlation, video tutorial, spoken tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to this tutorial on Plotting bar charts and scatter plot. |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight moviesData.csv and barPlots.R in the folder Plots |
I have downloaded and moved these files to Plots folder.
This folder is located in myProject folder on my Desktop. I have also set Plots folder as my Working Directory. |
Let us switch to Rstudio. | |
Highlight barPlots.R in the Files window of RStudio | Open the script barPlots.R in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight movies in the Source window | movies data frame opens in the Source window. |
Highlight dim(movies) in the Console window | It has 600 observations of 31 variables. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right.
This will enable us to see the remaining objects of movies data frame. |
Highlight imdb_rating in the Source window | Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies. |
Show slide
Bar Chart |
|
Let us switch to RStudio. | |
Highlight movies in the Source window. | For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart. |
Highlight barPlots.R in the Source window | Click on the script barPlots.R |
[Rstudio] moviesSub <- movies[1:20,] |
In the Source window, type the following command.
Save the script and run the current line by pressing Ctrl + Enter keys simultaneously. |
Let me resize the Source window. | |
Highlight moviesSub in the Environment window. | moviesSub with 20 observations is loaded in the Environment.
Now, we draw a bar chart of imdb_rating for these movies. |
[RStudio]
barplot(moviesSub$imdb_rating, ylab="IMDB Rating", xlab = "Movies", col="blue", ylim=c(0,10), main="Movies' IMDB Rating") |
In the Source window, type the following command. |
Highlight barplot in the Source window | Here, we have used the following arguments:
|
Highlight Run button in the Source window. | Run the current line. |
Highlight the plot in the Plots window | The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight the first bar in the plot | This particular movie has an IMDB rating of approximately 6. |
Highlight the third bar in the plot | Similarly, this particular movie has an IMDB rating of approximately 8.
However, we do not know the name of the movies. |
So, we will add more arguments in barplot function to show the names of movies on X-axis. | |
Close this plot. | |
[RStudio]
barplot(moviesSub$imdb_rating, ylab="IMDB Rating", col="blue", ylim=c(0,10), main="Movies' IMDB Rating", names.arg=moviesSub$title) |
In the Source window, type the following command. |
Highlight names.arg in the Source window | Here, we have used the argument names.arg and set it to title.
Remember, title column in moviesSub contains the names of movies. |
Highlight Run button in the Source window. | Run the current line. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight X-axis of the plot | Now, the names of movies are displayed on the X-axis.
But not for all movies. This is due to the point that the names are too long to be accommodated. That’s why, we will make these names perpendicular to X-axis. |
Close this plot. | |
[RStudio]
barplot(moviesSub$imdb_rating, ylab="IMDB Rating", col="blue", ylim=c(0,10), main="Movies' IMDB Rating", names.arg=moviesSub$title, las = 2) |
In the Source window, type the following command. |
Highlight las in the Source window |
Here, we have used las argument.
las equal to 2 produces labels which are at right angles to the axis. |
Highlight Run button in the Source window | Run the current line. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight the plot in the Plots window | Now the names for all the movies are displayed on X-axis.
For example, Filly Brown has an IMDB rating of approximately 6. |
Highlight the plot in the Plots window | However, longer names are being truncated.
We can add more arguments to barplot function for adjusting labels. For more information, please refer to the Additional Material section on this website. |
Close this plot. | |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight imdb_rating and audience_score in the Source window | Let us analyze the relation between imdb underscore rating and audience underscore score.
For this, we will draw a scatter plot with these two objects by using plot function. Remember, we have already learnt how to plot a single object. |
Show Slide
Scatter Plot |
|
Let us switch to RStudio. | |
Highlight barPlots.R in the Source window | In the Source window, click on the script barPlots.R |
[RStudio]
plot(x = movies$imdb_rating, y = movies$audience_score, main = "IMDB Rating vs Audience Score", xlab = "IMDB Rating", ylab = "Audience Score", xlim = c(0,10), ylim = c(0,100), col = "blue") |
In the Source window, type the following command. |
Highlight plot function in the Source window | Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis. |
Highlight xlim in the Source window |
|
Highlight ylim in the Source window | Similarly, we have set the range of values on Y-axis from 0 to 100. |
Highlight Run button in the Source window | Save the script and run the current line. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight the plot in the Plots window | We can observe that the movies having higher imdb underscore rating has a high audience underscore score. |
Close this plot. | |
Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score.
For this, we use cor function. | |
[RStudio]
cor(movies$imdb_rating, movies$audience_score) |
In the Source window, type the following command. |
Highlight Run button in the Source window | Save the script and run the current line. |
Highlight the output in the Console window | The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865. |
Highlight the output in the Console window | The value of correlation coefficient is always between -1 and +1.
A positive value indicates that the variables are positively related. |
Let us summarize what we have learnt. | |
Show slide
Summary |
In this tutorial, we have learnt how to:
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |