Difference between revisions of "R/C2/Plotting-Bar-Charts-and-Scatter-Plot/English"
Nancyvarkey (Talk | contribs) |
|||
Line 21: | Line 21: | ||
* Plot '''bar charts''' | * Plot '''bar charts''' | ||
* Plot '''scatter plot''' | * Plot '''scatter plot''' | ||
− | * Find the correlation coefficient between two objects | + | * Find the correlation coefficient between two objects. |
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | Pre-requisites | + | '''Pre-requisites''' |
− | https://spoken-tutorial.org | + | '''https://spoken-tutorial.org''' |
|| To understand this tutorial, you should know, | || To understand this tutorial, you should know, | ||
Line 36: | Line 36: | ||
If not, please locate the relevant tutorials '''R''' on this website. | If not, please locate the relevant tutorials '''R''' on this website. | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | System | + | '''System Specification'''s |
|| This tutorial is recorded on | || This tutorial is recorded on | ||
* '''Ubuntu Linux '''OS version '''16.04''' | * '''Ubuntu Linux '''OS version '''16.04''' | ||
Line 46: | Line 46: | ||
Install '''R''' version '''3.2.0''' or higher. | Install '''R''' version '''3.2.0''' or higher. | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | Download Files | + | '''Download Files''' |
|| For this tutorial, we will use | || For this tutorial, we will use | ||
* A '''data frame moviesData.csv''' | * A '''data frame moviesData.csv''' | ||
Line 55: | Line 55: | ||
Please download these files from the '''Code files''' link of this tutorial. | Please download these files from the '''Code files''' link of this tutorial. | ||
|- | |- | ||
− | || | + | || Point to the files in the Plots folder. |
Highlight '''moviesData.csv '''and''' barPlots.R '''in the folder '''Plots ''' | Highlight '''moviesData.csv '''and''' barPlots.R '''in the folder '''Plots ''' | ||
Line 64: | Line 64: | ||
I have also set '''Plots''' folder as my '''Working Directory.''' | I have also set '''Plots''' folder as my '''Working Directory.''' | ||
|- | |- | ||
− | || | + | || cursor on the interface. |
|| Let us switch to '''Rstudio'''. | || Let us switch to '''Rstudio'''. | ||
|- | |- | ||
Line 87: | Line 87: | ||
|| Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''. | || Now, we will learn how to draw a bar chart of the object named '''imdb underscore rating''' in '''movies'''. | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | Bar Chart | + | '''Bar Chart''' |
|| | || | ||
* A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''. | * A '''bar chart '''represents data in rectangular bars with length of the bar proportional to the value of the '''variable'''. | ||
Line 104: | Line 104: | ||
|| Click on the '''script barPlots.R''' | || Click on the '''script barPlots.R''' | ||
|- | |- | ||
− | || | + | || Cursor on the interface. |
'''[Rstudio]''' | '''[Rstudio]''' | ||
Line 111: | Line 111: | ||
|| In the '''Source''' window, type the following '''command'''. | || In the '''Source''' window, type the following '''command'''. | ||
− | Save the '''script''' and run the current line by pressing '''Ctrl + Enter '''keys simultaneously. | + | Save the '''script''' and run the current line by pressing '''Ctrl + Enter''' keys simultaneously. |
|- | |- | ||
− | || | + | || Drag the boundary to resize the window. |
|| Let me resize the '''Source''' window. | || Let me resize the '''Source''' window. | ||
|- | |- | ||
Line 121: | Line 121: | ||
Now, we draw a bar chart of '''imdb_rating''' for these movies. | Now, we draw a bar chart of '''imdb_rating''' for these movies. | ||
|- | |- | ||
− | || [RStudio] | + | ||Cursor in the '''Source''' window. |
+ | |||
+ | [RStudio] | ||
'''barplot(moviesSub$imdb_rating,''' | '''barplot(moviesSub$imdb_rating,''' | ||
Line 140: | Line 142: | ||
|| Here, we have used the following '''arguments''': | || Here, we have used the following '''arguments''': | ||
* '''moviesSub dollar sign imdb underscore rating '''is the data for plotting | * '''moviesSub dollar sign imdb underscore rating '''is the data for plotting | ||
− | * '''ylab''' and '''xlab''' for adding labels to the respective axes | + | * '''ylab''' and '''xlab''' for adding labels to the respective axes. |
* '''col''' to set the color of bins | * '''col''' to set the color of bins | ||
* '''ylim''' to set the range of values on Y-axis | * '''ylim''' to set the range of values on Y-axis | ||
− | * '''main''' for adding a title to the bar chart | + | * '''main''' for adding a title to the bar chart. |
|- | |- | ||
Line 150: | Line 152: | ||
|- | |- | ||
|| Highlight the plot in the '''Plots''' window | || Highlight the plot in the '''Plots''' window | ||
− | || The bar chart is displayed with '''Movies''' on X-axis and their '''imdb_rating '''on Y-axis. | + | || The bar chart is displayed with '''Movies''' on X-axis and their '''imdb_rating''' on Y-axis. |
|- | |- | ||
|| Highlight '''Files''' and '''Plots''' window | || Highlight '''Files''' and '''Plots''' window | ||
Line 163: | Line 165: | ||
However, we do not know the name of the movies. | However, we do not know the name of the movies. | ||
|- | |- | ||
− | || | + | ||Cursor on the plot window. |
|| So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis. | || So, we will add more '''arguments''' in '''barplot function''' to show the names of movies on X-axis. | ||
|- | |- | ||
− | || | + | || Click on the Close button. |
|| Close this plot. | || Close this plot. | ||
|- | |- | ||
Line 204: | Line 206: | ||
That’s why, we will make these names perpendicular to X-axis. | That’s why, we will make these names perpendicular to X-axis. | ||
|- | |- | ||
− | || | + | || Click on the Close button. |
|| Close this plot. | || Close this plot. | ||
|- | |- | ||
Line 224: | Line 226: | ||
|| In the '''Source''' window, type the following '''command'''. | || In the '''Source''' window, type the following '''command'''. | ||
|- | |- | ||
− | || | + | || Highlight '''las''' in the '''Source''' window |
− | + | ||
− | Highlight '''las''' in the '''Source''' window | + | |
|| Here, we have used '''las argument'''. | || Here, we have used '''las argument'''. | ||
Line 234: | Line 234: | ||
|| Run the current line. | || Run the current line. | ||
|- | |- | ||
− | || Highlight '''Files''' and '''Plots''' window | + | || Highlight '''Files''' and '''Plots''' window. |
|| In the '''Plots''' window, click on '''Zoom''' to maximize the plot. | || In the '''Plots''' window, click on '''Zoom''' to maximize the plot. | ||
|- | |- | ||
Line 249: | Line 249: | ||
For more information, please refer to the '''Additional Material''' section on this website. | For more information, please refer to the '''Additional Material''' section on this website. | ||
|- | |- | ||
− | || | + | || click on the Close button. |
|| Close this plot. | || Close this plot. | ||
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
− | || In the '''Source''' window, click on '''movies | + | || In the '''Source''' window, click on '''movies'''. |
|- | |- | ||
|| Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window | || Highlight '''imdb_rating''' and '''audience_score '''in the '''Source''' window | ||
− | || Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score | + | || Let us analyze the relation between '''imdb underscore rating''' and '''audience underscore score'''. |
For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''. | For this, we will draw a '''scatter plot''' with these two objects by using '''plot function'''. | ||
Line 262: | Line 262: | ||
Remember, we have already learnt how to plot a single object. | Remember, we have already learnt how to plot a single object. | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Scatter Plot | + | '''Scatter Plot''' |
|| | || | ||
* '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes. | * '''Scatter plot''' is a graph in which the values of two '''variables''' are plotted along two axes. | ||
Line 296: | Line 296: | ||
|- | |- | ||
|| Highlight '''plot function''' in the '''Source''' window | || Highlight '''plot function''' in the '''Source''' window | ||
− | || Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score '''on the Y-axis. | + | || Here, we have kept '''imdb underscore rating''' on the X-axis and '''audience underscore score''' on the Y-axis. |
|- | |- | ||
|| Highlight '''xlim''' in the '''Source''' window | || Highlight '''xlim''' in the '''Source''' window | ||
|| | || | ||
− | *As '''imdb underscore rating''' of any movie varies between 0 and 10, | + | * As '''imdb underscore rating''' of any movie varies between 0 and 10, |
− | *we have set the range of values on X-axis from 0 to 10. | + | * we have set the range of values on X-axis from 0 to 10. |
|- | |- | ||
|| Highlight '''ylim''' in the '''Source''' window | || Highlight '''ylim''' in the '''Source''' window | ||
Line 313: | Line 313: | ||
|- | |- | ||
|| Highlight the plot in the '''Plots''' window | || Highlight the plot in the '''Plots''' window | ||
− | || We can observe that the movies having higher '''imdb underscore rating '''has a high '''audience underscore score'''. | + | || We can observe that the movies having higher '''imdb underscore rating''' has a high '''audience underscore score'''. |
|- | |- | ||
− | || | + | || Click on the close button. |
|| Close this plot. | || Close this plot. | ||
|- | |- | ||
− | || | + | || Cursor on the interface. |
|| Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''. | || Now we will learn how to calculate the correlation coefficient between '''imdb underscore rating '''and '''audience underscore score'''. | ||
Line 332: | Line 332: | ||
|- | |- | ||
|| Highlight the output in the '''Console''' window | || Highlight the output in the '''Console''' window | ||
− | || The correlation coefficient between '''imdb underscore rating '''and '''audience underscore score''' is evaluated as 0.865. | + | || The correlation coefficient between '''imdb underscore rating''' and '''audience underscore score''' is evaluated as 0.865. |
|- | |- | ||
|| Highlight the output in the '''Console''' window | || Highlight the output in the '''Console''' window | ||
Line 342: | Line 342: | ||
|| Let us summarize what we have learnt. | || Let us summarize what we have learnt. | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | Summary | + | '''Summary''' |
|| In this tutorial, we have learnt how to: | || In this tutorial, we have learnt how to: | ||
* Plot '''bar charts''' | * Plot '''bar charts''' | ||
Line 351: | Line 351: | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | Assignment | + | '''Assignment''' |
|| We now suggest an assignment. | || We now suggest an assignment. | ||
* Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies. | * Read the file '''moviesData.csv'''. Create a bar chart of '''critics underscore score''' for the first 10 movies. | ||
Line 360: | Line 360: | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | About the Spoken Tutorial Project | + | '''About the Spoken Tutorial Project''' |
|| The video at the following link summarises the Spoken Tutorial project. | || The video at the following link summarises the Spoken Tutorial project. | ||
Please download and watch it. | Please download and watch it. | ||
|- | |- | ||
− | || Show slide | + | || '''Show slide''' |
− | Spoken Tutorial Workshops | + | '''Spoken Tutorial Workshops''' |
|| We conduct workshops using Spoken Tutorials and give certificates. | || We conduct workshops using Spoken Tutorials and give certificates. | ||
Please contact us. | Please contact us. | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Forum to answer questions | + | '''Forum to answer questions''' |
|| Please post your timed queries in this forum. | || Please post your timed queries in this forum. | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Forum to answer questions | + | '''Forum to answer questions''' |
|| Please post your general queries in this forum. | || Please post your general queries in this forum. | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Textbook Companion | + | '''Textbook Companion''' |
|| The '''FOSSEE '''team coordinates the '''TBC '''project. | || The '''FOSSEE '''team coordinates the '''TBC '''project. | ||
For more details, please visit these sites. | For more details, please visit these sites. | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Acknowledgment | + | '''Acknowledgment''' |
|| The Spoken Tutorial project is funded by '''NMEICT''', '''MHRD''', Govt. of India | || The Spoken Tutorial project is funded by '''NMEICT''', '''MHRD''', Govt. of India | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Thank You | + | '''Thank You''' |
|| The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai). | || The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai). | ||
Latest revision as of 23:27, 1 June 2019
Title of the script: Plotting Bar Charts and Scatter Plots
Author: Tushar Bajaj (TISS Mumbai) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, graphs, bar chart, labels, scatter plot, correlation, video tutorial, spoken tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to this tutorial on Plotting bar charts and scatter plot. |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
Point to the files in the Plots folder.
Highlight moviesData.csv and barPlots.R in the folder Plots |
I have downloaded and moved these files to Plots folder.
This folder is located in myProject folder on my Desktop. I have also set Plots folder as my Working Directory. |
cursor on the interface. | Let us switch to Rstudio. |
Highlight barPlots.R in the Files window of RStudio | Open the script barPlots.R in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight movies in the Source window | movies data frame opens in the Source window. |
Highlight dim(movies) in the Console window | It has 600 observations of 31 variables. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right.
This will enable us to see the remaining objects of movies data frame. |
Highlight imdb_rating in the Source window | Now, we will learn how to draw a bar chart of the object named imdb underscore rating in movies. |
Show slide
Bar Chart |
|
Let us switch to RStudio. | |
Highlight movies in the Source window. | For the sake of simplicity, we are considering only the first 20 observations of movies to draw a bar chart. |
Highlight barPlots.R in the Source window | Click on the script barPlots.R |
Cursor on the interface.
[Rstudio] moviesSub <- movies[1:20,] |
In the Source window, type the following command.
Save the script and run the current line by pressing Ctrl + Enter keys simultaneously. |
Drag the boundary to resize the window. | Let me resize the Source window. |
Highlight moviesSub in the Environment window. | moviesSub with 20 observations is loaded in the Environment.
Now, we draw a bar chart of imdb_rating for these movies. |
Cursor in the Source window.
[RStudio] barplot(moviesSub$imdb_rating, ylab="IMDB Rating", xlab = "Movies", col="blue", ylim=c(0,10), main="Movies' IMDB Rating") |
In the Source window, type the following command. |
Highlight barplot in the Source window | Here, we have used the following arguments:
|
Highlight Run button in the Source window. | Run the current line. |
Highlight the plot in the Plots window | The bar chart is displayed with Movies on X-axis and their imdb_rating on Y-axis. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight the first bar in the plot | This particular movie has an IMDB rating of approximately 6. |
Highlight the third bar in the plot | Similarly, this particular movie has an IMDB rating of approximately 8.
However, we do not know the name of the movies. |
Cursor on the plot window. | So, we will add more arguments in barplot function to show the names of movies on X-axis. |
Click on the Close button. | Close this plot. |
[RStudio]
barplot(moviesSub$imdb_rating, ylab="IMDB Rating", col="blue", ylim=c(0,10), main="Movies' IMDB Rating", names.arg=moviesSub$title) |
In the Source window, type the following command. |
Highlight names.arg in the Source window | Here, we have used the argument names.arg and set it to title.
Remember, title column in moviesSub contains the names of movies. |
Highlight Run button in the Source window. | Run the current line. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight X-axis of the plot | Now, the names of movies are displayed on the X-axis.
But not for all movies. This is due to the point that the names are too long to be accommodated. That’s why, we will make these names perpendicular to X-axis. |
Click on the Close button. | Close this plot. |
[RStudio]
barplot(moviesSub$imdb_rating, ylab="IMDB Rating", col="blue", ylim=c(0,10), main="Movies' IMDB Rating", names.arg=moviesSub$title, las = 2) |
In the Source window, type the following command. |
Highlight las in the Source window | Here, we have used las argument.
las equal to 2 produces labels which are at right angles to the axis. |
Highlight Run button in the Source window | Run the current line. |
Highlight Files and Plots window. | In the Plots window, click on Zoom to maximize the plot. |
Highlight the plot in the Plots window | Now the names for all the movies are displayed on X-axis.
For example, Filly Brown has an IMDB rating of approximately 6. |
Highlight the plot in the Plots window | However, longer names are being truncated.
We can add more arguments to barplot function for adjusting labels. For more information, please refer to the Additional Material section on this website. |
click on the Close button. | Close this plot. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight imdb_rating and audience_score in the Source window | Let us analyze the relation between imdb underscore rating and audience underscore score.
For this, we will draw a scatter plot with these two objects by using plot function. Remember, we have already learnt how to plot a single object. |
Show Slide
Scatter Plot |
|
Let us switch to RStudio. | |
Highlight barPlots.R in the Source window | In the Source window, click on the script barPlots.R |
[RStudio]
plot(x = movies$imdb_rating, y = movies$audience_score, main = "IMDB Rating vs Audience Score", xlab = "IMDB Rating", ylab = "Audience Score", xlim = c(0,10), ylim = c(0,100), col = "blue") |
In the Source window, type the following command. |
Highlight plot function in the Source window | Here, we have kept imdb underscore rating on the X-axis and audience underscore score on the Y-axis. |
Highlight xlim in the Source window |
|
Highlight ylim in the Source window | Similarly, we have set the range of values on Y-axis from 0 to 100. |
Highlight Run button in the Source window | Save the script and run the current line. |
Highlight Files and Plots window | In the Plots window, click on Zoom to maximize the plot. |
Highlight the plot in the Plots window | We can observe that the movies having higher imdb underscore rating has a high audience underscore score. |
Click on the close button. | Close this plot. |
Cursor on the interface. | Now we will learn how to calculate the correlation coefficient between imdb underscore rating and audience underscore score.
For this, we use cor function. |
[RStudio]
cor(movies$imdb_rating, movies$audience_score) |
In the Source window, type the following command. |
Highlight Run button in the Source window | Save the script and run the current line. |
Highlight the output in the Console window | The correlation coefficient between imdb underscore rating and audience underscore score is evaluated as 0.865. |
Highlight the output in the Console window | The value of correlation coefficient is always between -1 and +1.
A positive value indicates that the variables are positively related. |
Let us summarize what we have learnt. | |
Show slide
Summary |
In this tutorial, we have learnt how to:
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Tushar Bajaj (TISS Mumbai).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |