Difference between revisions of "R/C2/Introduction-to-ggplot2/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
(8 intermediate revisions by 3 users not shown)
Line 4: Line 4:
  
 
'''Keywords''': R, RStudio, graphics, plot, ggplot2, ggplot, video tutorial, spoken tutorial 
 
'''Keywords''': R, RStudio, graphics, plot, ggplot2, ggplot, video tutorial, spoken tutorial 
 +
  
 
{| border =1
 
{| border =1
Line 9: Line 10:
 
|'''Narration'''
 
|'''Narration'''
 
|-
 
|-
|| Show slide
+
|| '''Show slide
  
Opening Slide
+
Opening Slide'''
 
|| Welcome to this tutorial on '''Introduction to ggplot2'''.  
 
|| Welcome to this tutorial on '''Introduction to ggplot2'''.  
 
|-  
 
|-  
||  
+
|| '''Show slide
  
Show slide
+
Learning Objective'''
 
+
Learning Objective
+
  
 
|| In this tutorial, we will learn about
 
|| In this tutorial, we will learn about
* Need for''' data visualization'''
+
* Need for '''data visualization'''
* Basic '''plot''' function in '''R'''
+
* Basic '''plot function''' in '''R'''
* '''ggplot2 '''package
+
* '''ggplot2 package'''
  
 
|-  
 
|-  
||  
+
|| '''Show slide
 
+
Show slide
+
  
Pre-requisites
+
Pre-requisites'''
  
 
[https://spoken-tutorial.org/ https://spoken-tutorial.org/]  
 
[https://spoken-tutorial.org/ https://spoken-tutorial.org/]  
 
|| To understand this tutorial, you should know,  
 
|| To understand this tutorial, you should know,  
 
* Basics of Statistics and
 
* Basics of Statistics and
* Data frames  
+
* '''Data frames'''
  
 
If not, please locate the relevant tutorials on '''R''' on this website.
 
If not, please locate the relevant tutorials on '''R''' on this website.
 
|-  
 
|-  
|| Show slide
+
|| '''Show slide
  
System Specifications
+
System Specifications'''
 
|| This tutorial is recorded on
 
|| This tutorial is recorded on
* '''Ubuntu Linux '''OS version '''16.04'''
+
* '''Ubuntu Linux ''' OS version 16.04
* '''R''' version '''3.4.4'''
+
* '''R''' version 3.4.4
* '''RStudio''' version '''1.1.463'''
+
* '''RStudio''' version 1.1.463
  
Install '''R''' version '''3.2.0''' or higher.  
+
Install '''R''' version 3.2.0 or higher.  
 
|-  
 
|-  
|| Show slide
+
|| '''Show slide'''
  
Download Files
+
'''Download Files'''
|| For this tutorial, we will use
+
|| For this tutorial, we will use,
* A'''data frame''' '''moviesData.csv '''and
+
* A '''data frame moviesData.csv''' and
* A'''script'''file '''ggPlots.R'''.
+
* A '''script''' file '''ggPlots.R'''.
  
 
Please download these files from the '''Code files''' link of this tutorial.  
 
Please download these files from the '''Code files''' link of this tutorial.  
Line 65: Line 62:
 
This folder is located in '''myProject''' folder on my '''Desktop'''.
 
This folder is located in '''myProject''' folder on my '''Desktop'''.
  
I have also set '''ggPlots''' folder as my '''Working Directory.'''  
+
I have also set '''ggPlots''' folder as my '''Working Directory'''.
 
|-  
 
|-  
 
||  
 
||  
|| Now let us learn about visualization.
+
|| Now let us learn about '''visualization'''.
 
|-  
 
|-  
|| Show slide
+
|| '''Show slide'''
  
Need for Data Visualization
+
'''Need for Data Visualization'''
 
||  
 
||  
* '''Visualization''' is an important tool for insight generation.  
+
* '''Visualization''' is an important tool for '''insight generation'''.  
* It is used to understand the data structure, identify outliers and find patterns.
+
* It is used to understand the '''data structure''', identify '''outliers''' and find patterns.
  
 
|-  
 
|-  
||  
+
||'''Show slide'''
Show slide
+
  
Data visualization in R
+
'''Data visualization in R'''
  
|| There are 2 methods of data visualization in '''R''':
+
|| There are 2 methods of '''data visualization''' in '''R''':
 
* Basics graphics and  
 
* Basics graphics and  
 
* Grammar of graphics (popularly known as '''ggplot2''')  
 
* Grammar of graphics (popularly known as '''ggplot2''')  
Line 91: Line 87:
 
|| Let us switch to '''RStudio'''.
 
|| Let us switch to '''RStudio'''.
 
|-  
 
|-  
|| Highlight '''ggPlots.R''' in the '''Files '''window''' '''of '''RStudio '''
+
|| Highlight '''ggPlots.R''' in the '''Files''' window of '''RStudio'''.
|| Open the '''script ggPlots.R '''in''' RStudio'''.  
+
|| Open the '''script ggPlots.R''' in '''RStudio'''.  
 
|-  
 
|-  
 
|| Highlight the '''Source''' button.
 
|| Highlight the '''Source''' button.
Line 100: Line 96:
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Environment''' window  
 
|| Highlight '''movies''' in the '''Environment''' window  
|| '''movies''' '''data frame''' is loaded in the '''workspace'''.  
+
|| '''movies data frame''' is loaded in the '''workspace'''.  
  
 
This '''data frame''' will be used later in this tutorial.  
 
This '''data frame''' will be used later in this tutorial.  
Line 111: Line 107:
  
 
'''plot(x, y)'''
 
'''plot(x, y)'''
|| First, we will plot a '''sine '''curve by taking equally spaced samples.
+
|| First, we will plot a '''sine''' curve by taking equally spaced samples.
  
In the '''Source''' window, type the following commands.  
+
In the '''Source''' window, type the following '''commands'''.  
 
|-  
 
|-  
 
|| Highlight '''seq''' in the '''Source''' window  
 
|| Highlight '''seq''' in the '''Source''' window  
|| Here, we have used the '''seq''' function to generate a sequence.
+
|| Here, we have used the '''seq function''' to generate a sequence.
  
This sequence is from minus pi to plus pi with an interval of zero point one.  
+
This sequence is from '''minus pi''' to '''plus pi''' with an interval of zero point one.  
 
|-  
 
|-  
 
|| Highlight '''plot''' in the '''Source''' window  
 
|| Highlight '''plot''' in the '''Source''' window  
|| In''' plot''' command, the first argument is '''x''', and the second argument '''y''' is''' sine '''of''' x.'''
+
|| In '''plot command''', the first '''argument''' is '''x''' and the second '''argument y''' is '''Sine of x'''.
 
|-  
 
|-  
 
|| Highlight '''run''' button in the '''Source''' window
 
|| Highlight '''run''' button in the '''Source''' window
|| Save the '''script''' and run the last three lines of code by pressing '''Ctrl + Enter '''keys simultaneously.  
+
|| Save the '''script''' and run the last three lines of code by pressing '''Ctrl + Enter''' keys simultaneously.  
 
|-  
 
|-  
 
|| Highlight the plot in the '''Plots''' window
 
|| Highlight the plot in the '''Plots''' window
Line 130: Line 126:
 
|-  
 
|-  
 
|| Highlight '''Plots''' window  
 
|| Highlight '''Plots''' window  
|| In the '''Plots''' window, click on the''' Zoom '''button to maximize the plot.  
+
|| In the '''Plots''' window, click on the '''Zoom''' button to maximize the plot.  
 
|-  
 
|-  
 
|| Highlight the plot  
 
|| Highlight the plot  
Line 136: Line 132:
 
|-  
 
|-  
 
|| Click on X button to close.
 
|| Click on X button to close.
|| Click on''' Close (X)''' button to close this plot.  
+
|| Click on '''Close (X)''' button to close this plot.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''plot(x, y, main="Plotting a Sine Curve", ylab="sin(x)")'''
 
'''plot(x, y, main="Plotting a Sine Curve", ylab="sin(x)")'''
|| In the '''Source''' window, type the following commands.  
+
|| In the '''Source''' window, type the following '''commands'''.  
  
Here, we have added '''main''' and '''ylab''' arguments to the '''plot''' function.  
+
Here, we have added '''main''' and '''ylab arguments''' to the '''plot function'''.  
 
|-  
 
|-  
|| Highlight '''Run''' button in the '''Source''' window  
+
|| Highlight '''Run''' button in the '''Source''' window.
 
|| Run the current line.  
 
|| Run the current line.  
 
|-  
 
|-  
Line 159: Line 155:
 
|| '''[RStudio]'''
 
|| '''[RStudio]'''
  
'''plot(x, y,main="Plotting sine curve",ylab="sin(x)",type="l",col="blue")'''
+
'''plot(x, y, main="Plotting sine curve", ylab="sin(x)", type="l", col="blue")'''
 
|| Now we will learn how to change the '''type''' of '''plot'''.
 
|| Now we will learn how to change the '''type''' of '''plot'''.
  
In the '''Source''' window, type the following commands.  
+
In the '''Source''' window, type the following '''commands'''.  
 
|-  
 
|-  
|| Highlight '''type''' in the '''plot''' function
+
|| Highlight '''type''' in the '''plot function'''
|| Here, we have used the '''type''' argument and set it to '''l'''.
+
|| Here, we have used the '''type argument''' and set it to '''l'''.
  
 
It means that the type of plot we need is '''lines'''.  
 
It means that the type of plot we need is '''lines'''.  
 
|-  
 
|-  
 
|| Highlight '''col''' in the '''plot''' function  
 
|| Highlight '''col''' in the '''plot''' function  
|| '''col''' equal to '''blue, '''changes the colour of the plot to blue.  
+
|| '''col equal to blue''', changes the colour of the plot to blue.  
 
|-  
 
|-  
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
Line 185: Line 181:
 
|| '''[RStudio] '''
 
|| '''[RStudio] '''
  
'''plot(x, sin(x),main="Plotting Sine and Cosine graphs on the same plot",'''
+
'''plot(x, sin(x), main="Plotting Sine and Cosine graphs on the same plot",'''
  
'''ylab=" ",type="l",col="blue")lines(x,cos(x), col="red")'''
+
'''ylab=" ", type="l", col="blue")'''
|| In the '''Source''' window, type the following commands.  
+
 
 +
'''lines(x, cos(x), col="red")'''
 +
|| In the '''Source''' window, type the following '''commands'''.  
 
|-  
 
|-  
 
|| Highlight '''plot''' in the '''Source''' window  
 
|| Highlight '''plot''' in the '''Source''' window  
|| This command plots '''sine''' of '''x''' using the '''plot''' function.  
+
|| This '''command''' plots '''sine''' of '''x''' using the '''plot function'''.  
 
|-  
 
|-  
 
|| Highlight '''lines''' in the '''Source''' window  
 
|| Highlight '''lines''' in the '''Source''' window  
|| Next, we use '''lines''' function to plot '''cosine''' of '''x'''.  
+
|| Next, we use '''lines function''' to plot '''cosine of x'''.  
 
|-  
 
|-  
 
|| Highlight '''lines''' in the '''Source''' window  
 
|| Highlight '''lines''' in the '''Source''' window  
|| After the first line is plotted, the '''lines''' function is used.
+
|| After the first line is plotted, the '''lines function''' is used.
  
It takes an additional vector '''cos''' of '''x'''
+
It takes an additional '''vector cos of x''' as an input to draw the second line in the plot.  
 
+
as an input to draw the second line in the plot.  
+
 
|-  
 
|-  
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
Line 211: Line 207:
 
Here we can add a '''legend''' to the plot to differentiate between the multiple graphs.
 
Here we can add a '''legend''' to the plot to differentiate between the multiple graphs.
  
For this, we will use '''legend''' function.  
+
For this, we will use '''legend function'''.  
 
|-  
 
|-  
 
|| [RStudio] '''legend("topleft",'''
 
|| [RStudio] '''legend("topleft",'''
Line 218: Line 214:
  
 
'''fill=c("blue","red"))'''
 
'''fill=c("blue","red"))'''
||  
+
|| In the '''Source''' window, type the following '''command'''.  
 
+
In the '''Source''' window, type the following command.  
+
 
|-
 
|-
||  
+
|| Drag the boundary to resize.
 
|| I will resize the '''Source''' window.  
 
|| I will resize the '''Source''' window.  
 
|-  
 
|-  
 
|| Highlight '''topleft''' in the '''Source''' window  
 
|| Highlight '''topleft''' in the '''Source''' window  
|| The first argument refers to the coordinates for placing the '''legend '''in our plot.  
+
|| The first '''argument''' refers to the coordinates for placing the '''legend''' in our plot.  
  
 
We have set the coordinates to '''topleft'''.  
 
We have set the coordinates to '''topleft'''.  
 
|-  
 
|-  
 
|| Highlight '''c("sin(x)", "cos(x)")''' in the '''Source''' window  
 
|| Highlight '''c("sin(x)", "cos(x)")''' in the '''Source''' window  
|| The second argument is the names to be given.  
+
|| The second '''argument''' is the names to be given.  
  
Since we have plotted '''sine''' and '''cosine''' functions, we will pass these two names as a vector.  
+
Since we have plotted '''sine''' and '''cosine functions''', we will pass these two names as a '''vector'''.  
 
|-  
 
|-  
 
|| Highlight '''fill''' in the '''Source''' window  
 
|| Highlight '''fill''' in the '''Source''' window  
|| Next, we have used the '''fill '''argument to specify the graphs by their colors.  
+
|| Next, we have used the '''fill argument''' to specify the graphs by their colors.  
  
Recall that, '''sine '''function is plotted in blue''' '''and '''cosine '''function''' '''in red.  
+
Recall that, '''sine function''' is plotted in blue and '''cosine function''' in red.  
 
|-
 
|-
||  
+
|| Drag the boundary to resize.
 
|| I will resize the '''Plots''' window.
 
|| I will resize the '''Plots''' window.
 
|-  
 
|-  
Line 255: Line 249:
 
|| Close the plot.  
 
|| Close the plot.  
 
|-  
 
|-  
||  
+
|| Cursor on the interface.
|| So far, we have discussed the '''basic graphics''' in R language.  
+
|| So far, we have discussed the basic graphics in '''R language'''.  
  
Now, we will learn about the '''grammar of graphics''' by using '''ggplot2''' package.  
+
Now, we will learn about the '''grammar of graphics''' by using '''ggplot2 package'''.  
 
|-  
 
|-  
 +
|| '''Show slide'''
 +
 +
'''Introduction to  ggplot2  package'''
 
||  
 
||  
 
+
* '''ggplot2 package''' was created by '''Hadley Wickham''' in 2005.  
Show slide
+
* It offers a powerful '''graphics language''' for creating elegant and complex plots.
 
+
Introduction to '''ggplot2 '''package
+
|| * '''ggplot2''' package was created by '''Hadley Wickham''' in 2005.  
+
* It''' '''offers a powerful graphics language for creating elegant and complex plots.
+
  
 
|-  
 
|-  
Line 272: Line 265:
 
|| Let us switch to '''RStudio'''.
 
|| Let us switch to '''RStudio'''.
 
|-
 
|-
||  
+
|| Drag the boundary to resize.
|| I will resize the '''PLots''' window.
+
|| I will resize the '''Plots''' window.
 
|-  
 
|-  
 
|| Cursor on the interface.
 
|| Cursor on the interface.
|| To use any package in '''R''', we need to install and then load it.  
+
|| To use any '''package''' in '''R''', we need to install and then load it.  
  
As I have already installed '''ggplot2''' package, I will load this directly.  
+
As I have already installed '''ggplot2 package''', I will load this directly.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''install.packages("ggplot2")'''
 
'''install.packages("ggplot2")'''
|| If you have not installed the package, please use '''install dot packages''' function.  
+
|| If you have not installed the '''package''', please use '''install dot packages function'''.  
  
 
+
Please make sure that you are connected to the Internet while installing the '''packages'''.  
Please make sure that you are connected to the Internet while installing the packages.  
+
 
|-  
 
|-  
 
|| Click at the top of the '''script ggPlots.R'''
 
|| Click at the top of the '''script ggPlots.R'''
 
+
|| To load this '''package''', we will add the '''library''' at the top of the '''script'''.
|| To load this package, we will add the library at the top of the '''script'''.
+
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 296: Line 287:
 
'''library(ggplot2)'''
 
'''library(ggplot2)'''
  
|| In the '''Source '''window, scroll up to the top of the script.
+
|| In the '''Source '''window, scroll up to the top of the '''script'''.
  
Now, at the top of the '''script''', type '''library '''and '''ggplot2 '''in parentheses'''.'''
+
Now, at the top of the '''script''', type '''library''' and '''ggplot2''' in parentheses.
  
 
Save the '''script '''and run this line.
 
Save the '''script '''and run this line.
Line 305: Line 296:
  
 
Point to the line having '''legend''' function  
 
Point to the line having '''legend''' function  
|| Now, in the '''Source '''window, click on the next line after the '''legend''' function.  
+
|| Now, in the '''Source''' window, click on the next line after the '''legend function'''.  
 
|-
 
|-
| | Highlight '''movies''' in the '''Environment''' window  
+
|| Highlight '''movies''' in the '''Environment''' window  
| | We will use '''movies''' '''data frame''' for exploring '''ggplot2 '''package.  
+
|| We will use '''movies data frame''' for exploring '''ggplot2 package'''.  
 
|-
 
|-
| | '''[RStudio]'''
+
|| '''[RStudio]'''
  
 
'''View(movies)'''
 
'''View(movies)'''
| | Let us view the objects available in '''movies''' '''data frame'''.  
+
|| Let us view the '''objects''' available in '''movies data frame'''.  
  
 
In the '''Source''' window, type '''View''' and '''movies''' in parentheses.  
 
In the '''Source''' window, type '''View''' and '''movies''' in parentheses.  
Line 321: Line 312:
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| '''movies''' '''data frame''' opens in the '''Source''' window.  
+
|| '''movies data frame''' opens in the '''Source''' window.  
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| Now, we will create a simple '''scatter plot''' with two different objects of movies.  
+
|| Now, we will create a simple '''scatter plot''' with two different '''objects''' of movies.  
  
Remember, a '''scatter plot''' is a graph in which the values of two variables are plotted along the axes.
+
Remember, a '''scatter plot''' is a graph in which the values of two '''variables''' are plotted along the axes.
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window
 
|| Highlight the scroll bar in the '''Source''' window
Line 335: Line 326:
 
|-  
 
|-  
 
|| Highlight '''ggPlots.R''' in the Source window
 
|| Highlight '''ggPlots.R''' in the Source window
|| In the '''Source''' window, click on the '''script''' '''ggPlots.R '''
+
|| In the '''Source''' window, click on the '''script ggPlots.R'''
 
|-  
 
|-  
 
|| '''[RStudio]'''
 
|| '''[RStudio]'''
Line 344: Line 335:
  
 
'''geom_point()'''
 
'''geom_point()'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''ggplot''' in the '''Source''' window  
 
|| Highlight '''ggplot''' in the '''Source''' window  
|| '''ggplot''' function takes three basic arguments:
+
|| '''ggplot function''' takes three basic '''arguments''':
 
* '''Data'''
 
* '''Data'''
 
* '''Aesthetics'''  
 
* '''Aesthetics'''  
Line 359: Line 350:
 
Highlight '''aes''' in the '''Source''' window  
 
Highlight '''aes''' in the '''Source''' window  
  
|| In '''ggplot''' function, we have used the following arguments:
+
|| In '''ggplot function''', we have used the following '''arguments''':
* '''data''', which refers to the data set to be used for plotting.  
+
* '''data''', which refers to the '''data set''' to be used for plotting.  
  
We have set '''data''' equal to '''movies'''.  
+
:We have set '''data''' equal to '''movies'''.  
* '''mapping, '''which is used to apply aesthetics mapping to the plot.
+
* '''mapping, '''which is used to apply '''aesthetics mapping''' to the plot.
* '''aes,''' which is used to specify the mapping of objects on X and Y axes.  
+
* '''aes,''' which is used to specify the '''mapping''' of '''objects''' on X and Y axes.  
  
 
We will learn more about aesthetics mapping later in this series.
 
We will learn more about aesthetics mapping later in this series.
 
|-  
 
|-  
 
|| Highlight '''geom_point''' in the '''Source''' window  
 
|| Highlight '''geom_point''' in the '''Source''' window  
|| * '''Geom '''underscore '''point''' is used to draw points defined by X and Y coordinates.
+
|| '''geom '''underscore '''point''' is used to draw points defined by X and Y coordinates.
  
 
|-  
 
|-  
Line 384: Line 375:
 
|| We can see that there is a positive correlation between '''critics_score''' and '''audience_score'''.  
 
|| We can see that there is a positive correlation between '''critics_score''' and '''audience_score'''.  
  
Now we will learn how to save a plot''' '''generated by '''ggplot '''function.  
+
Now we will learn how to save a plot generated by '''ggplot function'''.  
 
|-  
 
|-  
 
|| Click on x button to close the plot.
 
|| Click on x button to close the plot.
Line 390: Line 381:
 
|-  
 
|-  
 
||  
 
||  
|| For saving the plots, there is a function named '''ggsave''' in '''ggplot2 '''package.  
+
|| For saving the plots, there is a '''function''' named '''ggsave''' in '''ggplot2 package'''.  
 
|-  
 
|-  
 
|| '''[RStudio]'''
 
|| '''[RStudio]'''
  
 
'''?ggsave'''
 
'''?ggsave'''
|| To know the syntax of '''ggsave''' function, we will access the '''Help''' section in '''RStudio'''.  
+
|| To know the syntax of '''ggsave function''', we will access the '''Help''' section in '''RStudio'''.  
  
  
In the '''Console''' window, type question mark '''ggsave''' and press Enter.  
+
In the '''Console''' window, type question mark '''ggsave''' and press '''Enter'''.  
 
|-
 
|-
 
||  
 
||  
Line 409: Line 400:
  
 
Highlight '''plot''' in '''Help'''  
 
Highlight '''plot''' in '''Help'''  
|| The first argument in this function is the '''filename'''.  
+
|| The first '''argument''' in this '''function''' is the '''filename'''.  
  
Next, there is the argument named '''plot''' which means the plot to be saved.  
+
Next, there is the '''argument''' named '''plot''' which means the plot to be saved.  
  
 
By default, it will save the last plot.  
 
By default, it will save the last plot.  
Line 419: Line 410:
 
|-  
 
|-  
 
|| Highlight plot in the '''Plots''' window  
 
|| Highlight plot in the '''Plots''' window  
|| Let us save our '''scatter plot''' with a name '''scatter''' underscore '''plot''' in png format.  
+
|| Let us save our '''scatter plot''' with a name '''scatter underscore plot''' in '''png format'''.  
 
|-  
 
|-  
 
|| '''[RStudio]'''
 
|| '''[RStudio]'''
  
 
'''ggsave("scatter_plot.png")'''
 
'''ggsave("scatter_plot.png")'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''Run''' button in the '''Source''' window  
 
|| Highlight '''Run''' button in the '''Source''' window  
Line 433: Line 424:
 
|-  
 
|-  
 
|| Highlight '''scatter_plot.png''' in the Files window
 
|| Highlight '''scatter_plot.png''' in the Files window
|| The plot has been saved in our current working directory.
+
|| The plot has been saved in our '''current working directory'''.
 
|-  
 
|-  
 
||  
 
||  
 
|| Let us summarize what we have learnt.
 
|| Let us summarize what we have learnt.
 
|-  
 
|-  
||  
+
|| '''Show slide'''
  
Show slide
+
'''Summary'''
 
+
Summary
+
 
|| In this tutorial, we have learnt about,
 
|| In this tutorial, we have learnt about,
 
* Need for''' data visualization'''
 
* Need for''' data visualization'''
* Basic '''plot''' function in '''R'''
+
* Basic '''plot function''' in '''R'''
* '''ggplot2 '''package
+
* '''ggplot2 package'''
  
 
|-  
 
|-  
||  
+
|| '''Show slide'''
 
+
Show slide
+
  
Assignment
+
'''Assignment'''
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
* Consider the built-in data set '''''mtcars'''''. Find the numerical variables in this data set.  
+
* Consider the '''built-in data set ''mtcars'''''. Find the numerical '''variables''' in this '''data set'''.  
* Make a scatter plot from the objects named '''mpg''' and '''wt''' in this data set.  
+
* Make a '''scatter plot''' from the '''objects''' named '''mpg''' and '''wt''' in this '''data set'''.  
* Save the plot in '''.jpeg '''format
+
* Save the plot in '''.jpeg format'''.
  
 
|-  
 
|-  
|| Show slide
+
|| '''Show slide'''
  
About the Spoken Tutorial Project
+
'''About the Spoken Tutorial Project'''
 
|| The video at the following link summarises the Spoken Tutorial project.
 
|| The video at the following link summarises the Spoken Tutorial project.
  
 
Please download and watch it.
 
Please download and watch it.
 
|-  
 
|-  
|| Show slide
+
|| '''Show slide'''
  
Spoken Tutorial Workshops
+
'''Spoken Tutorial Workshops'''
 
|| We conduct workshops using Spoken Tutorials and give certificates.
 
|| We conduct workshops using Spoken Tutorials and give certificates.
  
 
Please contact us.
 
Please contact us.
 
|-  
 
|-  
|| Show Slide
+
|| '''Show Slide'''
  
Forum to answer questions
+
'''Forum to answer questions'''
 
|| Please post your timed queries in this forum.
 
|| Please post your timed queries in this forum.
 
|-  
 
|-  
|| Show Slide
+
|| '''Show Slide'''
  
Forum to answer questions
+
'''Forum to answer questions'''
 
|| Please post your general queries in this forum.
 
|| Please post your general queries in this forum.
 
|-  
 
|-  
|| Show Slide
+
|| '''Show Slide'''
  
Textbook Companion
+
'''Textbook Companion'''
 
|| The '''FOSSEE '''team coordinates the '''TBC '''project.
 
|| The '''FOSSEE '''team coordinates the '''TBC '''project.
  
 
For more details, please visit these sites.
 
For more details, please visit these sites.
 
|-  
 
|-  
|| Show Slide
+
|| '''Show Slide'''
  
Acknowledgment
+
'''Acknowledgment'''
 
|| The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
 
|| The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
 
|-  
 
|-  
|| Show Slide
+
|| '''Show Slide'''
  
Thank You
+
'''Thank You'''
 
|| The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
 
|| The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
  

Latest revision as of 15:03, 25 July 2019

Title of the script: Introduction to ggplot2

Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, graphics, plot, ggplot2, ggplot, video tutorial, spoken tutorial 


Visual Cue Narration
Show slide

Opening Slide

Welcome to this tutorial on Introduction to ggplot2.
Show slide

Learning Objective

In this tutorial, we will learn about
  • Need for data visualization
  • Basic plot function in R
  • ggplot2 package
Show slide

Pre-requisites

https://spoken-tutorial.org/

To understand this tutorial, you should know,
  • Basics of Statistics and
  • Data frames

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use,
  • A data frame moviesData.csv and
  • A script file ggPlots.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and ggPlots.R in the folder Plots

I have downloaded and moved these files to ggPlots folder.

This folder is located in myProject folder on my Desktop.

I have also set ggPlots folder as my Working Directory.

Now let us learn about visualization.
Show slide

Need for Data Visualization

  • Visualization is an important tool for insight generation.
  • It is used to understand the data structure, identify outliers and find patterns.
Show slide

Data visualization in R

There are 2 methods of data visualization in R:
  • Basics graphics and
  • Grammar of graphics (popularly known as ggplot2)
Let us switch to RStudio.
Highlight ggPlots.R in the Files window of RStudio. Open the script ggPlots.R in RStudio.
Highlight the Source button.

Click on Source button.

Let us run this script by clicking on the Source button.
Highlight movies in the Environment window movies data frame is loaded in the workspace.

This data frame will be used later in this tutorial.

[RStudio]

x <- seq(-pi, pi, 0.1)

y <- sin(x)

plot(x, y)

First, we will plot a sine curve by taking equally spaced samples.

In the Source window, type the following commands.

Highlight seq in the Source window Here, we have used the seq function to generate a sequence.

This sequence is from minus pi to plus pi with an interval of zero point one.

Highlight plot in the Source window In plot command, the first argument is x and the second argument y is Sine of x.
Highlight run button in the Source window Save the script and run the last three lines of code by pressing Ctrl + Enter keys simultaneously.
Highlight the plot in the Plots window A plot of sine curve appears in the Plots window.
Highlight Plots window In the Plots window, click on the Zoom button to maximize the plot.
Highlight the plot Now we will add some more layers in this plot.
Click on X button to close. Click on Close (X) button to close this plot.
[RStudio]

plot(x, y, main="Plotting a Sine Curve", ylab="sin(x)")

In the Source window, type the following commands.

Here, we have added main and ylab arguments to the plot function.

Highlight Run button in the Source window. Run the current line.
Highlight Plots window In the Plots window, click on Zoom button to maximize the plot.
Highlight Y-axis of the plot The title of the plot and label of Y-axis have been added to the plot.
Click on X button to close. Close this plot.
[RStudio]

plot(x, y, main="Plotting sine curve", ylab="sin(x)", type="l", col="blue")

Now we will learn how to change the type of plot.

In the Source window, type the following commands.

Highlight type in the plot function Here, we have used the type argument and set it to l.

It means that the type of plot we need is lines.

Highlight col in the plot function col equal to blue, changes the colour of the plot to blue.
Highlight Run button in the Source window Run the current line.
Highlight the plot The type and color of the plot have been changed.
Cursor on the interface. Now, we will plot one more graph on the same plot.

Let us plot cosine of x along with sine of x on the same plot.

[RStudio]

plot(x, sin(x), main="Plotting Sine and Cosine graphs on the same plot",

ylab=" ", type="l", col="blue")

lines(x, cos(x), col="red")

In the Source window, type the following commands.
Highlight plot in the Source window This command plots sine of x using the plot function.
Highlight lines in the Source window Next, we use lines function to plot cosine of x.
Highlight lines in the Source window After the first line is plotted, the lines function is used.

It takes an additional vector cos of x as an input to draw the second line in the plot.

Highlight Run button in the Source window Run the last two lines of code by pressing Ctrl+Enter keys simultaneously.
Highlight the plot The two graphs appear in the same plot window.

Here we can add a legend to the plot to differentiate between the multiple graphs.

For this, we will use legend function.

[RStudio] legend("topleft",

c("sin(x)", "cos(x)"),

fill=c("blue","red"))

In the Source window, type the following command.
Drag the boundary to resize. I will resize the Source window.
Highlight topleft in the Source window The first argument refers to the coordinates for placing the legend in our plot.

We have set the coordinates to topleft.

Highlight c("sin(x)", "cos(x)") in the Source window The second argument is the names to be given.

Since we have plotted sine and cosine functions, we will pass these two names as a vector.

Highlight fill in the Source window Next, we have used the fill argument to specify the graphs by their colors.

Recall that, sine function is plotted in blue and cosine function in red.

Drag the boundary to resize. I will resize the Plots window.
Highlight Run button in the Source window Run the last three lines of code by pressing Ctrl+Enter keys simultaneously.
Highlight Files and Plots window In the Plots window, click on Zoom button to maximize the plot.
Highlight the plot The two plots with their names appear in the same graph.
Click on X button in the Plot window. Close the plot.
Cursor on the interface. So far, we have discussed the basic graphics in R language.

Now, we will learn about the grammar of graphics by using ggplot2 package.

Show slide

Introduction to ggplot2 package

  • ggplot2 package was created by Hadley Wickham in 2005.
  • It offers a powerful graphics language for creating elegant and complex plots.
Let us switch to RStudio.
Drag the boundary to resize. I will resize the Plots window.
Cursor on the interface. To use any package in R, we need to install and then load it.

As I have already installed ggplot2 package, I will load this directly.

[RStudio]

install.packages("ggplot2")

If you have not installed the package, please use install dot packages function.

Please make sure that you are connected to the Internet while installing the packages.

Click at the top of the script ggPlots.R To load this package, we will add the library at the top of the script.
[RStudio]

library(ggplot2)

In the Source window, scroll up to the top of the script.

Now, at the top of the script, type library and ggplot2 in parentheses.

Save the script and run this line.

[RStudio]

Point to the line having legend function

Now, in the Source window, click on the next line after the legend function.
Highlight movies in the Environment window We will use movies data frame for exploring ggplot2 package.
[RStudio]

View(movies)

Let us view the objects available in movies data frame.

In the Source window, type View and movies in parentheses.

Highlight Run button in the Source window Run the current line.
Highlight movies in the Source window movies data frame opens in the Source window.
Highlight movies in the Source window Now, we will create a simple scatter plot with two different objects of movies.

Remember, a scatter plot is a graph in which the values of two variables are plotted along the axes.

Highlight the scroll bar in the Source window In the Source window, scroll from left to right to see the remaining objects of movies data frame.
Highlight critics_score and audience_score in the Source window Suppose, we want to visualize the correlation between critics_score and audience_score.
Highlight ggPlots.R in the Source window In the Source window, click on the script ggPlots.R
[RStudio]

ggplot(data = movies,

mapping = aes(x=critics_score, y = audience_score)) +

geom_point()

In the Source window, type the following command.
Highlight ggplot in the Source window ggplot function takes three basic arguments:
  • Data
  • Aesthetics
  • Geometry
Highlight data in the Source window

Highlight mapping in the Source window

Highlight aes in the Source window

In ggplot function, we have used the following arguments:
  • data, which refers to the data set to be used for plotting.
We have set data equal to movies.
  • mapping, which is used to apply aesthetics mapping to the plot.
  • aes, which is used to specify the mapping of objects on X and Y axes.

We will learn more about aesthetics mapping later in this series.

Highlight geom_point in the Source window geom underscore point is used to draw points defined by X and Y coordinates.
Highlight Run button in the Source window Run the current line.
Highlight Plots window Scatter plot appears in the Plots window.
Highlight Plots window In the Plots window, click on the Zoom button to maximize the plot.
Highlight the plot We can see that there is a positive correlation between critics_score and audience_score.

Now we will learn how to save a plot generated by ggplot function.

Click on x button to close the plot. Close this plot.
For saving the plots, there is a function named ggsave in ggplot2 package.
[RStudio]

?ggsave

To know the syntax of ggsave function, we will access the Help section in RStudio.


In the Console window, type question mark ggsave and press Enter.

I will resize the Help window.
Highlight Help in RStudio


Highlight filename in Help

Highlight plot in Help

The first argument in this function is the filename.

Next, there is the argument named plot which means the plot to be saved.

By default, it will save the last plot.

Highlight Plots window Click on the Plots window.
Highlight plot in the Plots window Let us save our scatter plot with a name scatter underscore plot in png format.
[RStudio]

ggsave("scatter_plot.png")

In the Source window, type the following command.
Highlight Run button in the Source window Save the script and run the current line.
Highlight Files window Click on the Files tab.
Highlight scatter_plot.png in the Files window The plot has been saved in our current working directory.
Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt about,
  • Need for data visualization
  • Basic plot function in R
  • ggplot2 package
Show slide

Assignment

We now suggest an assignment.
  • Consider the built-in data set mtcars. Find the numerical variables in this data set.
  • Make a scatter plot from the objects named mpg and wt in this data set.
  • Save the plot in .jpeg format.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Varshit Dubey (CoE Pune).


This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst