R/C2/Introduction-to-ggplot2/English-timed

From Script | Spoken-Tutorial
Revision as of 16:20, 22 May 2020 by Sakinashaikh (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Time Narration
00:01 Welcome to this tutorial on Introduction to ggplot2.
00:07 In this tutorial, we will learn about
00:11 Need for data visualization
00:15 Basic plot function in R
00:18 ggplot2 package
00:21 To understand this tutorial, you should know,
00:25 Basics of Statistics and
00:28 Data frames
00:31 If not, please locate the relevant tutorials on R on this website.
00:38 This tutorial is recorded on
00:40 Ubuntu Linux OS version 16.04
00:46 R version 3.4.4
00:50 RStudio version 1.1.463
00:56 Install R version 3.2.0 or higher.
01:02 For this tutorial, we will use,
01:05 A data frame moviesData.csv
01:10 And a script file ggPlots.R.
00:15 Please download these files from the Code files link of this tutorial.
01:21 I have downloaded and moved these files to ggPlots folder.
01:28 This folder is located in myProject folder on my Desktop.
01:33 I have also set ggPlots folder as my Working Directory.
01:40 Now let us learn about visualization.
01:44 Visualization is an important tool for insight generation.
01:50 It is used to understand the data structure, identify outliers and find patterns.
01:57 There are 2 methods of data visualization in R:
02:03 Basics graphics and
02:05 Grammar of graphics (popularly known as ggplot2)
02:11 Let us switch to RStudio.
02:14 Open the script ggPlots.R in RStudio.
02:20 Let us run this script by clicking on the Source button.
02:26 movies data frame is loaded in the workspace.
02:30 This data frame will be used later in this tutorial.
02:35 First, we will plot a sine curve by taking equally spaced samples.
02:42 In the Source window, type the following commands.
02:47 Here, we have used the seq function to generate a sequence.
02:53 This sequence is from minus pi to plus pi with an interval of zero point one.
03:00 In plot command, the first argument is x and the second argument y is Sine of x.
03:12 Save the script and run the last three lines of code by pressing Ctrl + Enter keys simultaneously.
03:23 A plot of sine curve appears in the Plots window.
03:27 In the Plots window, click on the Zoom button to maximize the plot.
03:34 Now we will add some more layers in this plot.
03:39 Click on Close button to close this plot.
03:43 In the Source window, type the following commands.
03:48 Here, we have added main and ylab arguments to the plot function.
03:55 Run the current line.
03:57 In the Plots window, click on Zoom button to maximize the plot.
04:04 The title of the plot and label of Y-axis have been added to the plot.
04:10 Close this plot.
04:13 Now we will learn how to change the type of plot.
04:18 In the Source window, type the following commands.
04:23 Here, we have used the type argument and set it to l.
04:30 It means that the type of plot we need is lines.
04:36 col equal to blue, changes the colour of the plot to blue.
04:42 Run the current line.
04:43 The type and color of the plot have been changed.
04:49 Now, we will plot one more graph on the same plot.
04:53 Let us plot cosine of x along with sine of x on the same plot.
05:01 In the Source window, type the following commands.
05:05 This command plots sine of x using the plot function.
05:12 Next, we use lines function to plot cosine of x.
05:18 After the first line is plotted, the lines function is used.
05:23 It takes an additional vector cos of x as an input to draw the second line in the plot.
05:32 Run the last two lines of code by pressing Ctrl+Enter keys simultaneously.
05:40 The two graphs appear in the same plot window.
05:45 Here we can add a legend to the plot to differentiate between the multiple graphs.
05:52 For this, we will use legend function.
05:56 In the Source window, type the following command.
06:00 I will resize the Source window.
06:04 The first argument refers to the coordinates for placing the legend in our plot.
06:12 We have set the coordinates to topleft.
06:16 The second argument is the names to be given.
06:20 Since we have plotted sine and cosine functions, we will pass these two names as a vector.
06:28 Next, we have used the fill argument to specify the graphs by their colors.
06:35 Recall that, sine function is plotted in blue and cosine function in red.
06:42 I will resize the Plots window.
06:45 Run the last three lines of code by pressing Ctrl+Enter keys simultaneously.
06:54 In the Plots window, click on Zoom button to maximize the plot.
07:00 The two plots with their names appear in the same graph.
07:05 Close the plot.
07:07 So far, we have discussed the basic graphics in R language.
07:13 Now, we will learn about the grammar of graphics by using ggplot2 package.
07:20 ggplot2 package was created by Hadley Wickham in 2005.
07:26 It offers a powerful graphics language for creating elegant and complex plots.
07:34 Let us switch to RStudio.
07:37 I will resize the Plots window.
07:40 To use any package in R, we need to install and then load it.
07:45 As I have already installed ggplot2 package, I will load this directly.
07:52 If you have not installed the package, please use install dot packages function.
07:59 Please make sure that you are connected to the Internet while installing the packages.
08:05 To load this package, we will add the library at the top of the script.
08:11 In the Source window, scroll up to the top of the script.
08:16 Now, at the top of the script, type library and ggplot2 in parentheses.
08:26 Save the script and run this line.
08:31 Now, in the Source window, click on the next line after the legend function.
08:38 We will use movies data frame for exploring ggplot2 package.
08:44 Let us view the objects available in movies data frame.
08:50 In the Source window, type View and movies in parentheses.
08:56 Run the current line.
08:58 movies data frame opens in the Source window.
09:02 Now, we will create a simple scatter plot with two different objects of movies.
09:10 Remember, a scatter plot is a graph in which the values of two variables are plotted along the axes.
09:19 In the Source window, scroll from left to right to see the remaining objects of movies data frame.
09:28 Suppose, we want to visualize the correlation between critics_score and audience_score.
09:37 In the Source window, click on the script ggPlots.R
09:43 In the Source window, type the following command.
09:47 ggplot function takes three basic arguments:
09:52 Data Aesthetics Geometry
09:56 In ggplot function, we have used the following arguments:
10:01 data, which refers to the data set to be used for plotting.
10:06 We have set data equal to movies.
10:10 mapping, which is used to apply aesthetics mapping to the plot.
10:16 aes, which is used to specify the mapping of objects on X and Y axes.
10:23 We will learn more about aesthetics mapping later in this series.
10:28 geom underscore point is used to draw points defined by X and Y coordinates.
10:36 Run the current line.
10:39 Scatter plot appears in the Plots window.
10:42 In the Plots window, click on the Zoom button to maximize the plot.
10:49 We can see that there is a positive correlation between critics_score and audience_score.
10:56 Now we will learn how to save a plot generated by ggplot function.
11:03 Close this plot.
11:05 For saving the plots, there is a function named ggsave in ggplot2 package.
11:13 To know the syntax of ggsave function, we will access the Help section in RStudio.
11:20 In the Console window, type question mark ggsave and press Enter.
11:27 I will resize the Help window.
11:30 The first argument in this function is the filename.
11:35 Next, there is the argument named plot which means the plot to be saved.
11:42 By default, it will save the last plot.
11:46 Click on the Plots window.
11:49 Let us save our scatter plot with a name scatter underscore plot in png format.
11:58 In the Source window, type the following command.
12:02 Save the script and run the current line.
12:07 Click on the Files tab.
12:10 The plot has been saved in our current working directory.
12:15 Let us summarize what we have learnt.
12:19 In this tutorial, we have learnt about,
12:23 Need for data visualization
12:26 Basic plot function in R
12:29 ggplot2 package
12:32 We now suggest an assignment.
12:36 Consider the built-in data set mtcars. Find the numerical variables in this data set.
12:43 Make a scatter plot from the objects named mpg and wt in this data set.
12:51 Save the plot in .jpeg format.
12:56 The video at the following link summarises the Spoken Tutorial project.
13:01 Please download and watch it.
13:04 We conduct workshops using Spoken Tutorials and give certificates.
13:10 Please contact us.
13:13 Please post your timed queries in this forum.
13:17 Please post your general queries in this forum.
13:21 The FOSSEE team coordinates the TBC project.
13:24 For more details, please visit these sites.
13:29 The Spoken Tutorial project is funded by , MHRD, Govt. of India
13:36 The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
13:43 This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh