R/C2/Introduction-to-ggplot2/English-timed
From Script | Spoken-Tutorial
Time | Narration |
00:01 | Welcome to this tutorial on Introduction to ggplot2. |
00:07 | In this tutorial, we will learn about |
00:11 | Need for data visualization |
00:15 | Basic plot function in R |
00:18 | ggplot2 package |
00:21 | To understand this tutorial, you should know, |
00:25 | Basics of Statistics and |
00:28 | Data frames |
00:31 | If not, please locate the relevant tutorials on R on this website. |
00:38 | This tutorial is recorded on |
00:40 | Ubuntu Linux OS version 16.04 |
00:46 | R version 3.4.4 |
00:50 | RStudio version 1.1.463 |
00:56 | Install R version 3.2.0 or higher. |
01:02 | For this tutorial, we will use, |
01:05 | A data frame moviesData.csv |
01:10 | And a script file ggPlots.R. |
00:15 | Please download these files from the Code files link of this tutorial. |
01:21 | I have downloaded and moved these files to ggPlots folder. |
01:28 | This folder is located in myProject folder on my Desktop. |
01:33 | I have also set ggPlots folder as my Working Directory. |
01:40 | Now let us learn about visualization. |
01:44 | Visualization is an important tool for insight generation. |
01:50 | It is used to understand the data structure, identify outliers and find patterns. |
01:57 | There are 2 methods of data visualization in R: |
02:03 | Basics graphics and |
02:05 | Grammar of graphics (popularly known as ggplot2) |
02:11 | Let us switch to RStudio. |
02:14 | Open the script ggPlots.R in RStudio. |
02:20 | Let us run this script by clicking on the Source button. |
02:26 | movies data frame is loaded in the workspace. |
02:30 | This data frame will be used later in this tutorial. |
02:35 | First, we will plot a sine curve by taking equally spaced samples. |
02:42 | In the Source window, type the following commands. |
02:47 | Here, we have used the seq function to generate a sequence. |
02:53 | This sequence is from minus pi to plus pi with an interval of zero point one. |
03:00 | In plot command, the first argument is x and the second argument y is Sine of x. |
03:12 | Save the script and run the last three lines of code by pressing Ctrl + Enter keys simultaneously. |
03:23 | A plot of sine curve appears in the Plots window. |
03:27 | In the Plots window, click on the Zoom button to maximize the plot. |
03:34 | Now we will add some more layers in this plot. |
03:39 | Click on Close button to close this plot. |
03:43 | In the Source window, type the following commands. |
03:48 | Here, we have added main and ylab arguments to the plot function. |
03:55 | Run the current line. |
03:57 | In the Plots window, click on Zoom button to maximize the plot. |
04:04 | The title of the plot and label of Y-axis have been added to the plot. |
04:10 | Close this plot. |
04:13 | Now we will learn how to change the type of plot. |
04:18 | In the Source window, type the following commands. |
04:23 | Here, we have used the type argument and set it to l. |
04:30 | It means that the type of plot we need is lines. |
04:36 | col equal to blue, changes the colour of the plot to blue. |
04:42 | Run the current line. |
04:43 | The type and color of the plot have been changed. |
04:49 | Now, we will plot one more graph on the same plot. |
04:53 | Let us plot cosine of x along with sine of x on the same plot. |
05:01 | In the Source window, type the following commands. |
05:05 | This command plots sine of x using the plot function. |
05:12 | Next, we use lines function to plot cosine of x. |
05:18 | After the first line is plotted, the lines function is used. |
05:23 | It takes an additional vector cos of x as an input to draw the second line in the plot. |
05:32 | Run the last two lines of code by pressing Ctrl+Enter keys simultaneously. |
05:40 | The two graphs appear in the same plot window. |
05:45 | Here we can add a legend to the plot to differentiate between the multiple graphs. |
05:52 | For this, we will use legend function. |
05:56 | In the Source window, type the following command. |
06:00 | I will resize the Source window. |
06:04 | The first argument refers to the coordinates for placing the legend in our plot. |
06:12 | We have set the coordinates to topleft. |
06:16 | The second argument is the names to be given. |
06:20 | Since we have plotted sine and cosine functions, we will pass these two names as a vector. |
06:28 | Next, we have used the fill argument to specify the graphs by their colors. |
06:35 | Recall that, sine function is plotted in blue and cosine function in red. |
06:42 | I will resize the Plots window. |
06:45 | Run the last three lines of code by pressing Ctrl+Enter keys simultaneously. |
06:54 | In the Plots window, click on Zoom button to maximize the plot. |
07:00 | The two plots with their names appear in the same graph. |
07:05 | Close the plot. |
07:07 | So far, we have discussed the basic graphics in R language. |
07:13 | Now, we will learn about the grammar of graphics by using ggplot2 package. |
07:20 | ggplot2 package was created by Hadley Wickham in 2005. |
07:26 | It offers a powerful graphics language for creating elegant and complex plots. |
07:34 | Let us switch to RStudio. |
07:37 | I will resize the Plots window. |
07:40 | To use any package in R, we need to install and then load it. |
07:45 | As I have already installed ggplot2 package, I will load this directly. |
07:52 | If you have not installed the package, please use install dot packages function. |
07:59 | Please make sure that you are connected to the Internet while installing the packages. |
08:05 | To load this package, we will add the library at the top of the script. |
08:11 | In the Source window, scroll up to the top of the script. |
08:16 | Now, at the top of the script, type library and ggplot2 in parentheses. |
08:26 | Save the script and run this line. |
08:31 | Now, in the Source window, click on the next line after the legend function. |
08:38 | We will use movies data frame for exploring ggplot2 package. |
08:44 | Let us view the objects available in movies data frame. |
08:50 | In the Source window, type View and movies in parentheses. |
08:56 | Run the current line. |
08:58 | movies data frame opens in the Source window. |
09:02 | Now, we will create a simple scatter plot with two different objects of movies. |
09:10 | Remember, a scatter plot is a graph in which the values of two variables are plotted along the axes. |
09:19 | In the Source window, scroll from left to right to see the remaining objects of movies data frame. |
09:28 | Suppose, we want to visualize the correlation between critics_score and audience_score. |
09:37 | In the Source window, click on the script ggPlots.R |
09:43 | In the Source window, type the following command. |
09:47 | ggplot function takes three basic arguments: |
09:52 | Data Aesthetics Geometry |
09:56 | In ggplot function, we have used the following arguments: |
10:01 | data, which refers to the data set to be used for plotting. |
10:06 | We have set data equal to movies. |
10:10 | mapping, which is used to apply aesthetics mapping to the plot. |
10:16 | aes, which is used to specify the mapping of objects on X and Y axes. |
10:23 | We will learn more about aesthetics mapping later in this series. |
10:28 | geom underscore point is used to draw points defined by X and Y coordinates. |
10:36 | Run the current line. |
10:39 | Scatter plot appears in the Plots window. |
10:42 | In the Plots window, click on the Zoom button to maximize the plot. |
10:49 | We can see that there is a positive correlation between critics_score and audience_score. |
10:56 | Now we will learn how to save a plot generated by ggplot function. |
11:03 | Close this plot. |
11:05 | For saving the plots, there is a function named ggsave in ggplot2 package. |
11:13 | To know the syntax of ggsave function, we will access the Help section in RStudio. |
11:20 | In the Console window, type question mark ggsave and press Enter. |
11:27 | I will resize the Help window. |
11:30 | The first argument in this function is the filename. |
11:35 | Next, there is the argument named plot which means the plot to be saved. |
11:42 | By default, it will save the last plot. |
11:46 | Click on the Plots window. |
11:49 | Let us save our scatter plot with a name scatter underscore plot in png format. |
11:58 | In the Source window, type the following command. |
12:02 | Save the script and run the current line. |
12:07 | Click on the Files tab. |
12:10 | The plot has been saved in our current working directory. |
12:15 | Let us summarize what we have learnt. |
12:19 | In this tutorial, we have learnt about, |
12:23 | Need for data visualization |
12:26 | Basic plot function in R |
12:29 | ggplot2 package |
12:32 | We now suggest an assignment. |
12:36 | Consider the built-in data set mtcars. Find the numerical variables in this data set. |
12:43 | Make a scatter plot from the objects named mpg and wt in this data set. |
12:51 | Save the plot in .jpeg format. |
12:56 | The video at the following link summarises the Spoken Tutorial project. |
13:01 | Please download and watch it. |
13:04 | We conduct workshops using Spoken Tutorials and give certificates. |
13:10 | Please contact us. |
13:13 | Please post your timed queries in this forum. |
13:17 | Please post your general queries in this forum. |
13:21 | The FOSSEE team coordinates the TBC project. |
13:24 | For more details, please visit these sites. |
13:29 | The Spoken Tutorial project is funded by , MHRD, Govt. of India |
13:36 | The script for this tutorial was contributed by Varshit Dubey (CoE Pune). |
13:43 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |