R/C2/Introduction-to-ggplot2/English-timed
From Script | Spoken-Tutorial
| Time | Narration |
| 00:01 | Welcome to this tutorial on Introduction to ggplot2. |
| 00:07 | In this tutorial, we will learn about |
| 00:11 | Need for data visualization |
| 00:15 | Basic plot function in R |
| 00:18 | ggplot2 package |
| 00:21 | To understand this tutorial, you should know, |
| 00:25 | Basics of Statistics and |
| 00:28 | Data frames |
| 00:31 | If not, please locate the relevant tutorials on R on this website. |
| 00:38 | This tutorial is recorded on |
| 00:40 | Ubuntu Linux OS version 16.04 |
| 00:46 | R version 3.4.4 |
| 00:50 | RStudio version 1.1.463 |
| 00:56 | Install R version 3.2.0 or higher. |
| 01:02 | For this tutorial, we will use, |
| 01:05 | A data frame moviesData.csv |
| 01:10 | And a script file ggPlots.R. |
| 00:15 | Please download these files from the Code files link of this tutorial. |
| 01:21 | I have downloaded and moved these files to ggPlots folder. |
| 01:28 | This folder is located in myProject folder on my Desktop. |
| 01:33 | I have also set ggPlots folder as my Working Directory. |
| 01:40 | Now let us learn about visualization. |
| 01:44 | Visualization is an important tool for insight generation. |
| 01:50 | It is used to understand the data structure, identify outliers and find patterns. |
| 01:57 | There are 2 methods of data visualization in R: |
| 02:03 | Basics graphics and |
| 02:05 | Grammar of graphics (popularly known as ggplot2) |
| 02:11 | Let us switch to RStudio. |
| 02:14 | Open the script ggPlots.R in RStudio. |
| 02:20 | Let us run this script by clicking on the Source button. |
| 02:26 | movies data frame is loaded in the workspace. |
| 02:30 | This data frame will be used later in this tutorial. |
| 02:35 | First, we will plot a sine curve by taking equally spaced samples. |
| 02:42 | In the Source window, type the following commands. |
| 02:47 | Here, we have used the seq function to generate a sequence. |
| 02:53 | This sequence is from minus pi to plus pi with an interval of zero point one. |
| 03:00 | In plot command, the first argument is x and the second argument y is Sine of x. |
| 03:12 | Save the script and run the last three lines of code by pressing Ctrl + Enter keys simultaneously. |
| 03:23 | A plot of sine curve appears in the Plots window. |
| 03:27 | In the Plots window, click on the Zoom button to maximize the plot. |
| 03:34 | Now we will add some more layers in this plot. |
| 03:39 | Click on Close button to close this plot. |
| 03:43 | In the Source window, type the following commands. |
| 03:48 | Here, we have added main and ylab arguments to the plot function. |
| 03:55 | Run the current line. |
| 03:57 | In the Plots window, click on Zoom button to maximize the plot. |
| 04:04 | The title of the plot and label of Y-axis have been added to the plot. |
| 04:10 | Close this plot. |
| 04:13 | Now we will learn how to change the type of plot. |
| 04:18 | In the Source window, type the following commands. |
| 04:23 | Here, we have used the type argument and set it to l. |
| 04:30 | It means that the type of plot we need is lines. |
| 04:36 | col equal to blue, changes the colour of the plot to blue. |
| 04:42 | Run the current line. |
| 04:43 | The type and color of the plot have been changed. |
| 04:49 | Now, we will plot one more graph on the same plot. |
| 04:53 | Let us plot cosine of x along with sine of x on the same plot. |
| 05:01 | In the Source window, type the following commands. |
| 05:05 | This command plots sine of x using the plot function. |
| 05:12 | Next, we use lines function to plot cosine of x. |
| 05:18 | After the first line is plotted, the lines function is used. |
| 05:23 | It takes an additional vector cos of x as an input to draw the second line in the plot. |
| 05:32 | Run the last two lines of code by pressing Ctrl+Enter keys simultaneously. |
| 05:40 | The two graphs appear in the same plot window. |
| 05:45 | Here we can add a legend to the plot to differentiate between the multiple graphs. |
| 05:52 | For this, we will use legend function. |
| 05:56 | In the Source window, type the following command. |
| 06:00 | I will resize the Source window. |
| 06:04 | The first argument refers to the coordinates for placing the legend in our plot. |
| 06:12 | We have set the coordinates to topleft. |
| 06:16 | The second argument is the names to be given. |
| 06:20 | Since we have plotted sine and cosine functions, we will pass these two names as a vector. |
| 06:28 | Next, we have used the fill argument to specify the graphs by their colors. |
| 06:35 | Recall that, sine function is plotted in blue and cosine function in red. |
| 06:42 | I will resize the Plots window. |
| 06:45 | Run the last three lines of code by pressing Ctrl+Enter keys simultaneously. |
| 06:54 | In the Plots window, click on Zoom button to maximize the plot. |
| 07:00 | The two plots with their names appear in the same graph. |
| 07:05 | Close the plot. |
| 07:07 | So far, we have discussed the basic graphics in R language. |
| 07:13 | Now, we will learn about the grammar of graphics by using ggplot2 package. |
| 07:20 | ggplot2 package was created by Hadley Wickham in 2005. |
| 07:26 | It offers a powerful graphics language for creating elegant and complex plots. |
| 07:34 | Let us switch to RStudio. |
| 07:37 | I will resize the Plots window. |
| 07:40 | To use any package in R, we need to install and then load it. |
| 07:45 | As I have already installed ggplot2 package, I will load this directly. |
| 07:52 | If you have not installed the package, please use install dot packages function. |
| 07:59 | Please make sure that you are connected to the Internet while installing the packages. |
| 08:05 | To load this package, we will add the library at the top of the script. |
| 08:11 | In the Source window, scroll up to the top of the script. |
| 08:16 | Now, at the top of the script, type library and ggplot2 in parentheses. |
| 08:26 | Save the script and run this line. |
| 08:31 | Now, in the Source window, click on the next line after the legend function. |
| 08:38 | We will use movies data frame for exploring ggplot2 package. |
| 08:44 | Let us view the objects available in movies data frame. |
| 08:50 | In the Source window, type View and movies in parentheses. |
| 08:56 | Run the current line. |
| 08:58 | movies data frame opens in the Source window. |
| 09:02 | Now, we will create a simple scatter plot with two different objects of movies. |
| 09:10 | Remember, a scatter plot is a graph in which the values of two variables are plotted along the axes. |
| 09:19 | In the Source window, scroll from left to right to see the remaining objects of movies data frame. |
| 09:28 | Suppose, we want to visualize the correlation between critics_score and audience_score. |
| 09:37 | In the Source window, click on the script ggPlots.R |
| 09:43 | In the Source window, type the following command. |
| 09:47 | ggplot function takes three basic arguments: |
| 09:52 | Data Aesthetics Geometry |
| 09:56 | In ggplot function, we have used the following arguments: |
| 10:01 | data, which refers to the data set to be used for plotting. |
| 10:06 | We have set data equal to movies. |
| 10:10 | mapping, which is used to apply aesthetics mapping to the plot. |
| 10:16 | aes, which is used to specify the mapping of objects on X and Y axes. |
| 10:23 | We will learn more about aesthetics mapping later in this series. |
| 10:28 | geom underscore point is used to draw points defined by X and Y coordinates. |
| 10:36 | Run the current line. |
| 10:39 | Scatter plot appears in the Plots window. |
| 10:42 | In the Plots window, click on the Zoom button to maximize the plot. |
| 10:49 | We can see that there is a positive correlation between critics_score and audience_score. |
| 10:56 | Now we will learn how to save a plot generated by ggplot function. |
| 11:03 | Close this plot. |
| 11:05 | For saving the plots, there is a function named ggsave in ggplot2 package. |
| 11:13 | To know the syntax of ggsave function, we will access the Help section in RStudio. |
| 11:20 | In the Console window, type question mark ggsave and press Enter. |
| 11:27 | I will resize the Help window. |
| 11:30 | The first argument in this function is the filename. |
| 11:35 | Next, there is the argument named plot which means the plot to be saved. |
| 11:42 | By default, it will save the last plot. |
| 11:46 | Click on the Plots window. |
| 11:49 | Let us save our scatter plot with a name scatter underscore plot in png format. |
| 11:58 | In the Source window, type the following command. |
| 12:02 | Save the script and run the current line. |
| 12:07 | Click on the Files tab. |
| 12:10 | The plot has been saved in our current working directory. |
| 12:15 | Let us summarize what we have learnt. |
| 12:19 | In this tutorial, we have learnt about, |
| 12:23 | Need for data visualization |
| 12:26 | Basic plot function in R |
| 12:29 | ggplot2 package |
| 12:32 | We now suggest an assignment. |
| 12:36 | Consider the built-in data set mtcars. Find the numerical variables in this data set. |
| 12:43 | Make a scatter plot from the objects named mpg and wt in this data set. |
| 12:51 | Save the plot in .jpeg format. |
| 12:56 | The video at the following link summarises the Spoken Tutorial project. |
| 13:01 | Please download and watch it. |
| 13:04 | We conduct workshops using Spoken Tutorials and give certificates. |
| 13:10 | Please contact us. |
| 13:13 | Please post your timed queries in this forum. |
| 13:17 | Please post your general queries in this forum. |
| 13:21 | The FOSSEE team coordinates the TBC project. |
| 13:24 | For more details, please visit these sites. |
| 13:29 | The Spoken Tutorial project is funded by , MHRD, Govt. of India |
| 13:36 | The script for this tutorial was contributed by Varshit Dubey (CoE Pune). |
| 13:43 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |