R/C2/More-Functions-in-dplyr-Package/English-timed
From Script | Spoken-Tutorial
Time | Narration |
00:01 | Welcome to this tutorial on More functions in the dplyr package. |
00:07 | In this tutorial,we will learn about the following functions in the dplyr package: |
00:14 | select |
00:16 | rename |
00:18 | mutate |
00:20 | To understand this tutorial, you should know, |
00:24 | Basics of statistics |
00:27 | Basics of ggplot2 package |
00:31 | Data frames |
00:33 | If not, please locate the relevant tutorials on R on this website. |
00:40 | This tutorial is recorded on |
00:43 | Ubuntu Linux OS version 16.04 |
00:48 | R version 3.4.4 |
00:52 | RStudio version 1.1.463 |
00:57 | Install R version 3.2.0 or higher. |
01:03 | For this tutorial, we will use |
01:06 | A data frame moviesData.csv |
01:11 | A script file myVis.R. |
01:16 | Please download these files from the Code files link of this tutorial. |
01:23 | I have downloaded and moved these files to DataVis folder. |
01:29 | This folder is located in myProject folder on my Desktop. |
01:36 | I have also set the DataVis folder as my Working Directory. |
01:42 | Let us switch to RStudio. |
01:45 | Open the script myVis.R in RStudio. |
01:50 | We have already learnt how to use the filter and arrange functions in the dplyr package. |
02:01 | Run this script by clicking on the Source button. |
02:06 | movies data frame and other filtered data frames open in the Source window. |
02:13 | We will close all the data frames except movies. |
02:19 | In the Source window, scroll from left to right. |
02:24 | This will enable us to see the remaining objects of the movies data frame. |
02:30 | To select the required variables of a data frame we will use the select function. |
02:37 | It helps us to select only those variables that are required. |
02:43 | Here, we will use the select function to select title, genre, and imdb rating for all the movies. |
02:56 | Click on the script myVis.R |
03:00 | In the Source window, type the following command. |
03:05 | The first argument in the select function is the name of the data frame.
Here it is movies. |
03:14 | Other arguments are the variables which we will select for all the movies. |
03:20 | Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously. |
03:31 | moviesTGI opens in the Source window. |
03:36 | Here, title, genre, and imdb rating of all the movies are displayed. |
03:46 | Let us close moviesTGI data frame for now. |
03:52 | In the Source window, click on movies data frame. |
03:57 | Scroll the data frame from right to left to see other columns. |
04:04 | In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year. |
04:27 | These variables provide information about the day, month and year of the theater release of the movies. |
04:38 | Let us select these three variables along with the title of all the movies. |
04:45 | Please note that all the theater-related variable names start with t h t r. |
04:53 | Click on the script myVis.R. |
04:57 | In the Source window, type the following command. |
05:02 | Here, we have used starts_with function. |
05:08 | It selects all the variables in the movies data frame, whose names start with t h t r. |
05:16 | Run the last two lines of code. |
05:20 | moviesTHT opens in the Source window. |
05:25 | Movies with their titles and theater-release information are shown. |
05:33 | Let us close moviesTHT data frame for now. |
05:38 | In the Source window, click on movies. |
05:42 | Let us change the name of the variable thtr_rel_year. |
05:52 | For that, we will use the rename function. |
05:56 | Click on the script myVis.R |
06:00 | In the Source window, type the following command. |
06:05 | Here, we are changing the name of the variable thtr_rel_year. |
06:14 | Run the last two lines of code. |
06:19 | moviesR opens in the Source window. |
06:23 | In the Source window, scroll from left to right. |
06:28 | Observe that the name of the variable thtr_rel_year has changed to rel_year. |
06:41 | Let us close the data frame moviesR for now. |
06:46 | In the Source window, click on movies. |
06:50 | In the Source window, scroll from left to right. |
06:55 | Suppose we want to add a new variable named CriAud to our movies data frame. |
07:04 | This variable should contain the difference between critics_score and audience_score.
For this, we will use the mutate function. |
07:16 | mutate function is used to add a new variable and preserve the existing one. |
07:23 | For simplicity let us remove the variables appearing after audience_score in the movies data frame. |
07:33 | In the Source window, scroll from right to left. |
07:38 | We need to select the variables from title to audience_score. |
07:46 | For this, we will use the select function. |
07:50 | Click on the script myVis.R |
07:54 | In the Source window, type the following command. |
07:59 | Run the current line. |
08:02 | Now, we will use the mutate function to add a new variable. |
08:08 | In the Source window, type the following command. |
08:13 | Remember, we are adding a new variable named CriAud in the movies data frame. |
08:22 | This is to store the difference of critics score and audience score. |
08:29 | Run the last two lines of code. |
08:33 | moviesMu opens in the Source window. |
08:37 | In the Source window, scroll from left to right. |
08:42 | A new variable named CriAud is added. |
08:48 | Let us summarize what we have learnt. |
08:52 | In this tutorial, we have learnt about the following functions available in the dplyr package: |
09:00 | select |
09:01 | rename |
09:02 | mutate |
09:04 | We now suggest an assignment. |
09:08 | Use the built-in data set airquality. Using select function select the variables Ozone, Wind, and Temp in this data set. |
09:20 | Use the built-in data set mtcars. Rename the variables mpg and cyl with MilesPerGallon and Cylinder, respectively. |
09:33 | The video at the following link summarises the Spoken Tutorial project. |
09:37 | Please download and watch it. |
09:41 | We conduct workshops using Spoken Tutorials and give certificates. |
09:46 | Please contact us. |
09:49 | Please post your timed queries in this forum. |
09:54 | Please post your general queries in this forum. |
09:59 | The FOSSEE team coordinates the TBC project. |
10:02 | For more details, please visit these sites. |
10:07 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
10:13 | The script for this tutorial was contributed by Varshit Dubey (CoE Pune). |
10:20 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |