R/C2/More-Functions-in-dplyr-Package/English-timed

From Script | Spoken-Tutorial
Revision as of 16:34, 22 May 2020 by Sakinashaikh (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Time Narration
00:01 Welcome to this tutorial on More functions in the dplyr package.
00:07 In this tutorial,we will learn about the following functions in the dplyr package:
00:14 select
00:16 rename
00:18 mutate
00:20 To understand this tutorial, you should know,
00:24 Basics of statistics
00:27 Basics of ggplot2 package
00:31 Data frames
00:33 If not, please locate the relevant tutorials on R on this website.
00:40 This tutorial is recorded on
00:43 Ubuntu Linux OS version 16.04
00:48 R version 3.4.4
00:52 RStudio version 1.1.463
00:57 Install R version 3.2.0 or higher.
01:03 For this tutorial, we will use
01:06 A data frame moviesData.csv
01:11 A script file myVis.R.
01:16 Please download these files from the Code files link of this tutorial.
01:23 I have downloaded and moved these files to DataVis folder.
01:29 This folder is located in myProject folder on my Desktop.
01:36 I have also set the DataVis folder as my Working Directory.
01:42 Let us switch to RStudio.
01:45 Open the script myVis.R in RStudio.
01:50 We have already learnt how to use the filter and arrange functions in the dplyr package.
02:01 Run this script by clicking on the Source button.
02:06 movies data frame and other filtered data frames open in the Source window.
02:13 We will close all the data frames except movies.
02:19 In the Source window, scroll from left to right.
02:24 This will enable us to see the remaining objects of the movies data frame.
02:30 To select the required variables of a data frame we will use the select function.
02:37 It helps us to select only those variables that are required.
02:43 Here, we will use the select function to select title, genre, and imdb rating for all the movies.
02:56 Click on the script myVis.R
03:00 In the Source window, type the following command.
03:05 The first argument in the select function is the name of the data frame.

Here it is movies.

03:14 Other arguments are the variables which we will select for all the movies.
03:20 Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously.
03:31 moviesTGI opens in the Source window.
03:36 Here, title, genre, and imdb rating of all the movies are displayed.
03:46 Let us close moviesTGI data frame for now.
03:52 In the Source window, click on movies data frame.
03:57 Scroll the data frame from right to left to see other columns.
04:04 In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.
04:27 These variables provide information about the day, month and year of the theater release of the movies.
04:38 Let us select these three variables along with the title of all the movies.
04:45 Please note that all the theater-related variable names start with t h t r.
04:53 Click on the script myVis.R.
04:57 In the Source window, type the following command.
05:02 Here, we have used starts_with function.
05:08 It selects all the variables in the movies data frame, whose names start with t h t r.
05:16 Run the last two lines of code.
05:20 moviesTHT opens in the Source window.
05:25 Movies with their titles and theater-release information are shown.
05:33 Let us close moviesTHT data frame for now.
05:38 In the Source window, click on movies.
05:42 Let us change the name of the variable thtr_rel_year.
05:52 For that, we will use the rename function.
05:56 Click on the script myVis.R
06:00 In the Source window, type the following command.
06:05 Here, we are changing the name of the variable thtr_rel_year.
06:14 Run the last two lines of code.
06:19 moviesR opens in the Source window.
06:23 In the Source window, scroll from left to right.
06:28 Observe that the name of the variable thtr_rel_year has changed to rel_year.
06:41 Let us close the data frame moviesR for now.
06:46 In the Source window, click on movies.
06:50 In the Source window, scroll from left to right.
06:55 Suppose we want to add a new variable named CriAud to our movies data frame.
07:04 This variable should contain the difference between critics_score and audience_score.

For this, we will use the mutate function.

07:16 mutate function is used to add a new variable and preserve the existing one.
07:23 For simplicity let us remove the variables appearing after audience_score in the movies data frame.
07:33 In the Source window, scroll from right to left.
07:38 We need to select the variables from title to audience_score.
07:46 For this, we will use the select function.
07:50 Click on the script myVis.R
07:54 In the Source window, type the following command.
07:59 Run the current line.
08:02 Now, we will use the mutate function to add a new variable.
08:08 In the Source window, type the following command.
08:13 Remember, we are adding a new variable named CriAud in the movies data frame.
08:22 This is to store the difference of critics score and audience score.
08:29 Run the last two lines of code.
08:33 moviesMu opens in the Source window.
08:37 In the Source window, scroll from left to right.
08:42 A new variable named CriAud is added.
08:48 Let us summarize what we have learnt.
08:52 In this tutorial, we have learnt about the following functions available in the dplyr package:
09:00 select
09:01 rename
09:02 mutate
09:04 We now suggest an assignment.
09:08 Use the built-in data set airquality. Using select function select the variables Ozone, Wind, and Temp in this data set.
09:20 Use the built-in data set mtcars. Rename the variables mpg and cyl with MilesPerGallon and Cylinder, respectively.
09:33 The video at the following link summarises the Spoken Tutorial project.
09:37 Please download and watch it.
09:41 We conduct workshops using Spoken Tutorials and give certificates.
09:46 Please contact us.
09:49 Please post your timed queries in this forum.
09:54 Please post your general queries in this forum.
09:59 The FOSSEE team coordinates the TBC project.
10:02 For more details, please visit these sites.
10:07 The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
10:13 The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
10:20 This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh