R/C2/More-Functions-in-dplyr-Package/English
Title of the script: More functions in the dplyr package
Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, dplyr, select function, starts_with function, filter, rename function, mutate function, video tutorial
Visual Cue | Narration |
Show slide
Opening Slide |
Welcome to this tutorial on More functions in the dplyr package. |
Show slide Learning Objective |
In this tutorial, we will learn about the following functions in the dplyr package:
|
Show slide Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight moviesData.csv and myVis.R in the folder DataVis |
I have downloaded and moved these files to DataVis folder.
This folder is located in myProject folder on my Desktop. I have also set the DataVis folder as my Working Directory. |
Let us switch to RStudio. | |
Highlight myVis.R in the Files window of RStudio | Open the script myVis.R in RStudio. |
Highlight filter and arrange in the Source window | We have already learnt how to use the filter and arrange functions in the dplyr package. |
Highlight the Source button | Run this script by clicking on the Source button. |
Highlight movies in the Source window | movies data frame and other filtered data frames open in the Source window.
We will close all the data frames except movies. |
Highlight the scroll bar in the Source window
Click on the movies data frame >> scroll from left to right. |
In the Source window, scroll from left to right.
This will enable us to see the remaining objects of the movies data frame. |
Highlight movies in the Source window | To select the required variables of a data frame we will use the select function.
It helps us to select only those variables that are required. |
Highlight title, genre, and imdb_rating in the Source window | Here, we will use the select function to select
title, genre, and imdb rating for all the movies. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesTGI <- select(movies, title, genre, imdb_rating) View(moviesTGI) |
In the Source window, type the following command. |
Highlight select in the Source window | The first argument in the select function is the name of the data frame. Here it is movies. |
Highlight select in the Source window | Other arguments are the variables which we will select for all the movies. |
Click on Save button to save the script.
Press Ctrl + Enter keys. |
Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously. |
Highlight moviesTGI in the Source window | moviesTGI opens in the Source window. |
Highlight the columns of moviesTGI in the Source window | Here, title, genre, and imdb rating of all the movies are displayed. |
Highlight moviesTGI in the Source window | Let us close moviesTGI data frame for now. |
Highlight movies in the Source window | In the Source window, click on movies data frame. |
Highlight the scroll bar in the Source window | Scroll the data frame from right to left to see other columns. |
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window | In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.
These variables provide information about the day, month and year of the theatre release of the movies. |
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window | Let us select these three variables along with the title of all the movies.
Please note that all the theatre related variable names start with t h t r. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R. |
[RStudio]
moviesTHT <- select(movies, title, starts_with("thtr")) View(moviesTHT) |
In the Source window, type the following command. |
Highlight start_with in the Source window | Here, we have used starts_with function.
It selects all the variables in the movies data frame, whose names start with t h t r. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesTHT in the Source window | moviesTHT opens in the Source window.
movies with their titles and theatre release information are shown. |
Highlight moviesTHT in the Source window | Let us close moviesTHT data frame for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight thtr_rel_year in the Source window | Let us change the name of the variable thtr_rel_year.
For that, we will use the rename function. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesR <- rename(movies, rel_year = "thtr_rel_year") View(moviesR) |
In the Source window, type the following command. |
Highlight rename in the Source window | Here, we are changing the name of the variable thtr_rel_year. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesR in the Source window | moviesR opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight rel_year in the Source window | Observe that the name of the variable thtr_rel_year has changed to rel_year. |
Highlight moviesR in the Source window | Let us close the data frame moviesR for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight critics_score and audience_score in the Source window | Suppose we want to add a new variable named CriAud to our movies data frame.
This variable should contain the difference between critics_score and audience_score. For this, we will use the mutate function. mutate function is used to add new variable and preserve existing one. |
Highlight audience_score in the Source window | For simplicity let us remove the variables appearing after audience_score in the movies data frame. |
Highlight the scroll bar in the Source window | In the Source window, scroll from right to left. |
Highlight title in the Source window | We need to select the variables from title to audience_score.
For this, we will use the select function. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesLess <- select(movies, title:audience_score) |
In the Source window, type the following command. |
Highlight run button in the Source window | Run the current line. |
Now, we will use the mutate function to add a new variable. | |
[RStudio]
moviesMu <- mutate(moviesLess, CriAud = critics_score - audience_score) View(moviesMu) |
In the Source window, type the following command. |
Highlight mutate in the Source window | Remember, we are adding a new variable CriAud in the movies data frame.
This is to store the difference of critics score and audience score. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesMu in the Source window | moviesMu opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight CriAud in the Source window | A new variable named CriAud is added. |
Let us summarize what we have learnt. | |
Show slide Summary |
In this tutorial, we have learnt about the following functions available in the dplyr package:
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |