R/C2/More-Functions-in-dplyr-Package/English

From Script | Spoken-Tutorial
Revision as of 16:22, 28 August 2019 by Sudhakarst (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of the script: More functions in the dplyr package

Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, dplyr, select function, starts_with function, filter, rename function, mutate function, video tutorial


Visual Cue Narration
Show slide

Opening Slide

Welcome to this tutorial on More functions in the dplyr package.

Show slide

Learning Objective

In this tutorial, we will learn about the following functions in the dplyr package:
  • select
  • rename
  • mutate

Show slide

Pre-requisites

To understand this tutorial, you should know,
  • Basics of statistics
  • Basics of ggplot2 package
  • Data frames

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file myVis.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and myVis.R in the folder DataVis

I have downloaded and moved these files to DataVis folder.

This folder is located in myProject folder on my Desktop.

I have also set the DataVis folder as my Working Directory.

Let us switch to RStudio.
Highlight myVis.R in the Files window of RStudio Open the script myVis.R in RStudio.
Highlight filter and arrange in the Source window We have already learnt how to use the filter and arrange functions in the dplyr package.
Highlight the Source button Run this script by clicking on the Source button.
Highlight movies in the Source window movies data frame and other filtered data frames open in the Source window.


We will close all the data frames except movies.

Highlight the scroll bar in the Source window


Click on the movies data frame >> scroll from left to right.

In the Source window, scroll from left to right.


This will enable us to see the remaining objects of the movies data frame.

Highlight movies in the Source window To select the required variables of a data frame we will use the select function.

It helps us to select only those variables that are required.

Highlight title, genre, and imdb_rating in the Source window Here, we will use the select function to select

title, genre, and imdb rating for all the movies.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesTGI <- select(movies,

title, genre, imdb_rating)

View(moviesTGI)

In the Source window, type the following command.
Highlight select in the Source window The first argument in the select function is the name of the data frame. Here it is movies.
Highlight select in the Source window Other arguments are the variables which we will select for all the movies.
Click on Save button to save the script.

Press Ctrl + Enter keys.

Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously.
Highlight moviesTGI in the Source window moviesTGI opens in the Source window.
Highlight the columns of moviesTGI in the Source window Here, title, genre, and imdb rating of all the movies are displayed.
Highlight moviesTGI in the Source window Let us close moviesTGI data frame for now.
Highlight movies in the Source window In the Source window, click on movies data frame.
Highlight the scroll bar in the Source window Scroll the data frame from right to left to see other columns.
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.


These variables provide information about the day, month and year of the theatre release of the movies.

Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window Let us select these three variables along with the title of all the movies.


Please note that all the theatre related variable names start with t h t r.

Highlight the script myVis.R in the Source window Click on the script myVis.R.
[RStudio]

moviesTHT <- select(movies, title,

starts_with("thtr"))

View(moviesTHT)

In the Source window, type the following command.
Highlight start_with in the Source window Here, we have used starts_with function.

It selects all the variables in the movies data frame, whose names start with t h t r.

Highlight run button in the Source window Run the last two lines of code.
Highlight moviesTHT in the Source window moviesTHT opens in the Source window.

movies with their titles and theatre release information are shown.

Highlight moviesTHT in the Source window Let us close moviesTHT data frame for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight thtr_rel_year in the Source window Let us change the name of the variable thtr_rel_year.

For that, we will use the rename function.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesR <- rename(movies,

rel_year = "thtr_rel_year")

View(moviesR)

In the Source window, type the following command.
Highlight rename in the Source window Here, we are changing the name of the variable thtr_rel_year.
Highlight run button in the Source window Run the last two lines of code.
Highlight moviesR in the Source window moviesR opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight rel_year in the Source window Observe that the name of the variable thtr_rel_year has changed to rel_year.
Highlight moviesR in the Source window Let us close the data frame moviesR for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight critics_score and audience_score in the Source window Suppose we want to add a new variable named CriAud to our movies data frame.

This variable should contain the difference between critics_score and audience_score.

For this, we will use the mutate function.

mutate function is used to add new variable and preserve existing one.

Highlight audience_score in the Source window For simplicity let us remove the variables appearing after audience_score in the movies data frame.
Highlight the scroll bar in the Source window In the Source window, scroll from right to left.
Highlight title in the Source window We need to select the variables from title to audience_score.

For this, we will use the select function.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesLess <- select(movies,

title:audience_score)

In the Source window, type the following command.
Highlight run button in the Source window Run the current line.
Now, we will use the mutate function to add a new variable.
[RStudio]

moviesMu <- mutate(moviesLess,

CriAud = critics_score - audience_score)

View(moviesMu)

In the Source window, type the following command.
Highlight mutate in the Source window Remember, we are adding a new variable CriAud in the movies data frame.

This is to store the difference of critics score and audience score.

Highlight run button in the Source window Run the last two lines of code.
Highlight moviesMu in the Source window moviesMu opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight CriAud in the Source window A new variable named CriAud is added.
Let us summarize what we have learnt.

Show slide

Summary

In this tutorial, we have learnt about the following functions available in the dplyr package:
  • select
  • rename
  • mutate
Show slide

Assignment

We now suggest an assignment.
  • Use the built-in data set airquality. Using select function select the variables Ozone, Wind, and Temp in this data set.
  • Use the built-in data set mtcars. Rename the variables mpg and cyl with MilesPerGallon and Cylinder, respectively.


Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Varshit Dubey (CoE Pune).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst