Difference between revisions of "R/C2/More-Functions-in-dplyr-Package/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
Line 20: Line 20:
 
Learning Objective
 
Learning Objective
  
|| In this tutorial, we will learn about the following functions in the '''dplyr''' package:  
+
|| In this tutorial, we will learn about the following '''functions''' in the '''dplyr package''':  
 
* '''select'''
 
* '''select'''
 
* '''rename'''
 
* '''rename'''
Line 33: Line 33:
 
|| To understand this tutorial, you should know,  
 
|| To understand this tutorial, you should know,  
 
* Basics of statistics  
 
* Basics of statistics  
* Basics of '''ggplot2''' package
+
* Basics of '''ggplot2 package'''
* Data frames  
+
* '''Data frames'''
  
 
If not, please locate the relevant tutorials on '''R''' on this website.
 
If not, please locate the relevant tutorials on '''R''' on this website.
Line 52: Line 52:
 
Download Files
 
Download Files
 
|| For this tutorial, we will use
 
|| For this tutorial, we will use
* A '''data frame''' '''moviesData.csv'''
+
* A '''data frame moviesData.csv'''
 
* A '''script''' file '''myVis.R'''.
 
* A '''script''' file '''myVis.R'''.
  
Line 69: Line 69:
 
|| Let us switch to '''RStudio'''.
 
|| Let us switch to '''RStudio'''.
 
|-  
 
|-  
|| Highlight '''myVis.R''' in the '''Files '''window''' '''of '''RStudio '''
+
|| Highlight '''myVis.R''' in the '''Files '''window of '''RStudio '''
 
|| Open the '''script myVis.R '''in''' RStudio'''.  
 
|| Open the '''script myVis.R '''in''' RStudio'''.  
 
|-  
 
|-  
 
|| Highlight '''filter''' and '''arrange''' in the Source window  
 
|| Highlight '''filter''' and '''arrange''' in the Source window  
|| We have already learnt how to use the '''filter''' and '''arrange''' functions in the '''dplyr''' package.
+
|| We have already learnt how to use the '''filter''' and '''arrange functions''' in the '''dplyr package. '''
 
|-  
 
|-  
 
|| Highlight the '''Source''' button
 
|| Highlight the '''Source''' button
Line 88: Line 88:
 
|| In the '''Source''' window, scroll from left to right.  
 
|| In the '''Source''' window, scroll from left to right.  
  
This will enable us to see the remaining objects of the '''movies''' '''data frame'''.  
+
This will enable us to see the remaining objects of the '''movies data frame'''.  
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window
 
|| Highlight '''movies''' in the '''Source''' window
|| To select the required variables of a''' data frame '''we will use the '''select''' function.
+
|| To select the required variables of a''' data frame '''we will use the '''select function'''.
  
It helps us to select only those variables that are required.
+
It helps us to select only those '''variables''' that are required.
 
|-  
 
|-  
 
|| Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window  
 
|| Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window  
|| Here, we will use the '''select''' function to select  
+
|| Here, we will use the '''select function''' to select '''title, genre''', and '''imdb rating''' for all the movies.  
 
+
'''title''', '''genre''', and '''imdb rating''' for all the '''movies'''.  
+
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
Line 110: Line 108:
  
 
'''View(moviesTGI)'''
 
'''View(moviesTGI)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''select''' in the '''Source''' window  
 
|| Highlight '''select''' in the '''Source''' window  
|| The first argument in the '''select''' function is the name of the '''data frame. '''Here it is '''movies'''.  
+
|| The first '''argument''' in the '''select function''' is the name of the '''data frame.'''
 +
 
 +
Here it is '''movies'''.  
 
|-  
 
|-  
 
|| Highlight '''select''' in the '''Source''' window  
 
|| Highlight '''select''' in the '''Source''' window  
|| Other arguments are the variables which we will select for all the '''movies'''.  
+
|| Other '''arguments''' are the '''variables''' which we will select for all the movies.  
 
|-  
 
|-  
|| Click on '''Save''' button to save the script.
+
|| Click on '''Save''' button to save the '''script'''.
  
 
Press '''Ctrl + Enter '''keys.
 
Press '''Ctrl + Enter '''keys.
|| Save the '''script''' and run the last two lines of code by pressing '''Ctrl + Enter '''keys simultaneously.  
+
|| Save the '''script''' and '''run''' the last two lines of code by pressing '''Ctrl + Enter '''keys simultaneously.  
 
|-  
 
|-  
 
|| Highlight '''moviesTGI''' in the '''Source''' window
 
|| Highlight '''moviesTGI''' in the '''Source''' window
Line 127: Line 127:
 
|-  
 
|-  
 
|| Highlight the columns of '''moviesTGI '''in the '''Source''' window  
 
|| Highlight the columns of '''moviesTGI '''in the '''Source''' window  
|| Here, '''title''', '''genre''', and '''imdb rating''' of all the '''movies''' are displayed.  
+
|| Here, '''title, genre''', and '''imdb rating''' of all the movies are displayed.  
 
|-  
 
|-  
 
|| Highlight '''moviesTGI''' in the '''Source''' window  
 
|| Highlight '''moviesTGI''' in the '''Source''' window  
|| Let us close '''moviesTGI''' data frame for now.  
+
|| Let us close '''moviesTGI data frame''' for now.  
 
|-  
 
|-  
|| Highlight '''movies''' in the '''Source''' window  
+
|| Highlight movies in the '''Source''' window  
 
|| In the '''Source''' window, click on '''movies data frame. '''
 
|| In the '''Source''' window, click on '''movies data frame. '''
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
|| Scroll the data frame from right to left to see other columns.  
+
|| Scroll the '''data frame''' from right to left to see other columns.  
 
|-  
 
|-  
|| Highlight '''thtr_rel_year''', '''thtr_rel_month''', '''thtr_rel_day''' in the '''Source''' window  
+
|| Highlight '''thtr_rel_year, thtr_rel_month, thtr_rel_day''' in the '''Source''' window  
|| In the data frame, we can see the variables like '''thtr_rel_day, thtr_rel_month, thtr_rel_year'''.
+
|| In the '''data frame''', we can see the '''variables''' like '''thtr_rel_day, thtr_rel_month, thtr_rel_year'''.
  
These variables provide information about the day, month and year of the theatre release of the '''movies.'''
+
These '''variables''' provide information about the day, month and year of the theater release of the movies.
 
|-  
 
|-  
 
|| Highlight '''thtr_rel_year''', '''thtr_rel_month''', '''thtr_rel_day''' in the '''Source''' window  
 
|| Highlight '''thtr_rel_year''', '''thtr_rel_month''', '''thtr_rel_day''' in the '''Source''' window  
|| Let us select these three variables along with the '''title''' of all the '''movies'''.  
+
|| Let us select these three '''variables''' along with the '''title''' of all the movies.  
  
Please note that all the theatre related variable names start with '''t h t r'''.  
+
Please note that all the theater-related '''variable''' names start with '''t h t r'''.  
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
Line 158: Line 158:
  
 
'''View(moviesTHT)'''
 
'''View(moviesTHT)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''start_with''' in the '''Source''' window
 
|| Highlight '''start_with''' in the '''Source''' window
|| Here, we have used '''starts_with''' function.  
+
|| Here, we have used '''starts_with function'''.  
  
It selects all the variables in the '''movies''' '''data frame''', whose names start with '''t h t r. '''
+
It selects all the '''variables''' in the movies '''data frame''', whose names start with '''t h t r. '''
 
|-  
 
|-  
 
|| Highlight '''run''' button in the '''Source''' window
 
|| Highlight '''run''' button in the '''Source''' window
|| Run the last two lines of code.
+
|| '''Run''' the last two lines of code.
 
|-  
 
|-  
 
|| Highlight '''moviesTHT''' in the '''Source''' window
 
|| Highlight '''moviesTHT''' in the '''Source''' window
 
|| '''moviesTHT '''opens in the '''Source''' window.  
 
|| '''moviesTHT '''opens in the '''Source''' window.  
  
'''movies''' with their '''titles''' and theatre release information are shown.
+
Movies with their '''titles''' and theater-release information are shown.
 
|-  
 
|-  
 
|| Highlight '''moviesTHT''' in the Source window  
 
|| Highlight '''moviesTHT''' in the Source window  
|| Let us close '''moviesTHT''' data frame for now.  
+
|| Let us close '''moviesTHT data frame''' for now.  
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
|| In the Source window, click on '''movies'''.  
+
|| In the '''Source''' window, click on '''movies'''.  
 
|-  
 
|-  
 
|| Highlight '''thtr_rel_year''' in the '''Source''' window  
 
|| Highlight '''thtr_rel_year''' in the '''Source''' window  
|| Let us change the name of the variable '''thtr_rel_year'''.  
+
|| Let us change the name of the '''variable thtr_rel_year'''.  
  
For that, we will use the '''rename''' function.  
+
For that, we will use the '''rename function'''.  
 
|-  
 
|-  
|| Highlight the '''script myVis.R''' in the '''Source''' window  
+
|| Highlight the '''script myVis.R''' in the '''Source''' window.
 
|| Click on the '''script myVis.R '''
 
|| Click on the '''script myVis.R '''
 
|-  
 
|-  
Line 194: Line 194:
  
 
'''View(moviesR)'''
 
'''View(moviesR)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''rename''' in the '''Source''' window  
 
|| Highlight '''rename''' in the '''Source''' window  
|| Here, we are changing the name of the variable '''thtr_rel_year'''.  
+
|| Here, we are changing the name of the '''variable thtr_rel_year'''.  
 
|-  
 
|-  
 
|| Highlight '''run''' button in the '''Source''' window
 
|| Highlight '''run''' button in the '''Source''' window
|| Run the last two lines of code.
+
|| '''Run''' the last two lines of code.
 
|-  
 
|-  
 
|| Highlight '''moviesR''' in the '''Source''' window
 
|| Highlight '''moviesR''' in the '''Source''' window
Line 209: Line 209:
 
|-  
 
|-  
 
|| Highlight '''rel_year''' in the '''Source''' window  
 
|| Highlight '''rel_year''' in the '''Source''' window  
|| Observe that the name of the variable '''thtr_rel_year '''has changed to '''rel_year'''.  
+
|| Observe that the name of the '''variable thtr_rel_year '''has changed to '''rel_year'''.  
 
|-  
 
|-  
 
|| Highlight '''moviesR''' in the '''Source''' window  
 
|| Highlight '''moviesR''' in the '''Source''' window  
|| Let us close the data frame '''moviesR '''for now.  
+
|| Let us close the '''data frame moviesR '''for now.  
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
Line 221: Line 221:
 
|-  
 
|-  
 
|| Highlight '''critics_score''' and '''audience_score '''in the '''Source''' window  
 
|| Highlight '''critics_score''' and '''audience_score '''in the '''Source''' window  
|| Suppose we want to add a new variable named '''CriAud''' to our '''movies''' data frame.  
+
|| Suppose we want to add a new '''variable''' named '''CriAud''' to our '''movies data frame'''.  
  
This variable should contain the difference between '''critics_score''' and '''audience_score'''.  
+
This '''variable''' should contain the difference between '''critics_score''' and '''audience_score'''.  
  
For this, we will use the '''mutate''' function.  
+
For this, we will use the '''mutate function'''.  
  
'''mutate''' function is used to add new variable and preserve existing one.
+
'''mutate function''' is used to add a new '''variable''' and preserve the existing one.
 
|-  
 
|-  
 
|| Highlight '''audience_score '''in the '''Source''' window
 
|| Highlight '''audience_score '''in the '''Source''' window
|| For simplicity let us remove the variables appearing after '''audience_score '''in the '''movies''' data frame.  
+
|| For simplicity let us remove the '''variables''' appearing after '''audience_score '''in the '''movies data frame'''.  
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
Line 236: Line 236:
 
|-  
 
|-  
 
|| Highlight '''title''' in the '''Source''' window  
 
|| Highlight '''title''' in the '''Source''' window  
|| We need to select the variables from '''title''' to '''audience_score. '''
+
|| We need to select the '''variables''' from '''title''' to '''audience_score. '''
  
For this, we will use the '''select''' function.  
+
For this, we will use the '''select function'''.  
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
Line 248: Line 248:
  
 
'''title:audience_score)'''
 
'''title:audience_score)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''run''' button in the '''Source''' window
 
|| Highlight '''run''' button in the '''Source''' window
|| Run the current line.
+
|| '''Run''' the current line.
 
|-  
 
|-  
 
||  
 
||  
|| Now, we will use the '''mutate''' function to add a new variable.  
+
|| Now, we will use the '''mutate function''' to add a new '''variable'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 263: Line 263:
  
 
'''View(moviesMu)'''
 
'''View(moviesMu)'''
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight '''mutate''' in the '''Source''' window  
 
|| Highlight '''mutate''' in the '''Source''' window  
|| Remember, we are adding a new variable '''CriAud '''in the '''movies''' data frame'''. '''
+
|| Remember, we are adding a new '''variable''' named '''CriAud '''in the '''movies data frame'''.
  
This is''' '''to store the difference of '''critics score '''and '''audience score'''.  
+
This is to store the difference of '''critics score '''and '''audience score'''.  
 
|-  
 
|-  
 
|| Highlight '''run''' button in the '''Source''' window
 
|| Highlight '''run''' button in the '''Source''' window
|| Run the last two lines of code.
+
|| '''Run''' the last two lines of code.
 
|-  
 
|-  
 
|| Highlight '''moviesMu''' in the '''Source''' window
 
|| Highlight '''moviesMu''' in the '''Source''' window
Line 280: Line 280:
 
|-  
 
|-  
 
|| Highlight '''CriAud '''in the '''Source''' window  
 
|| Highlight '''CriAud '''in the '''Source''' window  
|| A new variable named '''CriAud''' is added.  
+
|| A new '''variable''' named '''CriAud''' is added.  
 
|-  
 
|-  
 
||  
 
||  
Line 291: Line 291:
 
Summary
 
Summary
  
|| In this tutorial, we have learnt about the following functions available in the '''dplyr''' package:
+
|| In this tutorial, we have learnt about the following '''functions''' available in the '''dplyr package: '''
 
* '''select'''
 
* '''select'''
 
* '''rename'''
 
* '''rename'''
Line 301: Line 301:
 
Assignment
 
Assignment
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
* Use the built-in data set '''airquality'''. Using '''select''' function select the variables '''Ozone''', '''Wind''', and '''Temp''' in this data set.  
+
* Use the''' built-in data set airquality'''. Using '''select function''' select the '''variables Ozone, Wind''', and '''Temp''' in this '''data set'''.  
* Use the built-in data set '''mtcars'''. Rename the variables '''mpg''' and '''cyl '''with '''MilesPerGallon''' and '''Cylinder''', respectively.
+
* Use the '''built-in data set mtcars'''. Rename the '''variables mpg''' and '''cyl '''with '''MilesPerGallon''' and '''Cylinder''', respectively.
  
 
|-  
 
|-  

Revision as of 13:16, 29 August 2019

Title of the script: More functions in the dplyr package

Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, dplyr, select function, starts_with function, filter, rename function, mutate function, video tutorial

Visual Cue Narration
Show slide

Opening Slide

Welcome to this tutorial on More functions in the dplyr package.

Show slide

Learning Objective

In this tutorial, we will learn about the following functions in the dplyr package:
  • select
  • rename
  • mutate

Show slide

Pre-requisites

To understand this tutorial, you should know,
  • Basics of statistics
  • Basics of ggplot2 package
  • Data frames

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file myVis.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and myVis.R in the folder DataVis

I have downloaded and moved these files to DataVis folder.

This folder is located in myProject folder on my Desktop.

I have also set the DataVis folder as my Working Directory.

Let us switch to RStudio.
Highlight myVis.R in the Files window of RStudio Open the script myVis.R in RStudio.
Highlight filter and arrange in the Source window We have already learnt how to use the filter and arrange functions in the dplyr package.
Highlight the Source button Run this script by clicking on the Source button.
Highlight movies in the Source window movies data frame and other filtered data frames open in the Source window.

We will close all the data frames except movies.

Highlight the scroll bar in the Source window

Click on the movies data frame >> scroll from left to right.

In the Source window, scroll from left to right.

This will enable us to see the remaining objects of the movies data frame.

Highlight movies in the Source window To select the required variables of a data frame we will use the select function.

It helps us to select only those variables that are required.

Highlight title, genre, and imdb_rating in the Source window Here, we will use the select function to select title, genre, and imdb rating for all the movies.
Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesTGI <- select(movies,

title, genre, imdb_rating)

View(moviesTGI)

In the Source window, type the following command.
Highlight select in the Source window The first argument in the select function is the name of the data frame.

Here it is movies.

Highlight select in the Source window Other arguments are the variables which we will select for all the movies.
Click on Save button to save the script.

Press Ctrl + Enter keys.

Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously.
Highlight moviesTGI in the Source window moviesTGI opens in the Source window.
Highlight the columns of moviesTGI in the Source window Here, title, genre, and imdb rating of all the movies are displayed.
Highlight moviesTGI in the Source window Let us close moviesTGI data frame for now.
Highlight movies in the Source window In the Source window, click on movies data frame.
Highlight the scroll bar in the Source window Scroll the data frame from right to left to see other columns.
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.

These variables provide information about the day, month and year of the theater release of the movies.

Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window Let us select these three variables along with the title of all the movies.

Please note that all the theater-related variable names start with t h t r.

Highlight the script myVis.R in the Source window Click on the script myVis.R.
[RStudio]

moviesTHT <- select(movies, title,

starts_with("thtr"))

View(moviesTHT)

In the Source window, type the following command.
Highlight start_with in the Source window Here, we have used starts_with function.

It selects all the variables in the movies data frame, whose names start with t h t r.

Highlight run button in the Source window Run the last two lines of code.
Highlight moviesTHT in the Source window moviesTHT opens in the Source window.

Movies with their titles and theater-release information are shown.

Highlight moviesTHT in the Source window Let us close moviesTHT data frame for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight thtr_rel_year in the Source window Let us change the name of the variable thtr_rel_year.

For that, we will use the rename function.

Highlight the script myVis.R in the Source window. Click on the script myVis.R
[RStudio]

moviesR <- rename(movies,

rel_year = "thtr_rel_year")

View(moviesR)

In the Source window, type the following command.
Highlight rename in the Source window Here, we are changing the name of the variable thtr_rel_year.
Highlight run button in the Source window Run the last two lines of code.
Highlight moviesR in the Source window moviesR opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight rel_year in the Source window Observe that the name of the variable thtr_rel_year has changed to rel_year.
Highlight moviesR in the Source window Let us close the data frame moviesR for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight critics_score and audience_score in the Source window Suppose we want to add a new variable named CriAud to our movies data frame.

This variable should contain the difference between critics_score and audience_score.

For this, we will use the mutate function.

mutate function is used to add a new variable and preserve the existing one.

Highlight audience_score in the Source window For simplicity let us remove the variables appearing after audience_score in the movies data frame.
Highlight the scroll bar in the Source window In the Source window, scroll from right to left.
Highlight title in the Source window We need to select the variables from title to audience_score.

For this, we will use the select function.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesLess <- select(movies,

title:audience_score)

In the Source window, type the following command.
Highlight run button in the Source window Run the current line.
Now, we will use the mutate function to add a new variable.
[RStudio]

moviesMu <- mutate(moviesLess,

CriAud = critics_score - audience_score)

View(moviesMu)

In the Source window, type the following command.
Highlight mutate in the Source window Remember, we are adding a new variable named CriAud in the movies data frame.

This is to store the difference of critics score and audience score.

Highlight run button in the Source window Run the last two lines of code.
Highlight moviesMu in the Source window moviesMu opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight CriAud in the Source window A new variable named CriAud is added.
Let us summarize what we have learnt.

Show slide

Summary

In this tutorial, we have learnt about the following functions available in the dplyr package:
  • select
  • rename
  • mutate
Show slide

Assignment

We now suggest an assignment.
  • Use the built-in data set airquality. Using select function select the variables Ozone, Wind, and Temp in this data set.
  • Use the built-in data set mtcars. Rename the variables mpg and cyl with MilesPerGallon and Cylinder, respectively.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Varshit Dubey (CoE Pune).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst