Difference between revisions of "R/C2/More-Functions-in-dplyr-Package/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
Line 12: Line 12:
  
 
Opening Slide
 
Opening Slide
|| Welcome to this tutorial on '''More functions in the dplyr package.'''
+
|| Welcome to this tutorial on '''More functions in the dplyr package'''.
 
|-  
 
|-  
 
||  
 
||  
Line 26: Line 26:
  
 
|-  
 
|-  
||  
+
|| Show slide
 
+
Show slide
+
  
 
Pre-requisites
 
Pre-requisites
Line 42: Line 40:
 
System Specifications
 
System Specifications
 
|| This tutorial is recorded on
 
|| This tutorial is recorded on
* '''Ubuntu Linux '''OS version '''16.04'''
+
* '''Ubuntu Linux''' OS version '''16.04'''
 
* '''R''' version '''3.4.4'''
 
* '''R''' version '''3.4.4'''
 
* '''RStudio''' version '''1.1.463'''
 
* '''RStudio''' version '''1.1.463'''
Line 64: Line 62:
 
This folder is located in '''myProject''' folder on my '''Desktop'''.
 
This folder is located in '''myProject''' folder on my '''Desktop'''.
  
I have also set the '''DataVis''' folder as my '''Working Directory.'''  
+
I have also set the '''DataVis''' folder as my '''Working Directory'''.
 
|-  
 
|-  
 
||  
 
||  
 
|| Let us switch to '''RStudio'''.
 
|| Let us switch to '''RStudio'''.
 
|-  
 
|-  
|| Highlight '''myVis.R''' in the '''Files '''window of '''RStudio '''
+
|| Highlight '''myVis.R''' in the '''Files''' window of '''RStudio '''
|| Open the '''script myVis.R '''in''' RStudio'''.  
+
|| Open the '''script myVis.R''' in '''RStudio'''.  
 
|-  
 
|-  
 
|| Highlight '''filter''' and '''arrange''' in the Source window  
 
|| Highlight '''filter''' and '''arrange''' in the Source window  
|| We have already learnt how to use the '''filter''' and '''arrange functions''' in the '''dplyr package. '''
+
|| We have already learnt how to use the '''filter''' and '''arrange functions''' in the '''dplyr package'''.
 
|-  
 
|-  
 
|| Highlight the '''Source''' button
 
|| Highlight the '''Source''' button
Line 91: Line 89:
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window
 
|| Highlight '''movies''' in the '''Source''' window
|| To select the required variables of a''' data frame '''we will use the '''select function'''.
+
|| To select the required variables of a '''data frame ''' we will use the '''select function'''.
  
 
It helps us to select only those '''variables''' that are required.
 
It helps us to select only those '''variables''' that are required.
 
|-  
 
|-  
 
|| Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window  
 
|| Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window  
|| Here, we will use the '''select function''' to select '''title, genre''', and '''imdb rating''' for all the movies.  
+
|| Here, we will use the '''select function''' to select '''title''', '''genre''', and '''imdb rating''' for all the movies.  
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
Line 111: Line 109:
 
|-  
 
|-  
 
|| Highlight '''select''' in the '''Source''' window  
 
|| Highlight '''select''' in the '''Source''' window  
|| The first '''argument''' in the '''select function''' is the name of the '''data frame.'''
+
|| The first '''argument''' in the '''select function''' is the name of the '''data frame'''.
  
 
Here it is '''movies'''.  
 
Here it is '''movies'''.  
Line 121: Line 119:
  
 
Press '''Ctrl + Enter '''keys.
 
Press '''Ctrl + Enter '''keys.
|| Save the '''script''' and '''run''' the last two lines of code by pressing '''Ctrl + Enter '''keys simultaneously.  
+
|| Save the '''script''' and '''run''' the last two lines of code by pressing '''Ctrl + Enter''' keys simultaneously.  
 
|-  
 
|-  
 
|| Highlight '''moviesTGI''' in the '''Source''' window
 
|| Highlight '''moviesTGI''' in the '''Source''' window
|| '''moviesTGI '''opens in the '''Source''' window.  
+
|| '''moviesTGI''' opens in the '''Source''' window.  
 
|-  
 
|-  
 
|| Highlight the columns of '''moviesTGI '''in the '''Source''' window  
 
|| Highlight the columns of '''moviesTGI '''in the '''Source''' window  
|| Here, '''title, genre''', and '''imdb rating''' of all the movies are displayed.  
+
|| Here, '''title''', '''genre''', and '''imdb rating''' of all the movies are displayed.  
 
|-  
 
|-  
 
|| Highlight '''moviesTGI''' in the '''Source''' window  
 
|| Highlight '''moviesTGI''' in the '''Source''' window  
Line 133: Line 131:
 
|-  
 
|-  
 
|| Highlight movies in the '''Source''' window  
 
|| Highlight movies in the '''Source''' window  
|| In the '''Source''' window, click on '''movies data frame. '''
+
|| In the '''Source''' window, click on '''movies data frame'''.
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
Line 212: Line 210:
 
|-  
 
|-  
 
|| Highlight '''moviesR''' in the '''Source''' window  
 
|| Highlight '''moviesR''' in the '''Source''' window  
|| Let us close the '''data frame moviesR '''for now.  
+
|| Let us close the '''data frame moviesR''' for now.  
 
|-  
 
|-  
 
|| Highlight '''movies''' in the '''Source''' window  
 
|| Highlight '''movies''' in the '''Source''' window  
Line 236: Line 234:
 
|-  
 
|-  
 
|| Highlight '''title''' in the '''Source''' window  
 
|| Highlight '''title''' in the '''Source''' window  
|| We need to select the '''variables''' from '''title''' to '''audience_score. '''
+
|| We need to select the '''variables''' from '''title''' to '''audience_score'''.
  
 
For this, we will use the '''select function'''.  
 
For this, we will use the '''select function'''.  
Line 268: Line 266:
 
|| Remember, we are adding a new '''variable''' named '''CriAud '''in the '''movies data frame'''.
 
|| Remember, we are adding a new '''variable''' named '''CriAud '''in the '''movies data frame'''.
  
This is to store the difference of '''critics score '''and '''audience score'''.  
+
This is to store the difference of '''critics score''' and '''audience score'''.  
 
|-  
 
|-  
 
|| Highlight '''run''' button in the '''Source''' window
 
|| Highlight '''run''' button in the '''Source''' window
Line 274: Line 272:
 
|-  
 
|-  
 
|| Highlight '''moviesMu''' in the '''Source''' window
 
|| Highlight '''moviesMu''' in the '''Source''' window
|| '''moviesMu '''opens in the '''Source''' window.  
+
|| '''moviesMu''' opens in the '''Source''' window.  
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
Line 285: Line 283:
 
|| Let us summarize what we have learnt.
 
|| Let us summarize what we have learnt.
 
|-  
 
|-  
||  
+
|| Show slide
 
+
Show slide
+
  
 
Summary
 
Summary

Latest revision as of 13:06, 3 September 2019

Title of the script: More functions in the dplyr package

Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, dplyr, select function, starts_with function, filter, rename function, mutate function, video tutorial

Visual Cue Narration
Show slide

Opening Slide

Welcome to this tutorial on More functions in the dplyr package.

Show slide

Learning Objective

In this tutorial, we will learn about the following functions in the dplyr package:
  • select
  • rename
  • mutate
Show slide

Pre-requisites

To understand this tutorial, you should know,
  • Basics of statistics
  • Basics of ggplot2 package
  • Data frames

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file myVis.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and myVis.R in the folder DataVis

I have downloaded and moved these files to DataVis folder.

This folder is located in myProject folder on my Desktop.

I have also set the DataVis folder as my Working Directory.

Let us switch to RStudio.
Highlight myVis.R in the Files window of RStudio Open the script myVis.R in RStudio.
Highlight filter and arrange in the Source window We have already learnt how to use the filter and arrange functions in the dplyr package.
Highlight the Source button Run this script by clicking on the Source button.
Highlight movies in the Source window movies data frame and other filtered data frames open in the Source window.

We will close all the data frames except movies.

Highlight the scroll bar in the Source window

Click on the movies data frame >> scroll from left to right.

In the Source window, scroll from left to right.

This will enable us to see the remaining objects of the movies data frame.

Highlight movies in the Source window To select the required variables of a data frame we will use the select function.

It helps us to select only those variables that are required.

Highlight title, genre, and imdb_rating in the Source window Here, we will use the select function to select title, genre, and imdb rating for all the movies.
Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesTGI <- select(movies,

title, genre, imdb_rating)

View(moviesTGI)

In the Source window, type the following command.
Highlight select in the Source window The first argument in the select function is the name of the data frame.

Here it is movies.

Highlight select in the Source window Other arguments are the variables which we will select for all the movies.
Click on Save button to save the script.

Press Ctrl + Enter keys.

Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously.
Highlight moviesTGI in the Source window moviesTGI opens in the Source window.
Highlight the columns of moviesTGI in the Source window Here, title, genre, and imdb rating of all the movies are displayed.
Highlight moviesTGI in the Source window Let us close moviesTGI data frame for now.
Highlight movies in the Source window In the Source window, click on movies data frame.
Highlight the scroll bar in the Source window Scroll the data frame from right to left to see other columns.
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.

These variables provide information about the day, month and year of the theater release of the movies.

Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window Let us select these three variables along with the title of all the movies.

Please note that all the theater-related variable names start with t h t r.

Highlight the script myVis.R in the Source window Click on the script myVis.R.
[RStudio]

moviesTHT <- select(movies, title,

starts_with("thtr"))

View(moviesTHT)

In the Source window, type the following command.
Highlight start_with in the Source window Here, we have used starts_with function.

It selects all the variables in the movies data frame, whose names start with t h t r.

Highlight run button in the Source window Run the last two lines of code.
Highlight moviesTHT in the Source window moviesTHT opens in the Source window.

Movies with their titles and theater-release information are shown.

Highlight moviesTHT in the Source window Let us close moviesTHT data frame for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight thtr_rel_year in the Source window Let us change the name of the variable thtr_rel_year.

For that, we will use the rename function.

Highlight the script myVis.R in the Source window. Click on the script myVis.R
[RStudio]

moviesR <- rename(movies,

rel_year = "thtr_rel_year")

View(moviesR)

In the Source window, type the following command.
Highlight rename in the Source window Here, we are changing the name of the variable thtr_rel_year.
Highlight run button in the Source window Run the last two lines of code.
Highlight moviesR in the Source window moviesR opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight rel_year in the Source window Observe that the name of the variable thtr_rel_year has changed to rel_year.
Highlight moviesR in the Source window Let us close the data frame moviesR for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight critics_score and audience_score in the Source window Suppose we want to add a new variable named CriAud to our movies data frame.

This variable should contain the difference between critics_score and audience_score.

For this, we will use the mutate function.

mutate function is used to add a new variable and preserve the existing one.

Highlight audience_score in the Source window For simplicity let us remove the variables appearing after audience_score in the movies data frame.
Highlight the scroll bar in the Source window In the Source window, scroll from right to left.
Highlight title in the Source window We need to select the variables from title to audience_score.

For this, we will use the select function.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesLess <- select(movies,

title:audience_score)

In the Source window, type the following command.
Highlight run button in the Source window Run the current line.
Now, we will use the mutate function to add a new variable.
[RStudio]

moviesMu <- mutate(moviesLess,

CriAud = critics_score - audience_score)

View(moviesMu)

In the Source window, type the following command.
Highlight mutate in the Source window Remember, we are adding a new variable named CriAud in the movies data frame.

This is to store the difference of critics score and audience score.

Highlight run button in the Source window Run the last two lines of code.
Highlight moviesMu in the Source window moviesMu opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight CriAud in the Source window A new variable named CriAud is added.
Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt about the following functions available in the dplyr package:
  • select
  • rename
  • mutate
Show slide

Assignment

We now suggest an assignment.
  • Use the built-in data set airquality. Using select function select the variables Ozone, Wind, and Temp in this data set.
  • Use the built-in data set mtcars. Rename the variables mpg and cyl with MilesPerGallon and Cylinder, respectively.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Varshit Dubey (CoE Pune).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst