Difference between revisions of "R/C2/More-Functions-in-dplyr-Package/English"
Sudhakarst (Talk | contribs) |
Nancyvarkey (Talk | contribs) |
||
Line 20: | Line 20: | ||
Learning Objective | Learning Objective | ||
− | || In this tutorial, we will learn about the following functions in the '''dplyr''' | + | || In this tutorial, we will learn about the following '''functions''' in the '''dplyr package''': |
* '''select''' | * '''select''' | ||
* '''rename''' | * '''rename''' | ||
Line 33: | Line 33: | ||
|| To understand this tutorial, you should know, | || To understand this tutorial, you should know, | ||
* Basics of statistics | * Basics of statistics | ||
− | * Basics of '''ggplot2''' | + | * Basics of '''ggplot2 package''' |
− | * Data frames | + | * '''Data frames''' |
If not, please locate the relevant tutorials on '''R''' on this website. | If not, please locate the relevant tutorials on '''R''' on this website. | ||
Line 52: | Line 52: | ||
Download Files | Download Files | ||
|| For this tutorial, we will use | || For this tutorial, we will use | ||
− | * A '''data frame | + | * A '''data frame moviesData.csv''' |
* A '''script''' file '''myVis.R'''. | * A '''script''' file '''myVis.R'''. | ||
Line 69: | Line 69: | ||
|| Let us switch to '''RStudio'''. | || Let us switch to '''RStudio'''. | ||
|- | |- | ||
− | || Highlight '''myVis.R''' in the '''Files '''window | + | || Highlight '''myVis.R''' in the '''Files '''window of '''RStudio ''' |
|| Open the '''script myVis.R '''in''' RStudio'''. | || Open the '''script myVis.R '''in''' RStudio'''. | ||
|- | |- | ||
|| Highlight '''filter''' and '''arrange''' in the Source window | || Highlight '''filter''' and '''arrange''' in the Source window | ||
− | || We have already learnt how to use the '''filter''' and '''arrange''' | + | || We have already learnt how to use the '''filter''' and '''arrange functions''' in the '''dplyr package. ''' |
|- | |- | ||
|| Highlight the '''Source''' button | || Highlight the '''Source''' button | ||
Line 88: | Line 88: | ||
|| In the '''Source''' window, scroll from left to right. | || In the '''Source''' window, scroll from left to right. | ||
− | This will enable us to see the remaining objects of the '''movies | + | This will enable us to see the remaining objects of the '''movies data frame'''. |
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
− | || To select the required variables of a''' data frame '''we will use the '''select''' | + | || To select the required variables of a''' data frame '''we will use the '''select function'''. |
− | It helps us to select only those variables that are required. | + | It helps us to select only those '''variables''' that are required. |
|- | |- | ||
|| Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window | || Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window | ||
− | || Here, we will use the '''select''' | + | || Here, we will use the '''select function''' to select '''title, genre''', and '''imdb rating''' for all the movies. |
− | + | ||
− | '''title | + | |
|- | |- | ||
|| Highlight the '''script myVis.R''' in the '''Source''' window | || Highlight the '''script myVis.R''' in the '''Source''' window | ||
Line 110: | Line 108: | ||
'''View(moviesTGI)''' | '''View(moviesTGI)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''select''' in the '''Source''' window | || Highlight '''select''' in the '''Source''' window | ||
− | || The first argument in the '''select''' | + | || The first '''argument''' in the '''select function''' is the name of the '''data frame.''' |
+ | |||
+ | Here it is '''movies'''. | ||
|- | |- | ||
|| Highlight '''select''' in the '''Source''' window | || Highlight '''select''' in the '''Source''' window | ||
− | || Other arguments are the variables which we will select for all the | + | || Other '''arguments''' are the '''variables''' which we will select for all the movies. |
|- | |- | ||
− | || Click on '''Save''' button to save the script. | + | || Click on '''Save''' button to save the '''script'''. |
Press '''Ctrl + Enter '''keys. | Press '''Ctrl + Enter '''keys. | ||
− | || Save the '''script''' and run the last two lines of code by pressing '''Ctrl + Enter '''keys simultaneously. | + | || Save the '''script''' and '''run''' the last two lines of code by pressing '''Ctrl + Enter '''keys simultaneously. |
|- | |- | ||
|| Highlight '''moviesTGI''' in the '''Source''' window | || Highlight '''moviesTGI''' in the '''Source''' window | ||
Line 127: | Line 127: | ||
|- | |- | ||
|| Highlight the columns of '''moviesTGI '''in the '''Source''' window | || Highlight the columns of '''moviesTGI '''in the '''Source''' window | ||
− | || Here, '''title | + | || Here, '''title, genre''', and '''imdb rating''' of all the movies are displayed. |
|- | |- | ||
|| Highlight '''moviesTGI''' in the '''Source''' window | || Highlight '''moviesTGI''' in the '''Source''' window | ||
− | || Let us close '''moviesTGI''' | + | || Let us close '''moviesTGI data frame''' for now. |
|- | |- | ||
− | || Highlight | + | || Highlight movies in the '''Source''' window |
|| In the '''Source''' window, click on '''movies data frame. ''' | || In the '''Source''' window, click on '''movies data frame. ''' | ||
|- | |- | ||
|| Highlight the scroll bar in the '''Source''' window | || Highlight the scroll bar in the '''Source''' window | ||
− | || Scroll the data frame from right to left to see other columns. | + | || Scroll the '''data frame''' from right to left to see other columns. |
|- | |- | ||
− | || Highlight '''thtr_rel_year | + | || Highlight '''thtr_rel_year, thtr_rel_month, thtr_rel_day''' in the '''Source''' window |
− | || In the data frame, we can see the variables like '''thtr_rel_day, thtr_rel_month, thtr_rel_year'''. | + | || In the '''data frame''', we can see the '''variables''' like '''thtr_rel_day, thtr_rel_month, thtr_rel_year'''. |
− | These variables provide information about the day, month and year of the | + | These '''variables''' provide information about the day, month and year of the theater release of the movies. |
|- | |- | ||
|| Highlight '''thtr_rel_year''', '''thtr_rel_month''', '''thtr_rel_day''' in the '''Source''' window | || Highlight '''thtr_rel_year''', '''thtr_rel_month''', '''thtr_rel_day''' in the '''Source''' window | ||
− | || Let us select these three variables along with the '''title''' of all the | + | || Let us select these three '''variables''' along with the '''title''' of all the movies. |
− | Please note that all the | + | Please note that all the theater-related '''variable''' names start with '''t h t r'''. |
|- | |- | ||
|| Highlight the '''script myVis.R''' in the '''Source''' window | || Highlight the '''script myVis.R''' in the '''Source''' window | ||
Line 158: | Line 158: | ||
'''View(moviesTHT)''' | '''View(moviesTHT)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''start_with''' in the '''Source''' window | || Highlight '''start_with''' in the '''Source''' window | ||
− | || Here, we have used '''starts_with''' | + | || Here, we have used '''starts_with function'''. |
− | It selects all | + | It selects all the '''variables''' in the movies '''data frame''', whose names start with '''t h t r. ''' |
|- | |- | ||
|| Highlight '''run''' button in the '''Source''' window | || Highlight '''run''' button in the '''Source''' window | ||
− | || Run the last two lines of code. | + | || '''Run''' the last two lines of code. |
|- | |- | ||
|| Highlight '''moviesTHT''' in the '''Source''' window | || Highlight '''moviesTHT''' in the '''Source''' window | ||
|| '''moviesTHT '''opens in the '''Source''' window. | || '''moviesTHT '''opens in the '''Source''' window. | ||
− | + | Movies with their '''titles''' and theater-release information are shown. | |
|- | |- | ||
|| Highlight '''moviesTHT''' in the Source window | || Highlight '''moviesTHT''' in the Source window | ||
− | || Let us close '''moviesTHT''' | + | || Let us close '''moviesTHT data frame''' for now. |
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
− | || In the Source window, click on '''movies'''. | + | || In the '''Source''' window, click on '''movies'''. |
|- | |- | ||
|| Highlight '''thtr_rel_year''' in the '''Source''' window | || Highlight '''thtr_rel_year''' in the '''Source''' window | ||
− | || Let us change the name of the | + | || Let us change the name of the '''variable thtr_rel_year'''. |
− | For that, we will use the '''rename''' | + | For that, we will use the '''rename function'''. |
|- | |- | ||
− | || Highlight the '''script myVis.R''' in the '''Source''' window | + | || Highlight the '''script myVis.R''' in the '''Source''' window. |
|| Click on the '''script myVis.R ''' | || Click on the '''script myVis.R ''' | ||
|- | |- | ||
Line 194: | Line 194: | ||
'''View(moviesR)''' | '''View(moviesR)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''rename''' in the '''Source''' window | || Highlight '''rename''' in the '''Source''' window | ||
− | || Here, we are changing the name of the | + | || Here, we are changing the name of the '''variable thtr_rel_year'''. |
|- | |- | ||
|| Highlight '''run''' button in the '''Source''' window | || Highlight '''run''' button in the '''Source''' window | ||
− | || Run the last two lines of code. | + | || '''Run''' the last two lines of code. |
|- | |- | ||
|| Highlight '''moviesR''' in the '''Source''' window | || Highlight '''moviesR''' in the '''Source''' window | ||
Line 209: | Line 209: | ||
|- | |- | ||
|| Highlight '''rel_year''' in the '''Source''' window | || Highlight '''rel_year''' in the '''Source''' window | ||
− | || Observe that the name of the | + | || Observe that the name of the '''variable thtr_rel_year '''has changed to '''rel_year'''. |
|- | |- | ||
|| Highlight '''moviesR''' in the '''Source''' window | || Highlight '''moviesR''' in the '''Source''' window | ||
− | || Let us close the | + | || Let us close the '''data frame moviesR '''for now. |
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
Line 221: | Line 221: | ||
|- | |- | ||
|| Highlight '''critics_score''' and '''audience_score '''in the '''Source''' window | || Highlight '''critics_score''' and '''audience_score '''in the '''Source''' window | ||
− | || Suppose we want to add a new variable named '''CriAud''' to our '''movies''' | + | || Suppose we want to add a new '''variable''' named '''CriAud''' to our '''movies data frame'''. |
− | This variable should contain the difference between '''critics_score''' and '''audience_score'''. | + | This '''variable''' should contain the difference between '''critics_score''' and '''audience_score'''. |
− | For this, we will use the '''mutate''' | + | For this, we will use the '''mutate function'''. |
− | '''mutate''' | + | '''mutate function''' is used to add a new '''variable''' and preserve the existing one. |
|- | |- | ||
|| Highlight '''audience_score '''in the '''Source''' window | || Highlight '''audience_score '''in the '''Source''' window | ||
− | || For simplicity let us remove the variables appearing after '''audience_score '''in the '''movies''' | + | || For simplicity let us remove the '''variables''' appearing after '''audience_score '''in the '''movies data frame'''. |
|- | |- | ||
|| Highlight the scroll bar in the '''Source''' window | || Highlight the scroll bar in the '''Source''' window | ||
Line 236: | Line 236: | ||
|- | |- | ||
|| Highlight '''title''' in the '''Source''' window | || Highlight '''title''' in the '''Source''' window | ||
− | || We need to select the variables from '''title''' to '''audience_score. ''' | + | || We need to select the '''variables''' from '''title''' to '''audience_score. ''' |
− | For this, we will use the '''select''' | + | For this, we will use the '''select function'''. |
|- | |- | ||
|| Highlight the '''script myVis.R''' in the '''Source''' window | || Highlight the '''script myVis.R''' in the '''Source''' window | ||
Line 248: | Line 248: | ||
'''title:audience_score)''' | '''title:audience_score)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''run''' button in the '''Source''' window | || Highlight '''run''' button in the '''Source''' window | ||
− | || Run the current line. | + | || '''Run''' the current line. |
|- | |- | ||
|| | || | ||
− | || Now, we will use the '''mutate''' | + | || Now, we will use the '''mutate function''' to add a new '''variable'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 263: | Line 263: | ||
'''View(moviesMu)''' | '''View(moviesMu)''' | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
|- | |- | ||
|| Highlight '''mutate''' in the '''Source''' window | || Highlight '''mutate''' in the '''Source''' window | ||
− | || Remember, we are adding a new variable '''CriAud '''in the '''movies | + | || Remember, we are adding a new '''variable''' named '''CriAud '''in the '''movies data frame'''. |
− | This is | + | This is to store the difference of '''critics score '''and '''audience score'''. |
|- | |- | ||
|| Highlight '''run''' button in the '''Source''' window | || Highlight '''run''' button in the '''Source''' window | ||
− | || Run the last two lines of code. | + | || '''Run''' the last two lines of code. |
|- | |- | ||
|| Highlight '''moviesMu''' in the '''Source''' window | || Highlight '''moviesMu''' in the '''Source''' window | ||
Line 280: | Line 280: | ||
|- | |- | ||
|| Highlight '''CriAud '''in the '''Source''' window | || Highlight '''CriAud '''in the '''Source''' window | ||
− | || A new variable named '''CriAud''' is added. | + | || A new '''variable''' named '''CriAud''' is added. |
|- | |- | ||
|| | || | ||
Line 291: | Line 291: | ||
Summary | Summary | ||
− | || In this tutorial, we have learnt about the following functions available in the '''dplyr''' | + | || In this tutorial, we have learnt about the following '''functions''' available in the '''dplyr package: ''' |
* '''select''' | * '''select''' | ||
* '''rename''' | * '''rename''' | ||
Line 301: | Line 301: | ||
Assignment | Assignment | ||
|| We now suggest an assignment. | || We now suggest an assignment. | ||
− | * Use the built-in data set | + | * Use the''' built-in data set airquality'''. Using '''select function''' select the '''variables Ozone, Wind''', and '''Temp''' in this '''data set'''. |
− | * Use the built-in data set | + | * Use the '''built-in data set mtcars'''. Rename the '''variables mpg''' and '''cyl '''with '''MilesPerGallon''' and '''Cylinder''', respectively. |
|- | |- |
Revision as of 13:16, 29 August 2019
Title of the script: More functions in the dplyr package
Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, dplyr, select function, starts_with function, filter, rename function, mutate function, video tutorial
Visual Cue | Narration |
Show slide
Opening Slide |
Welcome to this tutorial on More functions in the dplyr package. |
Show slide Learning Objective |
In this tutorial, we will learn about the following functions in the dplyr package:
|
Show slide Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight moviesData.csv and myVis.R in the folder DataVis |
I have downloaded and moved these files to DataVis folder.
This folder is located in myProject folder on my Desktop. I have also set the DataVis folder as my Working Directory. |
Let us switch to RStudio. | |
Highlight myVis.R in the Files window of RStudio | Open the script myVis.R in RStudio. |
Highlight filter and arrange in the Source window | We have already learnt how to use the filter and arrange functions in the dplyr package. |
Highlight the Source button | Run this script by clicking on the Source button. |
Highlight movies in the Source window | movies data frame and other filtered data frames open in the Source window.
We will close all the data frames except movies. |
Highlight the scroll bar in the Source window
Click on the movies data frame >> scroll from left to right. |
In the Source window, scroll from left to right.
This will enable us to see the remaining objects of the movies data frame. |
Highlight movies in the Source window | To select the required variables of a data frame we will use the select function.
It helps us to select only those variables that are required. |
Highlight title, genre, and imdb_rating in the Source window | Here, we will use the select function to select title, genre, and imdb rating for all the movies. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesTGI <- select(movies, title, genre, imdb_rating) View(moviesTGI) |
In the Source window, type the following command. |
Highlight select in the Source window | The first argument in the select function is the name of the data frame.
Here it is movies. |
Highlight select in the Source window | Other arguments are the variables which we will select for all the movies. |
Click on Save button to save the script.
Press Ctrl + Enter keys. |
Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously. |
Highlight moviesTGI in the Source window | moviesTGI opens in the Source window. |
Highlight the columns of moviesTGI in the Source window | Here, title, genre, and imdb rating of all the movies are displayed. |
Highlight moviesTGI in the Source window | Let us close moviesTGI data frame for now. |
Highlight movies in the Source window | In the Source window, click on movies data frame. |
Highlight the scroll bar in the Source window | Scroll the data frame from right to left to see other columns. |
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window | In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.
These variables provide information about the day, month and year of the theater release of the movies. |
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window | Let us select these three variables along with the title of all the movies.
Please note that all the theater-related variable names start with t h t r. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R. |
[RStudio]
moviesTHT <- select(movies, title, starts_with("thtr")) View(moviesTHT) |
In the Source window, type the following command. |
Highlight start_with in the Source window | Here, we have used starts_with function.
It selects all the variables in the movies data frame, whose names start with t h t r. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesTHT in the Source window | moviesTHT opens in the Source window.
Movies with their titles and theater-release information are shown. |
Highlight moviesTHT in the Source window | Let us close moviesTHT data frame for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight thtr_rel_year in the Source window | Let us change the name of the variable thtr_rel_year.
For that, we will use the rename function. |
Highlight the script myVis.R in the Source window. | Click on the script myVis.R |
[RStudio]
moviesR <- rename(movies, rel_year = "thtr_rel_year") View(moviesR) |
In the Source window, type the following command. |
Highlight rename in the Source window | Here, we are changing the name of the variable thtr_rel_year. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesR in the Source window | moviesR opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight rel_year in the Source window | Observe that the name of the variable thtr_rel_year has changed to rel_year. |
Highlight moviesR in the Source window | Let us close the data frame moviesR for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight critics_score and audience_score in the Source window | Suppose we want to add a new variable named CriAud to our movies data frame.
This variable should contain the difference between critics_score and audience_score. For this, we will use the mutate function. mutate function is used to add a new variable and preserve the existing one. |
Highlight audience_score in the Source window | For simplicity let us remove the variables appearing after audience_score in the movies data frame. |
Highlight the scroll bar in the Source window | In the Source window, scroll from right to left. |
Highlight title in the Source window | We need to select the variables from title to audience_score.
For this, we will use the select function. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesLess <- select(movies, title:audience_score) |
In the Source window, type the following command. |
Highlight run button in the Source window | Run the current line. |
Now, we will use the mutate function to add a new variable. | |
[RStudio]
moviesMu <- mutate(moviesLess, CriAud = critics_score - audience_score) View(moviesMu) |
In the Source window, type the following command. |
Highlight mutate in the Source window | Remember, we are adding a new variable named CriAud in the movies data frame.
This is to store the difference of critics score and audience score. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesMu in the Source window | moviesMu opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight CriAud in the Source window | A new variable named CriAud is added. |
Let us summarize what we have learnt. | |
Show slide Summary |
In this tutorial, we have learnt about the following functions available in the dplyr package:
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |