Difference between revisions of "R/C2/More-Functions-in-dplyr-Package/English"
Nancyvarkey (Talk | contribs) |
|||
Line 12: | Line 12: | ||
Opening Slide | Opening Slide | ||
− | || Welcome to this tutorial on '''More functions in the dplyr package | + | || Welcome to this tutorial on '''More functions in the dplyr package'''. |
|- | |- | ||
|| | || | ||
Line 26: | Line 26: | ||
|- | |- | ||
− | || | + | || Show slide |
− | + | ||
− | Show slide | + | |
Pre-requisites | Pre-requisites | ||
Line 42: | Line 40: | ||
System Specifications | System Specifications | ||
|| This tutorial is recorded on | || This tutorial is recorded on | ||
− | * '''Ubuntu Linux '''OS version '''16.04''' | + | * '''Ubuntu Linux''' OS version '''16.04''' |
* '''R''' version '''3.4.4''' | * '''R''' version '''3.4.4''' | ||
* '''RStudio''' version '''1.1.463''' | * '''RStudio''' version '''1.1.463''' | ||
Line 64: | Line 62: | ||
This folder is located in '''myProject''' folder on my '''Desktop'''. | This folder is located in '''myProject''' folder on my '''Desktop'''. | ||
− | I have also set the '''DataVis''' folder as my '''Working Directory | + | I have also set the '''DataVis''' folder as my '''Working Directory'''. |
|- | |- | ||
|| | || | ||
|| Let us switch to '''RStudio'''. | || Let us switch to '''RStudio'''. | ||
|- | |- | ||
− | || Highlight '''myVis.R''' in the '''Files '''window of '''RStudio ''' | + | || Highlight '''myVis.R''' in the '''Files''' window of '''RStudio ''' |
− | || Open the '''script myVis.R '''in''' RStudio'''. | + | || Open the '''script myVis.R''' in '''RStudio'''. |
|- | |- | ||
|| Highlight '''filter''' and '''arrange''' in the Source window | || Highlight '''filter''' and '''arrange''' in the Source window | ||
− | || We have already learnt how to use the '''filter''' and '''arrange functions''' in the '''dplyr package | + | || We have already learnt how to use the '''filter''' and '''arrange functions''' in the '''dplyr package'''. |
|- | |- | ||
|| Highlight the '''Source''' button | || Highlight the '''Source''' button | ||
Line 91: | Line 89: | ||
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
− | || To select the required variables of a''' data frame '''we will use the '''select function'''. | + | || To select the required variables of a '''data frame ''' we will use the '''select function'''. |
It helps us to select only those '''variables''' that are required. | It helps us to select only those '''variables''' that are required. | ||
|- | |- | ||
|| Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window | || Highlight '''title''', '''genre''', and '''imdb_rating '''in the '''Source''' window | ||
− | || Here, we will use the '''select function''' to select '''title, genre''', and '''imdb rating''' for all the movies. | + | || Here, we will use the '''select function''' to select '''title''', '''genre''', and '''imdb rating''' for all the movies. |
|- | |- | ||
|| Highlight the '''script myVis.R''' in the '''Source''' window | || Highlight the '''script myVis.R''' in the '''Source''' window | ||
Line 111: | Line 109: | ||
|- | |- | ||
|| Highlight '''select''' in the '''Source''' window | || Highlight '''select''' in the '''Source''' window | ||
− | || The first '''argument''' in the '''select function''' is the name of the '''data frame | + | || The first '''argument''' in the '''select function''' is the name of the '''data frame'''. |
Here it is '''movies'''. | Here it is '''movies'''. | ||
Line 121: | Line 119: | ||
Press '''Ctrl + Enter '''keys. | Press '''Ctrl + Enter '''keys. | ||
− | || Save the '''script''' and '''run''' the last two lines of code by pressing '''Ctrl + Enter '''keys simultaneously. | + | || Save the '''script''' and '''run''' the last two lines of code by pressing '''Ctrl + Enter''' keys simultaneously. |
|- | |- | ||
|| Highlight '''moviesTGI''' in the '''Source''' window | || Highlight '''moviesTGI''' in the '''Source''' window | ||
− | || '''moviesTGI '''opens in the '''Source''' window. | + | || '''moviesTGI''' opens in the '''Source''' window. |
|- | |- | ||
|| Highlight the columns of '''moviesTGI '''in the '''Source''' window | || Highlight the columns of '''moviesTGI '''in the '''Source''' window | ||
− | || Here, '''title, genre''', and '''imdb rating''' of all the movies are displayed. | + | || Here, '''title''', '''genre''', and '''imdb rating''' of all the movies are displayed. |
|- | |- | ||
|| Highlight '''moviesTGI''' in the '''Source''' window | || Highlight '''moviesTGI''' in the '''Source''' window | ||
Line 133: | Line 131: | ||
|- | |- | ||
|| Highlight movies in the '''Source''' window | || Highlight movies in the '''Source''' window | ||
− | || In the '''Source''' window, click on '''movies data frame | + | || In the '''Source''' window, click on '''movies data frame'''. |
|- | |- | ||
|| Highlight the scroll bar in the '''Source''' window | || Highlight the scroll bar in the '''Source''' window | ||
Line 212: | Line 210: | ||
|- | |- | ||
|| Highlight '''moviesR''' in the '''Source''' window | || Highlight '''moviesR''' in the '''Source''' window | ||
− | || Let us close the '''data frame moviesR '''for now. | + | || Let us close the '''data frame moviesR''' for now. |
|- | |- | ||
|| Highlight '''movies''' in the '''Source''' window | || Highlight '''movies''' in the '''Source''' window | ||
Line 236: | Line 234: | ||
|- | |- | ||
|| Highlight '''title''' in the '''Source''' window | || Highlight '''title''' in the '''Source''' window | ||
− | || We need to select the '''variables''' from '''title''' to '''audience_score | + | || We need to select the '''variables''' from '''title''' to '''audience_score'''. |
For this, we will use the '''select function'''. | For this, we will use the '''select function'''. | ||
Line 268: | Line 266: | ||
|| Remember, we are adding a new '''variable''' named '''CriAud '''in the '''movies data frame'''. | || Remember, we are adding a new '''variable''' named '''CriAud '''in the '''movies data frame'''. | ||
− | This is to store the difference of '''critics score '''and '''audience score'''. | + | This is to store the difference of '''critics score''' and '''audience score'''. |
|- | |- | ||
|| Highlight '''run''' button in the '''Source''' window | || Highlight '''run''' button in the '''Source''' window | ||
Line 274: | Line 272: | ||
|- | |- | ||
|| Highlight '''moviesMu''' in the '''Source''' window | || Highlight '''moviesMu''' in the '''Source''' window | ||
− | || '''moviesMu '''opens in the '''Source''' window. | + | || '''moviesMu''' opens in the '''Source''' window. |
|- | |- | ||
|| Highlight the scroll bar in the '''Source''' window | || Highlight the scroll bar in the '''Source''' window | ||
Line 285: | Line 283: | ||
|| Let us summarize what we have learnt. | || Let us summarize what we have learnt. | ||
|- | |- | ||
− | || | + | || Show slide |
− | + | ||
− | Show slide | + | |
Summary | Summary |
Latest revision as of 13:06, 3 September 2019
Title of the script: More functions in the dplyr package
Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, dplyr, select function, starts_with function, filter, rename function, mutate function, video tutorial
Visual Cue | Narration |
Show slide
Opening Slide |
Welcome to this tutorial on More functions in the dplyr package. |
Show slide Learning Objective |
In this tutorial, we will learn about the following functions in the dplyr package:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight moviesData.csv and myVis.R in the folder DataVis |
I have downloaded and moved these files to DataVis folder.
This folder is located in myProject folder on my Desktop. I have also set the DataVis folder as my Working Directory. |
Let us switch to RStudio. | |
Highlight myVis.R in the Files window of RStudio | Open the script myVis.R in RStudio. |
Highlight filter and arrange in the Source window | We have already learnt how to use the filter and arrange functions in the dplyr package. |
Highlight the Source button | Run this script by clicking on the Source button. |
Highlight movies in the Source window | movies data frame and other filtered data frames open in the Source window.
We will close all the data frames except movies. |
Highlight the scroll bar in the Source window
Click on the movies data frame >> scroll from left to right. |
In the Source window, scroll from left to right.
This will enable us to see the remaining objects of the movies data frame. |
Highlight movies in the Source window | To select the required variables of a data frame we will use the select function.
It helps us to select only those variables that are required. |
Highlight title, genre, and imdb_rating in the Source window | Here, we will use the select function to select title, genre, and imdb rating for all the movies. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesTGI <- select(movies, title, genre, imdb_rating) View(moviesTGI) |
In the Source window, type the following command. |
Highlight select in the Source window | The first argument in the select function is the name of the data frame.
Here it is movies. |
Highlight select in the Source window | Other arguments are the variables which we will select for all the movies. |
Click on Save button to save the script.
Press Ctrl + Enter keys. |
Save the script and run the last two lines of code by pressing Ctrl + Enter keys simultaneously. |
Highlight moviesTGI in the Source window | moviesTGI opens in the Source window. |
Highlight the columns of moviesTGI in the Source window | Here, title, genre, and imdb rating of all the movies are displayed. |
Highlight moviesTGI in the Source window | Let us close moviesTGI data frame for now. |
Highlight movies in the Source window | In the Source window, click on movies data frame. |
Highlight the scroll bar in the Source window | Scroll the data frame from right to left to see other columns. |
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window | In the data frame, we can see the variables like thtr_rel_day, thtr_rel_month, thtr_rel_year.
These variables provide information about the day, month and year of the theater release of the movies. |
Highlight thtr_rel_year, thtr_rel_month, thtr_rel_day in the Source window | Let us select these three variables along with the title of all the movies.
Please note that all the theater-related variable names start with t h t r. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R. |
[RStudio]
moviesTHT <- select(movies, title, starts_with("thtr")) View(moviesTHT) |
In the Source window, type the following command. |
Highlight start_with in the Source window | Here, we have used starts_with function.
It selects all the variables in the movies data frame, whose names start with t h t r. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesTHT in the Source window | moviesTHT opens in the Source window.
Movies with their titles and theater-release information are shown. |
Highlight moviesTHT in the Source window | Let us close moviesTHT data frame for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight thtr_rel_year in the Source window | Let us change the name of the variable thtr_rel_year.
For that, we will use the rename function. |
Highlight the script myVis.R in the Source window. | Click on the script myVis.R |
[RStudio]
moviesR <- rename(movies, rel_year = "thtr_rel_year") View(moviesR) |
In the Source window, type the following command. |
Highlight rename in the Source window | Here, we are changing the name of the variable thtr_rel_year. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesR in the Source window | moviesR opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight rel_year in the Source window | Observe that the name of the variable thtr_rel_year has changed to rel_year. |
Highlight moviesR in the Source window | Let us close the data frame moviesR for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight critics_score and audience_score in the Source window | Suppose we want to add a new variable named CriAud to our movies data frame.
This variable should contain the difference between critics_score and audience_score. For this, we will use the mutate function. mutate function is used to add a new variable and preserve the existing one. |
Highlight audience_score in the Source window | For simplicity let us remove the variables appearing after audience_score in the movies data frame. |
Highlight the scroll bar in the Source window | In the Source window, scroll from right to left. |
Highlight title in the Source window | We need to select the variables from title to audience_score.
For this, we will use the select function. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesLess <- select(movies, title:audience_score) |
In the Source window, type the following command. |
Highlight run button in the Source window | Run the current line. |
Now, we will use the mutate function to add a new variable. | |
[RStudio]
moviesMu <- mutate(moviesLess, CriAud = critics_score - audience_score) View(moviesMu) |
In the Source window, type the following command. |
Highlight mutate in the Source window | Remember, we are adding a new variable named CriAud in the movies data frame.
This is to store the difference of critics score and audience score. |
Highlight run button in the Source window | Run the last two lines of code. |
Highlight moviesMu in the Source window | moviesMu opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight CriAud in the Source window | A new variable named CriAud is added. |
Let us summarize what we have learnt. | |
Show slide
Summary |
In this tutorial, we have learnt about the following functions available in the dplyr package:
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |