R/C2/Data-Manipulation-using-dplyr-Package/English
Title of the script: Data Manipulation using dplyr package
Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data manipulation, dplyr, filter, video tutorial
Visual Cue | Narration |
Show slide
Opening Slide |
Welcome to this tutorial on Data manipulation using dplyr package. |
Show slide
Learning Objective |
In this tutorial, we will learn about,
|
Show slide Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight moviesData.csv and myVis.R in the folder DataVis |
I have downloaded and moved these files to DataVis folder.
This folder is located in myProject folder on my Desktop. I have also set DataVis folder as my Working Directory. |
Show slide
Need for Data Manipulation |
In real life, it is rare that we get the data in exactly the right form we need. |
Show slide
Need for Data Manipulation |
Often we’ll need to
|
We will learn how to achieve all this by using dplyr package. | |
Show slide
About dplyr Package |
|
Let us switch to RStudio. | |
Highlight myVis.R in the Files window of RStudio | Open the script myVis.R in RStudio. |
Highlight the Source button | Let us run this script by clicking on the Source button. |
Highlight movies in the Source window | movies data frame opens in the Source window.
This data frame will be used later in this tutorial. |
Now, we will install dplyr package. Please make sure that you are connected to the Internet. | |
[RStudio]
install.packages("dplyr") |
In the Console window, type the following command and press Enter. |
Highlight the red dot in the Console window | The installation of the package takes a few seconds.
We will wait while the package is being installed. |
Click at the top of the script myVis.R | To load this package, we will add the library at the top of the script. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
library(dplyr) Press Ctrl+Enter keys. |
At the top of the script, type library and dplyr in parentheses.
Save the script and run this line by pressing Ctrl+Enter keys simultaneously. |
Show slide
Functions in dplyr package |
Now we learn about some key functions in dplyr package:
|
Show slide
Functions in dplyr package |
All these functions can be combined with group underscore by function. It allows us to perform any operation by a group. |
Let us switch to RStudio. | |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right.
This will enable us to see the remaining objects of movies data frame. |
Highlight genre in the Source window | Suppose we want to filter the movies having genre as Comedy.
For this, we will use the filter function. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesComedy <- filter(movies, genre == "Comedy") |
In the Source window, type the following command. |
Highlight filter in the Source window | Recall that, filter function in dplyr package allows us to select cases based on their values. |
Highlight movies after filter in the Source window | Inside the filter function, the first argument is the name of the data frame which is movies. |
Highlight genre == "Comedy" in the Source window | The second argument is the value by which we want to filter the movies data frame. |
Highlight the Run button in the Source window | Save the script and run the current line. |
Highlight moviesComedy in the Environment window | Resulting data frame is stored in an object called moviesComedy in the Environment window.
Let us view the data frame moviesComedy to check whether it contains movies with genre as Comedy. |
[RStudio]
View(moviesComedy) |
In the Source window, type the following command. |
Highlight the Run button in the Source window | Run the current line. |
Highlight moviesComedy in the Source window | moviesComedy data frame opens in the Source window. |
Highlight genre in the Source window | All the movies having genre as Comedy have been filtered. |
Highlight moviesComedy in the Source window | Let us close this data frame moviesComedy for now. |
Highlight filter in the Source window | We can also use logical operators to combine two or more than two values. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight genre in the Source window | Suppose we want to filter the movies with genre as either Comedy or Drama. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesComDr <- filter(movies, genre == "Comedy" | genre == "Drama") View(moviesComDr) |
In the Source window, type the following commands. |
Highlight filter in the Source widow | Here, we have two values by which we would like to filter movies data frame. |
Highlight | in the Source window | For this, we have used a logical OR operator. |
Highlight the Run button in the Source window | Run the last two lines of code. |
Highlight moviesComDr in the Source window | moviesComDr opens in the Source window.
The movies having genre as either Comedy or Drama have been filtered. |
Highlight moviesComDr in the Source window | Let us close this data frame moviesComDr for now. |
Highlight moviesComDr <- filter(movies, genre == "Comedy" | genre == "Drama") in the Source window | This filter function can also be written using the match operator. |
[RStudio]
moviesComDrP <- filter(movies, genre %in% c("Comedy", "Drama")) View(moviesComDrP) |
In the Source window, type the following command. |
Highlight %in% in the Source window | %in% is used for value matching. |
[RStudio]
help('%in%') |
To know more about this operator, let us access the Help.
In the Console window, type the following command and press Enter. |
Highlight the Run button in the Source window | Run the last two lines of code. |
Highlight moviesComDrP in the Source window | moviesComDrP opens in the Source window.
The movies having genre as either Comedy or Drama have been filtered. |
Highlight moviesComDrP in the Source window | Let us close this data frame moviesComDrP for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight genre and imdb_rating in the Source window | Let us now filter movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesComIm <- filter(movies, genre == "Comedy" & imdb_rating >= 7.5) View(moviesComIm) |
In the Source window, type the following command. |
Highlight genre == "Comedy" & imdb_rating >= 7.5 in the Source window | Here, we have used a logical AND operator to include both conditions. |
Highlight the Run button in the Source window | Save the script and run the last two lines of code. |
Highlight moviesComIm in the Source window | moviesComIm opens in the Source window.
I will resize the Console window. There are seven movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5. |
Highlight moviesComIm in the Source window | Let us close this data frame moviesComIm for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight imdb_rating in the Source window | Suppose, we want to arrange the movies in an ascending order of imdb underscore rating.
For this, we will use the arrange function. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesImA <- arrange(movies, imdb_rating) View(moviesImA) |
In the Source window, type the following command. |
Highlight the Run button in the Source window | Run the last two lines of code. |
Highlight moviesImA in the Source window | moviesImA opens in the Source window. |
Highlight imdb_rating in the Source window | In the Source window, scroll from left to right and locate the imdb underscore rating column.
The movies have been arranged in ascending order of imdb underscore rating. |
Highlight imdb_rating in the Source window | Now, let us say we want to arrange the movies in descending order of imdb rating.
For this, we use desc function. |
Highlight moviesImA in the Source window | Let us close this data frame moviesImA for now. |
[RStudio]
moviesImD <- arrange(movies, desc(imdb_rating)) View(moviesImD) |
In the Source window, type the following command. |
Highlight the Run button in the Source window | Run the last two lines of code. |
Highlight moviesImD in the Source window | moviesImD opens in the Source window. |
Highlight imdb_rating in the Source window | In the Source window, scroll from left to right and locate the imdb underscore rating column.
The movies have been arranged in descending order of imdb rating. |
Highlight moviesImD in the Source window | Let us close this data frame moviesImD for now. |
Highlight movies in the Source window | In the Source window, click on movies. |
Highlight genre and imdb_rating in the Source window | Suppose we want to arrange the movies both by genre and imdb rating. |
Highlight the script myVis.R in the Source window | Click on the script myVis.R |
[RStudio]
moviesGeIm <- arrange(movies, genre, imdb_rating) View(moviesGeIm) |
In the Source window, type the following commands. |
Highlight the Run button in the Source window | Run the last two lines of code. |
Highlight moviesGeIm in the Source window | moviesGeIm opens in the Source window. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right.
Movies have been arranged both by genre and imdb underscore rating. |
Let us summarize what we have learnt. | |
Show slide Summary |
In this tutorial, we have learnt about:
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |