R/C2/Pipe-Operator/English
Title of the script: Pipe operator
Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, dplyr package, ggplot2, summarise function, group_by function, pipe operator, boxplot, video tutorial
Visual Cue’’’ | Narration’’’ |
Show slide
Opening Slide |
Welcome to this tutorial on Pipe Operator. |
Show slide
Learning Objective |
In this tutorial, we will learn about:
|
Show slide Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download Files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight moviesData.csv and myPipe.R in the folder pipeOps |
I have downloaded and moved these files to pipeOps folder.
This folder is located in myProject folder on my Desktop. I have also set pipeOps folder as my Working Directory. |
Show slide
summarise function |
Now we learn about summarise function.
|
Let us switch to RStudio. | |
Highlight myPipe.R in the Files window of RStudio | Open the script myPipe.R in RStudio. |
Highlight the Source button | Run this script by clicking on the Source button. |
Highlight movies in the Source window | movies data frame opens in the Source window. |
Highlight the scroll bar in the Source window | In the movies dataframe, scroll from left to right.
This will enable us to see the remaining objects of the movies data frame. |
Highlight imdb_rating in the Source window | To know the mean of imdb_rating of all movies, we will use summarise function. |
Highlight the script myPipe.R in the Source window | Click on the script myPipe.R |
[RStudio]
summarise(movies, mean(imdb_rating)) |
In the Source window, type the following command. |
Highlight summarise in the Source window | Inside the summarise function, the first argument is a data frame to be summarised.
Here, it is movies. The second argument is the information we need, that is the mean of imdb_rating. |
Highlight the Run button in the Source window | Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight output in the Console window
|
The mean value is shown.
|
Highlight summarise in the Source window | Basically, we do not use summarise function for computing such things.
|
Show slide
group_by() function |
When we use group_by function, the data frame gets divided into different groups. |
Let us switch back to RStudio. | |
Highlight movies in the Source window | In the Source window, click on movies data frame. |
Highlight the scroll bar in the Source window | In the movies data frame, scroll from right to left. |
Highlight genre in the Source window | We will group the movies data frame based on the genre.
|
Highlight the script myPipe.R in the Source window | Click on the script myPipe.R |
[RStudio]
groupMovies <- group_by(movies, genre) |
In the Source window, type the following command. |
Highlight Run button in the Source window | Run the current line. |
Highlight groupMovies in the Environment window | A new data frame groupMovies is stored.
Now, we will use summarise function on this data frame. |
[RStudio]
summarise(groupMovies, mean(imdb_rating)) |
In the Source window, type the following command. |
Highlight Run button in the Source window | Run the current line. |
I will resize the Console window | |
Highlight output in the Console window.
|
The mean values of all movies in different genres are displayed.
|
I will resize the Console window | |
Highlight movies in the Source window | In the Source window, click on movies data frame. |
Highlight the scroll bar in the Source window | In the movies data frame window, scroll from left to right. |
Point to imdb_rating, genre and mpaa_rating in the movies dataframe. | Let us find the mean imdb_rating distribution for the movies of Drama genre.
|
Highlight the script myPipe.R in the Source window | Click on the script myPipe.R |
[RStudio]
dramaMov <- filter(movies, genre == "Drama") gr_dramaMov <- group_by(dramaMov, mpaa_rating) summarise(gr_dramaMov, mean(imdb_rating)) |
In the Source window, type the following commands. |
Highlight filter(movies, genre == "Drama") in the Source window | First, we will extract the movies of Drama genre. |
Highlight gr_dramaMov <- group_by(dramaMov, mpaa_rating) in the Source window | Then, we group these movies based on mpaa_rating. |
Highlight summarise(gr_dramaMov, mean(imdb_rating)) in the Source window | Finally, we apply summarise function.
|
Highlight Run button in the Source window | Run the last three lines of code. |
I will resize the Console window | |
Highlight output in the Console window | The required mean values are printed on the console. |
I will resize the Console window again. | |
Highlight the last three lines of code in the Source window | In this code, we have to give names to each and every intermediate data frame.
|
Show slide
Pipe operator |
* The pipe operator is denoted as %>%.
|
Show slide
Example of pipe operator |
If we have to find the cosine of sine of pi, we can write
pi %>% sin() %>% cos() |
Let us switch to RStudio. | |
Highlight the last three lines of code in the Source window | We will learn how to do the same analysis by using the pipe operator. |
[RStudio]
movies %>% filter(genre=="Drama") %>% group_by(mpaa_rating) %>% summarise(mean(imdb_rating)) |
In the Source window, type the following command. |
Highlight movies %>% filter(genre=="Drama") %>%
group_by(mpaa_rating) %>% summarise(mean(imdb_rating)) in the Source window |
Here three lines of code have been written as a series of statements.
|
Highlight movies %>% filter(genre=="Drama") %>%
group_by(mpaa_rating) %>% summarise(mean(imdb_rating)) in the Source window |
This code is easier to read and write than the previous one.
|
Highlight movies %>% filter(genre=="Drama") %>%
group_by(mpaa_rating) %>% summarise(mean(imdb_rating)) in the Source window |
Save the script and run the current line. |
I will resize the Console window. | |
Highlight output on the Console window | The required mean values are printed on the Console. |
I will resize the Console window again. | |
Highlight movies in the Source window | In the Source window, click on movies data frame. |
Highlight the scroll bar in the Source window | In the Source window, scroll from left to right. |
Highlight audience_score and critics_score in movies | Let us check what is the difference between audience_score and critics_score of all the movies.
|
Highlight the script myPipe.R in the Source window | Click on the script myPipe.R |
[RStudio]
movies %>% mutate(diff = audience_score - critics_score) %>% ggplot(mapping = aes(x=genre, y=diff)) + geom_boxplot() |
In the Source window, type the following command. |
Highlight the Run button in the Source window | Save the script and run the current line. |
Highlight Plots window | The required box plot appears in the Plots window. |
Highlight Plots window | In the Plots window, click on the Zoom button to maximize the plot. |
Highlight Plots window, highlight drama, horror, and mystery & suspense | Here you can see that for the genres Drama, Horror, and Mystery & Suspense movies, the median is close to zero.
|
Highlight Plots window. highlight action & adventure and comedy | Whereas for Action & adventure and Comedy movies, the median is not close to zero.
|
Highlight the close button in the Plot Zoom window | Close this plot. |
Highlight movies in the Source window | Click on movies data frame. |
Highlight the scroll bar in the Source window | In the Source window, scroll from right to left. |
Highlight mpaa_rating and genre in movies | Let us check the number of movies in every category of mpaa_rating of each genre. |
Highlight the script myPipe.R in the Source window | Click on the script myPipe.R |
[RStudio]
movies %>% group_by(genre, mpaa_rating) %>% summarise(num = n()) |
In the Source window, type the following command. |
Highlight group_by in movies %>% group_by(genre, mpaa_rating) %>%
summarise(num = n()) in the Source window |
Notice that we have included both genre and mpaa_rating in group_by function.
|
Highlight summarise(num = n()) in movies %>% group_by(genre, mpaa_rating) %>%
summarise(num = n()) in the Source window |
We used num = n().
|
Highlight the Run button in the Source window | Run the current line. |
I will resize the Console window. | |
Highlight the output in the Console window | From the output, we can see that there are 22 Action and Adventure movies with mpaa_rating as R. |
Let us summarize what we have learnt. | |
Show slide Summary |
In this tutorial, we have learnt about:* summarise and group_by functions
|
Show slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
|
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
|