R/C2/Pipe-Operator/English-timed

Time	Narration
00:01	Welcome to this tutorial on Pipe Operator.
00:06	In this tutorial,we will learn about:
00:10	summarise and group_by functions
00:14	Operations in summarise function
00:18	Pipe operator
00:20	To understand this tutorial, you should know,
00:25	Basics of statistics
00:28	Basics of ggplot2 and dplyr packages
00:34	Data frames
00:36	If not, please locate the relevant tutorials on R on this website.
00:43	This tutorial is recorded on
00:45	Ubuntu Linux OS version 16.04
00:51	R version 3.4.4
00:55	RStudio version 1.1.463
01:01	Install R version 3.2.0 or higher.
01:07	For this tutorial, we will use
01:10	A data frame moviesData.csv
01:15	A script file myPipe.R.
01:20	Please download these files from the Code files link of this tutorial.
01:27	I have downloaded and moved these files to pipeOps folder.
01:33	This folder is located in myProject folder on my Desktop.
01:40	I have also set pipeOps folder as my Working Directory.
01:46	Now we learn about summarise function.
01:50	summarise function reduces a data frame into a single row.
01:56	It gives summaries like mean, median, etc. of the variables available in the data frame.
02:05	We use summarise along with group_by function.
02:11	Let us switch to RStudio.
02:15	Open the script myPipe.R in RStudio.
02:21	Run this script by clicking on the Source button.
02:26	movies data frame opens in the Source window.
02:31	In the movies dataframe, scroll from left to right.
02:37	This will enable us to see the remaining objects of the movies data frame.
02:44	To know the mean of imdb_rating of all movies, we will use summarise function.
02:52	Click on the script myPipe.R
02:57	In the Source window, type the following command.
03:02	Inside the summarise function, the first argument is a data frame to be summarised.
03:09	Here, it is movies.
03:12	The second argument is the information we need, that is the mean of imdb_rating.
03:21	Save the script and run the current line by pressing Ctrl+Enter keys simultaneously.
03:31	The mean value is shown.
03:34	One will argue that I can find the mean by using mean function along with dollar operator.
03:43	What is the use of installing a whole package and using a complex function?
03:49	Basically, we do not use summarise function for computing such things.
03:56	This function is not useful unless we pair it with group by function.
04:03	When we use group_by function, the data frame gets divided into different groups.
04:12	Let us switch back to RStudio.
04:16	In the Source window, click on movies data frame.
04:21	In the movies data frame, scroll from right to left.
04:27	We will group the movies data frame based on the genre.
04:33	For this, we will use group underscore by function.
04:39	Click on the script myPipe.R
04:43	In the Source window, type the following command.
04:48	Run the current line.
04:51	A new data frame groupMovies is stored.
04:56	Now, we will use summarise function on this data frame.
05:02	In the Source window, type the following command.
05:07	Run the current line.
05:10	I will resize the Console window
05:14	The mean values of all movies in different genres are displayed.
05:21	Notice that, Documentary genre has the highest mean imdb_rating.
05:28	And Comedy genre has the lowest mean imdb_rating.
05:34	I will resize the Console window
05:38	In the Source window, click on movies data frame.
05:43	In the movies data frame, scroll from left to right.
05:49	Let us find the mean imdb_rating distribution for the movies of Drama genre.
05:58	Also, we will group movies of Drama genre by mpaa_rating.
06:05	For this, we will use filter, group_by, and summarise functions one by one.
06:12	Click on the script myPipe.R
06:17	In the Source window, type the following commands.
06:23	First, we will extract the movies of Drama genre.
06:29	Then, we group these movies based on mpaa_rating.
06:35	Finally, we apply summarise function.
06:39	This will calculate the mean of the filtered and grouped movies.
06:46	Run the last three lines of code.
06:50	I will resize the Console window
06:54	The required mean values are printed on the console.
06:59	I will resize the Console window again.
07:03	In this code, we have to give names to each and every intermediate data frame.
07:10	But there is an alternate method to write these statements using the pipe operator.
07:17	The pipe operator is denoted as %>%.
07:25	It prevents us from making unnecessary data frames.
07:30	We can read the pipe as a series of imperative statements.
07:35	If we have to find the cosine of sine of pi, we can write
07:42	Let us switch to RStudio.
07:45	We will learn how to do the same analysis by using the pipe operator.
07:51	In the Source window, type the following command.
07:56	Here three lines of code have been written as a series of statements.
08:02	We can read this code as,
08:06	Using the movies data frame, filter the movies of Drama genre
08:13	Next, group the filtered movies by mpaa_rating
08:19	Finally, summarise the mean of imdb_rating of the grouped data.
08:26	This code is easier to read and write than the previous one.
08:32	In the case of pipe operator, we don’t have to repeat the name of the data frame.
08:39	Notice that we have written name of the data frame only once.
08:45	Save the script and run the current line.
08:50	I will resize the Console window.
08:54	The required mean values are printed on the Console.
08:59	I will resize the Console window again.
09:03	In the Source window, click on movies data frame.
09:08	In the Source window, scroll from left to right.
09:13	Let us check what is the difference between critics_score and audience_score of all the movies.
09:22	We will use a box plot for our study.
09:26	By using the pipe operator, we can combine the functions of ggplot2 and dplyr packages.
09:34	Click on the script myPipe.R
09:38	In the Source window, type the following command.
09:43	Save the script and run the current line.
09:49	The required box plot appears in the Plots window.
09:54	In the Plots window, click on the Zoom button to maximize the plot.
10:00	Here you can see that for the genres Drama, Horror, and Mystery & Suspense movies, the median is close to zero.
10:14	This means that the audience and critics opinions are very similar for these genres.
10:22	Whereas for Action & adventure and Comedy movies, the median is not close to zero.
10:30	This means that the audience and critics opinions are different for these genres.
10:37	Close this plot.
10:39	In the Source window, click on movies data frame.
10:44	In the Source window, scroll from right to left.
10:49	Let us check the number of movies in every category of mpaa_rating of each genre.
10:57	Click on the script myPipe.R
11:01	In the Source window, type the following command.
11:06	Notice that we have included both genre and mpaa_rating in group_by function.
11:15	So, the analysis will be done on the data divided by these 2 variables.
11:22	Also, we have used num = n().
11:27	The function n computes the number of times the event with specific condition has happened.
11:35	Run the current line.
11:38	I will resize the Console window.
11:42	From the output, we can see that there are 22 Action and Adventure movies with mpaa_rating as R.
11:53	Let us summarize what we have learnt.
11:57	In this tutorial, we have learnt about:
12:00	summarise and group_by functions
12:04	Operations in summarise function
12:08	Pipe operator
12:10	We now suggest an assignment.
12:14	Use the built-in data set iris.
12:18	Using the pipe operator, group the flowers by their species.
12:24	Summarise the grouped data by the mean of Sepal.Length and Sepal.Width.
12:33	The video at the following link summarises the Spoken Tutorial project.
12:37	Please download and watch it.
12:41	We conduct workshops using Spoken Tutorials and give certificates.
12:46	Please contact us.
12:49	Please post your timed queries in this forum.
12:54	Please post your general queries in this forum.
12:59	The FOSSEE team coordinates the TBC project.
13:03	For more details, please visit these sites.
13:07	The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
13:13	The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
13:20	This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh

R/C2/Pipe-Operator/English-timed

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools