R/C2/Pipe-Operator/English-timed
From Script | Spoken-Tutorial
| Time | Narration |
| 00:01 | Welcome to this tutorial on Pipe Operator. |
| 00:06 | In this tutorial,we will learn about: |
| 00:10 | summarise and group_by functions |
| 00:14 | Operations in summarise function |
| 00:18 | Pipe operator |
| 00:20 | To understand this tutorial, you should know, |
| 00:25 | Basics of statistics |
| 00:28 | Basics of ggplot2 and dplyr packages |
| 00:34 | Data frames |
| 00:36 | If not, please locate the relevant tutorials on R on this website. |
| 00:43 | This tutorial is recorded on |
| 00:45 | Ubuntu Linux OS version 16.04 |
| 00:51 | R version 3.4.4 |
| 00:55 | RStudio version 1.1.463 |
| 01:01 | Install R version 3.2.0 or higher. |
| 01:07 | For this tutorial, we will use |
| 01:10 | A data frame moviesData.csv |
| 01:15 | A script file myPipe.R. |
| 01:20 | Please download these files from the Code files link of this tutorial. |
| 01:27 | I have downloaded and moved these files to pipeOps folder. |
| 01:33 | This folder is located in myProject folder on my Desktop. |
| 01:40 | I have also set pipeOps folder as my Working Directory. |
| 01:46 | Now we learn about summarise function. |
| 01:50 | summarise function reduces a data frame into a single row. |
| 01:56 | It gives summaries like mean, median, etc. of the variables available in the data frame. |
| 02:05 | We use summarise along with group_by function. |
| 02:11 | Let us switch to RStudio. |
| 02:15 | Open the script myPipe.R in RStudio. |
| 02:21 | Run this script by clicking on the Source button. |
| 02:26 | movies data frame opens in the Source window. |
| 02:31 | In the movies dataframe, scroll from left to right. |
| 02:37 | This will enable us to see the remaining objects of the movies data frame. |
| 02:44 | To know the mean of imdb_rating of all movies, we will use summarise function. |
| 02:52 | Click on the script myPipe.R |
| 02:57 | In the Source window, type the following command. |
| 03:02 | Inside the summarise function, the first argument is a data frame to be summarised. |
| 03:09 | Here, it is movies. |
| 03:12 | The second argument is the information we need, that is the mean of imdb_rating. |
| 03:21 | Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
| 03:31 | The mean value is shown. |
| 03:34 | One will argue that I can find the mean by using mean function along with dollar operator. |
| 03:43 | What is the use of installing a whole package and using a complex function? |
| 03:49 | Basically, we do not use summarise function for computing such things. |
| 03:56 | This function is not useful unless we pair it with group by function. |
| 04:03 | When we use group_by function, the data frame gets divided into different groups. |
| 04:12 | Let us switch back to RStudio. |
| 04:16 | In the Source window, click on movies data frame. |
| 04:21 | In the movies data frame, scroll from right to left. |
| 04:27 | We will group the movies data frame based on the genre. |
| 04:33 | For this, we will use group underscore by function. |
| 04:39 | Click on the script myPipe.R |
| 04:43 | In the Source window, type the following command. |
| 04:48 | Run the current line. |
| 04:51 | A new data frame groupMovies is stored. |
| 04:56 | Now, we will use summarise function on this data frame. |
| 05:02 | In the Source window, type the following command. |
| 05:07 | Run the current line. |
| 05:10 | I will resize the Console window |
| 05:14 | The mean values of all movies in different genres are displayed. |
| 05:21 | Notice that, Documentary genre has the highest mean imdb_rating. |
| 05:28 | And Comedy genre has the lowest mean imdb_rating. |
| 05:34 | I will resize the Console window |
| 05:38 | In the Source window, click on movies data frame. |
| 05:43 | In the movies data frame, scroll from left to right. |
| 05:49 | Let us find the mean imdb_rating distribution for the movies of Drama genre. |
| 05:58 | Also, we will group movies of Drama genre by mpaa_rating. |
| 06:05 | For this, we will use filter, group_by, and summarise functions one by one. |
| 06:12 | Click on the script myPipe.R |
| 06:17 | In the Source window, type the following commands. |
| 06:23 | First, we will extract the movies of Drama genre. |
| 06:29 | Then, we group these movies based on mpaa_rating. |
| 06:35 | Finally, we apply summarise function. |
| 06:39 | This will calculate the mean of the filtered and grouped movies. |
| 06:46 | Run the last three lines of code. |
| 06:50 | I will resize the Console window |
| 06:54 | The required mean values are printed on the console. |
| 06:59 | I will resize the Console window again. |
| 07:03 | In this code, we have to give names to each and every intermediate data frame. |
| 07:10 | But there is an alternate method to write these statements using the pipe operator. |
| 07:17 | The pipe operator is denoted as %>%. |
| 07:25 | It prevents us from making unnecessary data frames. |
| 07:30 | We can read the pipe as a series of imperative statements. |
| 07:35 | If we have to find the cosine of sine of pi, we can write |
| 07:42 | Let us switch to RStudio. |
| 07:45 | We will learn how to do the same analysis by using the pipe operator. |
| 07:51 | In the Source window, type the following command. |
| 07:56 | Here three lines of code have been written as a series of statements. |
| 08:02 | We can read this code as, |
| 08:06 | Using the movies data frame, filter the movies of Drama genre |
| 08:13 | Next, group the filtered movies by mpaa_rating |
| 08:19 | Finally, summarise the mean of imdb_rating of the grouped data. |
| 08:26 | This code is easier to read and write than the previous one. |
| 08:32 | In the case of pipe operator, we don’t have to repeat the name of the data frame. |
| 08:39 | Notice that we have written name of the data frame only once. |
| 08:45 | Save the script and run the current line. |
| 08:50 | I will resize the Console window. |
| 08:54 | The required mean values are printed on the Console. |
| 08:59 | I will resize the Console window again. |
| 09:03 | In the Source window, click on movies data frame. |
| 09:08 | In the Source window, scroll from left to right. |
| 09:13 | Let us check what is the difference between critics_score and audience_score of all the movies. |
| 09:22 | We will use a box plot for our study. |
| 09:26 | By using the pipe operator, we can combine the functions of ggplot2 and dplyr packages. |
| 09:34 | Click on the script myPipe.R |
| 09:38 | In the Source window, type the following command. |
| 09:43 | Save the script and run the current line. |
| 09:49 | The required box plot appears in the Plots window. |
| 09:54 | In the Plots window, click on the Zoom button to maximize the plot. |
| 10:00 | Here you can see that for the genres Drama, Horror, and Mystery & Suspense movies, the median is close to zero. |
| 10:14 | This means that the audience and critics opinions are very similar for these genres. |
| 10:22 | Whereas for Action & adventure and Comedy movies, the median is not close to zero. |
| 10:30 | This means that the audience and critics opinions are different for these genres. |
| 10:37 | Close this plot. |
| 10:39 | In the Source window, click on movies data frame. |
| 10:44 | In the Source window, scroll from right to left. |
| 10:49 | Let us check the number of movies in every category of mpaa_rating of each genre. |
| 10:57 | Click on the script myPipe.R |
| 11:01 | In the Source window, type the following command. |
| 11:06 | Notice that we have included both genre and mpaa_rating in group_by function. |
| 11:15 | So, the analysis will be done on the data divided by these 2 variables. |
| 11:22 | Also, we have used num = n(). |
| 11:27 | The function n computes the number of times the event with specific condition has happened. |
| 11:35 | Run the current line. |
| 11:38 | I will resize the Console window. |
| 11:42 | From the output, we can see that there are 22 Action and Adventure movies with mpaa_rating as R. |
| 11:53 | Let us summarize what we have learnt. |
| 11:57 | In this tutorial, we have learnt about: |
| 12:00 | summarise and group_by functions |
| 12:04 | Operations in summarise function |
| 12:08 | Pipe operator |
| 12:10 | We now suggest an assignment. |
| 12:14 | Use the built-in data set iris. |
| 12:18 | Using the pipe operator, group the flowers by their species. |
| 12:24 | Summarise the grouped data by the mean of Sepal.Length and Sepal.Width. |
| 12:33 | The video at the following link summarises the Spoken Tutorial project. |
| 12:37 | Please download and watch it. |
| 12:41 | We conduct workshops using Spoken Tutorials and give certificates. |
| 12:46 | Please contact us. |
| 12:49 | Please post your timed queries in this forum. |
| 12:54 | Please post your general queries in this forum. |
| 12:59 | The FOSSEE team coordinates the TBC project. |
| 13:03 | For more details, please visit these sites. |
| 13:07 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
| 13:13 | The script for this tutorial was contributed by Varshit Dubey (CoE Pune). |
| 13:20 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |