R/C2/Data-Manipulation-using-dplyr-Package/English-timed
From Script | Spoken-Tutorial
| Time | Narration |
| 00:01 | Welcome to this tutorial on Data manipulation using dplyr package. |
| 00:08 | In this tutorial, we will learn about, |
| 00:12 | Data manipulation |
| 00:15 | dplyr package |
| 00:18 | How to use filter and arrange functions |
| 00:23 | To understand this tutorial, you should know, |
| 00:27 | Basics of Statistics |
| 00:30 | Basics of ggplot2 package |
| 00:34 | Data frames |
| 00:36 | If not, please locate the relevant tutorials on R on this website. |
| 00:43 | This tutorial is recorded on, |
| 00:46 | Ubuntu Linux OS version 16.04 |
| 00:52 | R version 3.4.4 |
| 00:56 | RStudio version 1.1.463 |
| 01:02 | Install R version 3.2.0 or higher. |
| 01:08 | For this tutorial, we will use |
| 01:11 | A data frame moviesData.csv |
| 01:16 | And A script file myVis.R. |
| 01:21 | Please download these files from the Code files link of this tutorial. |
| 01:28 | I have downloaded and moved these files to DataVis folder. |
| 01:34 | This folder is located in myProject folder on my Desktop. |
| 01:41 | I have also set DataVis folder as my Working Directory. |
| 01:47 | In real life, it is rare that we get the data in exactly the right form we need. |
| 01:55 | Often we’ll need to |
| 01:56 | create some new variables or summaries |
| 02:02 | rename the variables |
| 02:05 | reorder the observations in order to make the data a little easier to work with. |
| 02:12 | We will learn how to achieve all this by using dplyr package. |
| 02:19 | dplyr is a package for data manipulation, written and maintained by Hadley Wickham. |
| 02:27 | It comprises many functions that perform mostly used data manipulation operations. |
| 02:36 | Let us switch to RStudio. |
| 02:40 | Open the script myVis.R in RStudio. |
| 02:46 | Let us run this script by clicking on the Source button. |
| 02:52 | movies data frame opens in the Source window. |
| 02:57 | This data frame will be used later in this tutorial. |
| 03:02 | Now, we will install dplyr package. Please make sure that you are connected to the Internet. |
| 03:12 | In the Console window, type the following command and press Enter. |
| 03:19 | The installation of the package takes a few seconds. |
| 03:24 | We will wait while the package is being installed. |
| 03:29 | To load this package, we will add the library at the top of the script. |
| 03:36 | Click on the script myVis.R |
| 03:40 | At the top of the script, type library and dplyr in parentheses. |
| 03:49 | Save the script and run this line by pressing Ctrl + Enter keys simultaneously. |
| 03:59 | Now we learn about some key functions in dplyr package: |
| 04:05 | filter- to select cases based on their values. |
| 04:10 | arrange - to reorder the cases. |
| 04:14 | select - to select variables based on their names. |
| 04:19 | mutate - to add new variables that are functions of existing variables. |
| 04:26 | summarise - to condense multiple values to a single value. |
| 04:32 | All these functions can be combined with group underscore by function. |
| 04:39 | It allows us to perform any operation by a group. |
| 04:45 | Let us switch to RStudio. |
| 04:49 | In the Source window, click on movies. |
| 04:53 | In the Source window, scroll from left to right. |
| 04:58 | This will enable us to see the remaining objects of movies data frame. |
| 05:05 | Suppose we want to filter the movies having genre as Comedy. |
| 05:12 | For this, we will use the filter function. |
| 05:16 | Click on the script myVis.R |
| 05:21 | In the Source window, type the following command. |
| 05:26 | Recall that, filter function in dplyr package allows us to select cases based on their values. |
| 05:36 | Inside the filter function, the first argument is the name of the data frame which is movies. |
| 05:44 | The second argument is the value by which we want to filter the movies data frame. |
| 05:52 | Save the script and run the current line. |
| 05:57 | Resulting data frame is stored in an object called moviesComedy in the Environment window. |
| 06:06 | Let us view the data frame moviesComedy to check whether it contains movies with genre as Comedy. |
| 06:15 | In the Source window, type the following command. |
| 06:19 | Run the current line. |
| 06:22 | moviesComedy data frame opens in the Source window. |
| 06:28 | All the movies having genre as Comedy have been filtered. |
| 06:35 | Let us close this data frame moviesComedy for now. |
| 06:40 | We can also use logical operators to combine two or more than two values. |
| 06:48 | In the Source window, click on movies. |
| 06:52 | Suppose we want to filter the movies with genre as either Comedy or Drama. |
| 07:00 | Click on the script myVis.R |
| 07:05 | In the Source window, type the following commands. |
| 07:11 | Here, we have two values by which we would like to filter movies data frame. |
| 07:18 | For this, we have used a logical OR operator. |
| 07:23 | Run the last two lines of code. |
| 07:27 | moviesComDr opens in the Source window. |
| 07:32 | The movies having genre as either Comedy or Drama have been filtered. |
| 07:39 | Let us close this data frame moviesComDr for now. |
| 07:45 | This filter function can also be written using the match operator. |
| 07:52 | In the Source window, type the following command. |
| 07:57 | %in% is used for value matching. |
| 08:03 | To know more about this operator, let us access the Help. |
| 08:08 | In the Console window, type the following command and press Enter. |
| 08:17 | Run the last two lines of code. |
| 08:21 | moviesComDrP opens in the Source window. |
| 08:26 | The movies having genre as either Comedy or Drama have been filtered. |
| 08:33 | Let us close this data frame moviesComDrP for now. |
| 08:39 | In the Source window, click on movies. |
| 08:43 | In the Source window, scroll from left to right. |
| 08:48 | Let us now filter movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5. |
| 09:01 | Click on the script myVis.R |
| 09:05 | In the Source window, type the following command. |
| 09:10 | Here, we have used a logical AND operator to include both conditions. |
| 09:17 | Save the script and run the last two lines of code. |
| 09:23 | moviesComIm opens in the Source window. |
| 09:28 | I will resize the Console window. |
| 09:32 | There are seven movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5. |
| 09:43 | Let us close this data frame moviesComIm for now. |
| 09:49 | In the Source window, click on movies. |
| 09:53 | Suppose, we want to arrange the movies in an ascending order of imdb underscore rating. |
| 10:02 | For this, we will use the arrange function. |
| 10:06 | Click on the script myVis.R |
| 10:10 | In the Source window, type the following command. |
| 10:15 | Run the last two lines of code. |
| 10:19 | moviesImA opens in the Source window. |
| 10:24 | In the Source window, scroll from left to right and locate the imdb underscore rating column. |
| 10:35 | The movies have been arranged in ascending order of imdb rating. |
| 10:41 | Now, let us say we want to arrange the movies in descending order of imdb rating. |
| 10:48 | For this, we use desc function. |
| 10:54 | Let us close this data frame moviesImA for now. |
| 11:00 | In the Source window, type the following command. |
| 11:05 | Run the last two lines of code. |
| 11:09 | moviesImD opens in the Source window. |
| 11:14 | In the Source window, scroll from left to right and locate the imdb underscore rating column. |
| 11:24 | The movies have been arranged in descending order of imdb rating. |
| 11:30 | Let us close this data frame moviesImD for now. |
| 11:36 | In the Source window, click on movies. |
| 11:40 | Suppose we want to arrange the movies both by genre and imdb rating. |
| 11:48 | Click on the script myVis.R |
| 11:52 | In the Source window, type the following commands. |
| 11:56 | Run the last two lines of code. |
| 12:00 | moviesGeIm opens in the Source window. |
| 12:05 | In the Source window, scroll from left to right. |
| 12:09 | Movies have been arranged both by genre and imdb rating. |
| 12:15 | Let us summarize what we have learnt. |
| 12:19 | In this tutorial, we have learnt about: |
| 12:23 | Data manipulation |
| 12:26 | dplyr package |
| 12:28 | How to use filter and arrange functions. |
| 12:32 | We now suggest an assignment. |
| 12:36 | Consider the built-in data set mtcars. Find the cars with hp greater than 100 and cyl equal to 3. |
| 12:47 | Arrange the mtcars data set based on mpg variable. |
| 12:53 | The video at the following link summarises the Spoken Tutorial project. |
| 12:58 | Please download and watch it. |
| 13:02 | We conduct workshops using Spoken Tutorials and give certificates. |
| 13:08 | Please contact us. |
| 13:11 | Please post your timed queries in this forum. |
| 13:16 | Please post your general queries in this forum. |
| 13:21 | The FOSSEE team coordinates the TBC project. |
| 13:24 | For more details, please visit these sites. |
| 13:29 | The Spoken Tutorial project is funded by , MHRD, Govt. of India |
| 13:36 | The script for this tutorial was contributed by Varshit Dubey (CoE Pune). |
| 13:44 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |