R/C2/Data-Manipulation-using-dplyr-Package/English-timed
From Script | Spoken-Tutorial
Revision as of 16:33, 5 June 2020 by Sakinashaikh (Talk | contribs)
Time | Narration |
00:01 | Welcome to this tutorial on Data manipulation using dplyr package. |
00:08 | In this tutorial, we will learn about, |
00:12 | Data manipulation |
00:15 | dplyr package |
00:18 | How to use filter and arrange functions |
00:23 | To understand this tutorial, you should know, |
00:27 | Basics of Statistics |
00:30 | Basics of ggplot2 package |
00:34 | Data frames |
00:36 | If not, please locate the relevant tutorials on R on this website. |
00:43 | This tutorial is recorded on, |
00:46 | Ubuntu Linux OS version 16.04 |
00:52 | R version 3.4.4 |
00:56 | RStudio version 1.1.463 |
01:02 | Install R version 3.2.0 or higher. |
01:08 | For this tutorial, we will use |
01:11 | A data frame moviesData.csv |
01:16 | And A script file myVis.R. |
01:21 | Please download these files from the Code files link of this tutorial. |
01:28 | I have downloaded and moved these files to DataVis folder. |
01:34 | This folder is located in myProject folder on my Desktop. |
01:41 | I have also set DataVis folder as my Working Directory. |
01:47 | In real life, it is rare that we get the data in exactly the right form we need. |
01:55 | Often we’ll need to |
01:56 | create some new variables or summaries |
02:02 | rename the variables |
02:05 | reorder the observations in order to make the data a little easier to work with. |
02:12 | We will learn how to achieve all this by using dplyr package. |
02:19 | dplyr is a package for data manipulation, written and maintained by Hadley Wickham. |
02:27 | It comprises many functions that perform mostly used data manipulation operations. |
02:36 | Let us switch to RStudio. |
02:40 | Open the script myVis.R in RStudio. |
02:46 | Let us run this script by clicking on the Source button. |
02:52 | movies data frame opens in the Source window. |
02:57 | This data frame will be used later in this tutorial. |
03:02 | Now, we will install dplyr package. Please make sure that you are connected to the Internet. |
03:12 | In the Console window, type the following command and press Enter. |
03:19 | The installation of the package takes a few seconds. |
03:24 | We will wait while the package is being installed. |
03:29 | To load this package, we will add the library at the top of the script. |
03:36 | Click on the script myVis.R |
03:40 | At the top of the script, type library and dplyr in parentheses. |
03:49 | Save the script and run this line by pressing Ctrl + Enter keys simultaneously. |
03:59 | Now we learn about some key functions in dplyr package: |
04:05 | filter- to select cases based on their values. |
04:10 | arrange - to reorder the cases. |
04:14 | select - to select variables based on their names. |
04:19 | mutate - to add new variables that are functions of existing variables. |
04:26 | summarise - to condense multiple values to a single value. |
04:32 | All these functions can be combined with group underscore by function. |
04:39 | It allows us to perform any operation by a group. |
04:45 | Let us switch to RStudio. |
04:49 | In the Source window, click on movies. |
04:53 | In the Source window, scroll from left to right. |
04:58 | This will enable us to see the remaining objects of movies data frame. |
05:05 | Suppose we want to filter the movies having genre as Comedy. |
05:12 | For this, we will use the filter function. |
05:16 | Click on the script myVis.R |
05:21 | In the Source window, type the following command. |
05:26 | Recall that, filter function in dplyr package allows us to select cases based on their values. |
05:36 | Inside the filter function, the first argument is the name of the data frame which is movies. |
05:44 | The second argument is the value by which we want to filter the movies data frame. |
05:52 | Save the script and run the current line. |
05:57 | Resulting data frame is stored in an object called moviesComedy in the Environment window. |
06:06 | Let us view the data frame moviesComedy to check whether it contains movies with genre as Comedy. |
06:15 | In the Source window, type the following command. |
06:19 | Run the current line. |
06:22 | moviesComedy data frame opens in the Source window. |
06:28 | All the movies having genre as Comedy have been filtered. |
06:35 | Let us close this data frame moviesComedy for now. |
06:40 | We can also use logical operators to combine two or more than two values. |
06:48 | In the Source window, click on movies. |
06:52 | Suppose we want to filter the movies with genre as either Comedy or Drama. |
07:00 | Click on the script myVis.R |
07:05 | In the Source window, type the following commands. |
07:11 | Here, we have two values by which we would like to filter movies data frame. |
07:18 | For this, we have used a logical OR operator. |
07:23 | Run the last two lines of code. |
07:27 | moviesComDr opens in the Source window. |
07:32 | The movies having genre as either Comedy or Drama have been filtered. |
07:39 | Let us close this data frame moviesComDr for now. |
07:45 | This filter function can also be written using the match operator. |
07:52 | In the Source window, type the following command. |
07:57 | %in% is used for value matching. |
08:03 | To know more about this operator, let us access the Help. |
08:08 | In the Console window, type the following command and press Enter. |
08:17 | Run the last two lines of code. |
08:21 | moviesComDrP opens in the Source window. |
08:26 | The movies having genre as either Comedy or Drama have been filtered. |
08:33 | Let us close this data frame moviesComDrP for now. |
08:39 | In the Source window, click on movies. |
08:43 | In the Source window, scroll from left to right. |
08:48 | Let us now filter movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5. |
09:01 | Click on the script myVis.R |
09:05 | In the Source window, type the following command. |
09:10 | Here, we have used a logical AND operator to include both conditions. |
09:17 | Save the script and run the last two lines of code. |
09:23 | moviesComIm opens in the Source window. |
09:28 | I will resize the Console window. |
09:32 | There are seven movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5. |
09:43 | Let us close this data frame moviesComIm for now. |
09:49 | In the Source window, click on movies. |
09:53 | Suppose, we want to arrange the movies in an ascending order of imdb underscore rating. |
10:02 | For this, we will use the arrange function. |
10:06 | Click on the script myVis.R |
10:10 | In the Source window, type the following command. |
10:15 | Run the last two lines of code. |
10:19 | moviesImA opens in the Source window. |
10:24 | In the Source window, scroll from left to right and locate the imdb underscore rating column. |
10:35 | The movies have been arranged in ascending order of imdb rating. |
10:41 | Now, let us say we want to arrange the movies in descending order of imdb rating. |
10:48 | For this, we use desc function. |
10:54 | Let us close this data frame moviesImA for now. |
11:00 | In the Source window, type the following command. |
11:05 | Run the last two lines of code. |
11:09 | moviesImD opens in the Source window. |
11:14 | In the Source window, scroll from left to right and locate the imdb underscore rating column. |
11:24 | The movies have been arranged in descending order of imdb rating. |
11:30 | Let us close this data frame moviesImD for now. |
11:36 | In the Source window, click on movies. |
11:40 | Suppose we want to arrange the movies both by genre and imdb rating. |
11:48 | Click on the script myVis.R |
11:52 | In the Source window, type the following commands. |
11:56 | Run the last two lines of code. |
12:00 | moviesGeIm opens in the Source window. |
12:05 | In the Source window, scroll from left to right. |
12:09 | Movies have been arranged both by genre and imdb rating. |
12:15 | Let us summarize what we have learnt. |
12:19 | In this tutorial, we have learnt about: |
12:23 | Data manipulation |
12:26 | dplyr package |
12:28 | How to use filter and arrange functions. |
12:32 | We now suggest an assignment. |
12:36 | Consider the built-in data set mtcars. Find the cars with hp greater than 100 and cyl equal to 3. |
12:47 | Arrange the mtcars data set based on mpg variable. |
12:53 | The video at the following link summarises the Spoken Tutorial project. |
12:58 | Please download and watch it. |
13:02 | We conduct workshops using Spoken Tutorials and give certificates. |
13:08 | Please contact us. |
13:11 | Please post your timed queries in this forum. |
13:16 | Please post your general queries in this forum. |
13:21 | The FOSSEE team coordinates the TBC project. |
13:24 | For more details, please visit these sites. |
13:29 | The Spoken Tutorial project is funded by , MHRD, Govt. of India |
13:36 | The script for this tutorial was contributed by Varshit Dubey (CoE Pune). |
13:44 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |