R/C2/Data-Manipulation-using-dplyr-Package/English-timed

From Script | Spoken-Tutorial
Revision as of 16:29, 22 May 2020 by Sakinashaikh (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Time Narration
00:01 Welcome to this tutorial on Data manipulation using dplyr package.
00:08 In this tutorial, we will learn about,
00:12 Data manipulation
00:15 dplyr package
00:18 How to use filter and arrange functions
00:23 To understand this tutorial, you should know,
00:27 Basics of Statistics
00:30 Basics of ggplot2 package
00:34 Data frames
00:36 If not, please locate the relevant tutorials on R on this website.
00:43 This tutorial is recorded on,
00:46 Ubuntu Linux OS version 16.04
00:52 R version 3.4.4
00:56 RStudio version 1.1.463
01:02 Install R version 3.2.0 or higher.
01:08 For this tutorial, we will use
01:11 A data frame moviesData.csv
01: 16 And A script file myVis.R.
01:21 Please download these files from the Code files link of this tutorial.
01:28 I have downloaded and moved these files to DataVis folder.
01:34 This folder is located in myProject folder on my Desktop.
01:41 I have also set DataVis folder as my Working Directory.
01:47 In real life, it is rare that we get the data in exactly the right form we need.
01:55 Often we’ll need to
01:56 create some new variables or summaries
02:02 rename the variables
02:05 reorder the observations in order to make the data a little easier to work with.
02:12 We will learn how to achieve all this by using dplyr package.
02:19 dplyr is a package for data manipulation, written and maintained by Hadley Wickham.
02:27 It comprises many functions that perform mostly used data manipulation operations.
02:36 Let us switch to RStudio.
02:40 Open the script myVis.R in RStudio.
02:46 Let us run this script by clicking on the Source button.
02:52 movies data frame opens in the Source window.
02:57 This data frame will be used later in this tutorial.
03:02 Now, we will install dplyr package. Please make sure that you are connected to the Internet.
03:12 In the Console window, type the following command and press Enter.
03:19 The installation of the package takes a few seconds.
03:24 We will wait while the package is being installed.
03:29 To load this package, we will add the library at the top of the script.
03:36 Click on the script myVis.R
03:40 At the top of the script, type library and dplyr in parentheses.
03:49 Save the script and run this line by pressing Ctrl + Enter keys simultaneously.
03:59 Now we learn about some key functions in dplyr package:
04:05 filter- to select cases based on their values.
04:10 arrange - to reorder the cases.
04:14 select - to select variables based on their names.
04:19 mutate - to add new variables that are functions of existing variables.
04:26 summarise - to condense multiple values to a single value.
04:32 All these functions can be combined with group underscore by function.
04:39 It allows us to perform any operation by a group.
04:45 Let us switch to RStudio.
04:49 In the Source window, click on movies.
04:53 In the Source window, scroll from left to right.
04:58 This will enable us to see the remaining objects of movies data frame.
05:05 Suppose we want to filter the movies having genre as Comedy.
05:12 For this, we will use the filter function.
05:16 Click on the script myVis.R
05:21 In the Source window, type the following command.
05:26 Recall that, filter function in dplyr package allows us to select cases based on their values.
05:36 Inside the filter function, the first argument is the name of the data frame which is movies.
05:44 The second argument is the value by which we want to filter the movies data frame.
05:52 Save the script and run the current line.
05:57 Resulting data frame is stored in an object called moviesComedy in the Environment window.
06:06 Let us view the data frame moviesComedy to check whether it contains movies with genre as Comedy.
06:15 In the Source window, type the following command.
06:19 Run the current line.
06:22 moviesComedy data frame opens in the Source window.
06:28 All the movies having genre as Comedy have been filtered.
06:35 Let us close this data frame moviesComedy for now.
06:40 We can also use logical operators to combine two or more than two values.
06:48 In the Source window, click on movies.
06:52 Suppose we want to filter the movies with genre as either Comedy or Drama.
07:00 Click on the script myVis.R
07:05 In the Source window, type the following commands.
07:11 Here, we have two values by which we would like to filter movies data frame.
07:18 For this, we have used a logical OR operator.
07:23 Run the last two lines of code.
07:27 moviesComDr opens in the Source window.
07:32 The movies having genre as either Comedy or Drama have been filtered.
07:39 Let us close this data frame moviesComDr for now.
07:45 This filter function can also be written using the match operator.
07:52 In the Source window, type the following command.
07:57 %in% is used for value matching.
08:03 To know more about this operator, let us access the Help.
08:08 In the Console window, type the following command and press Enter.
08:17 Run the last two lines of code.
08:21 moviesComDrP opens in the Source window.
08:26 The movies having genre as either Comedy or Drama have been filtered.
08:33 Let us close this data frame moviesComDrP for now.
08:39 In the Source window, click on movies.
08:43 In the Source window, scroll from left to right.
08:48 Let us now filter movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5.
09:01 Click on the script myVis.R
09:05 In the Source window, type the following command.
09:10 Here, we have used a logical AND operator to include both conditions.
09:17 Save the script and run the last two lines of code.
09:23 moviesComIm opens in the Source window.
09:28 I will resize the Console window.
09:32 There are seven movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5.
09:43 Let us close this data frame moviesComIm for now.
09:49 In the Source window, click on movies.
09:53 Suppose, we want to arrange the movies in an ascending order of imdb underscore rating.
10:02 For this, we will use the arrange function.
10:06 Click on the script myVis.R
10:10 In the Source window, type the following command.
10:15 Run the last two lines of code.
10:19 moviesImA opens in the Source window.
10:24 In the Source window, scroll from left to right and locate the imdb underscore rating column.
10:35 The movies have been arranged in ascending order of imdb rating.
10:41 Now, let us say we want to arrange the movies in descending order of imdb rating.
10:48 For this, we use desc function.
10:54 Let us close this data frame moviesImA for now.
11:00 In the Source window, type the following command.
11:05 Run the last two lines of code.
11:09 moviesImD opens in the Source window.
11:14 In the Source window, scroll from left to right and locate the imdb underscore rating column.
11:24 The movies have been arranged in descending order of imdb rating.
11:30 Let us close this data frame moviesImD for now.
11:36 In the Source window, click on movies.
11:40 Suppose we want to arrange the movies both by genre and imdb rating.
11:48 Click on the script myVis.R
11:52 In the Source window, type the following commands.
11:56 Run the last two lines of code.
12:00 moviesGeIm opens in the Source window.
12:05 In the Source window, scroll from left to right.
12:09 Movies have been arranged both by genre and imdb rating.
12:15 Let us summarize what we have learnt.
12:19 In this tutorial, we have learnt about:
12:23 Data manipulation
12:26 dplyr package
12:28 How to use filter and arrange functions.
12:32 We now suggest an assignment.
12:36 Consider the built-in data set mtcars. Find the cars with hp greater than 100 and cyl equal to 3.
12:47 Arrange the mtcars data set based on mpg variable.
12:53 The video at the following link summarises the Spoken Tutorial project.
12:58 Please download and watch it.
13:02 We conduct workshops using Spoken Tutorials and give certificates.
13:08 Please contact us.
13:11 Please post your timed queries in this forum.
13:16 Please post your general queries in this forum.
13:21 The FOSSEE team coordinates the TBC project.
13:24 For more details, please visit these sites.
13:29 The Spoken Tutorial project is funded by , MHRD, Govt. of India
13:36 The script for this tutorial was contributed by Varshit Dubey (CoE Pune).
13:44 This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh