Difference between revisions of "R/C2/Data-Manipulation-using-dplyr-Package/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
(One intermediate revision by one other user not shown)
Line 12: Line 12:
  
 
Opening Slide
 
Opening Slide
|| Welcome to this tutorial on '''Data manipulation using dplyr'' ''package'''.
+
|| Welcome to this tutorial on '''Data manipulation using '''dplyr package'''.
 
|-  
 
|-  
 
|| Show slide
 
|| Show slide
Line 23: Line 23:
  
 
|-  
 
|-  
||  
+
|| Show slide
 
+
Show slide
+
  
 
Pre-requisites
 
Pre-requisites
 
|| To understand this tutorial, you should know,  
 
|| To understand this tutorial, you should know,  
* Basics of statistics
+
* Basics of Statistics
 
* Basics of '''ggplot2''' package
 
* Basics of '''ggplot2''' package
 
* Data frames  
 
* Data frames  
Line 38: Line 36:
  
 
System Specifications
 
System Specifications
|| This tutorial is recorded on
+
|| This tutorial is recorded on,
 
* '''Ubuntu Linux '''OS version '''16.04'''
 
* '''Ubuntu Linux '''OS version '''16.04'''
 
* '''R''' version '''3.4.4'''
 
* '''R''' version '''3.4.4'''
Line 49: Line 47:
 
Download Files
 
Download Files
 
|| For this tutorial, we will use
 
|| For this tutorial, we will use
* A '''data frame''' '''moviesData.csv'''
+
* A '''data frame moviesData.csv'''
 
* A '''script''' file '''myVis.R'''.
 
* A '''script''' file '''myVis.R'''.
  
Line 61: Line 59:
 
This folder is located in '''myProject''' folder on my '''Desktop'''.
 
This folder is located in '''myProject''' folder on my '''Desktop'''.
  
I have also set '''DataVis''' folder as my '''Working Directory.'''  
+
I have also set '''DataVis''' folder as my '''Working Directory'''.
 
|-  
 
|-  
 
|| Show slide
 
|| Show slide
Line 75: Line 73:
 
* create some new '''variables''' or '''summaries'''
 
* create some new '''variables''' or '''summaries'''
 
* rename the '''variables'''
 
* rename the '''variables'''
* reorder the observations in order to make the data a little easier to work with
+
* reorder the observations in order to make the data a little easier to work with.
  
 
|-  
 
|-  
||  
+
|| '''About dplyr package'''
 
|| We will learn how to achieve all this by using '''dplyr package'''.
 
|| We will learn how to achieve all this by using '''dplyr package'''.
 
|-  
 
|-  
Line 84: Line 82:
  
 
About dplyr Package
 
About dplyr Package
 
 
||  
 
||  
 
* '''dplyr''' is a '''package''' for '''data manipulation''', written and maintained by '''Hadley Wickham'''.
 
* '''dplyr''' is a '''package''' for '''data manipulation''', written and maintained by '''Hadley Wickham'''.
Line 104: Line 101:
 
This '''data frame''' will be used later in this tutorial.  
 
This '''data frame''' will be used later in this tutorial.  
 
|-  
 
|-  
||  
+
|| Cursor on the interface.
 
|| Now, we will install '''dplyr package'''. Please make sure that you are connected to the '''Internet'''.  
 
|| Now, we will install '''dplyr package'''. Please make sure that you are connected to the '''Internet'''.  
 
|-  
 
|-  
Line 128: Line 125:
  
 
Press '''Ctrl+Enter''' keys.
 
Press '''Ctrl+Enter''' keys.
|| At the top of the '''script''', type '''library '''and '''dplyr '''in parentheses.
+
|| At the top of the '''script''', type '''library''' and '''dplyr '''in parentheses.
  
Save the '''script '''and '''run''' this line by pressing '''Ctrl+Enter''' keys simultaneously.
+
Save the '''script '''and '''run''' this line by pressing '''Ctrl + Enter''' keys simultaneously.
|-  
+
|-
 
|| Show slide
 
|| Show slide
  
Line 137: Line 134:
  
 
|| Now we learn about some key '''functions''' in '''dplyr package''':  
 
|| Now we learn about some key '''functions''' in '''dplyr package''':  
* '''filter - '''to select ''cases'' based on their values.  
+
* '''filter'''- to select ''cases'' based on their values.  
* '''arrange - '''to reorder the '''cases.'''  
+
* '''arrange''' - to reorder the '''cases.'''  
* '''select - '''to select '''variables''' based on their names.
+
* '''select''' - to select '''variables''' based on their names.
* '''mutate - '''to add new '''variables''' that are '''functions''' of existing '''variables'''.
+
* '''mutate''' - to add new '''variables''' that are '''functions''' of existing '''variables'''.
  
 
|-  
 
|-  
Line 147: Line 144:
 
'''Functions''' in '''dplyr package'''
 
'''Functions''' in '''dplyr package'''
 
||  
 
||  
* '''summarise - '''to condense multiple values to a single value.
+
* '''summarise''' - to condense multiple values to a single value.
  
All these '''functions''' can be combined with '''group underscore by function'''. It allows us to perform any operation by a '''group'''.
+
All these '''functions''' can be combined with '''group underscore by function'''.  
 +
 
 +
It allows us to perform any operation by a '''group'''.
 
|-  
 
|-  
 
||  
 
||  
 
|| Let us switch to '''RStudio'''.
 
|| Let us switch to '''RStudio'''.
 
|-
 
|-
| | Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window  
| | In the '''Source''' window, click on '''movies'''.  
+
||In the '''Source''' window, click on '''movies'''.  
|-  
+
|-
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| In the '''Source''' window, scroll from left to right.  
 
|| In the '''Source''' window, scroll from left to right.  
Line 190: Line 189:
 
|-  
 
|-  
 
|| Highlight '''moviesComedy''' in the '''Environment''' window  
 
|| Highlight '''moviesComedy''' in the '''Environment''' window  
|| Resulting '''data frame''' is stored in an '''object''' called '''moviesComedy '''in the''' Environment window.'''
+
|| Resulting '''data frame''' is stored in an '''object''' called '''moviesComedy''' in the''' Environment window.'''
  
 
Let us view the '''data frame moviesComedy''' to check whether it contains '''movies''' with genre as '''Comedy'''.
 
Let us view the '''data frame moviesComedy''' to check whether it contains '''movies''' with genre as '''Comedy'''.
Line 209: Line 208:
 
|-  
 
|-  
 
|| Highlight '''moviesComedy''' in the Source window  
 
|| Highlight '''moviesComedy''' in the Source window  
|| Let us close this '''data frame moviesComedy '''for now.  
+
|| Let us close this '''data frame moviesComedy''' for now.  
 
|-  
 
|-  
 
|| Highlight '''filter''' in the '''Source''' window  
 
|| Highlight '''filter''' in the '''Source''' window  
 
|| We can also use '''logical''' operators to combine two or more than two values.  
 
|| We can also use '''logical''' operators to combine two or more than two values.  
 
|-
 
|-
| | Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window  
| | In the '''Source''' window, click on '''movies'''.  
+
|| In the '''Source''' window, click on '''movies'''.  
 
|-  
 
|-  
 
|| Highlight '''genre''' in the '''Source''' window  
 
|| Highlight '''genre''' in the '''Source''' window  
|| Suppose we want to filter the '''movies''' with '''genre''' as either '''Comedy '''or '''Drama'''.  
+
|| Suppose we want to filter the '''movies''' with '''genre''' as either '''Comedy''' or '''Drama'''.  
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
Line 241: Line 240:
 
|| '''Run''' the last two lines of code.  
 
|| '''Run''' the last two lines of code.  
 
|-
 
|-
| | Highlight '''moviesComDr''' in the '''Source''' window  
+
|| Highlight '''moviesComDr''' in the '''Source''' window  
| | '''moviesComDr '''opens in the '''Source''' window.  
+
|| '''moviesComDr '''opens in the '''Source''' window.  
  
 
The '''movies''' having genre as either '''Comedy '''or '''Drama '''have been filtered.  
 
The '''movies''' having genre as either '''Comedy '''or '''Drama '''have been filtered.  
 
|-  
 
|-  
 
|| Highlight '''moviesComDr''' in the '''Source''' window  
 
|| Highlight '''moviesComDr''' in the '''Source''' window  
|| Let us close this '''data frame moviesComDr '''for now.  
+
|| Let us close this '''data frame moviesComDr''' for now.  
 
|-
 
|-
| | Highlight '''moviesComDr <- filter(movies, genre == "Comedy" | genre == "Drama") '''in the '''Source''' window  
+
|| Highlight '''moviesComDr <- filter(movies, genre == "Comedy" | genre == "Drama") '''in the '''Source''' window  
| | This '''filter function''' can also be written using the '''match''' operator.  
+
|| This '''filter function''' can also be written using the '''match''' operator.  
 
|-
 
|-
| | [RStudio]
+
|| [RStudio]
  
 
'''moviesComDrP <- filter(movies,'''
 
'''moviesComDrP <- filter(movies,'''
Line 259: Line 258:
  
 
'''View(moviesComDrP)'''
 
'''View(moviesComDrP)'''
| | In the '''Source''' window, type the following '''command'''.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-
 
|-
| | Highlight '''%in%''' in the '''Source''' window  
+
|| Highlight '''%in%''' in the '''Source''' window  
| | '''%in%''' is used for value matching.  
+
|| '''%in%''' is used for value matching.  
 
|-
 
|-
| | [RStudio]
+
|| [RStudio]
  
 
'''help('%in%')'''
 
'''help('%in%')'''
| | To know more about this operator, let us access the '''Help'''.  
+
|| To know more about this operator, let us access the '''Help'''.  
  
 
In the '''Console''' window, type the following '''command''' and press '''Enter'''.  
 
In the '''Console''' window, type the following '''command''' and press '''Enter'''.  
|-
+
 
|  | Highlight '''Help''' window
+
|  | Match returns a '''vector''' of the positions of first matches of its first '''argument''' in its second.
+
 
|-  
 
|-  
 
|| Highlight the '''Run''' button in the '''Source''' window  
 
|| Highlight the '''Run''' button in the '''Source''' window  
Line 285: Line 282:
 
|| Let us close this '''data frame moviesComDrP '''for now.  
 
|| Let us close this '''data frame moviesComDrP '''for now.  
 
|-
 
|-
| | Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window  
| | In the '''Source''' window, click on '''movies'''.  
+
|| In the '''Source''' window, click on '''movies'''.  
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
Line 292: Line 289:
 
|-  
 
|-  
 
|| Highlight '''genre''' and '''imdb_rating''' in the '''Source''' window  
 
|| Highlight '''genre''' and '''imdb_rating''' in the '''Source''' window  
|| Let us now filter movies with genre as '''Comedy '''and '''imdb underscore rating '''greater than or equal to 7 point 5.
+
|| Let us now filter movies with genre as '''Comedy''' and '''imdb underscore rating '''greater than or equal to 7 point 5.
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
Line 314: Line 311:
 
|-  
 
|-  
 
|| Highlight '''moviesComIm''' in the '''Source''' window  
 
|| Highlight '''moviesComIm''' in the '''Source''' window  
|| '''moviesComIm '''opens in the '''Source''' window.  
+
|| '''moviesComIm''' opens in the '''Source''' window.  
  
 
I will resize the '''Console''' window.  
 
I will resize the '''Console''' window.  
  
There are seven movies with genre as '''Comedy '''and '''imdb underscore rating '''greater than or equal to 7 point 5.  
+
There are seven movies with genre as '''Comedy''' and '''imdb underscore rating''' greater than or equal to 7 point 5.  
 
|-  
 
|-  
 
|| Highlight '''moviesComIm''' in the '''Source''' window  
 
|| Highlight '''moviesComIm''' in the '''Source''' window  
 
|| Let us close this '''data frame moviesComIm '''for now.  
 
|| Let us close this '''data frame moviesComIm '''for now.  
 
|-
 
|-
| | Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window  
| | In the '''Source''' window, click on '''movies'''.  
+
|| In the '''Source''' window, click on '''movies'''.  
 
|-  
 
|-  
 
|| Highlight '''imdb_rating''' in the '''Source''' window  
 
|| Highlight '''imdb_rating''' in the '''Source''' window  
Line 334: Line 331:
 
|| Click on the '''script myVis.R'''
 
|| Click on the '''script myVis.R'''
 
|-
 
|-
| | [RStudio]
+
|| [RStudio]
  
 
'''moviesImA <- arrange(movies, imdb_rating) '''
 
'''moviesImA <- arrange(movies, imdb_rating) '''
  
 
'''View(moviesImA)'''
 
'''View(moviesImA)'''
| | In the '''Source''' window, type the following '''command'''.  
+
|| In the '''Source''' window, type the following '''command'''.  
 
|-  
 
|-  
 
|| Highlight the '''Run''' button in the '''Source''' window  
 
|| Highlight the '''Run''' button in the '''Source''' window  
Line 358: Line 355:
 
|-  
 
|-  
 
|| Highlight '''moviesImA''' in the '''Source''' window  
 
|| Highlight '''moviesImA''' in the '''Source''' window  
|| Let us close this '''data frame moviesImA '''for now.  
+
|| Let us close this '''data frame moviesImA''' for now.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 371: Line 368:
 
|-  
 
|-  
 
|| Highlight '''moviesImD''' in the '''Source''' window  
 
|| Highlight '''moviesImD''' in the '''Source''' window  
|| '''moviesImD '''opens in the '''Source''' window.  
+
|| '''moviesImD''' opens in the '''Source''' window.  
 
|-  
 
|-  
 
|| Highlight '''imdb_rating''' in the '''Source''' window
 
|| Highlight '''imdb_rating''' in the '''Source''' window
|| In the '''Source''' window, scroll from left to right and locate the '''imdb underscore rating '''column.  
+
|| In the '''Source''' window, scroll from left to right and locate the '''imdb underscore rating''' column.  
  
 
The movies have been arranged in descending order of '''imdb rating'''.  
 
The movies have been arranged in descending order of '''imdb rating'''.  
 
|-  
 
|-  
 
|| Highlight '''moviesImD''' in the '''Source''' window  
 
|| Highlight '''moviesImD''' in the '''Source''' window  
|| Let us close this '''data frame moviesImD '''for now.  
+
|| Let us close this '''data frame moviesImD''' for now.  
 
|-
 
|-
| | Highlight '''movies''' in the '''Source''' window  
+
|| Highlight '''movies''' in the '''Source''' window  
| | In the '''Source''' window, click on '''movies'''.  
+
|| In the '''Source''' window, click on '''movies'''.  
 
|-
 
|-
| | Highlight '''genre''' and '''imdb_rating''' in the '''Source''' window
+
|| Highlight '''genre''' and '''imdb_rating''' in the '''Source''' window
| | Suppose we want to arrange the movies both by genre and '''imdb rating'''.
+
|| Suppose we want to arrange the movies both by genre and '''imdb rating'''.
 
|-  
 
|-  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Highlight the '''script myVis.R''' in the '''Source''' window  
 
|| Click on the '''script myVis.R'''
 
|| Click on the '''script myVis.R'''
 
|-
 
|-
| | [RStudio]
+
|| [RStudio]
  
 
'''moviesGeIm <- arrange(movies, genre, imdb_rating)'''
 
'''moviesGeIm <- arrange(movies, genre, imdb_rating)'''
  
 
'''View(moviesGeIm)'''
 
'''View(moviesGeIm)'''
| | In the '''Source''' window, type the following '''commands'''.  
+
|| In the '''Source''' window, type the following '''commands'''.  
 
|-  
 
|-  
 
|| Highlight the '''Run''' button in the '''Source''' window  
 
|| Highlight the '''Run''' button in the '''Source''' window  
Line 401: Line 398:
 
|-  
 
|-  
 
|| Highlight '''moviesGeIm''' in the '''Source''' window  
 
|| Highlight '''moviesGeIm''' in the '''Source''' window  
|| '''moviesGeIm '''opens in the '''Source''' window.  
+
|| '''moviesGeIm''' opens in the '''Source''' window.  
 
|-  
 
|-  
 
|| Highlight the scroll bar in the '''Source''' window  
 
|| Highlight the scroll bar in the '''Source''' window  
Line 411: Line 408:
 
|| Let us summarize what we have learnt.
 
|| Let us summarize what we have learnt.
 
|-  
 
|-  
||  
+
||Show slide
 
+
Show slide
+
  
Summary
+
'''Summary'''
  
 
|| In this tutorial, we have learnt about:
 
|| In this tutorial, we have learnt about:
 
* '''Data manipulation'''
 
* '''Data manipulation'''
 
* '''dplyr package'''
 
* '''dplyr package'''
* How to use '''filter''' and '''arrange functions'''
+
* How to use '''filter''' and '''arrange functions'''.
  
 
|-  
 
|-  

Latest revision as of 11:21, 28 August 2019

Title of the script: Data Manipulation using dplyr package

Author: Varshit Dubey (CoE Pune) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, data manipulation, dplyr, filter, video tutorial

Visual Cue Narration
Show slide

Opening Slide

Welcome to this tutorial on Data manipulation using dplyr package.
Show slide

Learning Objective

In this tutorial, we will learn about,
  • Data manipulation
  • dplyr package
  • How to use filter and arrange functions
Show slide

Pre-requisites

To understand this tutorial, you should know,
  • Basics of Statistics
  • Basics of ggplot2 package
  • Data frames

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on,
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.463

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use
  • A data frame moviesData.csv
  • A script file myVis.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight moviesData.csv and myVis.R in the folder DataVis

I have downloaded and moved these files to DataVis folder.

This folder is located in myProject folder on my Desktop.

I have also set DataVis folder as my Working Directory.

Show slide

Need for Data Manipulation

In real life, it is rare that we get the data in exactly the right form we need.
Show slide

Need for Data Manipulation

Often we’ll need to
  • create some new variables or summaries
  • rename the variables
  • reorder the observations in order to make the data a little easier to work with.
About dplyr package We will learn how to achieve all this by using dplyr package.
Show slide

About dplyr Package

  • dplyr is a package for data manipulation, written and maintained by Hadley Wickham.
  • It comprises many functions that perform mostly used data manipulation operations.
Let us switch to RStudio.
Highlight myVis.R in the Files window of RStudio Open the script myVis.R in RStudio.
Highlight the Source button Let us run this script by clicking on the Source button.
Highlight movies in the Source window movies data frame opens in the Source window.

This data frame will be used later in this tutorial.

Cursor on the interface. Now, we will install dplyr package. Please make sure that you are connected to the Internet.
[RStudio]

install.packages("dplyr")

In the Console window, type the following command and press Enter.
Highlight the red dot in the Console window The installation of the package takes a few seconds.

We will wait while the package is being installed.

Click at the top of the script myVis.R To load this package, we will add the library at the top of the script.
Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

library(dplyr)

Press Ctrl+Enter keys.

At the top of the script, type library and dplyr in parentheses.

Save the script and run this line by pressing Ctrl + Enter keys simultaneously.

Show slide

Functions in dplyr package

Now we learn about some key functions in dplyr package:
  • filter- to select cases based on their values.
  • arrange - to reorder the cases.
  • select - to select variables based on their names.
  • mutate - to add new variables that are functions of existing variables.
Show slide

Functions in dplyr package

  • summarise - to condense multiple values to a single value.

All these functions can be combined with group underscore by function.

It allows us to perform any operation by a group.

Let us switch to RStudio.
Highlight movies in the Source window In the Source window, click on movies.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.

This will enable us to see the remaining objects of movies data frame.

Highlight genre in the Source window Suppose we want to filter the movies having genre as Comedy.

For this, we will use the filter function.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesComedy <- filter(movies,

genre == "Comedy")

In the Source window, type the following command.
Highlight filter in the Source window Recall that, filter function in dplyr package allows us to select cases based on their values.
Highlight movies after filter in the Source window Inside the filter function, the first argument is the name of the data frame which is movies.
Highlight genre == "Comedy" in the Source window The second argument is the value by which we want to filter the movies data frame.
Highlight the Run button in the Source window Save the script and run the current line.
Highlight moviesComedy in the Environment window Resulting data frame is stored in an object called moviesComedy in the Environment window.

Let us view the data frame moviesComedy to check whether it contains movies with genre as Comedy.

[RStudio]

View(moviesComedy)

In the Source window, type the following command.
Highlight the Run button in the Source window Run the current line.
Highlight moviesComedy in the Source window moviesComedy data frame opens in the Source window.
Highlight genre in the Source window All the movies having genre as Comedy have been filtered.
Highlight moviesComedy in the Source window Let us close this data frame moviesComedy for now.
Highlight filter in the Source window We can also use logical operators to combine two or more than two values.
Highlight movies in the Source window In the Source window, click on movies.
Highlight genre in the Source window Suppose we want to filter the movies with genre as either Comedy or Drama.
Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesComDr <- filter(movies,

genre == "Comedy" | genre == "Drama")

View(moviesComDr)

In the Source window, type the following commands.
Highlight filter in the Source widow Here, we have two values by which we would like to filter movies data frame.
Highlight | in the Source window For this, we have used a logical OR operator.
Highlight the Run button in the Source window Run the last two lines of code.
Highlight moviesComDr in the Source window moviesComDr opens in the Source window.

The movies having genre as either Comedy or Drama have been filtered.

Highlight moviesComDr in the Source window Let us close this data frame moviesComDr for now.
Highlight moviesComDr <- filter(movies, genre == "Comedy" | genre == "Drama") in the Source window This filter function can also be written using the match operator.
[RStudio]

moviesComDrP <- filter(movies,

genre %in% c("Comedy", "Drama"))

View(moviesComDrP)

In the Source window, type the following command.
Highlight %in% in the Source window %in% is used for value matching.
[RStudio]

help('%in%')

To know more about this operator, let us access the Help.

In the Console window, type the following command and press Enter.

Highlight the Run button in the Source window Run the last two lines of code.
Highlight moviesComDrP in the Source window moviesComDrP opens in the Source window.

The movies having genre as either Comedy or Drama have been filtered.

Highlight moviesComDrP in the Source window Let us close this data frame moviesComDrP for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.
Highlight genre and imdb_rating in the Source window Let us now filter movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5.
Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesComIm <- filter(movies,

genre == "Comedy" & imdb_rating >= 7.5)

View(moviesComIm)

In the Source window, type the following command.
Highlight genre == "Comedy" & imdb_rating >= 7.5 in the Source window Here, we have used a logical AND operator to include both conditions.
Highlight the Run button in the Source window Save the script and run the last two lines of code.
Highlight moviesComIm in the Source window moviesComIm opens in the Source window.

I will resize the Console window.

There are seven movies with genre as Comedy and imdb underscore rating greater than or equal to 7 point 5.

Highlight moviesComIm in the Source window Let us close this data frame moviesComIm for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight imdb_rating in the Source window Suppose, we want to arrange the movies in an ascending order of imdb underscore rating.

For this, we will use the arrange function.

Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesImA <- arrange(movies, imdb_rating)

View(moviesImA)

In the Source window, type the following command.
Highlight the Run button in the Source window Run the last two lines of code.
Highlight moviesImA in the Source window moviesImA opens in the Source window.
Highlight imdb_rating in the Source window In the Source window, scroll from left to right and locate the imdb underscore rating column.

The movies have been arranged in ascending order of imdb underscore rating.

Highlight imdb_rating in the Source window Now, let us say we want to arrange the movies in descending order of imdb rating.

For this, we use desc function.

Highlight moviesImA in the Source window Let us close this data frame moviesImA for now.
[RStudio]

moviesImD <- arrange(movies, desc(imdb_rating))

View(moviesImD)

In the Source window, type the following command.
Highlight the Run button in the Source window Run the last two lines of code.
Highlight moviesImD in the Source window moviesImD opens in the Source window.
Highlight imdb_rating in the Source window In the Source window, scroll from left to right and locate the imdb underscore rating column.

The movies have been arranged in descending order of imdb rating.

Highlight moviesImD in the Source window Let us close this data frame moviesImD for now.
Highlight movies in the Source window In the Source window, click on movies.
Highlight genre and imdb_rating in the Source window Suppose we want to arrange the movies both by genre and imdb rating.
Highlight the script myVis.R in the Source window Click on the script myVis.R
[RStudio]

moviesGeIm <- arrange(movies, genre, imdb_rating)

View(moviesGeIm)

In the Source window, type the following commands.
Highlight the Run button in the Source window Run the last two lines of code.
Highlight moviesGeIm in the Source window moviesGeIm opens in the Source window.
Highlight the scroll bar in the Source window In the Source window, scroll from left to right.

Movies have been arranged both by genre and imdb underscore rating.

Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt about:
  • Data manipulation
  • dplyr package
  • How to use filter and arrange functions.
Show slide

Assignment

We now suggest an assignment.
  • Consider the built-in data set mtcars. Find the cars with hp greater than 100 and cyl equal to 3.
  • Arrange the mtcars data set based on mpg variable.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Varshit Dubey (CoE Pune).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst