Difference between revisions of "R/C2/Indexing-and-Slicing-Data-Frames/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
 
(2 intermediate revisions by one other user not shown)
Line 4: Line 4:
  
 
'''Keywords''': R, RStudio, data frames, indexing, slicing, working directory, video tutorial.
 
'''Keywords''': R, RStudio, data frames, indexing, slicing, working directory, video tutorial.
 
 
  
 
{|border=1
 
{|border=1
Line 20: Line 18:
 
||Show slide
 
||Show slide
  
Learning Objectives
+
'''Learning Objectives'''
 
+
  
 
||In this tutorial, we will learn how to:
 
||In this tutorial, we will learn how to:
Line 28: Line 25:
 
* Create a '''subset''' from a''' data frame'''
 
* Create a '''subset''' from a''' data frame'''
 
* Retrieve '''data '''using double square brackets
 
* Retrieve '''data '''using double square brackets
 
  
 
|-
 
|-
 
||Show slide
 
||Show slide
  
Pre-requisites
+
'''Pre-requisites'''
 
+
 
+
  
 
||To understand this tutorial, you should have knowledge about  
 
||To understand this tutorial, you should have knowledge about  
Line 48: Line 42:
 
||Show slide
 
||Show slide
  
System Specifications
+
'''System Specifications'''
 
||This tutorial is recorded on
 
||This tutorial is recorded on
  
Line 60: Line 54:
 
||Show slide  
 
||Show slide  
  
 
+
'''Download Files'''
Download Files  
+
 
||For this tutorial, we will use the '''data frame CaptaincyData.csv''' and a '''script '''file '''mydataframe.R'''.  
 
||For this tutorial, we will use the '''data frame CaptaincyData.csv''' and a '''script '''file '''mydataframe.R'''.  
 
  
 
Download these files from the '''Code files''' link of this tutorial.
 
Download these files from the '''Code files''' link of this tutorial.
Line 72: Line 64:
 
Highlight '''CaptaincyData.csv '''and''' mydataframe.R '''in the folder '''myProject'''
 
Highlight '''CaptaincyData.csv '''and''' mydataframe.R '''in the folder '''myProject'''
 
||I have downloaded and moved these files to the folder '''myProject''' on my '''Desktop'''.  
 
||I have downloaded and moved these files to the folder '''myProject''' on my '''Desktop'''.  
 
  
 
I have also set this folder as my '''Working Directory.'''  
 
I have also set this folder as my '''Working Directory.'''  
Line 79: Line 70:
 
||
 
||
 
||Let us switch to '''Rstudio'''.
 
||Let us switch to '''Rstudio'''.
 
 
 
  
 
|-
 
|-
Line 87: Line 75:
  
 
Click '''mydataframe.R''' in '''RStudio'''
 
Click '''mydataframe.R''' in '''RStudio'''
 
  
 
Point to''' mydataframe.R''' in '''Rstudio.'''
 
Point to''' mydataframe.R''' in '''Rstudio.'''
 
||Let us open the '''script mydataframe.R''' in '''RStudio.'''
 
||Let us open the '''script mydataframe.R''' in '''RStudio.'''
  
For this, click on the '''script mydataframe.R'''  
+
For this, click on the '''script mydataframe.R'''.
  
 
+
'''Script mydataframe.R '''opens in '''Rstudio.'''
'''Script mydataframe.R '''opens in''' Rstudio.'''
+
  
 
|-
 
|-
Line 104: Line 90:
 
'''View(captaincy)'''
 
'''View(captaincy)'''
 
||Here, we have declared a''' variable captaincy''' to store and read '''CaptaincyData.csv'''.
 
||Here, we have declared a''' variable captaincy''' to store and read '''CaptaincyData.csv'''.
 
  
 
Also, '''View function''' is being used to see the contents of the file.
 
Also, '''View function''' is being used to see the contents of the file.
Line 111: Line 96:
 
||Highlight''' <-''' symbol in the '''Source''' window
 
||Highlight''' <-''' symbol in the '''Source''' window
 
||Remember, you may also use '''equal''' to sign in place of '''less than''' symbol followed by '''hyphen'''.  
 
||Remember, you may also use '''equal''' to sign in place of '''less than''' symbol followed by '''hyphen'''.  
 
  
 
However, we recommend '''less than''' symbol followed by '''hyphen'''.
 
However, we recommend '''less than''' symbol followed by '''hyphen'''.
 
  
 
For this, there is a shortcut in '''RStudio'''.  
 
For this, there is a shortcut in '''RStudio'''.  
 
  
 
Suppose we want to assign a value of 2 to a variable named '''testvar'''.  
 
Suppose we want to assign a value of 2 to a variable named '''testvar'''.  
 
  
 
|-
 
|-
Line 147: Line 128:
  
 
|-
 
|-
||
+
||Click on '''script''' '''mydataframe.R'''
 
||Click on the '''script''' '''mydataframe.R'''
 
||Click on the '''script''' '''mydataframe.R'''
  
 
|-
 
|-
 
||[RStudio]
 
||[RStudio]
 
  
 
'''captaincy[3,]'''
 
'''captaincy[3,]'''
 
||In the '''Source '''window, type '''captaincy '''
 
||In the '''Source '''window, type '''captaincy '''
 
  
 
Type '''capt''' and press '''Enter '''to make the variable '''captaincy '''autocomplete. Now''' '''within square brackets 3 followed by '''comma. '''And press '''Enter'''.  
 
Type '''capt''' and press '''Enter '''to make the variable '''captaincy '''autocomplete. Now''' '''within square brackets 3 followed by '''comma. '''And press '''Enter'''.  
 
  
 
Press '''Enter '''key''' '''at the end of every command.  
 
Press '''Enter '''key''' '''at the end of every command.  
  
Remember one of the most important features of '''RStudio '''include intelligent auto-completion of function names, packages, and R objects.  
+
Remember one of the most important features of '''RStudio''' include intelligent auto-completion of function names, packages, and R objects.  
  
 
|-
 
|-
Line 196: Line 174:
  
 
|-
 
|-
| |  
+
|| Point to the command.
| | So, to extract a '''column''', we shouldn’t use a '''comma''' within the square brackets.
+
|| So, to extract a '''column''', we shouldn’t use a '''comma''' within the square brackets.
  
 
When we extract '''data''' using row number or column number, it is known as numeric indexing.  
 
When we extract '''data''' using row number or column number, it is known as numeric indexing.  
  
 
|-
 
|-
| |  
+
|| Click on the '''captaincy data frame'''
| | In the '''Source '''window, click on the '''captaincy data frame'''.  
+
|| In the '''Source '''window, click on the '''captaincy data frame'''.  
  
 
|-
 
|-
Line 213: Line 191:
 
|-
 
|-
 
||[RStudio]
 
||[RStudio]
 
  
 
'''captaincy[c(2,3),]'''
 
'''captaincy[c(2,3),]'''
 
 
  
 
||To retrieve more than one '''row''', we use a numeric '''index vector'''. Click on the '''script mydataframe.R'''
 
||To retrieve more than one '''row''', we use a numeric '''index vector'''. Click on the '''script mydataframe.R'''
Line 236: Line 211:
  
 
|-
 
|-
||
+
||Click on the '''captaincy data frame'''
| | In the '''Source '''window, click on the '''captaincy data frame'''.  
+
|| In the '''Source '''window, click on the '''captaincy data frame'''.  
  
 
|-
 
|-
||
+
||Cursor on the interface.
 
||Now, we’ll find who has played 25 matches from the '''played''' column of '''captaincy data frame'''.  
 
||Now, we’ll find who has played 25 matches from the '''played''' column of '''captaincy data frame'''.  
  
Line 246: Line 221:
  
 
|-
 
|-
||
+
||Click on the '''script mydataframe.R'''
 
||Click on the '''script mydataframe.R'''
 
||Click on the '''script mydataframe.R'''
  
 
|-
 
|-
 
||[RStudio]
 
||[RStudio]
 
  
 
'''captaincy[captaincy$played==25,]'''
 
'''captaincy[captaincy$played==25,]'''
Line 277: Line 251:
 
|-
 
|-
 
||click on the '''captaincy data frame''' in '''Source''' window.
 
||click on the '''captaincy data frame''' in '''Source''' window.
| | In the '''Source '''window, click on the '''captaincy data frame'''.  
+
|| In the '''Source '''window, click on the '''captaincy data frame'''.  
  
 
|-
 
|-
||
+
||Cursor on the interface.
 
||Now let us learn how to get the values of any particular '''attribute''' for all the players.
 
||Now let us learn how to get the values of any particular '''attribute''' for all the players.
  
Line 299: Line 273:
 
|-
 
|-
 
||Highlight 1 in square brackets  
 
||Highlight 1 in square brackets  
 
 
 
 
||Please note that I have not used a '''comma''' inside the square brackets.  
 
||Please note that I have not used a '''comma''' inside the square brackets.  
  
Line 339: Line 310:
  
 
|-
 
|-
||
+
||In the '''Source '''window, click on the '''captaincy data frame'''
| | In the '''Source '''window, click on the '''captaincy data frame'''.  
+
|| In the '''Source '''window, click on the '''captaincy data frame'''.  
  
 
|-
 
|-
Line 347: Line 318:
  
 
|-
 
|-
||
+
||Click on the '''script mydataframe.R'''
 
||Click on the '''script mydataframe.R'''
 
||Click on the '''script mydataframe.R'''
  
Line 354: Line 325:
  
 
'''captaincy[c("names", "won")]'''
 
'''captaincy[c("names", "won")]'''
 
 
  
 
||In the '''Source '''window, type the following command and press '''Enter'''.
 
||In the '''Source '''window, type the following command and press '''Enter'''.
  
 
|-
 
|-
||Highlight''' c() '''
+
||Highlight '''c()'''
 
||Here''' c''' '''function '''is used''' '''to concatenate '''names '''and '''won'''.  
 
||Here''' c''' '''function '''is used''' '''to concatenate '''names '''and '''won'''.  
  
Line 376: Line 345:
  
 
|-
 
|-
||
+
||Point to '''captaincy data frame'''
 
||Now let us extract a '''subset '''from '''captaincy data frame'''.
 
||Now let us extract a '''subset '''from '''captaincy data frame'''.
  
Line 385: Line 354:
 
|-
 
|-
 
||click on the '''captaincy data frame''' in '''Source''' window.
 
||click on the '''captaincy data frame''' in '''Source''' window.
| | In the '''Source '''window, click on the '''captaincy data frame'''.  
+
|| In the '''Source '''window, click on the '''captaincy data frame'''.  
  
 
|-
 
|-
 
||[RStudio]
 
||[RStudio]
 
  
 
Highlight '''victory '''column of '''captaincy '''in the '''Source '''window
 
Highlight '''victory '''column of '''captaincy '''in the '''Source '''window
Line 401: Line 369:
  
 
Highlight '''names''', '''played '''and '''won '''columns of '''captaincy'''
 
Highlight '''names''', '''played '''and '''won '''columns of '''captaincy'''
 
 
  
 
||For this '''subset''', we shall only show the  
 
||For this '''subset''', we shall only show the  
Line 409: Line 375:
 
* number of matches '''played'''
 
* number of matches '''played'''
 
* number of matches '''won'''
 
* number of matches '''won'''
 
 
 
 
|-
 
|-
||
+
||Click on the '''script mydataframe.R'''
 
||Click on the '''script mydataframe.R'''
 
||Click on the '''script mydataframe.R'''
  
 
|-
 
|-
||
+
||Drag the source window to resize the '''Source''' window
 
||I will resize the '''Source''' window.  
 
||I will resize the '''Source''' window.  
  
Line 429: Line 392:
  
 
|-
 
|-
||
+
||Press '''Enter'''.
 
||You can press '''Enter''' after a''' comma''' for better visibility.  
 
||You can press '''Enter''' after a''' comma''' for better visibility.  
  
 
|-
 
|-
 
||[RStudio]
 
||[RStudio]
 
  
 
Highlight '''select '''
 
Highlight '''select '''
 
  
 
'''print(subData )'''
 
'''print(subData )'''
 
||The''' select parameter''' is used to select the required columns, '''names''', '''played''', and '''won'''.  
 
||The''' select parameter''' is used to select the required columns, '''names''', '''played''', and '''won'''.  
 
  
 
In the '''Source '''window, type '''print''' within '''parentheses subData '''.
 
In the '''Source '''window, type '''print''' within '''parentheses subData '''.
  
 
|-
 
|-
||Highlight two lines  
+
||Highlight two lines.
  
 
Press '''Ctrl + S''' >> Press '''Ctrl+Enter''' keys.
 
Press '''Ctrl + S''' >> Press '''Ctrl+Enter''' keys.
Line 452: Line 412:
  
 
|-
 
|-
||
+
||Drag the '''Console''' window.
 
||I am resizing the '''Console''' window to see the output properly.  
 
||I am resizing the '''Console''' window to see the output properly.  
  
Line 461: Line 421:
 
|-
 
|-
 
||Click on the '''captaincy data frame''' in '''Source''' window.
 
||Click on the '''captaincy data frame''' in '''Source''' window.
| | In the '''Source '''window, click on the '''captaincy data frame'''.  
+
|| In the '''Source '''window, click on the '''captaincy data frame'''.  
  
 
|-
 
|-
 
||[RStudio]  
 
||[RStudio]  
 
  
 
Highlight the third value of fourth column in '''captaincy '''
 
Highlight the third value of fourth column in '''captaincy '''
 
||Finally, let us learn how to extract a particular entry from some '''column''' of a '''data frame'''.  
 
||Finally, let us learn how to extract a particular entry from some '''column''' of a '''data frame'''.  
 
  
 
We will extract the third value in the fourth '''column''' of the '''captaincy data frame'''.
 
We will extract the third value in the fourth '''column''' of the '''captaincy data frame'''.
  
 
|-
 
|-
||
+
||Click on the '''script mydataframe.R'''.
||Click on the '''script mydataframe.R'''
+
||Click on the '''script mydataframe.R'''.
  
 
|-
 
|-
 
||[RStudio]
 
||[RStudio]
  
 
+
'''captaincy[[4]][3]'''
'''<nowiki>captaincy[[4]][3]</nowiki>'''
+
 
||In the '''Source '''window, type '''captaincy '''within double square brackets '''4''' within single square brackets '''3'''.  
 
||In the '''Source '''window, type '''captaincy '''within double square brackets '''4''' within single square brackets '''3'''.  
  
Line 509: Line 466:
 
* Create a '''subset''' from a '''data frame '''
 
* Create a '''subset''' from a '''data frame '''
 
* Retrieve data using double square brackets  
 
* Retrieve data using double square brackets  
 
 
 
 
|-
 
|-
||Slide 8
+
||Show slide
  
 
Assignment
 
Assignment
 
||We now suggest an assignment.
 
||We now suggest an assignment.
  
* Create a '''subset '''from '''captaincy data frame''' with the captains who have played > 20 matches and lost < 14 matches.  
+
* Create a '''subset '''from '''captaincy data frame''' with the captains who have played more than(>) 20 matches and lost less than(<) 14 matches.  
 
+
 
+
  
 
|-
 
|-

Latest revision as of 18:16, 11 September 2019

Title of script: Indexing and Slicing Data Frames

Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, data frames, indexing, slicing, working directory, video tutorial.

Visual Cue Narration
Show slide

Opening slide

Welcome to the spoken tutorial on Indexing and Slicing Data Frames.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Extract rows or columns from a data frame
  • Create a subset from a data frame
  • Retrieve data using double square brackets
Show slide

Pre-requisites

To understand this tutorial, you should have knowledge about
  • Data frames in R
  • R script in RStudio and
  • How to set Working directory in RStudio

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on
  • Ubuntu Linux OS version 16.04
  • R version 3.2.3
  • RStudio version 1.1.456

Install R version 3.2.0 or higher.

Show slide

Download Files

For this tutorial, we will use the data frame CaptaincyData.csv and a script file mydataframe.R.

Download these files from the Code files link of this tutorial.

[Computer screen]

Highlight CaptaincyData.csv and mydataframe.R in the folder myProject

I have downloaded and moved these files to the folder myProject on my Desktop.

I have also set this folder as my Working Directory.

Let us switch to Rstudio.
Point to script mydataframe.R.

Click mydataframe.R in RStudio

Point to mydataframe.R in Rstudio.

Let us open the script mydataframe.R in RStudio.

For this, click on the script mydataframe.R.

Script mydataframe.R opens in Rstudio.

[RStudio]

captaincy <- read.csv("CaptaincyData.csv")

View(captaincy)

Here, we have declared a variable captaincy to store and read CaptaincyData.csv.

Also, View function is being used to see the contents of the file.

Highlight <- symbol in the Source window Remember, you may also use equal to sign in place of less than symbol followed by hyphen.

However, we recommend less than symbol followed by hyphen.

For this, there is a shortcut in RStudio.

Suppose we want to assign a value of 2 to a variable named testvar.

[RStudio]

Alt + -

testvar <- 2

In the Console window, type testvar and then press

Alt and -(hyphen) keys simultaneously.

Then type 2 and press Enter.

Highlight mydataframe.R in the Source window Let us get back to the Source window.

Run the script mydataframe.R by clicking on the Source button.

[RStudio]

Highlight third row of captaincy

Now let us extract the contents of the third row of the captaincy data frame.
Click on script mydataframe.R Click on the script mydataframe.R
[RStudio]

captaincy[3,]

In the Source window, type captaincy

Type capt and press Enter to make the variable captaincy autocomplete. Now within square brackets 3 followed by comma. And press Enter.

Press Enter key at the end of every command.

Remember one of the most important features of RStudio include intelligent auto-completion of function names, packages, and R objects.

Highlight comma in the Source window We use a comma within square brackets when we wish to extract a row.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line by pressing Ctrl+Enter keys.
Highlight the output in the Console window The third row of the captaincy data frame is seen in the Console window.
Highlight comma in captaincy[3,] Now, let us run the same command without a comma.
[RStudio]

captaincy[3]

In the Source window, type captaincy then within square brackets 3.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and run this line only, as shown earlier.
Highlight the output in the Console window The contents of the third column of the data frame, are displayed on the Console window.
Point to the command. So, to extract a column, we shouldn’t use a comma within the square brackets.

When we extract data using row number or column number, it is known as numeric indexing.

Click on the captaincy data frame In the Source window, click on the captaincy data frame.
[RStudio]

Highlight second and third rows of captaincy

Let us now extract the contents of second and third rows of the captaincy data frame.
[RStudio]

captaincy[c(2,3),]

To retrieve more than one row, we use a numeric index vector. Click on the script mydataframe.R

In the Source window, type the following command and press Enter.

Highlight the c() function c function is being used to concatenate the second and third rows.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The second and third rows of captaincy data frame are seen in the Console window.
Click on the captaincy data frame In the Source window, click on the captaincy data frame.
Cursor on the interface. Now, we’ll find who has played 25 matches from the played column of captaincy data frame.

Extracting this type of information is known as logical indexing.

Click on the script mydataframe.R Click on the script mydataframe.R
[RStudio]

captaincy[captaincy$played==25,]

In the Source window, type captaincy

Within square brackets captaincy dollar sign played equal to equal to 25 comma. Press Enter.

Highlight dollar sign


Highlight two equal to signs

Remember, dollar sign allows you to extract elements by name.

Please note that there is no space between the two equal to signs.

Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The details of captain Dravid are shown in the Console window.
click on the captaincy data frame in Source window. In the Source window, click on the captaincy data frame.
Cursor on the interface. Now let us learn how to get the values of any particular attribute for all the players.

We will fetch the names of all the captains.

[RStudio]

Highlight first column of captaincy

captaincy[1]

For this, we need to know the values in the first column.

Click on the script mydataframe.R

In the Source window, type captaincy and within square brackets 1.

Highlight 1 in square brackets Please note that I have not used a comma inside the square brackets.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The names of the captains are seen in the Console window.
Usually we use column names instead of column numbers.
[RStudio]

captaincy["names"]

To know the names of the captains, type captaincy.

Within square brackets inside double quotes names

Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Point to the names in the Console window. Names of the captains are shown in the Console window.

Extracting data by column names is known as name indexing.

Highlight the broom icon in the Console window Clear the Console window by clicking on the broom icon.
In the Source window, click on the captaincy data frame In the Source window, click on the captaincy data frame.
cursor in the Source window. Now let us view the names of the captains along with the number of matches they have won.
Click on the script mydataframe.R Click on the script mydataframe.R
[RStudio]

captaincy[c("names", "won")]

In the Source window, type the following command and press Enter.
Highlight c() Here c function is used to concatenate names and won.
Highlight names and won Observe that names and won have been written within double quotes.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute this line.
Point to the names of the captains in the Console window. The names of captains and the number of matches won are seen in the Console window.
Point to captaincy data frame Now let us extract a subset from captaincy data frame.

We will create a subset of captains, who have won more than 30% of their matches.

This is called slicing a data frame.

click on the captaincy data frame in Source window. In the Source window, click on the captaincy data frame.
[RStudio]

Highlight victory column of captaincy in the Source window

Please note that there is a column named victory in the captaincy data frame.

For required subset of captains, their victory should be greater than 0.3.

[RStudio]


Highlight names, played and won columns of captaincy

For this subset, we shall only show the
  • names
  • number of matches played
  • number of matches won
Click on the script mydataframe.R Click on the script mydataframe.R
Drag the source window to resize the Source window I will resize the Source window.
[RStudio]

subData <- subset(captaincy, victory > 0.3, select = c("names", "played", "won"))

In the Source window, type the following command

Now, press Enter after the comma followed by 0 point 3.

Press Enter. You can press Enter after a comma for better visibility.
[RStudio]

Highlight select

print(subData )

The select parameter is used to select the required columns, names, played, and won.

In the Source window, type print within parentheses subData .

Highlight two lines.

Press Ctrl + S >> Press Ctrl+Enter keys.

Save the script and run these two lines.
Drag the Console window. I am resizing the Console window to see the output properly.
Highlight the output in the Console window The subData is shown in the Console window.
Click on the captaincy data frame in Source window. In the Source window, click on the captaincy data frame.
[RStudio]

Highlight the third value of fourth column in captaincy

Finally, let us learn how to extract a particular entry from some column of a data frame.

We will extract the third value in the fourth column of the captaincy data frame.

Click on the script mydataframe.R. Click on the script mydataframe.R.
[RStudio]

captaincy4[3]

In the Source window, type captaincy within double square brackets 4 within single square brackets 3.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The expected value 14 is seen in the Console window.
For more information on indexing and slicing data frames, please refer to the Additional materials section on this website.
Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt how to:
  • Extract rows or columns from a data frame
  • Create a subset from a data frame
  • Retrieve data using double square brackets
Show slide

Assignment

We now suggest an assignment.
  • Create a subset from captaincy data frame with the captains who have played more than(>) 20 matches and lost less than(<) 14 matches.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give Certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgement

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).


This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Sudhakarst