R/C2/Indexing-and-Slicing-Data-Frames/English

From Script | Spoken-Tutorial
Revision as of 19:09, 11 February 2019 by Sudhakarst (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of script: Indexing and Slicing Data Frames

Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, data frames, indexing, slicing, working directory, video tutorial.


Visual Cue Narration
Show slide

Opening slide

Welcome to the spoken tutorial on Indexing and Slicing Data Frames.
Show slide

Learning Objectives


In this tutorial, we will learn how to:*
Extract rows or columns from a data frame
  • Create a subset from a data frame
  • Retrieve data using double square brackets


Show slide

Pre-requisites


To understand this tutorial, you should have knowledge about *
Data frames in R
  • R script in RStudio and
  • How to set Working directory in RStudio


If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on*
Ubuntu Linux OS version 16.04
  • R version 3.2.3
  • RStudio version 1.1.456


Install R version 3.2.0 or higher.

Show slide


Download Files

For this tutorial, we will use the data frame CaptaincyData.csv and a script file mydataframe.R.


Download these files from the Code files link of this tutorial.

[Computer screen]

Highlight CaptaincyData.csv and mydataframe.R in the folder myProject

I have downloaded and moved these files to the folder myProject on my Desktop.


I have also set this folder as my Working Directory.

Let us switch to RStudio.
Point to script mydataframe.R.

Click mydataframe.R in RStudio


Point to mydataframe.R in Rstudio.

Let us open the script mydataframe.R in RStudio.

For this, click on the script mydataframe.R


Script mydataframe.R opens in Rstudio.

[RStudio]

captaincy <- read.csv("CaptaincyData.csv")

View(captaincy)

Here, we have declared a variable captaincy to store and read CaptaincyData.csv.


Also, View function is being used to see the contents of the file.

Highlight <- symbol in the Source window Remember, you may also use equal to sign in place of less than symbol followed by hyphen.


However, we recommend less than symbol followed by hyphen.


For this, there is a shortcut in RStudio.


Suppose we want to assign a value of 2 to a variable named testvar.


[RStudio]

Alt + -

testvar <- 2

In the Console window, type testvar and then press

Alt and -(hyphen) keys simultaneously.

Then type 2 and press Enter.

Highlight mydataframe.R in the Source window Let us get back to the Source window.

Run the script mydataframe.R by clicking on the Source button.

[RStudio]

Highlight third row of captaincy

Now let us extract the contents of the third row of the captaincy data frame.
Click on the script mydataframe.R
[RStudio]


captaincy[3,]

In the Source window, type captaincy


Type capt and press Enter to make the variable captaincy autocomplete. Now within square brackets 3 followed by comma. And press Enter.


Press Enter key at the end of every command.

Remember one of the most important features of RStudio include intelligent auto-completion of function names, packages, and R objects.

Highlight comma in the Source window We use a comma within square brackets when we wish to extract a row.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line by pressing Ctrl+Enter keys.
Highlight the output in the Console window The third row of the captaincy data frame is seen in the Console window.
Highlight comma in captaincy[3,] Now, let us run the same command without a comma.
[RStudio]

captaincy[3]

In the Source window, type captaincy then within square brackets 3.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and run this line only, as shown earlier.
Highlight the output in the Console window The contents of the third column of the data frame, are displayed on the Console window.
So, to extract a column, we shouldn’t use a comma within the square brackets.

When we extract data using row number or column number, it is known as numeric indexing.

In the Source window, click on the captaincy data frame.
[RStudio]

Highlight second and third rows of captaincy

Let us now extract the contents of second and third rows of the captaincy data frame.
[RStudio]


captaincy[c(2,3),]


To retrieve more than one row, we use a numeric index vector. Click on the script mydataframe.R

In the Source window, type the following command and press Enter.

Highlight the c() function c function is being used to concatenate the second and third rows.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The second and third rows of captaincy data frame are seen in the Console window.
In the Source window, click on the captaincy data frame.
Now, we’ll find who has played 25 matches from the played column of captaincy data frame.

Extracting this type of information is known as logical indexing.

Click on the script mydataframe.R
[RStudio]


captaincy[captaincy$played==25,]

In the Source window, type captaincy

Within square brackets captaincy dollar sign played equal to equal to 25 comma. Press Enter.

Highlight dollar sign


Highlight two equal to signs

Remember, dollar sign allows you to extract elements by name.

Please note that there is no space between the two equal to signs.

Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The details of captain Dravid are shown in the Console window.
click on the captaincy data frame in Source window. In the Source window, click on the captaincy data frame.
Now let us learn how to get the values of any particular attribute for all the players.

We will fetch the names of all the captains.

[RStudio]

Highlight first column of captaincy

captaincy[1]

For this, we need to know the values in the first column.

Click on the script mydataframe.R

In the Source window, type captaincy and within square brackets 1.

Highlight 1 in square brackets


Please note that I have not used a comma inside the square brackets.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The names of the captains are seen in the Console window.
Usually we use column names instead of column numbers.
[RStudio]

captaincy["names"]

To know the names of the captains, type captaincy.

Within square brackets inside double quotes names

Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Point to the names in the Console window. Names of the captains are shown in the Console window.

Extracting data by column names is known as name indexing.

Highlight the broom icon in the Console window Clear the Console window by clicking on the broom icon.
In the Source window, click on the captaincy data frame.
cursor in the Source window. Now let us view the names of the captains along with the number of matches they have won.
Click on the script mydataframe.R
[RStudio]

captaincy[c("names", "won")]


In the Source window, type the following command and press Enter.
Highlight c() Here c function is used to concatenate names and won.
Highlight names and won Observe that names and won have been written within double quotes.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute this line.
Point to the names of the captains in the Console window. The names of captains and the number of matches won are seen in the Console window.
Now let us extract a subset from captaincy data frame.

We will create a subset of captains, who have won more than 30% of their matches.

This is called slicing a data frame.

click on the captaincy data frame in Source window. In the Source window, click on the captaincy data frame.
[RStudio]


Highlight victory column of captaincy in the Source window

Please note that there is a column named victory in the captaincy data frame.

For required subset of captains, their victory should be greater than 0.3.

[RStudio]


Highlight names, played and won columns of captaincy


For this subset, we shall only show the *
names
  • number of matches played
  • number of matches won


Click on the script mydataframe.R
I will resize the Source window.
[RStudio]

subData <- subset(captaincy, victory > 0.3, select = c("names", "played", "won"))

In the Source window, type the following command

Now, press Enter after the comma followed by 0 point 3.

You can press Enter after a comma for better visibility.
[RStudio]


Highlight select


print(subData )

The select parameter is used to select the required columns, names, played, and won.


In the Source window, type print within parentheses subData .

Highlight two lines

Press Ctrl + S >> Press Ctrl+Enter keys.

Save the script and run these two lines.
I am resizing the Console window to see the output properly.
Highlight the output in the Console window The subData is shown in the Console window.
Click on the captaincy data frame in Source window. In the Source window, click on the captaincy data frame.
[RStudio]


Highlight the third value of fourth column in captaincy

Finally, let us learn how to extract a particular entry from some column of a data frame.


We will extract the third value in the fourth column of the captaincy data frame.

Click on the script mydataframe.R
[RStudio]


captaincy[[4]][3]

In the Source window, type captaincy within double square brackets 4 within single square brackets 3.
Press Ctrl + S >> Press Ctrl+Enter keys. Save the script and execute the current line.
Highlight the output in the Console window The expected value 14 is seen in the Console window.
For more information on indexing and slicing data frames, please refer to the Additional materials section on this website.
Let us summarize what we have learnt.
Show slide

Summary

In this tutorial, we have learnt how to:*
Extract rows or columns from a data frame
  • Create a subset from a data frame
  • Retrieve data using double square brackets


Slide 8

Assignment

We now suggest an assignment.*
Create a subset from captaincy data frame with the captains who have played > 20 matches and lost < 14 matches.


Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give Certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgement

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).


This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Sudhakarst