Difference between revisions of "R/C2/Introduction-to-Data-Frames-in-R/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with ''''FOSS:''' R through RStudio           R version 3.0.0 (2013-04-03) and RStudio version 0.97.336 '''Tutorial Title:''' Introduction to data frames in …')
 
 
Line 29: Line 29:
 
Opening slide
 
Opening slide
 
|Welcome to the spoken tutorial on introduction to '''data frames''' in '''R'''.
 
|Welcome to the spoken tutorial on introduction to '''data frames''' in '''R'''.
 
  
 
I am Kannan Moudgalya.
 
I am Kannan Moudgalya.
Line 37: Line 36:
  
 
Learning Objectives
 
Learning Objectives
 
 
 
 
 
 
 
 
  
 
|In this tutorial, we will explain  
 
|In this tutorial, we will explain  
Line 59: Line 50:
  
 
Prerequisites
 
Prerequisites
 
 
 
 
 
 
 
 
  
 
|To understand this tutorial,  
 
|To understand this tutorial,  
Line 77: Line 60:
 
|-
 
|-
 
|Prerequisites (Ctd)
 
|Prerequisites (Ctd)
 
 
 
 
|
 
|
 
* Please locate this tutorial on our web page, '''spoken hyphen tutorial dot org'''
 
* Please locate this tutorial on our web page, '''spoken hyphen tutorial dot org'''
Line 88: Line 68:
  
 
Systems Requirements
 
Systems Requirements
 
 
 
 
|
 
|
 
* I am using version '''3''' of '''R'''
 
* I am using version '''3''' of '''R'''
Line 103: Line 80:
  
 
|-
 
|-
|<nowiki>sample <- c(-2,1,5.8)</nowiki>
+
|sample <- c(-2,1,5.8)
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
  
 
|Let us begin with the '''“c” command''' to create a '''vector'''
 
|Let us begin with the '''“c” command''' to create a '''vector'''
Line 144: Line 105:
  
 
|-
 
|-
|<nowiki>names <- c(“Mahi”,”Sourav”,”Azhar”,</nowiki>
+
|names <- c(“Mahi”,”Sourav”,”Azhar”,
  
 
“Sunny”,”Pataudi”,”Dravid”)
 
“Sunny”,”Pataudi”,”Dravid”)
 
 
 
 
 
 
  
 
|Let me create a '''vector''' of Indian Cricket Captains, old and new.
 
|Let me create a '''vector''' of Indian Cricket Captains, old and new.
Line 171: Line 126:
  
 
|-
 
|-
|<nowiki>captaincy <- dataframe(names,Y, played,won,lost)</nowiki>
+
|captaincy <- dataframe(names,Y, played,won,lost)
 
+
 
+
  
 
|Now, let us put these together and create a '''data frame''' as I do now.
 
|Now, let us put these together and create a '''data frame''' as I do now.
Line 181: Line 134:
 
|-
 
|-
 
|View(captaincy)
 
|View(captaincy)
 
 
  
 
|Let us understand what the command '''data frame '''does.
 
|Let us understand what the command '''data frame '''does.
Line 208: Line 159:
 
|-
 
|-
 
|captaincy$names
 
|captaincy$names
 
 
  
 
|It is easy to access the parts of a '''data frame'''.
 
|It is easy to access the parts of a '''data frame'''.
Line 217: Line 166:
 
|-
 
|-
 
|captaincy$won
 
|captaincy$won
 
  
 
captaincy$played
 
captaincy$played
  
<nowiki>ratio <- captaincy$won / captaincy$played</nowiki>
+
ratio <- captaincy$won / captaincy$played
 
+
 
+
 
+
  
 
|Let us similarly get the number of matches won.
 
|Let us similarly get the number of matches won.
 
 
  
 
Let us find the ratio of matches won.
 
Let us find the ratio of matches won.
Line 235: Line 178:
  
 
|-
 
|-
|<nowiki>captaincy$victory <- ratio</nowiki>
+
|captaincy$victory <- ratio
 
+
 
+
 
+
 
+
 
+
 
+
  
 
|Let us include this ratio also in the '''captaincy data frame'''.
 
|Let us include this ratio also in the '''captaincy data frame'''.
Line 251: Line 188:
 
|-
 
|-
 
|View(captaincy)
 
|View(captaincy)
 
  
 
|We can check that ratio is included in captaincy by the '''View command'''.
 
|We can check that ratio is included in captaincy by the '''View command'''.
Line 259: Line 195:
 
|-
 
|-
 
|options(digits=2)
 
|options(digits=2)
 
 
 
  
 
|Let us reduce the number of digits displayed in the ratio to 2.
 
|Let us reduce the number of digits displayed in the ratio to 2.
Line 291: Line 224:
 
|-
 
|-
 
|
 
|
 
  
 
plot(captaincy$Y,ratio)
 
plot(captaincy$Y,ratio)
 
 
 
 
  
 
|Let us now get some plots with this '''data frame'''.
 
|Let us now get some plots with this '''data frame'''.
Line 319: Line 247:
 
|-
 
|-
 
|write.csv(captaincy,”NewCaptaincy.csv”)
 
|write.csv(captaincy,”NewCaptaincy.csv”)
 
 
 
 
 
  
 
|We will use the command '''write.csv''', as we do now.
 
|We will use the command '''write.csv''', as we do now.
Line 348: Line 271:
 
|-
 
|-
 
|On the emacs editor...
 
|On the emacs editor...
 
 
 
 
 
 
  
 
|Let us now see what the file '''NewCaptaincy.csv''' contains.
 
|Let us now see what the file '''NewCaptaincy.csv''' contains.
Line 380: Line 297:
 
|-
 
|-
 
|Change '''Mahi''' to '''Dhoni''' and save.
 
|Change '''Mahi''' to '''Dhoni''' and save.
 
 
  
 
|Let us make a small change in this '''csv''' file and read it.
 
|Let us make a small change in this '''csv''' file and read it.
Line 389: Line 304:
 
|-
 
|-
 
|Workspace -> Import Dataset  
 
|Workspace -> Import Dataset  
 
 
 
 
 
 
 
 
 
 
  
 
|Let us click '''Import Dataset'''.
 
|Let us click '''Import Dataset'''.
Line 414: Line 319:
 
|-
 
|-
 
|read a text file
 
|read a text file
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
View(NewCaptaincy)
 
View(NewCaptaincy)
 
 
 
 
 
 
  
 
|Let us click the '''File''' option.
 
|Let us click the '''File''' option.
Line 467: Line 352:
 
|-
 
|-
 
|
 
|
 
 
 
 
 
 
data()
 
data()
 
 
 
 
 
  
 
|Let us take up the final topic for this tutorial.
 
|Let us take up the final topic for this tutorial.
Line 501: Line 376:
  
 
Summary
 
Summary
 
 
 
 
 
 
 
 
  
 
|In this tutorial, we learnt the following:
 
|In this tutorial, we learnt the following:
Line 522: Line 389:
  
 
Assignment
 
Assignment
 
 
 
 
 
 
 
 
 
 
|We now suggest an assignment.
 
|We now suggest an assignment.
  
Line 551: Line 409:
  
 
Spoken Tutorial Workshops
 
Spoken Tutorial Workshops
 
  
 
|We conduct workshops using Spoken Tutorials; give Certificates.
 
|We conduct workshops using Spoken Tutorials; give Certificates.

Latest revision as of 16:24, 24 April 2019

FOSS: R through RStudio

          R version 3.0.0 (2013-04-03) and RStudio version 0.97.336

Tutorial Title: Introduction to data frames in R

Author: Kannan Moudgalya

Reviewer: Jayendran Venkateswaran, Neeraj Hatekar, S. Subramanian, T. Santhanam, Sanjeev Bakshi, Revathi Kasturi and others

Date: 14 September 2013

Keywords: Video tutorial, spoken tutorial, data frames, csv file, data display, importing csv files

Files required: None

Immediate prerequisite in the same family: Introduction to basics of R

Prerequisite from any other family of ST: None

Visual Cue Narration
Slide 1

Opening slide

Welcome to the spoken tutorial on introduction to data frames in R.

I am Kannan Moudgalya.

Slide 2

Learning Objectives

In this tutorial, we will explain
  • how to create a data frame in R
  • how to work with it
  • how to save it into a file and
  • how to read data from a file into a data frame

We will also learn about the data sets that come with the R software

Slide 3

Prerequisites

To understand this tutorial,
  • One needs to know elementary maths
  • for example, the meaning of mean, row and column
  • One needs to know how to edit a text file and plotting
  • No programming background is required, however
  • Anyone with a simple exposure to statistics can understand this tutorial
Prerequisites (Ctd)
  • Please locate this tutorial on our web page, spoken hyphen tutorial dot org
  • Prerequisite spoken tutorials, if any, will be mentioned in this page
Slide 4

Systems Requirements

  • I am using version 3 of R
  • RStudio 0.97
  • and Mac OS X 10.7

But it should work in other versions and also other operating systems.

RStudio, Slide 6 Let us switch to RStudio.
sample <- c(-2,1,5.8) Let us begin with the “c” command to create a vector

Notice that I am using the assignment operator, that is, less than followed by hyphen.

I will use the word get to denote this assignment.

This is the way to create a vector of three elements.

c stands for create or concatenate.

Although the equal to sign works most of the time, one usually uses this operator in R.

sample Let us view what the vector sample contains.
Let us now build a data set.

We will also call it as a data frame in this tutorial.

names <- c(“Mahi”,”Sourav”,”Azhar”,

“Sunny”,”Pataudi”,”Dravid”)

Let me create a vector of Indian Cricket Captains, old and new.

names gets c of Mahi, Sourav, Azhar, Sunny, Pataudi, Dravid

Each name has to be entered in between double quotes.

names Let us view what the variable names contains.
I will create a vector to store the number of matches they captained.

One can see that all the variables got entered in the workspace.

captaincy <- dataframe(names,Y, played,won,lost) Now, let us put these together and create a data frame as I do now.

It is data dot frame.

View(captaincy) Let us understand what the command data frame does.

To do this, let us View brackets captaincy.

The contents of the data frame appear nicely in the window above.
What will happen if you just type captaincy, without the View command?

Would you want to try it now?

In statistics, one works with such data sets.

Remember, the commands and variables in R are case sensitive.

The View command begins with a capital V.

captaincy$names It is easy to access the parts of a data frame.

For example, this is how I will get the names in the captaincy data frame.

captaincy$won

captaincy$played

ratio <- captaincy$won / captaincy$played

Let us similarly get the number of matches won.

Let us find the ratio of matches won.

When we divide a vector by another, R carries out component-wise division.

captaincy$victory <- ratio Let us include this ratio also in the captaincy data frame.

captaincy dollar victory gets ratio.

What is in ratio gets transferred to victory, a variable in the data frame captaincy.

View(captaincy) We can check that ratio is included in captaincy by the View command.

We can see that the ratio is included in victory.

options(digits=2) Let us reduce the number of digits displayed in the ratio to 2.

We do this with the command options within brackets digits=2.

View(captaincy) Let us view captaincy.

We can see that it has been reduced to two digits.

captaincy$played Let us see the number of matches played by all the captains.
mean(captaincy$played) Let us find the mean of these matches.
(45+49+47+47+40+25) / 6


By direct calculation, we find that mean does exactly what it should.

These two are equal.

plot(captaincy$Y,ratio)

Let us now get some plots with this data frame.

Let us plot the ratio of victories vs. the year of captaincy.

Remember that the x axis variable has to be listed first in the plot command.

plot(captaincy$names,captaincy$played) Let us plot the number of matches captained versus the names of the captains.
Drag this a little bit. The graph expands. Put it back.
We will now explain how to write this data frame to a comma separated variable file or C S V.
write.csv(captaincy,”NewCaptaincy.csv”) We will use the command write.csv, as we do now.

Let me move this to the right.

This writes the captaincy data frame into the file NewCaptaincy.csv

emacs NewCaptaincy.csv


Let me open this file using my favourite text editor, emacs.
You may use any other text editor, such as notepad or vi or gedit.

Please do not, however, use Microsoft office or LibreOffice Writer for this purpose.

Because, you may not be able to load these files into R.

On the emacs editor... Let us now see what the file NewCaptaincy.csv contains.

This csv file contains everything that we created.

It also contains row numbers in every row, at the beginning.

write.csv(captaincy,

”NewCaptaincy.csv”,row.names=FALSE)

How will you get rid of these row numbers?

Add an additional flag called row dot names equals FALSE.

Let us read the new version of this file.

In emacs, I just have to refresh it.

In other editors, you may have to re-open the file.

We see that the row numbers no longer appear.

Change Mahi to Dhoni and save. Let us make a small change in this csv file and read it.

Let us change Mahi to Dhoni and save the file.

Workspace -> Import Dataset Let us click Import Dataset.

There are two options:

  • from the text file and
  • from the web.

Reading from a local file is a fool-proof method.

It works even if Internet is not available.

All one has to do is to download the required files ahead of time.

read a text file

View(NewCaptaincy)

Let us click the File option.

Let us select NewCaptaincy.

In order to see this, let us maximise this, let us press Import, let us press unmaximise.

You will not have this problem, of course, because, you will be using the whole screen.

For this tutorial, I am using only a small portion of the screen.

And that’s why we have had this problem.

The appearance of a data frame called NewCaptaincy here, and also here, shows that it has been read.

It also has Dhoni as the name.

The row numbers have also disappeared.

It is also possible to read excel and ods files into R.

This involves a little more work and hence not covered in this tutorial.

It is easy to export excel and ods files into csv format.

So, for now, please use the csv route to import excel or ods files.

data()

Let us take up the final topic for this tutorial.

R comes with several data frames.

To access them, issue the command data brackets.

R shows a list of available data frames.

We can carry out all sorts of calculations on these data sets.

View(CO2) We will just view one data set, called C O 2.
We will conclude this tutorial here.
Slide 14

Summary

In this tutorial, we learnt the following:
  • We created vectors
  • We put the vectors into a data frame
  • We operated on the data frame
  • We saved the data frame into a file
  • Imported the file into R
  • Examined the data frames that came with R
Slide 15

Assignment

We now suggest an assignment.
  • Find out how to calculate median using the help button.
  • Repeat the above using web search.
  • Calculate the mean and median of the data frame C O 2.
  • Take R’s help for all the commands shown in this tutorial.
  • Import a data set from the Internet directly or through a file.
Slide 16

About the Spoken Tutorial Project

This video summarises the Spoken Tutorial project.

If you do not have good bandwidth, you may download and watch it.

Slide 17

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials; give Certificates.

Please contact us.

Slide 18

Acknowledgements

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Slide 19 Thanks Thanks for joining, goodbye.

Contributors and Content Editors

Nancyvarkey, Sudhakarst