R/C2/Introduction-to-Data-Frames-in-R/English

From Script | Spoken-Tutorial
Revision as of 13:21, 10 October 2013 by Nancyvarkey (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

FOSS: R through RStudio

          R version 3.0.0 (2013-04-03) and RStudio version 0.97.336

Tutorial Title: Introduction to data frames in R

Author: Kannan Moudgalya

Reviewer: Jayendran Venkateswaran, Neeraj Hatekar, S. Subramanian, T. Santhanam, Sanjeev Bakshi, Revathi Kasturi and others

Date: 14 September 2013

Keywords: Video tutorial, spoken tutorial, data frames, csv file, data display, importing csv files

Files required: None

Immediate prerequisite in the same family: Introduction to basics of R

Prerequisite from any other family of ST: None

Visual Cue Narration
Slide 1

Opening slide

Welcome to the spoken tutorial on introduction to data frames in R.


I am Kannan Moudgalya.

Slide 2

Learning Objectives





In this tutorial, we will explain
  • how to create a data frame in R
  • how to work with it
  • how to save it into a file and
  • how to read data from a file into a data frame

We will also learn about the data sets that come with the R software

Slide 3

Prerequisites





To understand this tutorial,
  • One needs to know elementary maths
  • for example, the meaning of mean, row and column
  • One needs to know how to edit a text file and plotting
  • No programming background is required, however
  • Anyone with a simple exposure to statistics can understand this tutorial
Prerequisites (Ctd)


  • Please locate this tutorial on our web page, spoken hyphen tutorial dot org
  • Prerequisite spoken tutorials, if any, will be mentioned in this page
Slide 4

Systems Requirements


  • I am using version 3 of R
  • RStudio 0.97
  • and Mac OS X 10.7

But it should work in other versions and also other operating systems.

RStudio, Slide 6 Let us switch to RStudio.
sample <- c(-2,1,5.8)









Let us begin with the “c” command to create a vector

Notice that I am using the assignment operator, that is, less than followed by hyphen.

I will use the word get to denote this assignment.

This is the way to create a vector of three elements.

c stands for create or concatenate.

Although the equal to sign works most of the time, one usually uses this operator in R.

sample Let us view what the vector sample contains.
Let us now build a data set.

We will also call it as a data frame in this tutorial.

names <- c(“Mahi”,”Sourav”,”Azhar”,

“Sunny”,”Pataudi”,”Dravid”)




Let me create a vector of Indian Cricket Captains, old and new.

names gets c of Mahi, Sourav, Azhar, Sunny, Pataudi, Dravid

Each name has to be entered in between double quotes.

names Let us view what the variable names contains.
I will create a vector to store the number of matches they captained.

One can see that all the variables got entered in the workspace.

captaincy <- dataframe(names,Y, played,won,lost)


Now, let us put these together and create a data frame as I do now.

It is data dot frame.

View(captaincy)


Let us understand what the command data frame does.

To do this, let us View brackets captaincy.

The contents of the data frame appear nicely in the window above.
What will happen if you just type captaincy, without the View command?

Would you want to try it now?

In statistics, one works with such data sets.

Remember, the commands and variables in R are case sensitive.

The View command begins with a capital V.

captaincy$names


It is easy to access the parts of a data frame.

For example, this is how I will get the names in the captaincy data frame.

captaincy$won


captaincy$played

ratio <- captaincy$won / captaincy$played



Let us similarly get the number of matches won.


Let us find the ratio of matches won.

When we divide a vector by another, R carries out component-wise division.

captaincy$victory <- ratio




Let us include this ratio also in the captaincy data frame.

captaincy dollar victory gets ratio.

What is in ratio gets transferred to victory, a variable in the data frame captaincy.

View(captaincy)


We can check that ratio is included in captaincy by the View command.

We can see that the ratio is included in victory.

options(digits=2)



Let us reduce the number of digits displayed in the ratio to 2.

We do this with the command options within brackets digits=2.

View(captaincy) Let us view captaincy.

We can see that it has been reduced to two digits.

captaincy$played Let us see the number of matches played by all the captains.
mean(captaincy$played) Let us find the mean of these matches.
(45+49+47+47+40+25) / 6


By direct calculation, we find that mean does exactly what it should.

These two are equal.


plot(captaincy$Y,ratio)



Let us now get some plots with this data frame.

Let us plot the ratio of victories vs. the year of captaincy.

Remember that the x axis variable has to be listed first in the plot command.

plot(captaincy$names,captaincy$played) Let us plot the number of matches captained versus the names of the captains.
Drag this a little bit. The graph expands. Put it back.
We will now explain how to write this data frame to a comma separated variable file or C S V.
write.csv(captaincy,”NewCaptaincy.csv”)




We will use the command write.csv, as we do now.

Let me move this to the right.

This writes the captaincy data frame into the file NewCaptaincy.csv

emacs NewCaptaincy.csv


Let me open this file using my favourite text editor, emacs.
You may use any other text editor, such as notepad or vi or gedit.

Please do not, however, use Microsoft office or LibreOffice Writer for this purpose.

Because, you may not be able to load these files into R.

On the emacs editor...




Let us now see what the file NewCaptaincy.csv contains.

This csv file contains everything that we created.

It also contains row numbers in every row, at the beginning.

write.csv(captaincy,

”NewCaptaincy.csv”,row.names=FALSE)

How will you get rid of these row numbers?

Add an additional flag called row dot names equals FALSE.

Let us read the new version of this file.

In emacs, I just have to refresh it.

In other editors, you may have to re-open the file.

We see that the row numbers no longer appear.

Change Mahi to Dhoni and save.


Let us make a small change in this csv file and read it.

Let us change Mahi to Dhoni and save the file.

Workspace -> Import Dataset






Let us click Import Dataset.

There are two options:

  • from the text file and
  • from the web.

Reading from a local file is a fool-proof method.

It works even if Internet is not available.

All one has to do is to download the required files ahead of time.

read a text file








View(NewCaptaincy)




Let us click the File option.

Let us select NewCaptaincy.

In order to see this, let us maximise this, let us press Import, let us press unmaximise.

You will not have this problem, of course, because, you will be using the whole screen.

For this tutorial, I am using only a small portion of the screen.

And that’s why we have had this problem.

The appearance of a data frame called NewCaptaincy here, and also here, shows that it has been read.

It also has Dhoni as the name.

The row numbers have also disappeared.

It is also possible to read excel and ods files into R.

This involves a little more work and hence not covered in this tutorial.

It is easy to export excel and ods files into csv format.

So, for now, please use the csv route to import excel or ods files.



data()




Let us take up the final topic for this tutorial.

R comes with several data frames.

To access them, issue the command data brackets.

R shows a list of available data frames.

We can carry out all sorts of calculations on these data sets.

View(CO2) We will just view one data set, called C O 2.
We will conclude this tutorial here.
Slide 14

Summary





In this tutorial, we learnt the following:
  • We created vectors
  • We put the vectors into a data frame
  • We operated on the data frame
  • We saved the data frame into a file
  • Imported the file into R
  • Examined the data frames that came with R
Slide 15

Assignment





We now suggest an assignment.
  • Find out how to calculate median using the help button.
  • Repeat the above using web search.
  • Calculate the mean and median of the data frame C O 2.
  • Take R’s help for all the commands shown in this tutorial.
  • Import a data set from the Internet directly or through a file.
Slide 16

About the Spoken Tutorial Project

This video summarises the Spoken Tutorial project.

If you do not have good bandwidth, you may download and watch it.

Slide 17

Spoken Tutorial Workshops


We conduct workshops using Spoken Tutorials; give Certificates.

Please contact us.

Slide 18

Acknowledgements

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Slide 19 Thanks Thanks for joining, goodbye.

Contributors and Content Editors

Nancyvarkey, Sudhakarst