R/C2/Introduction-to-Data-Frames-in-R/English
FOSS: R through RStudio
R version 3.0.0 (2013-04-03) and RStudio version 0.97.336
Tutorial Title: Introduction to data frames in R
Author: Kannan Moudgalya
Reviewer: Jayendran Venkateswaran, Neeraj Hatekar, S. Subramanian, T. Santhanam, Sanjeev Bakshi, Revathi Kasturi and others
Date: 14 September 2013
Keywords: Video tutorial, spoken tutorial, data frames, csv file, data display, importing csv files
Files required: None
Immediate prerequisite in the same family: Introduction to basics of R
Prerequisite from any other family of ST: None
Visual Cue | Narration |
---|---|
Slide 1
Opening slide |
Welcome to the spoken tutorial on introduction to data frames in R.
I am Kannan Moudgalya. |
Slide 2
Learning Objectives |
In this tutorial, we will explain
We will also learn about the data sets that come with the R software |
Slide 3
Prerequisites |
To understand this tutorial,
|
Prerequisites (Ctd) |
|
Slide 4
Systems Requirements |
But it should work in other versions and also other operating systems. |
RStudio, Slide 6 | Let us switch to RStudio. |
sample <- c(-2,1,5.8) | Let us begin with the “c” command to create a vector
Notice that I am using the assignment operator, that is, less than followed by hyphen. I will use the word get to denote this assignment. This is the way to create a vector of three elements. c stands for create or concatenate. Although the equal to sign works most of the time, one usually uses this operator in R. |
sample | Let us view what the vector sample contains. |
Let us now build a data set.
We will also call it as a data frame in this tutorial. | |
names <- c(“Mahi”,”Sourav”,”Azhar”,
“Sunny”,”Pataudi”,”Dravid”) |
Let me create a vector of Indian Cricket Captains, old and new.
names gets c of Mahi, Sourav, Azhar, Sunny, Pataudi, Dravid Each name has to be entered in between double quotes. |
names | Let us view what the variable names contains. |
I will create a vector to store the number of matches they captained.
One can see that all the variables got entered in the workspace. | |
captaincy <- dataframe(names,Y, played,won,lost) | Now, let us put these together and create a data frame as I do now.
It is data dot frame. |
View(captaincy) | Let us understand what the command data frame does.
To do this, let us View brackets captaincy. |
The contents of the data frame appear nicely in the window above. | |
What will happen if you just type captaincy, without the View command?
Would you want to try it now? | |
In statistics, one works with such data sets.
Remember, the commands and variables in R are case sensitive. The View command begins with a capital V. | |
captaincy$names | It is easy to access the parts of a data frame.
For example, this is how I will get the names in the captaincy data frame. |
captaincy$won
captaincy$played ratio <- captaincy$won / captaincy$played |
Let us similarly get the number of matches won.
Let us find the ratio of matches won. When we divide a vector by another, R carries out component-wise division. |
captaincy$victory <- ratio | Let us include this ratio also in the captaincy data frame.
captaincy dollar victory gets ratio. What is in ratio gets transferred to victory, a variable in the data frame captaincy. |
View(captaincy) | We can check that ratio is included in captaincy by the View command.
We can see that the ratio is included in victory. |
options(digits=2) | Let us reduce the number of digits displayed in the ratio to 2.
We do this with the command options within brackets digits=2. |
View(captaincy) | Let us view captaincy.
We can see that it has been reduced to two digits. |
captaincy$played | Let us see the number of matches played by all the captains. |
mean(captaincy$played) | Let us find the mean of these matches. |
(45+49+47+47+40+25) / 6
|
By direct calculation, we find that mean does exactly what it should.
These two are equal. |
plot(captaincy$Y,ratio) |
Let us now get some plots with this data frame.
Let us plot the ratio of victories vs. the year of captaincy. Remember that the x axis variable has to be listed first in the plot command. |
plot(captaincy$names,captaincy$played) | Let us plot the number of matches captained versus the names of the captains. |
Drag this a little bit. The graph expands. Put it back. | |
We will now explain how to write this data frame to a comma separated variable file or C S V. | |
write.csv(captaincy,”NewCaptaincy.csv”) | We will use the command write.csv, as we do now.
Let me move this to the right. This writes the captaincy data frame into the file NewCaptaincy.csv |
emacs NewCaptaincy.csv
|
Let me open this file using my favourite text editor, emacs. |
You may use any other text editor, such as notepad or vi or gedit.
Please do not, however, use Microsoft office or LibreOffice Writer for this purpose. Because, you may not be able to load these files into R. | |
On the emacs editor... | Let us now see what the file NewCaptaincy.csv contains.
This csv file contains everything that we created. It also contains row numbers in every row, at the beginning. |
write.csv(captaincy,
”NewCaptaincy.csv”,row.names=FALSE) |
How will you get rid of these row numbers?
Add an additional flag called row dot names equals FALSE. |
Let us read the new version of this file.
In emacs, I just have to refresh it. In other editors, you may have to re-open the file. We see that the row numbers no longer appear. | |
Change Mahi to Dhoni and save. | Let us make a small change in this csv file and read it.
Let us change Mahi to Dhoni and save the file. |
Workspace -> Import Dataset | Let us click Import Dataset.
There are two options:
Reading from a local file is a fool-proof method. It works even if Internet is not available. All one has to do is to download the required files ahead of time. |
read a text file
View(NewCaptaincy) |
Let us click the File option.
Let us select NewCaptaincy. In order to see this, let us maximise this, let us press Import, let us press unmaximise. You will not have this problem, of course, because, you will be using the whole screen. For this tutorial, I am using only a small portion of the screen. And that’s why we have had this problem. The appearance of a data frame called NewCaptaincy here, and also here, shows that it has been read. It also has Dhoni as the name. The row numbers have also disappeared. |
It is also possible to read excel and ods files into R.
This involves a little more work and hence not covered in this tutorial. It is easy to export excel and ods files into csv format. So, for now, please use the csv route to import excel or ods files. | |
data() |
Let us take up the final topic for this tutorial.
R comes with several data frames. To access them, issue the command data brackets. R shows a list of available data frames. We can carry out all sorts of calculations on these data sets. |
View(CO2) | We will just view one data set, called C O 2. |
We will conclude this tutorial here. | |
Slide 14
Summary |
In this tutorial, we learnt the following:
|
Slide 15
Assignment |
We now suggest an assignment.
|
Slide 16
About the Spoken Tutorial Project |
This video summarises the Spoken Tutorial project.
If you do not have good bandwidth, you may download and watch it. |
Slide 17
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials; give Certificates.
Please contact us. |
Slide 18
Acknowledgements |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Slide 19 Thanks | Thanks for joining, goodbye. |