Difference between revisions of "R/C2/Introduction-to-Data-Frames-in-R/English"
Nancyvarkey (Talk | contribs) (Created page with ''''FOSS:''' R through RStudio R version 3.0.0 (2013-04-03) and RStudio version 0.97.336 '''Tutorial Title:''' Introduction to data frames in …') |
Sudhakarst (Talk | contribs) |
||
Line 29: | Line 29: | ||
Opening slide | Opening slide | ||
|Welcome to the spoken tutorial on introduction to '''data frames''' in '''R'''. | |Welcome to the spoken tutorial on introduction to '''data frames''' in '''R'''. | ||
− | |||
I am Kannan Moudgalya. | I am Kannan Moudgalya. | ||
Line 37: | Line 36: | ||
Learning Objectives | Learning Objectives | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|In this tutorial, we will explain | |In this tutorial, we will explain | ||
Line 59: | Line 50: | ||
Prerequisites | Prerequisites | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|To understand this tutorial, | |To understand this tutorial, | ||
Line 77: | Line 60: | ||
|- | |- | ||
|Prerequisites (Ctd) | |Prerequisites (Ctd) | ||
− | |||
− | |||
− | |||
| | | | ||
* Please locate this tutorial on our web page, '''spoken hyphen tutorial dot org''' | * Please locate this tutorial on our web page, '''spoken hyphen tutorial dot org''' | ||
Line 88: | Line 68: | ||
Systems Requirements | Systems Requirements | ||
− | |||
− | |||
− | |||
| | | | ||
* I am using version '''3''' of '''R''' | * I am using version '''3''' of '''R''' | ||
Line 103: | Line 80: | ||
|- | |- | ||
− | | | + | |sample <- c(-2,1,5.8) |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|Let us begin with the '''“c” command''' to create a '''vector''' | |Let us begin with the '''“c” command''' to create a '''vector''' | ||
Line 144: | Line 105: | ||
|- | |- | ||
− | | | + | |names <- c(“Mahi”,”Sourav”,”Azhar”, |
“Sunny”,”Pataudi”,”Dravid”) | “Sunny”,”Pataudi”,”Dravid”) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|Let me create a '''vector''' of Indian Cricket Captains, old and new. | |Let me create a '''vector''' of Indian Cricket Captains, old and new. | ||
Line 171: | Line 126: | ||
|- | |- | ||
− | | | + | |captaincy <- dataframe(names,Y, played,won,lost) |
− | + | ||
− | + | ||
|Now, let us put these together and create a '''data frame''' as I do now. | |Now, let us put these together and create a '''data frame''' as I do now. | ||
Line 181: | Line 134: | ||
|- | |- | ||
|View(captaincy) | |View(captaincy) | ||
− | |||
− | |||
|Let us understand what the command '''data frame '''does. | |Let us understand what the command '''data frame '''does. | ||
Line 208: | Line 159: | ||
|- | |- | ||
|captaincy$names | |captaincy$names | ||
− | |||
− | |||
|It is easy to access the parts of a '''data frame'''. | |It is easy to access the parts of a '''data frame'''. | ||
Line 217: | Line 166: | ||
|- | |- | ||
|captaincy$won | |captaincy$won | ||
− | |||
captaincy$played | captaincy$played | ||
− | + | ratio <- captaincy$won / captaincy$played | |
− | + | ||
− | + | ||
− | + | ||
|Let us similarly get the number of matches won. | |Let us similarly get the number of matches won. | ||
− | |||
− | |||
Let us find the ratio of matches won. | Let us find the ratio of matches won. | ||
Line 235: | Line 178: | ||
|- | |- | ||
− | | | + | |captaincy$victory <- ratio |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
|Let us include this ratio also in the '''captaincy data frame'''. | |Let us include this ratio also in the '''captaincy data frame'''. | ||
Line 251: | Line 188: | ||
|- | |- | ||
|View(captaincy) | |View(captaincy) | ||
− | |||
|We can check that ratio is included in captaincy by the '''View command'''. | |We can check that ratio is included in captaincy by the '''View command'''. | ||
Line 259: | Line 195: | ||
|- | |- | ||
|options(digits=2) | |options(digits=2) | ||
− | |||
− | |||
− | |||
|Let us reduce the number of digits displayed in the ratio to 2. | |Let us reduce the number of digits displayed in the ratio to 2. | ||
Line 291: | Line 224: | ||
|- | |- | ||
| | | | ||
− | |||
plot(captaincy$Y,ratio) | plot(captaincy$Y,ratio) | ||
− | |||
− | |||
− | |||
− | |||
|Let us now get some plots with this '''data frame'''. | |Let us now get some plots with this '''data frame'''. | ||
Line 319: | Line 247: | ||
|- | |- | ||
|write.csv(captaincy,”NewCaptaincy.csv”) | |write.csv(captaincy,”NewCaptaincy.csv”) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
|We will use the command '''write.csv''', as we do now. | |We will use the command '''write.csv''', as we do now. | ||
Line 348: | Line 271: | ||
|- | |- | ||
|On the emacs editor... | |On the emacs editor... | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|Let us now see what the file '''NewCaptaincy.csv''' contains. | |Let us now see what the file '''NewCaptaincy.csv''' contains. | ||
Line 380: | Line 297: | ||
|- | |- | ||
|Change '''Mahi''' to '''Dhoni''' and save. | |Change '''Mahi''' to '''Dhoni''' and save. | ||
− | |||
− | |||
|Let us make a small change in this '''csv''' file and read it. | |Let us make a small change in this '''csv''' file and read it. | ||
Line 389: | Line 304: | ||
|- | |- | ||
|Workspace -> Import Dataset | |Workspace -> Import Dataset | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|Let us click '''Import Dataset'''. | |Let us click '''Import Dataset'''. | ||
Line 414: | Line 319: | ||
|- | |- | ||
|read a text file | |read a text file | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
View(NewCaptaincy) | View(NewCaptaincy) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|Let us click the '''File''' option. | |Let us click the '''File''' option. | ||
Line 467: | Line 352: | ||
|- | |- | ||
| | | | ||
− | |||
− | |||
− | |||
− | |||
− | |||
data() | data() | ||
− | |||
− | |||
− | |||
− | |||
− | |||
|Let us take up the final topic for this tutorial. | |Let us take up the final topic for this tutorial. | ||
Line 501: | Line 376: | ||
Summary | Summary | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|In this tutorial, we learnt the following: | |In this tutorial, we learnt the following: | ||
Line 522: | Line 389: | ||
Assignment | Assignment | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|We now suggest an assignment. | |We now suggest an assignment. | ||
Line 551: | Line 409: | ||
Spoken Tutorial Workshops | Spoken Tutorial Workshops | ||
− | |||
|We conduct workshops using Spoken Tutorials; give Certificates. | |We conduct workshops using Spoken Tutorials; give Certificates. |
Latest revision as of 16:24, 24 April 2019
FOSS: R through RStudio
R version 3.0.0 (2013-04-03) and RStudio version 0.97.336
Tutorial Title: Introduction to data frames in R
Author: Kannan Moudgalya
Reviewer: Jayendran Venkateswaran, Neeraj Hatekar, S. Subramanian, T. Santhanam, Sanjeev Bakshi, Revathi Kasturi and others
Date: 14 September 2013
Keywords: Video tutorial, spoken tutorial, data frames, csv file, data display, importing csv files
Files required: None
Immediate prerequisite in the same family: Introduction to basics of R
Prerequisite from any other family of ST: None
Visual Cue | Narration |
---|---|
Slide 1
Opening slide |
Welcome to the spoken tutorial on introduction to data frames in R.
I am Kannan Moudgalya. |
Slide 2
Learning Objectives |
In this tutorial, we will explain
We will also learn about the data sets that come with the R software |
Slide 3
Prerequisites |
To understand this tutorial,
|
Prerequisites (Ctd) |
|
Slide 4
Systems Requirements |
But it should work in other versions and also other operating systems. |
RStudio, Slide 6 | Let us switch to RStudio. |
sample <- c(-2,1,5.8) | Let us begin with the “c” command to create a vector
Notice that I am using the assignment operator, that is, less than followed by hyphen. I will use the word get to denote this assignment. This is the way to create a vector of three elements. c stands for create or concatenate. Although the equal to sign works most of the time, one usually uses this operator in R. |
sample | Let us view what the vector sample contains. |
Let us now build a data set.
We will also call it as a data frame in this tutorial. | |
names <- c(“Mahi”,”Sourav”,”Azhar”,
“Sunny”,”Pataudi”,”Dravid”) |
Let me create a vector of Indian Cricket Captains, old and new.
names gets c of Mahi, Sourav, Azhar, Sunny, Pataudi, Dravid Each name has to be entered in between double quotes. |
names | Let us view what the variable names contains. |
I will create a vector to store the number of matches they captained.
One can see that all the variables got entered in the workspace. | |
captaincy <- dataframe(names,Y, played,won,lost) | Now, let us put these together and create a data frame as I do now.
It is data dot frame. |
View(captaincy) | Let us understand what the command data frame does.
To do this, let us View brackets captaincy. |
The contents of the data frame appear nicely in the window above. | |
What will happen if you just type captaincy, without the View command?
Would you want to try it now? | |
In statistics, one works with such data sets.
Remember, the commands and variables in R are case sensitive. The View command begins with a capital V. | |
captaincy$names | It is easy to access the parts of a data frame.
For example, this is how I will get the names in the captaincy data frame. |
captaincy$won
captaincy$played ratio <- captaincy$won / captaincy$played |
Let us similarly get the number of matches won.
Let us find the ratio of matches won. When we divide a vector by another, R carries out component-wise division. |
captaincy$victory <- ratio | Let us include this ratio also in the captaincy data frame.
captaincy dollar victory gets ratio. What is in ratio gets transferred to victory, a variable in the data frame captaincy. |
View(captaincy) | We can check that ratio is included in captaincy by the View command.
We can see that the ratio is included in victory. |
options(digits=2) | Let us reduce the number of digits displayed in the ratio to 2.
We do this with the command options within brackets digits=2. |
View(captaincy) | Let us view captaincy.
We can see that it has been reduced to two digits. |
captaincy$played | Let us see the number of matches played by all the captains. |
mean(captaincy$played) | Let us find the mean of these matches. |
(45+49+47+47+40+25) / 6
|
By direct calculation, we find that mean does exactly what it should.
These two are equal. |
plot(captaincy$Y,ratio) |
Let us now get some plots with this data frame.
Let us plot the ratio of victories vs. the year of captaincy. Remember that the x axis variable has to be listed first in the plot command. |
plot(captaincy$names,captaincy$played) | Let us plot the number of matches captained versus the names of the captains. |
Drag this a little bit. The graph expands. Put it back. | |
We will now explain how to write this data frame to a comma separated variable file or C S V. | |
write.csv(captaincy,”NewCaptaincy.csv”) | We will use the command write.csv, as we do now.
Let me move this to the right. This writes the captaincy data frame into the file NewCaptaincy.csv |
emacs NewCaptaincy.csv
|
Let me open this file using my favourite text editor, emacs. |
You may use any other text editor, such as notepad or vi or gedit.
Please do not, however, use Microsoft office or LibreOffice Writer for this purpose. Because, you may not be able to load these files into R. | |
On the emacs editor... | Let us now see what the file NewCaptaincy.csv contains.
This csv file contains everything that we created. It also contains row numbers in every row, at the beginning. |
write.csv(captaincy,
”NewCaptaincy.csv”,row.names=FALSE) |
How will you get rid of these row numbers?
Add an additional flag called row dot names equals FALSE. |
Let us read the new version of this file.
In emacs, I just have to refresh it. In other editors, you may have to re-open the file. We see that the row numbers no longer appear. | |
Change Mahi to Dhoni and save. | Let us make a small change in this csv file and read it.
Let us change Mahi to Dhoni and save the file. |
Workspace -> Import Dataset | Let us click Import Dataset.
There are two options:
Reading from a local file is a fool-proof method. It works even if Internet is not available. All one has to do is to download the required files ahead of time. |
read a text file
View(NewCaptaincy) |
Let us click the File option.
Let us select NewCaptaincy. In order to see this, let us maximise this, let us press Import, let us press unmaximise. You will not have this problem, of course, because, you will be using the whole screen. For this tutorial, I am using only a small portion of the screen. And that’s why we have had this problem. The appearance of a data frame called NewCaptaincy here, and also here, shows that it has been read. It also has Dhoni as the name. The row numbers have also disappeared. |
It is also possible to read excel and ods files into R.
This involves a little more work and hence not covered in this tutorial. It is easy to export excel and ods files into csv format. So, for now, please use the csv route to import excel or ods files. | |
data() |
Let us take up the final topic for this tutorial.
R comes with several data frames. To access them, issue the command data brackets. R shows a list of available data frames. We can carry out all sorts of calculations on these data sets. |
View(CO2) | We will just view one data set, called C O 2. |
We will conclude this tutorial here. | |
Slide 14
Summary |
In this tutorial, we learnt the following:
|
Slide 15
Assignment |
We now suggest an assignment.
|
Slide 16
About the Spoken Tutorial Project |
This video summarises the Spoken Tutorial project.
If you do not have good bandwidth, you may download and watch it. |
Slide 17
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials; give Certificates.
Please contact us. |
Slide 18
Acknowledgements |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Slide 19 Thanks | Thanks for joining, goodbye. |