R/C2/Introduction-to-Data-Frames-in-R/English-timed

From Script | Spoken-Tutorial
Jump to: navigation, search


Time Narration
0:00 Welcome to the spoken tutorial on introduction to data frames in R.
00:05 I am Kannan Moudgalya.
00:07 In this tutorial, we will explain
00:10 how to create a data frame in R
00:13 how to work with it
00:15 how to save it into a file and
00:17 how to read data from a file into a data frame
00:21 We will also learn about the data sets that come with the R software
00:27 To understand this tutorial,
00:29 One needs to know elementary maths
00:32 for example, the meaning of mean, row and column
00:35 One needs to know how to edit a text file and plotting
00:39 No programming background is required, however
00:42 Anyone with a simple exposure to statistics can understand this tutorial
00:46 Please locate this tutorial on our web page, spoken hyphen tutorial dot org
00:53 Prerequisite spoken tutorials, if any, will be mentioned in this page
00:57 I am using version 3 of R

RStudio 0.97 and Mac OS X 10.7

01:05 But it should work in other versions and also other operating systems.
01:12 Let us switch to Rstudio.
01:14 Let us begin with the “c” command to create a vector
01:23 Notice that I am using the assignment operator, that is, less than followed by hyphen.
01:31 I will use the word get to denote this assignment.
01:35 This is the way to create a vector of three elements.
01:38 c stands for create or concatenate.
01:42 Although the equal to sign works most of the time, one usually uses this operator in R.
01:49 Let us view what the vector sample contains.
01:56 Let us now build a data set.
01:58 We will also call it as a data frame in this tutorial.
02:03 Let me create a vector of Indian Cricket Captains, old and new.
02:10 names gets c of Mahi, Sourav, Azhar, Sunny, Pataudi, Dravid
02:46 Each name has to be entered in between double quotes.
02:50 Let us view what the variable names contains.
02:57 I will create a vector to store the number of matches they captained.
03:15 One can see that all the variables got entered in the workspace.
03:21 Now, let us put these together and create a data frame as I do now.
03:44 It is data dot frame.
03:50 Let us understand what the command data frame does.
03:55 To do this, let us View brackets captaincy.
04:02 The contents of the data frame appear nicely in the window above.
04:09 What will happen if you just type captaincy, without the View command?
04:14 Would you want to try it now?
04:16 In statistics, one works with such data sets.
04:19 Remember, the commands and variables in R are case sensitive.
04:24 The View command begins with a capital V.
04:29 It is easy to access the parts of a data frame.
04:32 For example, this is how I will get the names in the captaincy data frame.
04:39 Let us similarly get the number of matches won.
04:45 Let us find the ratio of matches won.
04:54 When we divide a vector by another, R carries out component-wise division.
04:59 Let us include this ratio also in the captaincy data frame.
05:04 captaincy dollar victory gets ratio.
05:13 What is in ratio gets transferred to victory, a variable in the data frame captaincy.
05:19 We can check that ratio is included in captaincy by the View command.
05:26 We can see that the ratio is included in victory.
05:31 Let us reduce the number of digits displayed in the ratio to 2.
05:35 We do this with the command options within brackets digits=2.
05:48 Let us view captaincy
05:51 We can see that it has been reduced to two digits.
05:56 Let us see the number of matches played by all the captains.
06:03 Let us find the mean of these matches.
06:07 By direct calculation, we find that mean does exactly what it should.
06:16 These two are equal.
06:18 Let us now get some plots with this data frame.
06:22 Let us plot the ratio of victories vs. the year of captaincy.
06:32 Remember that the x axis variable has to be listed first in the plot command.
06:40 Let us plot the number of matches captained versus the names of the captains.
06:50 Drag this a little bit. The graph expands. Put it back.
06:58 We will now explain how to write this data frame to a comma separated variable file or C S V.
07:06 We will use the command write.csv, as we do now.
07:15 Let me move this to the right.
07:23 This writes the captaincy data frame into the file NewCaptaincy.csv
07:29 Let me open this file using my favourite text editor, emacs.
07:34 You may use any other text editor, such as notepad or vi or gedit.
07:40 Please do not, however, use Microsoft office or LibreOffice Writer for this purpose.
07:46 Because, you may not be able to load these files into R.
07:50 Let us now see what the file NewCaptaincy.csv contains.
07:56 This csv file contains everything that we created.
08:00 It also contains row numbers in every row, at the beginning.
08:06 How will you get rid of these row numbers?
08:09 Add an additional flag called row dot names equals FALSE.
08:32 Let us read the new version of this file in emacs.
08:40 In emacs, I just have to refresh it.
08:46 In other editors, you may have to re-open the file.
08:50 We see that the row numbers no longer appear.
08:53 Let us make a small change in this csv file and read it.
08:57 Let us change Mahi to Dhoni and save the file.
09:09 Let us click Import Dataset.
09:19 There are two options:
09:21 from the text file and from the web.
09:25 Reading from a local file is a fool-proof method.
09:28 It works even if Internet is not available.
09:31 All one has to do is to download the required files ahead of time.
09:36 Let us click the File option.
09:40 Let us select NewCaptaincy.
09:47 In order to see this, let's maximise this, let's Import this, let manimise this.
09:57 You will not have this problem, of course, because, you will be using the whole screen.
10:03 For this tutorial, I am using only a small portion of the screen.
10:08 And that’s why we have had this problem.
10:11 The appearance of a data frame called NewCaptaincy here, and also here, shows that it has been read.
10:20 It also has Dhoni as the name.
10:23 The row numbers have also disappeared.
10:26 It is also possible to read excel and ods files into R.
10:31 This involves a little more work and hence not covered in this tutorial.
10:35 It is easy to export excel and ods files into csv format.
10:40 So, for now, please use the csv route to import excel or ods files.
10:47 Let us take up the final topic for this tutorial.
10:50 R comes with several data frames.
10:54 To access them, issue the command data brackets.
11:03 R shows a list of available data frames.
11:07 We can carry out all sorts of calculations on these data sets.
11:12 We will just view one data set, called C O 2.
11:18 We will conclude this tutorial here.
11:20 In this tutorial, we learnt the following:
11:23 We created vectors
11:25 We put the vectors into a data frame
11:27 We operated on the data frame
11:29 We saved the data frame into a file
11:32 Imported the file into R
11:34 Examined the data frames that came with R
11:38 We now suggest an assignment.
11:40 Find out how to calculate median using the help button.
11:48 Repeat the above using web search.
11:51 Calculate the mean and median of the data frame C O 2.
11:57 Take R’s help for all the commands shown in this tutorial.
12:03 Import a data set from the Internet directly or through a file.
12:08 This video summarises the Spoken Tutorial project.
12:13 If you do not have good bandwidth, you may download and watch it.
12:17 We conduct workshops using Spoken Tutorials; give Certificates.
12:22 Please contact us.
12:24 The Spoken Tutorial project is funded by MHRD, Govt. of India
12:30 Thanks for joining, goodbye.

Contributors and Content Editors

Sakinashaikh