R/C2/Introduction-to-Data-Frames-in-R/English-timed
From Script | Spoken-Tutorial
Revision as of 12:03, 8 May 2020 by Sakinashaikh (Talk | contribs)
Time | Narration |
---|---|
0:00 | Welcome to the spoken tutorial on introduction to data frames in R. |
00:05 | I am Kannan Moudgalya. |
00:07 | In this tutorial, we will explain |
00:10 | how to create a data frame in R |
00:13 | how to work with it |
00:15 | how to save it into a file and |
00:17 | how to read data from a file into a data frame |
00:21 | We will also learn about the data sets that come with the R software |
00:27 | To understand this tutorial, |
00:29 | One needs to know elementary maths |
00:32 | for example, the meaning of mean, row and column |
00:35 | One needs to know how to edit a text file and plotting |
00:39 | No programming background is required, however |
00:42 | Anyone with a simple exposure to statistics can understand this tutorial |
00:46 | Please locate this tutorial on our web page, spoken hyphen tutorial dot org |
00:53 | Prerequisite spoken tutorials, if any, will be mentioned in this page |
00:57 | I am using version 3 of R
RStudio 0.97 and Mac OS X 10.7 |
01:05 | But it should work in other versions and also other operating systems. |
01:12 | Let us switch to Rstudio. |
01:14 | Let us begin with the “c” command to create a vector |
01:23 | Notice that I am using the assignment operator, that is, less than followed by hyphen. |
01:31 | I will use the word get to denote this assignment. |
01:35 | This is the way to create a vector of three elements. |
01:38 | c stands for create or concatenate. |
01:42 | Although the equal to sign works most of the time, one usually uses this operator in R. |
01:49 | Let us view what the vector sample contains. |
01:56 | Let us now build a data set. |
01:58 | We will also call it as a data frame in this tutorial. |
02:03 | Let me create a vector of Indian Cricket Captains, old and new. |
02:10 | names gets c of Mahi, Sourav, Azhar, Sunny, Pataudi, Dravid |
02:46 | Each name has to be entered in between double quotes. |
02:50 | Let us view what the variable names contains. |
02:57 | I will create a vector to store the number of matches they captained. |
03:15 | One can see that all the variables got entered in the workspace. |
03:21 | Now, let us put these together and create a data frame as I do now. |
03:44 | It is data dot frame. |
03:50 | Let us understand what the command data frame does. |
03:55 | To do this, let us View brackets captaincy. |
04:02 | The contents of the data frame appear nicely in the window above. |
04:09 | What will happen if you just type captaincy, without the View command? |
04:14 | Would you want to try it now? |
04:16 | In statistics, one works with such data sets. |
04:19 | Remember, the commands and variables in R are case sensitive. |
04:24 | The View command begins with a capital V. |
04:29 | It is easy to access the parts of a data frame. |
04:32 | For example, this is how I will get the names in the captaincy data frame. |
04:39 | Let us similarly get the number of matches won. |
04:45 | Let us find the ratio of matches won. |
04:54 | When we divide a vector by another, R carries out component-wise division. |
04:59 | Let us include this ratio also in the captaincy data frame. |
05:04 | captaincy dollar victory gets ratio. |
05:13 | What is in ratio gets transferred to victory, a variable in the data frame captaincy. |
05:19 | We can check that ratio is included in captaincy by the View command. |
05:26 | We can see that the ratio is included in victory. |
05:31 | Let us reduce the number of digits displayed in the ratio to 2. |
05:35 | We do this with the command options within brackets digits=2. |
05:48 | Let us view captaincy |
05:51 | We can see that it has been reduced to two digits. |
05:56 | Let us see the number of matches played by all the captains. |
06:03 | Let us find the mean of these matches. |
06:07 | By direct calculation, we find that mean does exactly what it should. |
06:16 | These two are equal. |
06:18 | Let us now get some plots with this data frame. |
06:22 | Let us plot the ratio of victories vs. the year of captaincy. |
06:32 | Remember that the x axis variable has to be listed first in the plot command. |
06:40 | Let us plot the number of matches captained versus the names of the captains. |
06:50 | Drag this a little bit. The graph expands. Put it back. |
06:58 | We will now explain how to write this data frame to a comma separated variable file or C S V. |
07:06 | We will use the command write.csv, as we do now. |
07:15 | Let me move this to the right. |
07:23 | This writes the captaincy data frame into the file NewCaptaincy.csv |
07:29 | Let me open this file using my favourite text editor, emacs. |
07:34 | You may use any other text editor, such as notepad or vi or gedit. |
07:40 | Please do not, however, use Microsoft office or LibreOffice Writer for this purpose. |
07:46 | Because, you may not be able to load these files into R. |
07:50 | Let us now see what the file NewCaptaincy.csv contains. |
07:56 | This csv file contains everything that we created. |
08:00 | It also contains row numbers in every row, at the beginning. |
08:06 | How will you get rid of these row numbers? |
08:09 | Add an additional flag called row dot names equals FALSE. |
08:32 | Let us read the new version of this file in emacs. |
08:40 | In emacs, I just have to refresh it. |
08:46 | In other editors, you may have to re-open the file. |
08:50 | We see that the row numbers no longer appear. |
08:53 | Let us make a small change in this csv file and read it. |
08:57 | Let us change Mahi to Dhoni and save the file. |
09:09 | Let us click Import Dataset. |
09:19 | There are two options: |
09:21 | from the text file and from the web. |
09:25 | Reading from a local file is a fool-proof method. |
09:28 | It works even if Internet is not available. |
09:31 | All one has to do is to download the required files ahead of time. |
09:36 | Let us click the File option. |
09:40 | Let us select NewCaptaincy. |
09:47 | In order to see this, let's maximise this, let's Import this, let manimise this. |
09:57 | You will not have this problem, of course, because, you will be using the whole screen. |
10:03 | For this tutorial, I am using only a small portion of the screen. |
10:08 | And that’s why we have had this problem. |
10:11 | The appearance of a data frame called NewCaptaincy here, and also here, shows that it has been read. |
10:20 | It also has Dhoni as the name. |
10:23 | The row numbers have also disappeared. |
10:26 | It is also possible to read excel and ods files into R. |
10:31 | This involves a little more work and hence not covered in this tutorial. |
10:35 | It is easy to export excel and ods files into csv format. |
10:40 | So, for now, please use the csv route to import excel or ods files. |
10:47 | Let us take up the final topic for this tutorial. |
10:50 | R comes with several data frames. |
10:54 | To access them, issue the command data brackets. |
11:03 | R shows a list of available data frames. |
11:07 | We can carry out all sorts of calculations on these data sets. |
11:12 | We will just view one data set, called C O 2. |
11:18 | We will conclude this tutorial here. |
11:20 | In this tutorial, we learnt the following: |
11:23 | We created vectors |
11:25 | We put the vectors into a data frame |
11:27 | We operated on the data frame |
11:29 | We saved the data frame into a file |
11:32 | Imported the file into R |
11:34 | Examined the data frames that came with R |
11:38 | We now suggest an assignment. |
11:40 | Find out how to calculate median using the help button. |
11:48 | Repeat the above using web search. |
11:51 | Calculate the mean and median of the data frame C O 2. |
11:57 | Take R’s help for all the commands shown in this tutorial. |
12:03 | Import a data set from the Internet directly or through a file. |
12:08 | This video summarises the Spoken Tutorial project. |
12:13 | If you do not have good bandwidth, you may download and watch it. |
12:17 | We conduct workshops using Spoken Tutorials; give Certificates. |
12:22 | Please contact us. |
12:24 | The Spoken Tutorial project is funded by MHRD, Govt. of India |
12:30 | Thanks for joining, goodbye. |