R/C2/Merging-and-Importing-Data/English-timed
From Script | Spoken-Tutorial
Revision as of 15:50, 22 May 2020 by Sakinashaikh (Talk | contribs)
Time | Narration |
00:01 | Welcome to the spoken tutorial on Merging and Importing Data |
00:07 | In this tutorial, we will learn how to: |
00:11 | Use built-in functions for exploring a data frame |
00:16 | Merge two data frames. |
00:19 | Import data in different formats in R |
00:24 | To understand this tutorial, you should know |
00:28 | Data frames in R |
00:31 | R script in RStudio |
00:34 | How to set working directory in RStudio |
00:39 | If not, please locate the relevant tutorials on R on this website. |
00:46 | This tutorial is recorded on, |
00:50 | Ubuntu Linux OS version 16.04 |
00:55 | R version 3.4.4 |
01:00 | RStudio version 1.1.456 |
01:06 | Install R version 3.2.0 or higher. |
01:12 | For this tutorial, we will use, |
01:15 | five data frames in different formats and |
01:20 | a script file myDataSet.R. |
01:24 | Please download these files from the Code files link of this tutorial. |
01:30 | I have downloaded these files from Code files link and moved them to DataMerging folder in myProject folder on the Desktop. |
01:42 | I have also set this folder as my Working Directory. |
01:48 | Let us switch to RStudio. |
01:51 | Open the script myDataSet.R in RStudio. |
01:57 | For this, click on the script myDataSet.R. |
02:03 | Script myDataSet.R opens in Rstudio. |
02:09 | Run this script by clicking on Source button. |
02:14 | captaincyOne appears in the Source window. |
02:18 | We will use some built-in functions of R to explore captaincyOne. |
02:24 | For all the built-in functions used in this tutorial, please refer to the Additional Material. |
02:31 | First, we will use summary function. |
02:35 | Click on the script myDataSet.R |
02:39 | In the Source window, type summary and then captaincyOne in parentheses. |
02:46 | Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
02:55 | In the Console window, scroll up to locate the output. |
03:02 | Statistical parameters for each column of captaincyOne are shown on the Console. |
03:09 | In the Source window, press Enter. |
03:13 | Press Enter at the end of every command. |
03:17 | Now, let us look at class function. |
03:21 | In the Source window, type class and then captaincyOne in parentheses. |
03:29 | Save the script and run the current line. |
03:34 | class function returns the class of captaincyOne, which is data frame. |
03:41 | Next let us look at typeof function. |
03:45 | In the Source window, type typeof and then captaincyOne in parentheses. |
03:53 | Save the script and run the current line. |
03:58 | typeof function returns the storage type of captaincyOne, which is list. |
04:05 | To know more about typeof function, we will access the help section of RStudio. |
04:12 | In the Console window, type help, within parentheses typeof. Press Enter. |
04:20 | typeof determines the R internal type or storage mode of any object. |
04:27 | Click on the Files tab. |
04:30 | Clear the Console window by clicking on the broom icon. |
04:35 | Click on the data frame captaincyOne. |
04:39 | Now, let us extract two rows from top of captaincyOne. |
04:45 | For this, we will use head function. |
04:49 | Click on the script myDataSet.R |
04:53 | In the Source window, type head within parentheses captaincyOne comma space 2. |
05:02 | Save the script and run the current line. |
05:07 | The top two rows of captaincyOne are shown on the Console window. |
05:13 | Click on the data frame captaincyOne. |
05:17 | Suppose we want to extract two rows from bottom of captaincyOne. |
05:24 | For this, we will use the tail function. |
05:28 | Click on the script myDataSet.R |
05:32 | In the Source window, type tail within parentheses captaincyOne comma space 2. |
05:41 | Save the script and run the current line. |
05:46 | The last two rows of captaincyOne are shown on the Console window. |
05:52 | Next, let us learn about str function. |
05:57 | This function is used to display the structure of an R object. |
06:03 | In the Source window, type str within parentheses captaincyOne. |
06:10 | Save the script and run the current line. |
06:15 | The structural details of captaincyOne are shown on the Console. |
06:21 | Now, we will look at merging of data frames. |
06:26 | Merging data frames has advantages like: |
06:30 | It makes data more available. |
06:33 | It helps in improving data quality. |
06:37 | Combining similar data also reduces data complexity. |
06:42 | Let us switch to RStudio. |
06:45 | We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
06:55 | We will declare a variable captaincyTwo to store and read CaptaincyData2.csv. |
07:03 | In the Source window, type the following command and press Enter. |
07:11 | Now, type View within parentheses captaincyTwo. |
07:17 | Save the script and run the last two lines. |
07:23 | The contents of captaincyTwo appear in the Source window. |
07:28 | This data frame has the same captains as that in captaincyOne. |
07:34 | However, it has different information about them like the number of matches drawn. |
07:40 | Now, we will update captaincyOne by adding information from captaincyTwo. |
07:48 | For this, we use merge function. |
07:52 | Click on the script myDataSet.R |
07:56 | I am resizing the Source window. |
07:59 | In the Source window, type the following command. |
08:04 | Press Enter. |
08:06 | In the merge function, we use column names by which we want to merge two data frames. |
08:13 | Here, it is names. |
08:16 | Now, type View and captaincyOne in parentheses. |
08:22 | Save the script and run these two lines. |
08:28 | The contents of the updated captaincyOne appear in the Source window. |
08:35 | Close the two tabs captaincyOne and captaincyTwo. |
08:41 | Now, we will learn how to import data of different formats in R. |
08:47 | We shall add one comment first. |
08:50 | In the Source window, type # hash space Importing data in different formats. |
08:58 | Now, let us import CaptaincyData.xml file. |
09:04 | For that, we need to install XML package. |
09:09 | Make sure that you are connected to Internet. |
09:13 | We need to install Ubuntu package libxml2-dev before installing XML package. |
09:23 | Information on how to install this package, is provided in the Additional Material. |
09:29 | I have already installed libxml2-dev package. |
09:35 | Hence, I will proceed for installing XML package now. |
09:41 | On the Console window, type install dot packages. |
09:47 | Now, type XMLinside double quotes and in parentheses. |
09:53 | Press Enter. We will wait until R installs the package. |
10:00 | Then, we load this package using library function. |
10:05 | Click on the script myDataSet.R |
10:09 | Since we are loading a package, we will add it at the top of the script. |
10:15 | In the Source window, scroll up. |
10:18 | Now, at the top of the script myDataSet.R, type library and XML in parentheses. |
10:29 | Save the script and run this line. |
10:34 | Now, in the Source window, click on the next line after the comment Importing data in different formats. |
10:43 | Type the following command and press Enter. |
10:49 | Then type View and xmldata in parentheses. |
10:56 | Save the script and run these two lines. |
11:00 | The contents of the xml file are shown here. |
11:05 | Next let us learn how to import CaptaincyData.txt. |
11:12 | Click on the script myDataSet.R |
11:16 | In the Source window, type the following command |
11:21 | and press Enter. |
11:24 | Next, type View and txtdata in parentheses. |
11:30 | Save the script and run these two lines. |
11:35 | The contents of the txt file are shown. |
11:40 | Now, we will learn how to import data from user interface of Rstudio. |
11:47 | I am resizing the Source window. |
11:51 | We will import the Excel file CaptaincyData.xlsx using this method. |
12:00 | Please ensure that you have packages like readxl and Rcpp installed in your system. |
12:08 | In the top right corner of RStudio, click on the Environment tab. |
12:15 | In the Environment tab, click on Import Dataset. |
12:20 | From the drop-down menu, select From Excel. |
12:25 | A window named Import Excel Data appears. |
12:30 | You can select a file on your computer or type the URL from which you want to load an Excel file. |
12:39 | We will select a file on our computer. |
12:43 | In the upper right corner of this window, near File/Url text field, click on Browse. |
12:52 | I will select the file CaptaincyData.xlsx located in DataMerging folder. |
13:01 | This folder is in myProject folder on the Desktop. |
13:06 | Click Open to load this file. |
13:10 | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
13:17 | At the bottom right corner of this window, you can see the code for importing this Excel file. |
13:24 | Finally, click on the Import button. |
13:28 | The contents of the Excel file are shown here. |
13:32 | Let us summarize what we have learnt. |
13:36 | In this tutorial, we have learnt how to: |
13:40 | Use built-in functions for exploring a data frame |
13:45 | Merge two data frames |
13:48 | Import data in different formats in R |
13:53 | We now suggest an assignment. |
13:57 | Using built-in dataset iris, implement all the functions we have learnt in this tutorial. |
14:04 | The video at the following link summarises the Spoken Tutorial project. |
14:09 | Please download and watch it. |
14:12 | We conduct workshops using Spoken Tutorials and give certificates. |
14:18 | Please contact us. |
14:21 | Please post your timed queries in this forum. |
14:25 | Please post your general queries in this forum. |
14:29 | The FOSSEE team coordinates the TBC project. For more details, please visit these sites. |
14:37 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
14:44 | The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). |
14:52 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |