R/C2/Merging-and-Importing-Data/English-timed
From Script | Spoken-Tutorial
| Time | Narration |
| 00:01 | Welcome to the spoken tutorial on Merging and Importing Data |
| 00:07 | In this tutorial, we will learn how to: |
| 00:11 | Use built-in functions for exploring a data frame |
| 00:16 | Merge two data frames. |
| 00:19 | Import data in different formats in R |
| 00:24 | To understand this tutorial, you should know |
| 00:28 | Data frames in R |
| 00:31 | R script in RStudio |
| 00:34 | How to set working directory in RStudio |
| 00:39 | If not, please locate the relevant tutorials on R on this website. |
| 00:46 | This tutorial is recorded on, |
| 00:50 | Ubuntu Linux OS version 16.04 |
| 00:55 | R version 3.4.4 |
| 01:00 | RStudio version 1.1.456 |
| 01:06 | Install R version 3.2.0 or higher. |
| 01:12 | For this tutorial, we will use, |
| 01:15 | five data frames in different formats and |
| 01:20 | a script file myDataSet.R. |
| 01:24 | Please download these files from the Code files link of this tutorial. |
| 01:30 | I have downloaded these files from Code files link and moved them to DataMerging folder in myProject folder on the Desktop. |
| 01:42 | I have also set this folder as my Working Directory. |
| 01:48 | Let us switch to RStudio. |
| 01:51 | Open the script myDataSet.R in RStudio. |
| 01:57 | For this, click on the script myDataSet.R. |
| 02:03 | Script myDataSet.R opens in Rstudio. |
| 02:09 | Run this script by clicking on Source button. |
| 02:14 | captaincyOne appears in the Source window. |
| 02:18 | We will use some built-in functions of R to explore captaincyOne. |
| 02:24 | For all the built-in functions used in this tutorial, please refer to the Additional Material. |
| 02:31 | First, we will use summary function. |
| 02:35 | Click on the script myDataSet.R |
| 02:39 | In the Source window, type summary and then captaincyOne in parentheses. |
| 02:46 | Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
| 02:55 | In the Console window, scroll up to locate the output. |
| 03:02 | Statistical parameters for each column of captaincyOne are shown on the Console. |
| 03:09 | In the Source window, press Enter. |
| 03:13 | Press Enter at the end of every command. |
| 03:17 | Now, let us look at class function. |
| 03:21 | In the Source window, type class and then captaincyOne in parentheses. |
| 03:29 | Save the script and run the current line. |
| 03:34 | class function returns the class of captaincyOne, which is data frame. |
| 03:41 | Next let us look at typeof function. |
| 03:45 | In the Source window, type typeof and then captaincyOne in parentheses. |
| 03:53 | Save the script and run the current line. |
| 03:58 | typeof function returns the storage type of captaincyOne, which is list. |
| 04:05 | To know more about typeof function, we will access the help section of RStudio. |
| 04:12 | In the Console window, type help, within parentheses typeof. Press Enter. |
| 04:20 | typeof determines the R internal type or storage mode of any object. |
| 04:27 | Click on the Files tab. |
| 04:30 | Clear the Console window by clicking on the broom icon. |
| 04:35 | Click on the data frame captaincyOne. |
| 04:39 | Now, let us extract two rows from top of captaincyOne. |
| 04:45 | For this, we will use head function. |
| 04:49 | Click on the script myDataSet.R |
| 04:53 | In the Source window, type head within parentheses captaincyOne comma space 2. |
| 05:02 | Save the script and run the current line. |
| 05:07 | The top two rows of captaincyOne are shown on the Console window. |
| 05:13 | Click on the data frame captaincyOne. |
| 05:17 | Suppose we want to extract two rows from bottom of captaincyOne. |
| 05:24 | For this, we will use the tail function. |
| 05:28 | Click on the script myDataSet.R |
| 05:32 | In the Source window, type tail within parentheses captaincyOne comma space 2. |
| 05:41 | Save the script and run the current line. |
| 05:46 | The last two rows of captaincyOne are shown on the Console window. |
| 05:52 | Next, let us learn about str function. |
| 05:57 | This function is used to display the structure of an R object. |
| 06:03 | In the Source window, type str within parentheses captaincyOne. |
| 06:10 | Save the script and run the current line. |
| 06:15 | The structural details of captaincyOne are shown on the Console. |
| 06:21 | Now, we will look at merging of data frames. |
| 06:26 | Merging data frames has advantages like: |
| 06:30 | It makes data more available. |
| 06:33 | It helps in improving data quality. |
| 06:37 | Combining similar data also reduces data complexity. |
| 06:42 | Let us switch to RStudio. |
| 06:45 | We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
| 06:55 | We will declare a variable captaincyTwo to store and read CaptaincyData2.csv. |
| 07:03 | In the Source window, type the following command and press Enter. |
| 07:11 | Now, type View within parentheses captaincyTwo. |
| 07:17 | Save the script and run the last two lines. |
| 07:23 | The contents of captaincyTwo appear in the Source window. |
| 07:28 | This data frame has the same captains as that in captaincyOne. |
| 07:34 | However, it has different information about them like the number of matches drawn. |
| 07:40 | Now, we will update captaincyOne by adding information from captaincyTwo. |
| 07:48 | For this, we use merge function. |
| 07:52 | Click on the script myDataSet.R |
| 07:56 | I am resizing the Source window. |
| 07:59 | In the Source window, type the following command. |
| 08:04 | Press Enter. |
| 08:06 | In the merge function, we use column names by which we want to merge two data frames. |
| 08:13 | Here, it is names. |
| 08:16 | Now, type View and captaincyOne in parentheses. |
| 08:22 | Save the script and run these two lines. |
| 08:28 | The contents of the updated captaincyOne appear in the Source window. |
| 08:35 | Close the two tabs captaincyOne and captaincyTwo. |
| 08:41 | Now, we will learn how to import data of different formats in R. |
| 08:47 | We shall add one comment first. |
| 08:50 | In the Source window, type # hash space Importing data in different formats. |
| 08:58 | Now, let us import CaptaincyData.xml file. |
| 09:04 | For that, we need to install XML package. |
| 09:09 | Make sure that you are connected to Internet. |
| 09:13 | We need to install Ubuntu package libxml2-dev before installing XML package. |
| 09:23 | Information on how to install this package, is provided in the Additional Material. |
| 09:29 | I have already installed libxml2-dev package. |
| 09:35 | Hence, I will proceed for installing XML package now. |
| 09:41 | On the Console window, type install dot packages. |
| 09:47 | Now, type XMLinside double quotes and in parentheses. |
| 09:53 | Press Enter. We will wait until R installs the package. |
| 10:00 | Then, we load this package using library function. |
| 10:05 | Click on the script myDataSet.R |
| 10:09 | Since we are loading a package, we will add it at the top of the script. |
| 10:15 | In the Source window, scroll up. |
| 10:18 | Now, at the top of the script myDataSet.R, type library and XML in parentheses. |
| 10:29 | Save the script and run this line. |
| 10:34 | Now, in the Source window, click on the next line after the comment Importing data in different formats. |
| 10:43 | Type the following command and press Enter. |
| 10:49 | Then type View and xmldata in parentheses. |
| 10:56 | Save the script and run these two lines. |
| 11:00 | The contents of the xml file are shown here. |
| 11:05 | Next let us learn how to import CaptaincyData.txt. |
| 11:12 | Click on the script myDataSet.R |
| 11:16 | In the Source window, type the following command |
| 11:21 | and press Enter. |
| 11:24 | Next, type View and txtdata in parentheses. |
| 11:30 | Save the script and run these two lines. |
| 11:35 | The contents of the txt file are shown. |
| 11:40 | Now, we will learn how to import data from user interface of Rstudio. |
| 11:47 | I am resizing the Source window. |
| 11:51 | We will import the Excel file CaptaincyData.xlsx using this method. |
| 12:00 | Please ensure that you have packages like readxl and Rcpp installed in your system. |
| 12:08 | In the top right corner of RStudio, click on the Environment tab. |
| 12:15 | In the Environment tab, click on Import Dataset. |
| 12:20 | From the drop-down menu, select From Excel. |
| 12:25 | A window named Import Excel Data appears. |
| 12:30 | You can select a file on your computer or type the URL from which you want to load an Excel file. |
| 12:39 | We will select a file on our computer. |
| 12:43 | In the upper right corner of this window, near File/Url text field, click on Browse. |
| 12:52 | I will select the file CaptaincyData.xlsx located in DataMerging folder. |
| 13:01 | This folder is in myProject folder on the Desktop. |
| 13:06 | Click Open to load this file. |
| 13:10 | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
| 13:17 | At the bottom right corner of this window, you can see the code for importing this Excel file. |
| 13:24 | Finally, click on the Import button. |
| 13:28 | The contents of the Excel file are shown here. |
| 13:32 | Let us summarize what we have learnt. |
| 13:36 | In this tutorial, we have learnt how to: |
| 13:40 | Use built-in functions for exploring a data frame |
| 13:45 | Merge two data frames |
| 13:48 | Import data in different formats in R |
| 13:53 | We now suggest an assignment. |
| 13:57 | Using built-in dataset iris, implement all the functions we have learnt in this tutorial. |
| 14:04 | The video at the following link summarises the Spoken Tutorial project. |
| 14:09 | Please download and watch it. |
| 14:12 | We conduct workshops using Spoken Tutorials and give certificates. |
| 14:18 | Please contact us. |
| 14:21 | Please post your timed queries in this forum. |
| 14:25 | Please post your general queries in this forum. |
| 14:29 | The FOSSEE team coordinates the TBC project. For more details, please visit these sites. |
| 14:37 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
| 14:44 | The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). |
| 14:52 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |