R/C2/Merging-and-Importing-Data/English-timed

From Script | Spoken-Tutorial
Jump to: navigation, search
Time Narration
00:01 Welcome to the spoken tutorial on Merging and Importing Data
00:07 In this tutorial, we will learn how to:
00:11 Use built-in functions for exploring a data frame
00:16 Merge two data frames.
00:19 Import data in different formats in R
00:24 To understand this tutorial, you should know
00:28 Data frames in R
00:31 R script in RStudio
00:34 How to set working directory in RStudio
00:39 If not, please locate the relevant tutorials on R on this website.
00:46 This tutorial is recorded on,
00:50 Ubuntu Linux OS version 16.04
00:55 R version 3.4.4
01:00 RStudio version 1.1.456
01:06 Install R version 3.2.0 or higher.
01:12 For this tutorial, we will use,
01:15 five data frames in different formats and
01:20 a script file myDataSet.R.
01:24 Please download these files from the Code files link of this tutorial.
01:30 I have downloaded these files from Code files link and moved them to DataMerging folder in myProject folder on the Desktop.
01:42 I have also set this folder as my Working Directory.
01:48 Let us switch to RStudio.
01:51 Open the script myDataSet.R in RStudio.
01:57 For this, click on the script myDataSet.R.
02:03 Script myDataSet.R opens in Rstudio.
02:09 Run this script by clicking on Source button.
02:14 captaincyOne appears in the Source window.
02:18 We will use some built-in functions of R to explore captaincyOne.
02:24 For all the built-in functions used in this tutorial, please refer to the Additional Material.
02:31 First, we will use summary function.
02:35 Click on the script myDataSet.R
02:39 In the Source window, type summary and then captaincyOne in parentheses.
02:46 Save the script and run the current line by pressing Ctrl+Enter keys simultaneously.
02:55 In the Console window, scroll up to locate the output.
03:02 Statistical parameters for each column of captaincyOne are shown on the Console.
03:09 In the Source window, press Enter.
03:13 Press Enter at the end of every command.
03:17 Now, let us look at class function.
03:21 In the Source window, type class and then captaincyOne in parentheses.
03:29 Save the script and run the current line.
03:34 class function returns the class of captaincyOne, which is data frame.
03:41 Next let us look at typeof function.
03:45 In the Source window, type typeof and then captaincyOne in parentheses.
03:53 Save the script and run the current line.
03:58 typeof function returns the storage type of captaincyOne, which is list.
04:05 To know more about typeof function, we will access the help section of RStudio.
04:12 In the Console window, type help, within parentheses typeof. Press Enter.
04:20 typeof determines the R internal type or storage mode of any object.
04:27 Click on the Files tab.
04:30 Clear the Console window by clicking on the broom icon.
04:35 Click on the data frame captaincyOne.
04:39 Now, let us extract two rows from top of captaincyOne.
04:45 For this, we will use head function.
04:49 Click on the script myDataSet.R
04:53 In the Source window, type head within parentheses captaincyOne comma space 2.
05:02 Save the script and run the current line.
05:07 The top two rows of captaincyOne are shown on the Console window.
05:13 Click on the data frame captaincyOne.
05:17 Suppose we want to extract two rows from bottom of captaincyOne.
05:24 For this, we will use the tail function.
05:28 Click on the script myDataSet.R
05:32 In the Source window, type tail within parentheses captaincyOne comma space 2.
05:41 Save the script and run the current line.
05:46 The last two rows of captaincyOne are shown on the Console window.
05:52 Next, let us learn about str function.
05:57 This function is used to display the structure of an R object.
06:03 In the Source window, type str within parentheses captaincyOne.
06:10 Save the script and run the current line.
06:15 The structural details of captaincyOne are shown on the Console.
06:21 Now, we will look at merging of data frames.
06:26 Merging data frames has advantages like:
06:30 It makes data more available.
06:33 It helps in improving data quality.
06:37 Combining similar data also reduces data complexity.
06:42 Let us switch to RStudio.
06:45 We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv.
06:55 We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
07:03 In the Source window, type the following command and press Enter.
07:11 Now, type View within parentheses captaincyTwo.
07:17 Save the script and run the last two lines.
07:23 The contents of captaincyTwo appear in the Source window.
07:28 This data frame has the same captains as that in captaincyOne.
07:34 However, it has different information about them like the number of matches drawn.
07:40 Now, we will update captaincyOne by adding information from captaincyTwo.
07:48 For this, we use merge function.
07:52 Click on the script myDataSet.R
07:56 I am resizing the Source window.
07:59 In the Source window, type the following command.
08:04 Press Enter.
08:06 In the merge function, we use column names by which we want to merge two data frames.
08:13 Here, it is names.
08:16 Now, type View and captaincyOne in parentheses.
08:22 Save the script and run these two lines.
08:28 The contents of the updated captaincyOne appear in the Source window.
08:35 Close the two tabs captaincyOne and captaincyTwo.
08:41 Now, we will learn how to import data of different formats in R.
08:47 We shall add one comment first.
08:50 In the Source window, type # hash space Importing data in different formats.
08:58 Now, let us import CaptaincyData.xml file.
09:04 For that, we need to install XML package.
09:09 Make sure that you are connected to Internet.
09:13 We need to install Ubuntu package libxml2-dev before installing XML package.
09:23 Information on how to install this package, is provided in the Additional Material.
09:29 I have already installed libxml2-dev package.
09:35 Hence, I will proceed for installing XML package now.
09:41 On the Console window, type install dot packages.
09:47 Now, type XMLinside double quotes and in parentheses.
09:53 Press Enter. We will wait until R installs the package.
10:00 Then, we load this package using library function.
10:05 Click on the script myDataSet.R
10:09 Since we are loading a package, we will add it at the top of the script.
10:15 In the Source window, scroll up.
10:18 Now, at the top of the script myDataSet.R, type library and XML in parentheses.
10:29 Save the script and run this line.
10:34 Now, in the Source window, click on the next line after the comment Importing data in different formats.
10:43 Type the following command and press Enter.
10:49 Then type View and xmldata in parentheses.
10:56 Save the script and run these two lines.
11:00 The contents of the xml file are shown here.
11:05 Next let us learn how to import CaptaincyData.txt.
11:12 Click on the script myDataSet.R
11:16 In the Source window, type the following command
11:21 and press Enter.
11:24 Next, type View and txtdata in parentheses.
11:30 Save the script and run these two lines.
11:35 The contents of the txt file are shown.
11:40 Now, we will learn how to import data from user interface of Rstudio.
11:47 I am resizing the Source window.
11:51 We will import the Excel file CaptaincyData.xlsx using this method.
12:00 Please ensure that you have packages like readxl and Rcpp installed in your system.
12:08 In the top right corner of RStudio, click on the Environment tab.
12:15 In the Environment tab, click on Import Dataset.
12:20 From the drop-down menu, select From Excel.
12:25 A window named Import Excel Data appears.
12:30 You can select a file on your computer or type the URL from which you want to load an Excel file.
12:39 We will select a file on our computer.
12:43 In the upper right corner of this window, near File/Url text field, click on Browse.
12:52 I will select the file CaptaincyData.xlsx located in DataMerging folder.
13:01 This folder is in myProject folder on the Desktop.
13:06 Click Open to load this file.
13:10 Below the field File/Url, RStudio shows the preview of the Excel file being imported.
13:17 At the bottom right corner of this window, you can see the code for importing this Excel file.
13:24 Finally, click on the Import button.
13:28 The contents of the Excel file are shown here.
13:32 Let us summarize what we have learnt.
13:36 In this tutorial, we have learnt how to:
13:40 Use built-in functions for exploring a data frame
13:45 Merge two data frames
13:48 Import data in different formats in R
13:53 We now suggest an assignment.
13:57 Using built-in dataset iris, implement all the functions we have learnt in this tutorial.
14:04 The video at the following link summarises the Spoken Tutorial project.
14:09 Please download and watch it.
14:12 We conduct workshops using Spoken Tutorials and give certificates.
14:18 Please contact us.
14:21 Please post your timed queries in this forum.
14:25 Please post your general queries in this forum.
14:29 The FOSSEE team coordinates the TBC project. For more details, please visit these sites.
14:37 The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
14:44 The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
14:52 This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh