R/C2/Merging-and-Importing-Data/English
Title of script: Merging and Importing Data
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data merge, data import, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Merging and Importing Data |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use,
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight data frames and myDataSet.R in the folder myProject |
I have downloaded these files from Code files link.
And moved them to DataMerging folder in myProject folder on the Desktop. I have also set this folder as my Working Directory. |
Let us switch to RStudio. | |
Click myDataSet.R in RStudio
Point to myDataSet.R in Rstudio |
Open the script myDataSet.R in RStudio.
For this, click on the script myDataSet.R. Script myDataSet.R opens in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight captaincyOne in the Source window | captaincyOne appears in the Source window. |
[RStudio]
Highlight captaincyOne in the Source window |
We will use some built-in functions of R to explore captaincyOne.
For all the built-in functions used in this tutorial, please refer to the Additional Material. |
First, we will use summary function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
summary(captaincyOne) Highlight Source button |
In the Source window, type summary and then captaincyOne in parentheses.
Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight the output in the Console window | In the Console window, scroll up to locate the output.
Statistical parameters for each column of captaincyOne are shown on the Console. |
Highlight summary(captaincyOne) in the Source window | In the Source window, press Enter.
Press Enter at the end of every command. |
Now, let us look at class function. | |
[RStudio]
class(captaincyOne) |
In the Source window, type class and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | class function returns the class of captaincyOne, which is data frame. |
Next let us look at typeof function. | |
[RStudio]
typeof(captaincyOne) |
In the Source window, type typeof and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | typeof function returns the storage type of captaincyOne, which is list. |
Highlight typeof in the Source window | To know more about typeof function, we will access the help section of RStudio. |
[RStudio]
help(typeof) |
In the Console window, type help, within parentheses typeof. Press Enter. |
Highlight Description in the help window | typeof determines the R internal type or storage mode of any object. |
Highlight Files tab in the lower right of RStudio | Click on the Files tab. |
Highlight broom icon in the Console window | Clear the Console window by clicking on the broom icon. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight captaincyOne in Source window | Now, let us extract two rows from top of captaincyOne.
For this, we will use head function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
head(captaincyOne, 2) |
In the Source window, type head within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The top two rows of captaincyOne are shown on the Console window. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight CaptaincyOne in the Source window | Suppose we want to extract two rows from bottom of captaincyOne.
For this, we will use the tail function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
tail(captaincyOne, 2) |
In the Source window, type tail within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The last two rows of captaincyOne are shown on the Console window. |
Next, let us learn about str function.
This function is used to display the structure of an R object. | |
[RStudio]
str(captaincyOne) |
In the Source window, type str within parentheses captaincyOne.
Save the script and run the current line. |
Highlight the output in the Console window | The structural details of captaincyOne are shown on the Console. |
Now, we will look at merging of data frames. | |
Show slide
Merging data frames |
Merging data frames has advantages like:
|
Let us switch to RStudio. | |
[RStudio]
Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab |
We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
[RStudio]
captaincyTwo <- read.csv("CaptaincyData2.csv") |
We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
In the Source window, type the following command and press Enter. |
[RStudio]
View(captaincyTwo) |
Now, type View within parentheses captaincyTwo.
Save the script and run the last two lines. |
Highlight captaincyTwo in Source window | The contents of captaincyTwo appear in the Source window. |
Highlight the name of captains in captaincyTwo
Highlight the column drawn in captaincyTwo |
This data frame has the same captains as that in captaincyOne.
However, it has different information about them like the number of matches drawn. |
Highlight captaincyOne in the Source window | Now, we will update captaincyOne by adding information from captaincyTwo.
For this, we use merge function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
I am resizing the Source window. | |
[RStudio]
captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names") |
In the Source window, type the following command. Press Enter. |
Highlight by = "names" in the Source window | In the merge function, we use column names by which we want to merge two data frames.
Here, it is names. |
[RStudio]
View(captaincyOne) |
Now, type View and captaincyOne in parentheses.
Save the script and run these two lines. |
Highlight captaincyOne in the Source window | The contents of the updated captaincyOne appear in the Source window. |
[RStudio]
Highlight the tabs captaincyOne and captaincyTwo |
Close the two tabs captaincyOne and captaincyTwo. |
Cursor on the interface. | Now, we will learn how to import data of different formats in R. |
[RStudio]
# Importing data in different formats |
We shall add one comment first.
In the Source window, type # hash space Importing data in different formats. |
Highlight CaptaincyData.xml under Files tab | Now, let us import CaptaincyData.xml file.
For that, we need to install XML package. Make sure that you are connected to Internet. |
We need to install Ubuntu package libxml2-dev before installing XML package.
Information on how to install this package, is provided in the Additional Material. | |
[RStudio]
Click in the Console window |
I have already installed libxml2-dev package.
Hence, I will proceed for installing XML package now. |
[RStudio]
install.packages("XML") Highlight the red dot in the Console window |
On the Console window, type install dot packages.
Now, type XML inside double quotes and in parentheses. Press Enter. We will wait until R installs the package. |
Then, we load this package using library function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Click at the top of the script myDataSet.R | Since we are loading a package, we will add it at the top of the script. |
[RStudio]
library(XML) |
In the Source window, at the top of the script myDataSet.R, type library and XML in parentheses.
Save the script and run this line. |
[RStudio]
Point to the comment. xmldata <- xmlToDataFrame("CaptaincyData.xml") |
Now, in the Source window, click on the next line after the comment Importing data in different formats.
Type the following command and press Enter. |
[RStudio]
View(xmldata ) |
Then type View and xmldata in parentheses.
Save the script and run these two lines. |
Highlight xmldata in the Source window | The contents of the xml file are shown here. |
Highlight CaptaincyData.txt under Files tab | Next let us learn how to import CaptaincyData.txt. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
txtdata <- read.table(“CaptaincyData.txt”) |
In the Source window, type the following command and press Enter. |
[RStudio] View(txtdata) |
Next, type View and txtdata in parentheses.
Save the script and run these two lines. |
Highlight txtdata in the Source window | The contents of the text file are shown. |
Highlight CaptaincyData.xlsx under the Files tab | Now, we will learn how to import data from the user interface of Rstudio.
We will import the Excel file CaptaincyData.xlsx using this method. Please ensure that you have packages like readxl and Rcpp installed in your system. |
Highlight Environment tab | In the top right corner of RStudio, click on the Environment tab. |
Highlight Import Dataset button
Highlight From Excel option |
In the Environment tab, click on Import Dataset.
From the drop-down menu, select From Excel. |
Highlight Import Excel Data window | A window named Import Excel Data appears. |
Highlight File/Url option | You can select a file on your computer or type the url from which you want to load an excel file.
We will select a file on our computer. |
Highlight Browse option | In the upper right corner of this window, near File/Url text field, click on Browse. |
Highlight CaptaincyData.xlsx in the folder myProject | I will select the file CaptaincyData.xlsx located in DataMerging folder.
This folder is in myProject folder on the Desktop. Click Open to load this file. |
Highlight Data Preview option | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
Highlight Code Preview option | At the bottom right corner of this window, you can see the code for importing this excel file. |
Highlight Import button | Finally, click on the Import button. |
Highlight CaptaincyData in the Source window | The contents of the Excel file are shown here. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Pls post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Pls post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |