Difference between revisions of "R/C2/Merging-and-Importing-Data/English"
Nancyvarkey (Talk | contribs) m (Nancyvarkey moved page R/C2/Data-Merging-and-Data-Import/English to R/C2/Merging-and-Importing-Data/English without leaving a redirect) |
|
(No difference)
|
Latest revision as of 18:05, 2 April 2019
Title of script: Merging and Importing Data
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data merge, data import, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Merging and Importing Data |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on,
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use,
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight data frames and myDataSet.R in the folder myProject |
I have downloaded these files from Code files link.
And moved them to DataMerging folder in myProject folder on the Desktop. I have also set this folder as my Working Directory. |
Let us switch to RStudio. | |
Click myDataSet.R in RStudio
Point to myDataSet.R in Rstudio |
Open the script myDataSet.R in RStudio.
For this, click on the script myDataSet.R. Script myDataSet.R opens in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight captaincyOne in the Source window | captaincyOne appears in the Source window. |
[RStudio]
Highlight captaincyOne in the Source window |
We will use some built-in functions of R to explore captaincyOne.
For all the built-in functions used in this tutorial, please refer to the Additional Material. |
Cursor on the interface. | First, we will use summary function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
summary(captaincyOne) Highlight Source button |
In the Source window, type summary and then captaincyOne in parentheses.
Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight the output in the Console window | In the Console window, scroll up to locate the output.
Statistical parameters for each column of captaincyOne are shown on the Console. |
Highlight summary(captaincyOne) in the Source window | In the Source window, press Enter.
Press Enter at the end of every command. |
Now, let us look at class function. | |
[RStudio]
class(captaincyOne) |
In the Source window, type class and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | class function returns the class of captaincyOne, which is data frame. |
Point to Source window. | Next let us look at typeof function. |
[RStudio]
typeof(captaincyOne) |
In the Source window, type typeof and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | typeof function returns the storage type of captaincyOne, which is list. |
Highlight typeof in the Source window | To know more about typeof function, we will access the help section of RStudio. |
[RStudio]
help(typeof) |
In the Console window, type help, within parentheses typeof. Press Enter. |
Highlight Description in the help window | typeof determines the R internal type or storage mode of any object. |
Highlight Files tab in the lower right of RStudio | Click on the Files tab. |
Highlight broom icon in the Console window | Clear the Console window by clicking on the broom icon. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight captaincyOne in Source window | Now, let us extract two rows from top of captaincyOne.
For this, we will use head function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
head(captaincyOne, 2) |
In the Source window, type head within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The top two rows of captaincyOne are shown on the Console window. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight CaptaincyOne in the Source window | Suppose we want to extract two rows from bottom of captaincyOne.
For this, we will use the tail function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
tail(captaincyOne, 2) |
In the Source window, type tail within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The last two rows of captaincyOne are shown on the Console window. |
Cursor on the interface. | Next, let us learn about str function.
This function is used to display the structure of an R object. |
[RStudio]
str(captaincyOne) |
In the Source window, type str within parentheses captaincyOne.
Save the script and run the current line. |
Highlight the output in the Console window | The structural details of captaincyOne are shown on the Console. |
Now, we will look at merging of data frames. | |
Show slide
Merging data frames |
Merging data frames has advantages like:
|
Let us switch to RStudio. | |
[RStudio]
Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab |
We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
[RStudio]
captaincyTwo <- read.csv("CaptaincyData2.csv") |
We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
In the Source window, type the following command and press Enter. |
[RStudio]
View(captaincyTwo) |
Now, type View within parentheses captaincyTwo.
Save the script and run the last two lines. |
Highlight captaincyTwo in Source window | The contents of captaincyTwo appear in the Source window. |
Highlight the name of captains in captaincyTwo
Highlight the column drawn in captaincyTwo |
This data frame has the same captains as that in captaincyOne.
However, it has different information about them like the number of matches drawn. |
Highlight captaincyOne in the Source window | Now, we will update captaincyOne by adding information from captaincyTwo.
For this, we use merge function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Drag the Source window. | I am resizing the Source window. |
[RStudio]
captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names") |
In the Source window, type the following command. Press Enter. |
Highlight by = "names" in the Source window | In the merge function, we use column names by which we want to merge two data frames.
Here, it is names. |
[RStudio]
View(captaincyOne) |
Now, type View and captaincyOne in parentheses.
Save the script and run these two lines. |
Highlight captaincyOne in the Source window | The contents of the updated captaincyOne appear in the Source window. |
[RStudio]
Highlight the tabs captaincyOne and captaincyTwo |
Close the two tabs captaincyOne and captaincyTwo. |
Cursor on the interface. | Now, we will learn how to import data of different formats in R. |
[RStudio]
# Importing data in different formats |
We shall add one comment first.
In the Source window, type # hash space Importing data in different formats. |
Highlight CaptaincyData.xml under Files tab | Now, let us import CaptaincyData.xml file.
For that, we need to install XML package. Make sure that you are connected to Internet. |
We need to install Ubuntu package libxml2-dev before installing XML package.
Information on how to install this package, is provided in the Additional Material. | |
[RStudio]
Click in the Console window |
I have already installed libxml2-dev package.
Hence, I will proceed for installing XML package now. |
[RStudio]
install.packages("XML") Highlight the red dot in the Console window |
On the Console window, type install dot packages.
Now, type XMLinside double quotes and in parentheses. Press Enter. We will wait until R installs the package. |
Then, we load this package using library function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Click at the top of the script myDataSet.R | Since we are loading a package, we will add it at the top of the script. |
[RStudio]
library(XML) |
In the Source window, scroll up.
Now, at the top of the script myDataSet.R, type library and XML in parentheses. Save the script and run this line. |
[RStudio]
Point to the comment. xmldata <- xmlToDataFrame("CaptaincyData.xml") |
Now, in the Source window, click on the next line after the comment Importing data in different formats.
Type the following command and press Enter. |
[RStudio]
View(xmldata ) |
Then type View and xmldata in parentheses.
Save the script and run these two lines. |
Highlight xmldata in the Source window | The contents of the xml file are shown here. |
Highlight CaptaincyData.txt under Files tab | Next let us learn how to import CaptaincyData.txt. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
txtdata <- read.table(“CaptaincyData.txt”) |
In the Source window, type the following command and press Enter. |
[RStudio] View(txtdata) |
Next, type View and txtdata in parentheses.
Save the script and run these two lines. |
Highlight txtdata in the Source window | The contents of the txt file are shown. |
Highlight CaptaincyData.xlsx under the Files tab | Now, we will learn how to import data from user interface of Rstudio.
I am resizing the Source window. We will import the Excel file CaptaincyData.xlsx using this method. Please ensure that you have packages like readxl and Rcpp installed in your system. |
Highlight Environment tab | In the top right corner of RStudio, click on the Environment tab. |
Highlight Import Dataset button
Highlight From Excel option |
In the Environment tab, click on Import Dataset.
From the drop-down menu, select From Excel. |
Highlight Import Excel Data window | A window named Import Excel Data appears. |
Highlight File/Url option | You can select a file on your computer or type the URL from which you want to load an Excel file.
We will select a file on our computer. |
Highlight Browse option | In the upper right corner of this window, near File/Url text field, click on Browse. |
Highlight CaptaincyData.xlsx in the folder myProject | I will select the file CaptaincyData.xlsx located in DataMerging folder.
This folder is in myProject folder on the Desktop. Click Open to load this file. |
Highlight Data Preview option | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
Highlight Code Preview option | At the bottom right corner of this window, you can see the code for importing this Excel file. |
Highlight Import button | Finally, click on the Import button. |
Highlight CaptaincyData in the Source window | The contents of the Excel file are shown here. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |