Difference between revisions of "R/C2/Merging-and-Importing-Data/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
m (Nancyvarkey moved page R/C2/Data-Merging-and-Data-Import/English to R/C2/Merging-and-Importing-Data/English without leaving a redirect)
 
(19 intermediate revisions by 3 users not shown)
Line 12: Line 12:
  
 
Opening slide
 
Opening slide
|| Welcome to the spoken tutorial on '''Merging''' '''and Importing Data'''
+
|| Welcome to the spoken tutorial on '''Merging and Importing Data'''
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
  
Learning Objectives
+
'''Learning Objectives'''
 
|| In this tutorial, we will learn how to:  
 
|| In this tutorial, we will learn how to:  
* Use built-in functions for exploring a '''data frame'''
+
* Use '''built-in functions''' for exploring a '''data frame'''
* Merge two '''data frames'''
+
* '''Merge''' two '''data frames'''
* Import data in different formats in '''R'''
+
* Import '''data''' in different formats in '''R'''
 
+
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
  
Pre-requisites
+
'''Pre-requisites'''
  
 
http://spoken-tutorial.org
 
http://spoken-tutorial.org
 
|| To understand this tutorial, you should know
 
|| To understand this tutorial, you should know
 
* '''Data frames''' in '''R'''
 
* '''Data frames''' in '''R'''
* '''R script '''in '''RStudio '''
+
* '''R script''' in '''RStudio '''
* How to set working directory in RStudio
+
* How to set '''working directory''' in '''RStudio'''
  
 
If not, please locate the relevant tutorials on '''R''' on this website.
 
If not, please locate the relevant tutorials on '''R''' on this website.
Line 37: Line 36:
 
|| Show slide  
 
|| Show slide  
  
System Specifications
+
'''System Specifications'''
|| This tutorial is recorded on
+
|| This tutorial is recorded on,
* '''Ubuntu Linux '''OS version''' 16.04'''
+
* '''Ubuntu Linux''' OS version 16.04
* '''R '''version''' 3.4.4'''
+
* '''R''' version 3.4.4
* '''RStudio''' version''' 1.1.456'''
+
* '''RStudio''' version 1.1.456
  
Install '''R''' version '''3.2.0''' or higher.  
+
Install '''R''' version 3.2.0 or higher.  
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
  
Download files  
+
'''Download files'''
 
|| For this tutorial, we will use,
 
|| For this tutorial, we will use,
 
* five '''data frames ''' in different formats and  
 
* five '''data frames ''' in different formats and  
Line 56: Line 55:
 
|| [Computer screen]
 
|| [Computer screen]
  
Highlight '''data frames '''and''' myDataSet.R '''in the folder '''myProject'''
+
Highlight '''data frames''' and '''myDataSet.R''' in the folder '''myProject'''
 
|| I have downloaded these files from '''Code files''' link.
 
|| I have downloaded these files from '''Code files''' link.
  
 
And moved them to '''DataMerging '''folder in '''myProject''' folder on the '''Desktop'''.  
 
And moved them to '''DataMerging '''folder in '''myProject''' folder on the '''Desktop'''.  
  
I have also set this folder as my '''Working Directory.'''  
+
I have also set this folder as my '''Working Directory'''.
 
|-  
 
|-  
 
||  
 
||  
Line 69: Line 68:
  
 
Point to''' myDataSet.R''' in '''Rstudio'''
 
Point to''' myDataSet.R''' in '''Rstudio'''
|| Open the '''script myDataSet.R''' in '''RStudio.'''
+
|| Open the '''script myDataSet.R''' in '''RStudio'''.
  
 
For this, click on the '''script myDataSet.R'''.  
 
For this, click on the '''script myDataSet.R'''.  
  
'''Script myDataSet.R '''opens in''' RStudio.'''
+
'''Script myDataSet.R''' opens in '''RStudio.'''
 
|-  
 
|-  
 
|| Highlight the '''Source''' button
 
|| Highlight the '''Source''' button
Line 79: Line 78:
 
|-  
 
|-  
 
|| Highlight '''captaincyOne''' in the '''Source''' window  
 
|| Highlight '''captaincyOne''' in the '''Source''' window  
|| '''captaincyOne '''opens in the '''Source''' window.  
+
|| '''captaincyOne '''appears in the '''Source''' window.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
Highlight '''captaincyOne''' in the '''Source''' window  
 
Highlight '''captaincyOne''' in the '''Source''' window  
|| We will use some built-in functions of '''R''' to explore''' captaincyOne'''.  
+
|| We will use some '''built-in functions''' of '''R''' to explore''' captaincyOne'''.  
  
For all the built-in functions used in this tutorial, please refer to the '''Additional Material'''.
+
For all the '''built-in functions''' used in this tutorial, please refer to the '''Additional Material'''.
 
|-  
 
|-  
||  
+
|| Cursor on the interface.
|| First, we will use '''summary '''function.  
+
|| First, we will use '''summary''' function.  
 
|-  
 
|-  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
Line 99: Line 98:
  
 
Highlight '''Source''' button
 
Highlight '''Source''' button
|| In the '''Source '''window, type '''summary '''and then '''captaincyOne '''in parentheses.  
+
|| In the '''Source''' window, type '''summary''' and then '''captaincyOne '''in parentheses.  
  
Save the '''script '''and run the current line by pressing '''Ctrl+Enter''' keys simultaneously.  
+
Save the '''script''' and run the current line by pressing '''Ctrl+Enter''' keys simultaneously.  
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| Statistical parameters for each column of '''captaincyOne '''are shown on the '''Console'''.
+
|| In the '''Console''' window, scroll up to locate the output.
 +
 
 +
'''Statistical parameters''' for each column of '''captaincyOne''' are shown on the '''Console'''.
 +
 
 
|-  
 
|-  
 
|| Highlight '''summary(captaincyOne) '''in the '''Source''' window  
 
|| Highlight '''summary(captaincyOne) '''in the '''Source''' window  
 
|| In the '''Source''' window, press '''Enter'''.  
 
|| In the '''Source''' window, press '''Enter'''.  
  
Press '''Enter '''at the end of every command.  
+
Press '''Enter''' at the end of every command.  
 
|-  
 
|-  
 
||  
 
||  
|| Now, let us look at '''class '''function.  
+
|| Now, let us look at '''class function'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''class(captaincyOne)'''
 
'''class(captaincyOne)'''
|| In the '''Source '''window, type '''class '''and then '''captaincyOne '''in parentheses.
+
|| In the '''Source''' window, type '''class''' and then '''captaincyOne''' in parentheses.
  
 
Save the '''script '''and run the current line.  
 
Save the '''script '''and run the current line.  
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| '''class '''function returns the class of '''captaincyOne, '''which is '''data frame'''.  
+
|| '''class function''' returns the class of '''captaincyOne''', which is '''data frame'''.  
 
|-  
 
|-  
||  
+
|| Point to '''Source''' window.
|| Next let us look at '''typeof '''function.
+
|| Next let us look at '''typeof function'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''typeof(captaincyOne)'''
 
'''typeof(captaincyOne)'''
|| In the '''Source '''window, type '''typeof '''and then '''captaincyOne '''in parentheses.
+
|| In the '''Source '''window, type '''typeof''' and then '''captaincyOne''' in parentheses.
  
Save the '''script '''and run the current line.  
+
Save the '''script''' and run the current line.  
 
|-  
 
|-  
|| Highlight the output in the '''Console '''window
+
|| Highlight the output in the '''Console''' window
|| '''typeof '''function returns the storage type of '''captaincyOne''', which is '''list'''.  
+
|| '''typeof function''' returns the storage type of '''captaincyOne''', which is '''list'''.  
  
We will learn more about '''list '''later in this series.
 
 
|-  
 
|-  
 
|| Highlight '''typeof''' in the '''Source''' window  
 
|| Highlight '''typeof''' in the '''Source''' window  
|| To know more about '''typeof''' function, we will access the '''help''' section of '''RStudio'''.  
+
|| To know more about '''typeof function''', we will access the '''help''' section of '''RStudio'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 148: Line 149:
 
|-  
 
|-  
 
|| Highlight '''Description''' in the '''help''' window  
 
|| Highlight '''Description''' in the '''help''' window  
|| '''typeof''' determines the R internal type or storage mode of any object.  
+
|| '''typeof''' determines the '''R internal type''' or '''storage mode''' of any '''object'''.  
 
|-  
 
|-  
 
|| Highlight '''Files''' tab in the lower right of '''RStudio'''
 
|| Highlight '''Files''' tab in the lower right of '''RStudio'''
|| Click on '''Files''' tab.  
+
|| Click on the '''Files''' tab.  
 
|-  
 
|-  
 
|| Highlight broom icon in the '''Console''' window  
 
|| Highlight broom icon in the '''Console''' window  
|| Clear the '''Console '''window by clicking on the '''broom''' icon.  
+
|| Clear the '''Console''' window by clicking on the broom icon.  
 
|-  
 
|-  
 
|| Highlight '''captaincyOne''' in the '''Source''' window  
 
|| Highlight '''captaincyOne''' in the '''Source''' window  
|| Click on the data frame '''captaincyOne'''.  
+
|| Click on the '''data frame captaincyOne'''.  
 
|-  
 
|-  
 
|| Highlight '''captaincyOne '''in '''Source '''window
 
|| Highlight '''captaincyOne '''in '''Source '''window
|| Now let’s extract two rows from top of '''captaincyOne'''.  
+
|| Now, let us extract two rows from top of '''captaincyOne'''.  
  
For this, we will use '''head '''function.  
+
For this, we will use '''head function'''.  
 
|-  
 
|-  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
|| Click on the script '''myDataSet.R'''
+
|| Click on the '''script myDataSet.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''head(captaincyOne, 2)'''
 
'''head(captaincyOne, 2)'''
|| In the '''Source '''window,  
+
|| In the '''Source''' window, type '''head''' within parentheses '''captaincyOne comma '''space 2.
  
type '''head '''within parentheses '''captaincyOne comma '''space 2.
+
Save the '''script''' and run the current line.  
 
+
Save the '''script '''and run the current line.  
+
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| The top two rows of '''captaincyOne '''are shown on the '''Console''' window.
+
|| The top two rows of '''captaincyOne''' are shown on the '''Console''' window.
 
|-  
 
|-  
 
|| Highlight '''captaincyOne''' in the '''Source''' window  
 
|| Highlight '''captaincyOne''' in the '''Source''' window  
|| Click on the data frame '''captaincyOne'''.  
+
|| Click on the '''data frame captaincyOne'''.  
 
|-  
 
|-  
 
|| Highlight '''CaptaincyOne''' in the '''Source''' window  
 
|| Highlight '''CaptaincyOne''' in the '''Source''' window  
 
|| Suppose we want to extract two rows from bottom of '''captaincyOne'''.  
 
|| Suppose we want to extract two rows from bottom of '''captaincyOne'''.  
  
For this, we will use the '''tail '''function.  
+
For this, we will use the '''tail function'''.  
 
|-  
 
|-  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
|| Click on the script '''myDataSet.R'''
+
|| Click on the '''script myDataSet.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''tail(captaincyOne, 2)'''
 
'''tail(captaincyOne, 2)'''
|| In the '''Source '''window, type '''tail '''within parentheses '''captaincyOne comma '''space 2.  
+
|| In the '''Source''' window, type '''tail '''within parentheses '''captaincyOne comma '''space 2.  
  
 
Save the '''script '''and run the current line.  
 
Save the '''script '''and run the current line.  
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| The last two rows of '''captaincyOne '''are shown on the '''Console''' window.
+
|| The last two rows of '''captaincyOne''' are shown on the '''Console''' window.
 
|-  
 
|-  
||  
+
|| Cursor on the interface.
|| Next, let us learn about '''str '''function.  
+
|| Next, let us learn about '''str function'''.  
  
This function is used to display the structure of an '''R''' object.  
+
This '''function''' is used to display the structure of an '''R object'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''str(captaincyOne)'''
 
'''str(captaincyOne)'''
|| In the '''Source '''window, type '''str '''within parentheses '''captaincyOne'''.  
+
|| In the '''Source''' window, type '''str '''within parentheses '''captaincyOne'''.  
  
Save the '''script '''and run the current line.  
+
Save the '''script''' and run the current line.  
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
Line 216: Line 215:
 
|-  
 
|-  
 
||  
 
||  
|| Now, we will look at merging of '''data frames'''.  
+
|| Now, we will look at '''merging''' of '''data frames'''.  
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
  
Merging data frames  
+
Merging '''data frames'''
|| Merging '''data frames '''has advantages like:  
+
|| '''Merging data frames '''has advantages like:  
* It makes data more available.
+
* It makes '''data''' more available.
* It helps in improving data quality.
+
* It helps in improving '''data''' quality.
* Combining similar data also reduces data complexity.  
+
* Combining similar '''data''' also reduces data complexity.
 +
 
 +
|-
 +
||
 +
|| Let us switch to '''RStudio'''.
  
 
|-  
 
|-  
Line 230: Line 233:
  
 
Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' '''
 
Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' '''
|| We will learn how to merge two '''data frames CaptaincyData.csv '''and '''CaptaincyData2.csv'''.
+
|| We will learn how to '''merge''' two '''data frames CaptaincyData.csv''' and '''CaptaincyData2.csv'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 242: Line 245:
  
 
'''View(captaincyTwo)'''
 
'''View(captaincyTwo)'''
|| In the '''Source '''window, type '''View '''within parentheses '''captaincyTwo.'''
+
|| Now, type '''View '''within parentheses '''captaincyTwo.'''
  
 
Save the '''script '''and run the last two lines.  
 
Save the '''script '''and run the last two lines.  
Line 257: Line 260:
 
|-  
 
|-  
 
|| Highlight '''captaincyOne '''in the '''Source '''window  
 
|| Highlight '''captaincyOne '''in the '''Source '''window  
|| Now, we will update '''captaincyOne '''by adding information from '''captaincyTwo'''.
+
|| Now, we will update '''captaincyOne''' by adding information from '''captaincyTwo'''.
  
For this, we use '''merge '''function.  
+
For this, we use '''merge function'''.  
 
|-  
 
|-  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Click on the script '''myDataSet.R'''
 
|| Click on the script '''myDataSet.R'''
 +
 +
|-
 +
|| Drag the Source window.
 +
|| I am resizing the '''Source''' window.
 +
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names")'''
 
'''captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names")'''
|| In the '''Source''' window, type the following command and press '''Enter'''.  
+
|| In the '''Source''' window, type the following command. Press '''Enter'''.  
 
|-  
 
|-  
 
|| Highlight '''by =''' '''"names" '''in the '''Source''' window  
 
|| Highlight '''by =''' '''"names" '''in the '''Source''' window  
|| In the '''merge''' function, we use column names by which we want to merge two '''data frames'''.  
+
|| In the '''merge function''', we use column names by which we want to merge two '''data frames'''.  
  
 
Here, it is '''names'''.
 
Here, it is '''names'''.
Line 282: Line 290:
 
|-  
 
|-  
 
|| Highlight '''captaincyOne''' in the '''Source '''window  
 
|| Highlight '''captaincyOne''' in the '''Source '''window  
|| The contents of the&nbsp;updated '''captaincyOne''' appear in the '''Source''' window.
+
|| The contents of the updated '''captaincyOne''' appear in the '''Source''' window.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
Highlight the tabs '''captaincyOne '''and '''captaincyTwo'''
 
Highlight the tabs '''captaincyOne '''and '''captaincyTwo'''
|| Close the two tabs '''captaincyOne '''and '''captaincyTwo'''.
+
|| Close the two tabs '''captaincyOne''' and '''captaincyTwo'''.
 
|-  
 
|-  
 
|| Cursor on the interface.
 
|| Cursor on the interface.
|| Now, we will learn how to import '''data''' of different formats in R.
+
|| Now, we will learn how to import '''data''' of different '''formats''' in '''R'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 297: Line 305:
 
|| We shall add one comment first.  
 
|| We shall add one comment first.  
  
In the '''Source '''window, type
+
In the '''Source '''window, type '''#''' hash space '''Importing data in different formats'''.
 
+
'''#''' hash space '''Importing data in different formats'''.
+
 
|-  
 
|-  
 
|| Highlight '''CaptaincyData.xml''' under '''Files''' tab''' '''
 
|| Highlight '''CaptaincyData.xml''' under '''Files''' tab''' '''
Line 308: Line 314:
 
Make sure that you are connected to '''Internet'''.  
 
Make sure that you are connected to '''Internet'''.  
 
|-  
 
|-  
|| this information can shown as a text while editing.
+
||
 +
|| We need to install '''Ubuntu''' package '''libxml2-dev ''' before installing '''XML''' package.
  
Pls mention this editing team.
+
Information on how to install this package, is provided in the '''Additional Material'''.
|| We need to install '''Ubuntu''' package '''libxml2-dev '''
+
  
before installing '''XML''' package.
 
 
Information on how to install this package, is provided in the '''Additional Material'''.
 
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
Click in the '''Console''' window  
 
Click in the '''Console''' window  
|| I have already installed '''libxml2-dev '''package.  
+
|| I have already installed '''libxml2-dev''' package.  
  
 
Hence, I will proceed for installing '''XML''' package now.  
 
Hence, I will proceed for installing '''XML''' package now.  
Line 329: Line 332:
  
 
Highlight the red dot in the '''Console''' window  
 
Highlight the red dot in the '''Console''' window  
|| On the '''Console '''window, type '''install dot packages'''.''' '''
+
|| On the '''Console '''window, type '''install dot packages'''.
  
Now''', '''type''' XML '''inside double quotes and in parentheses.
+
Now, type '''XML'''inside double quotes and in parentheses.
  
 
Press '''Enter'''.  
 
Press '''Enter'''.  
  
We will wait until '''R''' installs the package.  
+
We will wait until '''R''' installs the '''package'''.  
 
|-  
 
|-  
 
||  
 
||  
|| Then, we load this package''' '''using '''library''' function'''. '''
+
|| Then, we load this '''package '''using '''library function'''.  
 
|-  
 
|-  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
|| Click on the script '''myDataSet.R'''
+
|| Click on the '''script myDataSet.R'''
 
|-  
 
|-  
|| Click at the top of the '''script''' '''myDataSet.R'''
+
|| Click at the top of the '''script myDataSet.R'''
|| Since we are loading a package, we will add it at the top of the '''script'''.
+
|| Since we are loading a '''package''', we will add it at the top of the '''script'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''library(XML)'''
 
'''library(XML)'''
|| In the '''Source '''window, at the top of the '''script myDataSet.R''', type '''library '''and '''XML '''in parentheses'''. '''
+
|| In the '''Source '''window, scroll up.
 +
 
 +
Now, at the top of the '''script myDataSet.R''', type '''library '''and '''XML '''in parentheses'''. '''
  
 
Save the '''script '''and run this line.  
 
Save the '''script '''and run this line.  
Line 358: Line 363:
  
 
'''xmldata <- xmlToDataFrame("CaptaincyData.xml")'''
 
'''xmldata <- xmlToDataFrame("CaptaincyData.xml")'''
|| Now, in the '''Source '''window, click on the next line after the comment '''Importing data in different formats'''.
+
|| Now, in the '''Source '''window, click on the next line after the '''comment Importing data in different formats'''.
  
Type the following command and press '''Enter'''.  
+
Type the following '''command''' and press '''Enter'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 376: Line 381:
 
|-  
 
|-  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
 
|| Highlight '''myDataSet.R '''in the '''Source''' window  
|| Click on the script '''myDataSet.R'''
+
|| Click on the '''script myDataSet.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''txtdata <- read.table(“CaptaincyData.txt”)'''
 
'''txtdata <- read.table(“CaptaincyData.txt”)'''
|| In the '''Source''' window, type the following command and press '''Enter'''.  
+
|| In the '''Source''' window, type the following '''command''' and press '''Enter'''.  
 
|-  
 
|-  
 
||  
 
||  
Line 393: Line 398:
 
|-  
 
|-  
 
|| Highlight '''txtdata '''in the '''Source '''window  
 
|| Highlight '''txtdata '''in the '''Source '''window  
|| The contents of the '''text '''file are shown.  
+
|| The contents of the '''txt''' file are shown.  
 
|-  
 
|-  
 
|| Highlight '''CaptaincyData.xlsx''' under the '''Files''' tab
 
|| Highlight '''CaptaincyData.xlsx''' under the '''Files''' tab
|| Now, we will learn how to import '''data''' from the user interface of '''Rstudio'''.
+
|| Now, we will learn how to import '''data''' from '''user interface''' of '''Rstudio'''.
 +
 +
I am resizing the '''Source''' window.  
  
 
We will import the '''Excel''' file '''CaptaincyData.xlsx''' using this method.  
 
We will import the '''Excel''' file '''CaptaincyData.xlsx''' using this method.  
  
Please ensure that you have packages like '''readxl''' and '''Rcpp''' installed in your system.
+
Please ensure that you have '''packages''' like '''readxl''' and '''Rcpp''' installed in your system.
 
|-  
 
|-  
 
|| Highlight '''Environment '''tab
 
|| Highlight '''Environment '''tab
Line 416: Line 423:
 
|-  
 
|-  
 
|| Highlight '''File/Url''' option  
 
|| Highlight '''File/Url''' option  
|| You can select a '''file '''on your computer or type the '''url '''from which you want to load an '''excel '''file.  
+
|| You can select a '''file '''on your computer or type the '''URL '''from which you want to load an '''Excel '''file.  
  
 
We will select a file on our computer.  
 
We will select a file on our computer.  
Line 424: Line 431:
 
|-  
 
|-  
 
|| Highlight '''CaptaincyData.xlsx''' in the folder '''myProject'''
 
|| Highlight '''CaptaincyData.xlsx''' in the folder '''myProject'''
|| I will select the file '''CaptaincyData.xlsx''' located in '''DataMerging '''folder.
+
|| I will select the file '''CaptaincyData.xlsx''' located in '''DataMerging''' folder.
  
 
This folder is in '''myProject''' folder on the '''Desktop'''.  
 
This folder is in '''myProject''' folder on the '''Desktop'''.  
  
Click '''Open '''to load this '''file'''.  
+
Click '''Open''' to load this '''file'''.  
 
|-  
 
|-  
 
|| Highlight '''Data Preview''' option  
 
|| Highlight '''Data Preview''' option  
|| Below the field '''File/Url, RStudio '''shows the preview of the '''Excel '''file being imported.  
+
|| Below the field '''File/Url, RStudio '''shows the preview of the '''Excel''' file being imported.  
 
|-  
 
|-  
 
|| Highlight '''Code Preview''' option  
 
|| Highlight '''Code Preview''' option  
|| At the bottom right corner of this window, you can see the code for importing this '''excel '''file.  
+
|| At the bottom right corner of this window, you can see the code for importing this '''Excel''' file.  
 
|-  
 
|-  
 
|| Highlight '''Import '''button  
 
|| Highlight '''Import '''button  
Line 440: Line 447:
 
|-  
 
|-  
 
|| Highlight '''CaptaincyData '''in the '''Source '''window  
 
|| Highlight '''CaptaincyData '''in the '''Source '''window  
|| The contents of the '''Excel '''file are shown here.  
+
|| The contents of the '''Excel''' file are shown here.  
 
|-  
 
|-  
 
||  
 
||  
Line 449: Line 456:
 
Summary
 
Summary
 
|| In this tutorial, we have learnt how to:
 
|| In this tutorial, we have learnt how to:
* Use built-in functions for exploring a '''data frame'''
+
* Use '''built-in functions''' for exploring a '''data frame'''
* Merge two '''data frames'''
+
* '''Merge''' two '''data frames'''
* Import data in different formats in '''R'''
+
* Import '''data''' in different '''formats''' in '''R'''
  
 
|-  
 
|-  
Line 458: Line 465:
 
Assignment
 
Assignment
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
* Using built-in '''dataset iris''', implement all the functions we have learnt in this tutorial.
+
* Using '''built-in dataset iris''', implement all the '''functions''' we have learnt in this tutorial.
 
|-  
 
|-  
 
|| Show slide
 
|| Show slide
Line 477: Line 484:
  
 
Forum to answer questions
 
Forum to answer questions
|| Pls post your timed queries in this forum.
+
|| Please post your timed queries in this forum.
 
|-  
 
|-  
 
|| Show Slide
 
|| Show Slide
  
 
Forum to answer questions
 
Forum to answer questions
|| Pls post your general queries in this forum.
+
|| Please post your general queries in this forum.
 
|-  
 
|-  
 
|| Show Slide
 
|| Show Slide

Latest revision as of 18:05, 2 April 2019

Title of script: Merging and Importing Data

Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, data merge, data import, video tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to the spoken tutorial on Merging and Importing Data
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Use built-in functions for exploring a data frame
  • Merge two data frames
  • Import data in different formats in R
Show slide

Pre-requisites

http://spoken-tutorial.org

To understand this tutorial, you should know
  • Data frames in R
  • R script in RStudio
  • How to set working directory in RStudio

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on,
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.456

Install R version 3.2.0 or higher.

Show slide

Download files

For this tutorial, we will use,
  • five data frames in different formats and
  • a script file myDataSet.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight data frames and myDataSet.R in the folder myProject

I have downloaded these files from Code files link.

And moved them to DataMerging folder in myProject folder on the Desktop.

I have also set this folder as my Working Directory.

Let us switch to RStudio.
Click myDataSet.R in RStudio

Point to myDataSet.R in Rstudio

Open the script myDataSet.R in RStudio.

For this, click on the script myDataSet.R.

Script myDataSet.R opens in RStudio.

Highlight the Source button Run this script by clicking on Source button.
Highlight captaincyOne in the Source window captaincyOne appears in the Source window.
[RStudio]

Highlight captaincyOne in the Source window

We will use some built-in functions of R to explore captaincyOne.

For all the built-in functions used in this tutorial, please refer to the Additional Material.

Cursor on the interface. First, we will use summary function.
Highlight myDataSet.R in the Source window Click on the script myDataSet.R
[RStudio]

summary(captaincyOne)

Highlight Source button

In the Source window, type summary and then captaincyOne in parentheses.

Save the script and run the current line by pressing Ctrl+Enter keys simultaneously.

Highlight the output in the Console window In the Console window, scroll up to locate the output.

Statistical parameters for each column of captaincyOne are shown on the Console.

Highlight summary(captaincyOne) in the Source window In the Source window, press Enter.

Press Enter at the end of every command.

Now, let us look at class function.
[RStudio]

class(captaincyOne)

In the Source window, type class and then captaincyOne in parentheses.

Save the script and run the current line.

Highlight the output in the Console window class function returns the class of captaincyOne, which is data frame.
Point to Source window. Next let us look at typeof function.
[RStudio]

typeof(captaincyOne)

In the Source window, type typeof and then captaincyOne in parentheses.

Save the script and run the current line.

Highlight the output in the Console window typeof function returns the storage type of captaincyOne, which is list.
Highlight typeof in the Source window To know more about typeof function, we will access the help section of RStudio.
[RStudio]

help(typeof)

In the Console window, type help, within parentheses typeof. Press Enter.
Highlight Description in the help window typeof determines the R internal type or storage mode of any object.
Highlight Files tab in the lower right of RStudio Click on the Files tab.
Highlight broom icon in the Console window Clear the Console window by clicking on the broom icon.
Highlight captaincyOne in the Source window Click on the data frame captaincyOne.
Highlight captaincyOne in Source window Now, let us extract two rows from top of captaincyOne.

For this, we will use head function.

Highlight myDataSet.R in the Source window Click on the script myDataSet.R
[RStudio]

head(captaincyOne, 2)

In the Source window, type head within parentheses captaincyOne comma space 2.

Save the script and run the current line.

Highlight the output in the Console window The top two rows of captaincyOne are shown on the Console window.
Highlight captaincyOne in the Source window Click on the data frame captaincyOne.
Highlight CaptaincyOne in the Source window Suppose we want to extract two rows from bottom of captaincyOne.

For this, we will use the tail function.

Highlight myDataSet.R in the Source window Click on the script myDataSet.R
[RStudio]

tail(captaincyOne, 2)

In the Source window, type tail within parentheses captaincyOne comma space 2.

Save the script and run the current line.

Highlight the output in the Console window The last two rows of captaincyOne are shown on the Console window.
Cursor on the interface. Next, let us learn about str function.

This function is used to display the structure of an R object.

[RStudio]

str(captaincyOne)

In the Source window, type str within parentheses captaincyOne.

Save the script and run the current line.

Highlight the output in the Console window The structural details of captaincyOne are shown on the Console.
Now, we will look at merging of data frames.
Show slide

Merging data frames

Merging data frames has advantages like:
  • It makes data more available.
  • It helps in improving data quality.
  • Combining similar data also reduces data complexity.
Let us switch to RStudio.
[RStudio]

Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab

We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv.
[RStudio]

captaincyTwo <- read.csv("CaptaincyData2.csv")

We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.

In the Source window, type the following command and press Enter.

[RStudio]

View(captaincyTwo)

Now, type View within parentheses captaincyTwo.

Save the script and run the last two lines.

Highlight captaincyTwo in Source window The contents of captaincyTwo appear in the Source window.
Highlight the name of captains in captaincyTwo

Highlight the column drawn in captaincyTwo

This data frame has the same captains as that in captaincyOne.

However, it has different information about them like the number of matches drawn.

Highlight captaincyOne in the Source window Now, we will update captaincyOne by adding information from captaincyTwo.

For this, we use merge function.

Highlight myDataSet.R in the Source window Click on the script myDataSet.R
Drag the Source window. I am resizing the Source window.
[RStudio]

captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names")

In the Source window, type the following command. Press Enter.
Highlight by = "names" in the Source window In the merge function, we use column names by which we want to merge two data frames.

Here, it is names.

[RStudio]

View(captaincyOne)

Now, type View and captaincyOne in parentheses.

Save the script and run these two lines.

Highlight captaincyOne in the Source window The contents of the updated captaincyOne appear in the Source window.
[RStudio]

Highlight the tabs captaincyOne and captaincyTwo

Close the two tabs captaincyOne and captaincyTwo.
Cursor on the interface. Now, we will learn how to import data of different formats in R.
[RStudio]

# Importing data in different formats

We shall add one comment first.

In the Source window, type # hash space Importing data in different formats.

Highlight CaptaincyData.xml under Files tab Now, let us import CaptaincyData.xml file.

For that, we need to install XML package.

Make sure that you are connected to Internet.

We need to install Ubuntu package libxml2-dev before installing XML package.

Information on how to install this package, is provided in the Additional Material.

[RStudio]

Click in the Console window

I have already installed libxml2-dev package.

Hence, I will proceed for installing XML package now.

[RStudio]

install.packages("XML")

Highlight the red dot in the Console window

On the Console window, type install dot packages.

Now, type XMLinside double quotes and in parentheses.

Press Enter.

We will wait until R installs the package.

Then, we load this package using library function.
Highlight myDataSet.R in the Source window Click on the script myDataSet.R
Click at the top of the script myDataSet.R Since we are loading a package, we will add it at the top of the script.
[RStudio]

library(XML)

In the Source window, scroll up.

Now, at the top of the script myDataSet.R, type library and XML in parentheses.

Save the script and run this line.

[RStudio]

Point to the comment.

xmldata <- xmlToDataFrame("CaptaincyData.xml")

Now, in the Source window, click on the next line after the comment Importing data in different formats.

Type the following command and press Enter.

[RStudio]

View(xmldata )

Then type View and xmldata in parentheses.

Save the script and run these two lines.

Highlight xmldata in the Source window The contents of the xml file are shown here.
Highlight CaptaincyData.txt under Files tab Next let us learn how to import CaptaincyData.txt.
Highlight myDataSet.R in the Source window Click on the script myDataSet.R
[RStudio]

txtdata <- read.table(“CaptaincyData.txt”)

In the Source window, type the following command and press Enter.

[RStudio]

View(txtdata)

Next, type View and txtdata in parentheses.

Save the script and run these two lines.

Highlight txtdata in the Source window The contents of the txt file are shown.
Highlight CaptaincyData.xlsx under the Files tab Now, we will learn how to import data from user interface of Rstudio.

I am resizing the Source window.

We will import the Excel file CaptaincyData.xlsx using this method.

Please ensure that you have packages like readxl and Rcpp installed in your system.

Highlight Environment tab In the top right corner of RStudio, click on the Environment tab.
Highlight Import Dataset button

Highlight From Excel option

In the Environment tab, click on Import Dataset.

From the drop-down menu, select From Excel.

Highlight Import Excel Data window A window named Import Excel Data appears.
Highlight File/Url option You can select a file on your computer or type the URL from which you want to load an Excel file.

We will select a file on our computer.

Highlight Browse option In the upper right corner of this window, near File/Url text field, click on Browse.
Highlight CaptaincyData.xlsx in the folder myProject I will select the file CaptaincyData.xlsx located in DataMerging folder.

This folder is in myProject folder on the Desktop.

Click Open to load this file.

Highlight Data Preview option Below the field File/Url, RStudio shows the preview of the Excel file being imported.
Highlight Code Preview option At the bottom right corner of this window, you can see the code for importing this Excel file.
Highlight Import button Finally, click on the Import button.
Highlight CaptaincyData in the Source window The contents of the Excel file are shown here.
Let us summarize what we have learnt.
Show Slide

Summary

In this tutorial, we have learnt how to:
  • Use built-in functions for exploring a data frame
  • Merge two data frames
  • Import data in different formats in R
Show Slide

Assignment

We now suggest an assignment.
  • Using built-in dataset iris, implement all the functions we have learnt in this tutorial.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgement

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst