Difference between revisions of "R/C2/Merging-and-Importing-Data/English"
Sudhakarst (Talk | contribs) |
Nancyvarkey (Talk | contribs) m (Nancyvarkey moved page R/C2/Data-Merging-and-Data-Import/English to R/C2/Merging-and-Importing-Data/English without leaving a redirect) |
||
(10 intermediate revisions by 3 users not shown) | |||
Line 12: | Line 12: | ||
Opening slide | Opening slide | ||
− | || Welcome to the spoken tutorial on '''Merging | + | || Welcome to the spoken tutorial on '''Merging and Importing Data''' |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Learning Objectives | + | '''Learning Objectives''' |
|| In this tutorial, we will learn how to: | || In this tutorial, we will learn how to: | ||
− | * Use built-in functions for exploring a '''data frame''' | + | * Use '''built-in functions''' for exploring a '''data frame''' |
− | * Merge two '''data frames''' | + | * '''Merge''' two '''data frames''' |
− | * Import data in different formats in '''R''' | + | * Import '''data''' in different formats in '''R''' |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Pre-requisites | + | '''Pre-requisites''' |
http://spoken-tutorial.org | http://spoken-tutorial.org | ||
|| To understand this tutorial, you should know | || To understand this tutorial, you should know | ||
* '''Data frames''' in '''R''' | * '''Data frames''' in '''R''' | ||
− | * '''R script '''in '''RStudio ''' | + | * '''R script''' in '''RStudio ''' |
− | * How to set working directory in '''RStudio''' | + | * How to set '''working directory''' in '''RStudio''' |
+ | |||
If not, please locate the relevant tutorials on '''R''' on this website. | If not, please locate the relevant tutorials on '''R''' on this website. | ||
|- | |- | ||
|| Show slide | || Show slide | ||
− | System Specifications | + | '''System Specifications''' |
− | || This tutorial is recorded on | + | || This tutorial is recorded on, |
− | * '''Ubuntu Linux '''OS version | + | * '''Ubuntu Linux''' OS version 16.04 |
− | * '''R '''version | + | * '''R''' version 3.4.4 |
− | * '''RStudio''' version | + | * '''RStudio''' version 1.1.456 |
− | Install '''R''' version | + | Install '''R''' version 3.2.0 or higher. |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Download files | + | '''Download files''' |
|| For this tutorial, we will use, | || For this tutorial, we will use, | ||
* five '''data frames ''' in different formats and | * five '''data frames ''' in different formats and | ||
Line 54: | Line 55: | ||
|| [Computer screen] | || [Computer screen] | ||
− | Highlight '''data frames '''and''' myDataSet.R '''in the folder '''myProject''' | + | Highlight '''data frames''' and '''myDataSet.R''' in the folder '''myProject''' |
|| I have downloaded these files from '''Code files''' link. | || I have downloaded these files from '''Code files''' link. | ||
And moved them to '''DataMerging '''folder in '''myProject''' folder on the '''Desktop'''. | And moved them to '''DataMerging '''folder in '''myProject''' folder on the '''Desktop'''. | ||
− | I have also set this folder as my '''Working Directory | + | I have also set this folder as my '''Working Directory'''. |
|- | |- | ||
|| | || | ||
Line 67: | Line 68: | ||
Point to''' myDataSet.R''' in '''Rstudio''' | Point to''' myDataSet.R''' in '''Rstudio''' | ||
− | || Open the '''script myDataSet.R''' in '''RStudio | + | || Open the '''script myDataSet.R''' in '''RStudio'''. |
For this, click on the '''script myDataSet.R'''. | For this, click on the '''script myDataSet.R'''. | ||
− | '''Script myDataSet.R '''opens in''' RStudio.''' | + | '''Script myDataSet.R''' opens in '''RStudio.''' |
|- | |- | ||
|| Highlight the '''Source''' button | || Highlight the '''Source''' button | ||
Line 82: | Line 83: | ||
Highlight '''captaincyOne''' in the '''Source''' window | Highlight '''captaincyOne''' in the '''Source''' window | ||
− | || We will use some built-in functions of '''R''' to explore''' captaincyOne'''. | + | || We will use some '''built-in functions''' of '''R''' to explore''' captaincyOne'''. |
− | For all the built-in functions used in this tutorial, please refer to the '''Additional Material'''. | + | For all the '''built-in functions''' used in this tutorial, please refer to the '''Additional Material'''. |
|- | |- | ||
− | || | + | || Cursor on the interface. |
− | || First, we will use '''summary '''function. | + | || First, we will use '''summary''' function. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
Line 97: | Line 98: | ||
Highlight '''Source''' button | Highlight '''Source''' button | ||
− | || In the '''Source '''window, type '''summary '''and then '''captaincyOne '''in parentheses. | + | || In the '''Source''' window, type '''summary''' and then '''captaincyOne '''in parentheses. |
− | Save the '''script '''and run the current line by pressing '''Ctrl+Enter''' keys simultaneously. | + | Save the '''script''' and run the current line by pressing '''Ctrl+Enter''' keys simultaneously. |
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
|| In the '''Console''' window, scroll up to locate the output. | || In the '''Console''' window, scroll up to locate the output. | ||
− | Statistical parameters for each column of '''captaincyOne '''are shown on the '''Console'''. | + | '''Statistical parameters''' for each column of '''captaincyOne''' are shown on the '''Console'''. |
|- | |- | ||
Line 110: | Line 111: | ||
|| In the '''Source''' window, press '''Enter'''. | || In the '''Source''' window, press '''Enter'''. | ||
− | Press '''Enter '''at the end of every command. | + | Press '''Enter''' at the end of every command. |
|- | |- | ||
|| | || | ||
− | || Now, let us look at '''class ''' | + | || Now, let us look at '''class function'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''class(captaincyOne)''' | '''class(captaincyOne)''' | ||
− | || In the '''Source '''window, type '''class '''and then '''captaincyOne '''in parentheses. | + | || In the '''Source''' window, type '''class''' and then '''captaincyOne''' in parentheses. |
Save the '''script '''and run the current line. | Save the '''script '''and run the current line. | ||
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || '''class ''' | + | || '''class function''' returns the class of '''captaincyOne''', which is '''data frame'''. |
|- | |- | ||
− | || | + | || Point to '''Source''' window. |
− | || Next let us look at '''typeof ''' | + | || Next let us look at '''typeof function'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''typeof(captaincyOne)''' | '''typeof(captaincyOne)''' | ||
− | || In the '''Source '''window, type '''typeof '''and then '''captaincyOne '''in parentheses. | + | || In the '''Source '''window, type '''typeof''' and then '''captaincyOne''' in parentheses. |
− | Save the '''script '''and run the current line. | + | Save the '''script''' and run the current line. |
|- | |- | ||
− | || Highlight the output in the '''Console '''window | + | || Highlight the output in the '''Console''' window |
− | || '''typeof ''' | + | || '''typeof function''' returns the storage type of '''captaincyOne''', which is '''list'''. |
|- | |- | ||
|| Highlight '''typeof''' in the '''Source''' window | || Highlight '''typeof''' in the '''Source''' window | ||
− | || To know more about '''typeof''' | + | || To know more about '''typeof function''', we will access the '''help''' section of '''RStudio'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 148: | Line 149: | ||
|- | |- | ||
|| Highlight '''Description''' in the '''help''' window | || Highlight '''Description''' in the '''help''' window | ||
− | || '''typeof''' determines the R internal type or storage mode of any object. | + | || '''typeof''' determines the '''R internal type''' or '''storage mode''' of any '''object'''. |
|- | |- | ||
|| Highlight '''Files''' tab in the lower right of '''RStudio''' | || Highlight '''Files''' tab in the lower right of '''RStudio''' | ||
Line 154: | Line 155: | ||
|- | |- | ||
|| Highlight broom icon in the '''Console''' window | || Highlight broom icon in the '''Console''' window | ||
− | || Clear the '''Console '''window by clicking on the broom icon. | + | || Clear the '''Console''' window by clicking on the broom icon. |
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source''' window | || Highlight '''captaincyOne''' in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''data frame captaincyOne'''. |
|- | |- | ||
|| Highlight '''captaincyOne '''in '''Source '''window | || Highlight '''captaincyOne '''in '''Source '''window | ||
|| Now, let us extract two rows from top of '''captaincyOne'''. | || Now, let us extract two rows from top of '''captaincyOne'''. | ||
− | For this, we will use '''head ''' | + | For this, we will use '''head function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''head(captaincyOne, 2)''' | '''head(captaincyOne, 2)''' | ||
− | || In the '''Source '''window, type '''head '''within parentheses '''captaincyOne comma '''space 2. | + | || In the '''Source''' window, type '''head''' within parentheses '''captaincyOne comma '''space 2. |
− | Save the '''script '''and run the current line. | + | Save the '''script''' and run the current line. |
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || The top two rows of '''captaincyOne '''are shown on the '''Console''' window. | + | || The top two rows of '''captaincyOne''' are shown on the '''Console''' window. |
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source''' window | || Highlight '''captaincyOne''' in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''data frame captaincyOne'''. |
|- | |- | ||
|| Highlight '''CaptaincyOne''' in the '''Source''' window | || Highlight '''CaptaincyOne''' in the '''Source''' window | ||
|| Suppose we want to extract two rows from bottom of '''captaincyOne'''. | || Suppose we want to extract two rows from bottom of '''captaincyOne'''. | ||
− | For this, we will use the '''tail ''' | + | For this, we will use the '''tail function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''tail(captaincyOne, 2)''' | '''tail(captaincyOne, 2)''' | ||
− | || In the '''Source '''window, type '''tail '''within parentheses '''captaincyOne comma '''space 2. | + | || In the '''Source''' window, type '''tail '''within parentheses '''captaincyOne comma '''space 2. |
Save the '''script '''and run the current line. | Save the '''script '''and run the current line. | ||
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || The last two rows of '''captaincyOne '''are shown on the '''Console''' window. | + | || The last two rows of '''captaincyOne''' are shown on the '''Console''' window. |
|- | |- | ||
− | || | + | || Cursor on the interface. |
− | || Next, let us learn about '''str ''' | + | || Next, let us learn about '''str function'''. |
− | This function is used to display the structure of an '''R''' | + | This '''function''' is used to display the structure of an '''R object'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''str(captaincyOne)''' | '''str(captaincyOne)''' | ||
− | || In the '''Source '''window, type '''str '''within parentheses '''captaincyOne'''. | + | || In the '''Source''' window, type '''str '''within parentheses '''captaincyOne'''. |
− | Save the '''script '''and run the current line. | + | Save the '''script''' and run the current line. |
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
Line 214: | Line 215: | ||
|- | |- | ||
|| | || | ||
− | || Now, we will look at merging of '''data frames'''. | + | || Now, we will look at '''merging''' of '''data frames'''. |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Merging data frames | + | Merging '''data frames''' |
− | || | + | || '''Merging data frames '''has advantages like: |
− | * It makes data more available. | + | * It makes '''data''' more available. |
− | * It helps in improving data quality. | + | * It helps in improving '''data''' quality. |
− | * Combining similar data also reduces data complexity. | + | * Combining similar '''data''' also reduces data complexity. |
|- | |- | ||
Line 232: | Line 233: | ||
Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' ''' | Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' ''' | ||
− | || We will learn how to merge two '''data frames CaptaincyData.csv '''and '''CaptaincyData2.csv'''. | + | || We will learn how to '''merge''' two '''data frames CaptaincyData.csv''' and '''CaptaincyData2.csv'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 244: | Line 245: | ||
'''View(captaincyTwo)''' | '''View(captaincyTwo)''' | ||
− | || | + | || Now, type '''View '''within parentheses '''captaincyTwo.''' |
Save the '''script '''and run the last two lines. | Save the '''script '''and run the last two lines. | ||
Line 259: | Line 260: | ||
|- | |- | ||
|| Highlight '''captaincyOne '''in the '''Source '''window | || Highlight '''captaincyOne '''in the '''Source '''window | ||
− | || Now, we will update '''captaincyOne '''by adding information from '''captaincyTwo'''. | + | || Now, we will update '''captaincyOne''' by adding information from '''captaincyTwo'''. |
− | For this, we use '''merge ''' | + | For this, we use '''merge function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
|| Click on the script '''myDataSet.R''' | || Click on the script '''myDataSet.R''' | ||
+ | |||
+ | |- | ||
+ | || Drag the Source window. | ||
+ | || I am resizing the '''Source''' window. | ||
+ | |||
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names")''' | '''captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names")''' | ||
− | || In the '''Source''' window, type the following command | + | || In the '''Source''' window, type the following command. Press '''Enter'''. |
|- | |- | ||
|| Highlight '''by =''' '''"names" '''in the '''Source''' window | || Highlight '''by =''' '''"names" '''in the '''Source''' window | ||
− | || In the '''merge''' | + | || In the '''merge function''', we use column names by which we want to merge two '''data frames'''. |
Here, it is '''names'''. | Here, it is '''names'''. | ||
Line 284: | Line 290: | ||
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source '''window | || Highlight '''captaincyOne''' in the '''Source '''window | ||
− | || The contents of the | + | || The contents of the updated '''captaincyOne''' appear in the '''Source''' window. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Highlight the tabs '''captaincyOne '''and '''captaincyTwo''' | Highlight the tabs '''captaincyOne '''and '''captaincyTwo''' | ||
− | || Close the two tabs '''captaincyOne '''and '''captaincyTwo'''. | + | || Close the two tabs '''captaincyOne''' and '''captaincyTwo'''. |
|- | |- | ||
|| Cursor on the interface. | || Cursor on the interface. | ||
− | || Now, we will learn how to import '''data''' of different formats in R. | + | || Now, we will learn how to import '''data''' of different '''formats''' in '''R'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 299: | Line 305: | ||
|| We shall add one comment first. | || We shall add one comment first. | ||
− | In the '''Source '''window, type | + | In the '''Source '''window, type '''#''' hash space '''Importing data in different formats'''. |
− | + | ||
− | '''#''' hash space '''Importing data in different formats'''. | + | |
|- | |- | ||
|| Highlight '''CaptaincyData.xml''' under '''Files''' tab''' ''' | || Highlight '''CaptaincyData.xml''' under '''Files''' tab''' ''' | ||
Line 310: | Line 314: | ||
Make sure that you are connected to '''Internet'''. | Make sure that you are connected to '''Internet'''. | ||
|- | |- | ||
− | || | + | || |
+ | || We need to install '''Ubuntu''' package '''libxml2-dev ''' before installing '''XML''' package. | ||
− | + | Information on how to install this package, is provided in the '''Additional Material'''. | |
− | + | ||
− | |||
− | |||
− | |||
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Click in the '''Console''' window | Click in the '''Console''' window | ||
− | || I have already installed '''libxml2-dev '''package. | + | || I have already installed '''libxml2-dev''' package. |
Hence, I will proceed for installing '''XML''' package now. | Hence, I will proceed for installing '''XML''' package now. | ||
Line 331: | Line 332: | ||
Highlight the red dot in the '''Console''' window | Highlight the red dot in the '''Console''' window | ||
− | || On the '''Console '''window, type '''install dot packages'''. | + | || On the '''Console '''window, type '''install dot packages'''. |
− | Now | + | Now, type '''XML'''inside double quotes and in parentheses. |
Press '''Enter'''. | Press '''Enter'''. | ||
− | We will wait until '''R''' installs the package. | + | We will wait until '''R''' installs the '''package'''. |
|- | |- | ||
|| | || | ||
− | || Then, we load this | + | || Then, we load this '''package '''using '''library function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
− | || Click at the top of the '''script | + | || Click at the top of the '''script myDataSet.R''' |
− | || Since we are loading a package, we will add it at the top of the '''script'''. | + | || Since we are loading a '''package''', we will add it at the top of the '''script'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''library(XML)''' | '''library(XML)''' | ||
− | || In the '''Source '''window, at the top of the '''script myDataSet.R''', type '''library '''and '''XML '''in parentheses'''. ''' | + | || In the '''Source '''window, scroll up. |
+ | |||
+ | Now, at the top of the '''script myDataSet.R''', type '''library '''and '''XML '''in parentheses'''. ''' | ||
Save the '''script '''and run this line. | Save the '''script '''and run this line. | ||
Line 360: | Line 363: | ||
'''xmldata <- xmlToDataFrame("CaptaincyData.xml")''' | '''xmldata <- xmlToDataFrame("CaptaincyData.xml")''' | ||
− | || Now, in the '''Source '''window, click on the next line after the | + | || Now, in the '''Source '''window, click on the next line after the '''comment Importing data in different formats'''. |
− | Type the following command and press '''Enter'''. | + | Type the following '''command''' and press '''Enter'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 378: | Line 381: | ||
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''txtdata <- read.table(“CaptaincyData.txt”)''' | '''txtdata <- read.table(“CaptaincyData.txt”)''' | ||
− | || In the '''Source''' window, type the following command and press '''Enter'''. | + | || In the '''Source''' window, type the following '''command''' and press '''Enter'''. |
|- | |- | ||
|| | || | ||
Line 395: | Line 398: | ||
|- | |- | ||
|| Highlight '''txtdata '''in the '''Source '''window | || Highlight '''txtdata '''in the '''Source '''window | ||
− | || The contents of the ''' | + | || The contents of the '''txt''' file are shown. |
|- | |- | ||
|| Highlight '''CaptaincyData.xlsx''' under the '''Files''' tab | || Highlight '''CaptaincyData.xlsx''' under the '''Files''' tab | ||
− | || Now, we will learn how to import '''data''' from | + | || Now, we will learn how to import '''data''' from '''user interface''' of '''Rstudio'''. |
+ | |||
+ | I am resizing the '''Source''' window. | ||
We will import the '''Excel''' file '''CaptaincyData.xlsx''' using this method. | We will import the '''Excel''' file '''CaptaincyData.xlsx''' using this method. | ||
− | Please ensure that you have packages like '''readxl''' and '''Rcpp''' installed in your system. | + | Please ensure that you have '''packages''' like '''readxl''' and '''Rcpp''' installed in your system. |
|- | |- | ||
|| Highlight '''Environment '''tab | || Highlight '''Environment '''tab | ||
Line 418: | Line 423: | ||
|- | |- | ||
|| Highlight '''File/Url''' option | || Highlight '''File/Url''' option | ||
− | || You can select a '''file '''on your computer or type the ''' | + | || You can select a '''file '''on your computer or type the '''URL '''from which you want to load an '''Excel '''file. |
We will select a file on our computer. | We will select a file on our computer. | ||
Line 426: | Line 431: | ||
|- | |- | ||
|| Highlight '''CaptaincyData.xlsx''' in the folder '''myProject''' | || Highlight '''CaptaincyData.xlsx''' in the folder '''myProject''' | ||
− | || I will select the file '''CaptaincyData.xlsx''' located in '''DataMerging '''folder. | + | || I will select the file '''CaptaincyData.xlsx''' located in '''DataMerging''' folder. |
This folder is in '''myProject''' folder on the '''Desktop'''. | This folder is in '''myProject''' folder on the '''Desktop'''. | ||
− | Click '''Open '''to load this '''file'''. | + | Click '''Open''' to load this '''file'''. |
|- | |- | ||
|| Highlight '''Data Preview''' option | || Highlight '''Data Preview''' option | ||
− | || Below the field '''File/Url, RStudio '''shows the preview of the '''Excel '''file being imported. | + | || Below the field '''File/Url, RStudio '''shows the preview of the '''Excel''' file being imported. |
|- | |- | ||
|| Highlight '''Code Preview''' option | || Highlight '''Code Preview''' option | ||
− | || At the bottom right corner of this window, you can see the code for importing this ''' | + | || At the bottom right corner of this window, you can see the code for importing this '''Excel''' file. |
|- | |- | ||
|| Highlight '''Import '''button | || Highlight '''Import '''button | ||
Line 442: | Line 447: | ||
|- | |- | ||
|| Highlight '''CaptaincyData '''in the '''Source '''window | || Highlight '''CaptaincyData '''in the '''Source '''window | ||
− | || The contents of the '''Excel '''file are shown here. | + | || The contents of the '''Excel''' file are shown here. |
|- | |- | ||
|| | || | ||
Line 451: | Line 456: | ||
Summary | Summary | ||
|| In this tutorial, we have learnt how to: | || In this tutorial, we have learnt how to: | ||
− | * Use built-in functions for exploring a '''data frame''' | + | * Use '''built-in functions''' for exploring a '''data frame''' |
− | * Merge two '''data frames''' | + | * '''Merge''' two '''data frames''' |
− | * Import data in different formats in '''R''' | + | * Import '''data''' in different '''formats''' in '''R''' |
|- | |- | ||
Line 460: | Line 465: | ||
Assignment | Assignment | ||
|| We now suggest an assignment. | || We now suggest an assignment. | ||
− | * Using | + | * Using '''built-in dataset iris''', implement all the '''functions''' we have learnt in this tutorial. |
|- | |- | ||
|| Show slide | || Show slide | ||
Line 479: | Line 484: | ||
Forum to answer questions | Forum to answer questions | ||
− | || | + | || Please post your timed queries in this forum. |
|- | |- | ||
|| Show Slide | || Show Slide | ||
Forum to answer questions | Forum to answer questions | ||
− | || | + | || Please post your general queries in this forum. |
|- | |- | ||
|| Show Slide | || Show Slide |
Latest revision as of 18:05, 2 April 2019
Title of script: Merging and Importing Data
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data merge, data import, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Merging and Importing Data |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on,
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use,
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight data frames and myDataSet.R in the folder myProject |
I have downloaded these files from Code files link.
And moved them to DataMerging folder in myProject folder on the Desktop. I have also set this folder as my Working Directory. |
Let us switch to RStudio. | |
Click myDataSet.R in RStudio
Point to myDataSet.R in Rstudio |
Open the script myDataSet.R in RStudio.
For this, click on the script myDataSet.R. Script myDataSet.R opens in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight captaincyOne in the Source window | captaincyOne appears in the Source window. |
[RStudio]
Highlight captaincyOne in the Source window |
We will use some built-in functions of R to explore captaincyOne.
For all the built-in functions used in this tutorial, please refer to the Additional Material. |
Cursor on the interface. | First, we will use summary function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
summary(captaincyOne) Highlight Source button |
In the Source window, type summary and then captaincyOne in parentheses.
Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight the output in the Console window | In the Console window, scroll up to locate the output.
Statistical parameters for each column of captaincyOne are shown on the Console. |
Highlight summary(captaincyOne) in the Source window | In the Source window, press Enter.
Press Enter at the end of every command. |
Now, let us look at class function. | |
[RStudio]
class(captaincyOne) |
In the Source window, type class and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | class function returns the class of captaincyOne, which is data frame. |
Point to Source window. | Next let us look at typeof function. |
[RStudio]
typeof(captaincyOne) |
In the Source window, type typeof and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | typeof function returns the storage type of captaincyOne, which is list. |
Highlight typeof in the Source window | To know more about typeof function, we will access the help section of RStudio. |
[RStudio]
help(typeof) |
In the Console window, type help, within parentheses typeof. Press Enter. |
Highlight Description in the help window | typeof determines the R internal type or storage mode of any object. |
Highlight Files tab in the lower right of RStudio | Click on the Files tab. |
Highlight broom icon in the Console window | Clear the Console window by clicking on the broom icon. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight captaincyOne in Source window | Now, let us extract two rows from top of captaincyOne.
For this, we will use head function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
head(captaincyOne, 2) |
In the Source window, type head within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The top two rows of captaincyOne are shown on the Console window. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight CaptaincyOne in the Source window | Suppose we want to extract two rows from bottom of captaincyOne.
For this, we will use the tail function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
tail(captaincyOne, 2) |
In the Source window, type tail within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The last two rows of captaincyOne are shown on the Console window. |
Cursor on the interface. | Next, let us learn about str function.
This function is used to display the structure of an R object. |
[RStudio]
str(captaincyOne) |
In the Source window, type str within parentheses captaincyOne.
Save the script and run the current line. |
Highlight the output in the Console window | The structural details of captaincyOne are shown on the Console. |
Now, we will look at merging of data frames. | |
Show slide
Merging data frames |
Merging data frames has advantages like:
|
Let us switch to RStudio. | |
[RStudio]
Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab |
We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
[RStudio]
captaincyTwo <- read.csv("CaptaincyData2.csv") |
We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
In the Source window, type the following command and press Enter. |
[RStudio]
View(captaincyTwo) |
Now, type View within parentheses captaincyTwo.
Save the script and run the last two lines. |
Highlight captaincyTwo in Source window | The contents of captaincyTwo appear in the Source window. |
Highlight the name of captains in captaincyTwo
Highlight the column drawn in captaincyTwo |
This data frame has the same captains as that in captaincyOne.
However, it has different information about them like the number of matches drawn. |
Highlight captaincyOne in the Source window | Now, we will update captaincyOne by adding information from captaincyTwo.
For this, we use merge function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Drag the Source window. | I am resizing the Source window. |
[RStudio]
captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names") |
In the Source window, type the following command. Press Enter. |
Highlight by = "names" in the Source window | In the merge function, we use column names by which we want to merge two data frames.
Here, it is names. |
[RStudio]
View(captaincyOne) |
Now, type View and captaincyOne in parentheses.
Save the script and run these two lines. |
Highlight captaincyOne in the Source window | The contents of the updated captaincyOne appear in the Source window. |
[RStudio]
Highlight the tabs captaincyOne and captaincyTwo |
Close the two tabs captaincyOne and captaincyTwo. |
Cursor on the interface. | Now, we will learn how to import data of different formats in R. |
[RStudio]
# Importing data in different formats |
We shall add one comment first.
In the Source window, type # hash space Importing data in different formats. |
Highlight CaptaincyData.xml under Files tab | Now, let us import CaptaincyData.xml file.
For that, we need to install XML package. Make sure that you are connected to Internet. |
We need to install Ubuntu package libxml2-dev before installing XML package.
Information on how to install this package, is provided in the Additional Material. | |
[RStudio]
Click in the Console window |
I have already installed libxml2-dev package.
Hence, I will proceed for installing XML package now. |
[RStudio]
install.packages("XML") Highlight the red dot in the Console window |
On the Console window, type install dot packages.
Now, type XMLinside double quotes and in parentheses. Press Enter. We will wait until R installs the package. |
Then, we load this package using library function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Click at the top of the script myDataSet.R | Since we are loading a package, we will add it at the top of the script. |
[RStudio]
library(XML) |
In the Source window, scroll up.
Now, at the top of the script myDataSet.R, type library and XML in parentheses. Save the script and run this line. |
[RStudio]
Point to the comment. xmldata <- xmlToDataFrame("CaptaincyData.xml") |
Now, in the Source window, click on the next line after the comment Importing data in different formats.
Type the following command and press Enter. |
[RStudio]
View(xmldata ) |
Then type View and xmldata in parentheses.
Save the script and run these two lines. |
Highlight xmldata in the Source window | The contents of the xml file are shown here. |
Highlight CaptaincyData.txt under Files tab | Next let us learn how to import CaptaincyData.txt. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
txtdata <- read.table(“CaptaincyData.txt”) |
In the Source window, type the following command and press Enter. |
[RStudio] View(txtdata) |
Next, type View and txtdata in parentheses.
Save the script and run these two lines. |
Highlight txtdata in the Source window | The contents of the txt file are shown. |
Highlight CaptaincyData.xlsx under the Files tab | Now, we will learn how to import data from user interface of Rstudio.
I am resizing the Source window. We will import the Excel file CaptaincyData.xlsx using this method. Please ensure that you have packages like readxl and Rcpp installed in your system. |
Highlight Environment tab | In the top right corner of RStudio, click on the Environment tab. |
Highlight Import Dataset button
Highlight From Excel option |
In the Environment tab, click on Import Dataset.
From the drop-down menu, select From Excel. |
Highlight Import Excel Data window | A window named Import Excel Data appears. |
Highlight File/Url option | You can select a file on your computer or type the URL from which you want to load an Excel file.
We will select a file on our computer. |
Highlight Browse option | In the upper right corner of this window, near File/Url text field, click on Browse. |
Highlight CaptaincyData.xlsx in the folder myProject | I will select the file CaptaincyData.xlsx located in DataMerging folder.
This folder is in myProject folder on the Desktop. Click Open to load this file. |
Highlight Data Preview option | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
Highlight Code Preview option | At the bottom right corner of this window, you can see the code for importing this Excel file. |
Highlight Import button | Finally, click on the Import button. |
Highlight CaptaincyData in the Source window | The contents of the Excel file are shown here. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |