Difference between revisions of "R/C2/Merging-and-Importing-Data/English"
Sudhakarst (Talk | contribs) |
Nancyvarkey (Talk | contribs) |
||
Line 12: | Line 12: | ||
Opening slide | Opening slide | ||
− | || Welcome to the spoken tutorial on '''Merging''' '''and Importing Data''' | + | || Welcome to the spoken tutorial on '''Merging''' '''and''' Importing Data''' |
|- | |- | ||
|| Show slide | || Show slide | ||
Line 18: | Line 18: | ||
Learning Objectives | Learning Objectives | ||
|| In this tutorial, we will learn how to: | || In this tutorial, we will learn how to: | ||
− | * Use built-in functions for exploring a '''data frame''' | + | * Use '''built-in functions''' for exploring a '''data frame''' |
− | * Merge two '''data frames''' | + | * '''Merge''' two '''data frames''' |
− | * Import data in different formats in '''R''' | + | * Import '''data''' in different formats in '''R''' |
|- | |- | ||
|| Show slide | || Show slide | ||
Line 30: | Line 30: | ||
* '''Data frames''' in '''R''' | * '''Data frames''' in '''R''' | ||
* '''R script '''in '''RStudio ''' | * '''R script '''in '''RStudio ''' | ||
− | * How to set working directory in '''RStudio''' | + | * How to set '''working directory''' in '''RStudio''' |
If not, please locate the relevant tutorials on '''R''' on this website. | If not, please locate the relevant tutorials on '''R''' on this website. | ||
|- | |- | ||
Line 82: | Line 82: | ||
Highlight '''captaincyOne''' in the '''Source''' window | Highlight '''captaincyOne''' in the '''Source''' window | ||
− | || We will use some built-in functions of '''R''' to explore''' captaincyOne'''. | + | || We will use some '''built-in functions''' of '''R''' to explore''' captaincyOne'''. |
− | For all the built-in functions used in this tutorial, please refer to the '''Additional Material'''. | + | For all the '''built-in functions''' used in this tutorial, please refer to the '''Additional Material'''. |
|- | |- | ||
|| | || | ||
Line 104: | Line 104: | ||
|| In the '''Console''' window, scroll up to locate the output. | || In the '''Console''' window, scroll up to locate the output. | ||
− | Statistical parameters for each column of '''captaincyOne '''are shown on the '''Console'''. | + | '''Statistical parameters''' for each column of '''captaincyOne '''are shown on the '''Console'''. |
|- | |- | ||
Line 113: | Line 113: | ||
|- | |- | ||
|| | || | ||
− | || Now, let us look at '''class ''' | + | || Now, let us look at '''class function'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 123: | Line 123: | ||
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || '''class ''' | + | || '''class function''' returns the class of '''captaincyOne, '''which is '''data frame'''. |
|- | |- | ||
|| | || | ||
− | || Next let us look at '''typeof ''' | + | || Next let us look at '''typeof function'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 136: | Line 136: | ||
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || '''typeof ''' | + | || '''typeof function''' returns the storage type of '''captaincyOne''', which is '''list'''. |
|- | |- | ||
|| Highlight '''typeof''' in the '''Source''' window | || Highlight '''typeof''' in the '''Source''' window | ||
− | || To know more about '''typeof''' | + | || To know more about '''typeof function''', we will access the '''help''' section of '''RStudio'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 148: | Line 148: | ||
|- | |- | ||
|| Highlight '''Description''' in the '''help''' window | || Highlight '''Description''' in the '''help''' window | ||
− | || '''typeof''' determines the R internal type or storage mode of any object. | + | || '''typeof''' determines the '''R internal type''' or '''storage mode''' of any '''object'''. |
|- | |- | ||
|| Highlight '''Files''' tab in the lower right of '''RStudio''' | || Highlight '''Files''' tab in the lower right of '''RStudio''' | ||
Line 157: | Line 157: | ||
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source''' window | || Highlight '''captaincyOne''' in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''data frame captaincyOne'''. |
|- | |- | ||
|| Highlight '''captaincyOne '''in '''Source '''window | || Highlight '''captaincyOne '''in '''Source '''window | ||
|| Now, let us extract two rows from top of '''captaincyOne'''. | || Now, let us extract two rows from top of '''captaincyOne'''. | ||
− | For this, we will use '''head ''' | + | For this, we will use '''head function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 178: | Line 178: | ||
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source''' window | || Highlight '''captaincyOne''' in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''data frame captaincyOne'''. |
|- | |- | ||
|| Highlight '''CaptaincyOne''' in the '''Source''' window | || Highlight '''CaptaincyOne''' in the '''Source''' window | ||
|| Suppose we want to extract two rows from bottom of '''captaincyOne'''. | || Suppose we want to extract two rows from bottom of '''captaincyOne'''. | ||
− | For this, we will use the '''tail ''' | + | For this, we will use the '''tail function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 199: | Line 199: | ||
|- | |- | ||
|| | || | ||
− | || Next, let us learn about '''str ''' | + | || Next, let us learn about '''str function'''. |
− | This function is used to display the structure of an '''R''' | + | This '''function''' is used to display the structure of an '''R object'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 214: | Line 214: | ||
|- | |- | ||
|| | || | ||
− | || Now, we will look at merging of '''data frames'''. | + | || Now, we will look at '''merging''' of '''data frames'''. |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Merging data frames | + | Merging '''data frames''' |
− | || | + | || '''Merging data frames '''has advantages like: |
− | * It makes data more available. | + | * It makes '''data''' more available. |
− | * It helps in improving data quality. | + | * It helps in improving '''data''' quality. |
− | * Combining similar data also reduces data complexity. | + | * Combining similar '''data''' also reduces data complexity. |
|- | |- | ||
Line 232: | Line 232: | ||
Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' ''' | Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' ''' | ||
− | || We will learn how to merge two '''data frames CaptaincyData.csv '''and '''CaptaincyData2.csv'''. | + | || We will learn how to '''merge''' two '''data frames CaptaincyData.csv '''and '''CaptaincyData2.csv'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 261: | Line 261: | ||
|| Now, we will update '''captaincyOne '''by adding information from '''captaincyTwo'''. | || Now, we will update '''captaincyOne '''by adding information from '''captaincyTwo'''. | ||
− | For this, we use '''merge ''' | + | For this, we use '''merge function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
Line 277: | Line 277: | ||
|- | |- | ||
|| Highlight '''by =''' '''"names" '''in the '''Source''' window | || Highlight '''by =''' '''"names" '''in the '''Source''' window | ||
− | || In the '''merge''' | + | || In the '''merge function''', we use column names by which we want to merge two '''data frames'''. |
Here, it is '''names'''. | Here, it is '''names'''. | ||
Line 297: | Line 297: | ||
|- | |- | ||
|| Cursor on the interface. | || Cursor on the interface. | ||
− | || Now, we will learn how to import '''data''' of different formats in R. | + | || Now, we will learn how to import '''data''' of different '''formats''' in '''R'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 337: | Line 337: | ||
Press '''Enter'''. | Press '''Enter'''. | ||
− | We will wait until '''R''' installs the package. | + | We will wait until '''R''' installs the '''package'''. |
|- | |- | ||
|| | || | ||
− | || Then, we load this | + | || Then, we load this '''package '''using '''library function'''. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
− | || Click at the top of the '''script | + | || Click at the top of the '''script myDataSet.R''' |
− | || Since we are loading a package, we will add it at the top of the '''script'''. | + | || Since we are loading a '''package''', we will add it at the top of the '''script'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 362: | Line 362: | ||
'''xmldata <- xmlToDataFrame("CaptaincyData.xml")''' | '''xmldata <- xmlToDataFrame("CaptaincyData.xml")''' | ||
− | || Now, in the '''Source '''window, click on the next line after the | + | || Now, in the '''Source '''window, click on the next line after the '''comment Importing data in different formats'''. |
− | Type the following command and press '''Enter'''. | + | Type the following '''command''' and press '''Enter'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 380: | Line 380: | ||
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
− | || Click on the | + | || Click on the '''script myDataSet.R''' |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
'''txtdata <- read.table(“CaptaincyData.txt”)''' | '''txtdata <- read.table(“CaptaincyData.txt”)''' | ||
− | || In the '''Source''' window, type the following command and press '''Enter'''. | + | || In the '''Source''' window, type the following '''command''' and press '''Enter'''. |
|- | |- | ||
|| | || | ||
Line 397: | Line 397: | ||
|- | |- | ||
|| Highlight '''txtdata '''in the '''Source '''window | || Highlight '''txtdata '''in the '''Source '''window | ||
− | || The contents of the '''txt'''file are shown. | + | || The contents of the '''txt''' file are shown. |
|- | |- | ||
|| Highlight '''CaptaincyData.xlsx''' under the '''Files''' tab | || Highlight '''CaptaincyData.xlsx''' under the '''Files''' tab | ||
− | || Now, we will learn how to import '''data''' from user interface of '''Rstudio'''. | + | || Now, we will learn how to import '''data''' from '''user interface''' of '''Rstudio'''. |
I am resizing the '''Source''' window. | I am resizing the '''Source''' window. | ||
Line 406: | Line 406: | ||
We will import the '''Excel''' file '''CaptaincyData.xlsx''' using this method. | We will import the '''Excel''' file '''CaptaincyData.xlsx''' using this method. | ||
− | Please ensure that you have packages like '''readxl''' and '''Rcpp''' installed in your system. | + | Please ensure that you have '''packages''' like '''readxl''' and '''Rcpp''' installed in your system. |
|- | |- | ||
|| Highlight '''Environment '''tab | || Highlight '''Environment '''tab | ||
Line 422: | Line 422: | ||
|- | |- | ||
|| Highlight '''File/Url''' option | || Highlight '''File/Url''' option | ||
− | || You can select a '''file '''on your computer or type the ''' | + | || You can select a '''file '''on your computer or type the '''URL '''from which you want to load an '''Excel '''file. |
We will select a file on our computer. | We will select a file on our computer. | ||
Line 455: | Line 455: | ||
Summary | Summary | ||
|| In this tutorial, we have learnt how to: | || In this tutorial, we have learnt how to: | ||
− | * Use built-in functions for exploring a '''data frame''' | + | * Use '''built-in functions''' for exploring a '''data frame''' |
− | * Merge two '''data frames''' | + | * '''Merge''' two '''data frames''' |
− | * Import data in different formats in '''R''' | + | * Import '''data''' in different '''formats''' in '''R''' |
|- | |- | ||
Line 464: | Line 464: | ||
Assignment | Assignment | ||
|| We now suggest an assignment. | || We now suggest an assignment. | ||
− | * Using built-in dataset | + | * Using '''built-in dataset iris''', implement all the '''functions''' we have learnt in this tutorial. |
|- | |- | ||
|| Show slide | || Show slide |
Revision as of 18:35, 20 March 2019
Title of script: Merging and Importing Data
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data merge, data import, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Merging and Importing Data |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use,
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight data frames and myDataSet.R in the folder myProject |
I have downloaded these files from Code files link.
And moved them to DataMerging folder in myProject folder on the Desktop. I have also set this folder as my Working Directory. |
Let us switch to RStudio. | |
Click myDataSet.R in RStudio
Point to myDataSet.R in Rstudio |
Open the script myDataSet.R in RStudio.
For this, click on the script myDataSet.R. Script myDataSet.R opens in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight captaincyOne in the Source window | captaincyOne appears in the Source window. |
[RStudio]
Highlight captaincyOne in the Source window |
We will use some built-in functions of R to explore captaincyOne.
For all the built-in functions used in this tutorial, please refer to the Additional Material. |
First, we will use summary function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
summary(captaincyOne) Highlight Source button |
In the Source window, type summary and then captaincyOne in parentheses.
Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight the output in the Console window | In the Console window, scroll up to locate the output.
Statistical parameters for each column of captaincyOne are shown on the Console. |
Highlight summary(captaincyOne) in the Source window | In the Source window, press Enter.
Press Enter at the end of every command. |
Now, let us look at class function. | |
[RStudio]
class(captaincyOne) |
In the Source window, type class and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | class function returns the class of captaincyOne, which is data frame. |
Next let us look at typeof function. | |
[RStudio]
typeof(captaincyOne) |
In the Source window, type typeof and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | typeof function returns the storage type of captaincyOne, which is list. |
Highlight typeof in the Source window | To know more about typeof function, we will access the help section of RStudio. |
[RStudio]
help(typeof) |
In the Console window, type help, within parentheses typeof. Press Enter. |
Highlight Description in the help window | typeof determines the R internal type or storage mode of any object. |
Highlight Files tab in the lower right of RStudio | Click on the Files tab. |
Highlight broom icon in the Console window | Clear the Console window by clicking on the broom icon. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight captaincyOne in Source window | Now, let us extract two rows from top of captaincyOne.
For this, we will use head function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
head(captaincyOne, 2) |
In the Source window, type head within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The top two rows of captaincyOne are shown on the Console window. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight CaptaincyOne in the Source window | Suppose we want to extract two rows from bottom of captaincyOne.
For this, we will use the tail function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
tail(captaincyOne, 2) |
In the Source window, type tail within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The last two rows of captaincyOne are shown on the Console window. |
Next, let us learn about str function.
This function is used to display the structure of an R object. | |
[RStudio]
str(captaincyOne) |
In the Source window, type str within parentheses captaincyOne.
Save the script and run the current line. |
Highlight the output in the Console window | The structural details of captaincyOne are shown on the Console. |
Now, we will look at merging of data frames. | |
Show slide
Merging data frames |
Merging data frames has advantages like:
|
Let us switch to RStudio. | |
[RStudio]
Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab |
We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
[RStudio]
captaincyTwo <- read.csv("CaptaincyData2.csv") |
We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
In the Source window, type the following command and press Enter. |
[RStudio]
View(captaincyTwo) |
Now, type View within parentheses captaincyTwo.
Save the script and run the last two lines. |
Highlight captaincyTwo in Source window | The contents of captaincyTwo appear in the Source window. |
Highlight the name of captains in captaincyTwo
Highlight the column drawn in captaincyTwo |
This data frame has the same captains as that in captaincyOne.
However, it has different information about them like the number of matches drawn. |
Highlight captaincyOne in the Source window | Now, we will update captaincyOne by adding information from captaincyTwo.
For this, we use merge function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
I am resizing the Source window. | |
[RStudio]
captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names") |
In the Source window, type the following command. Press Enter. |
Highlight by = "names" in the Source window | In the merge function, we use column names by which we want to merge two data frames.
Here, it is names. |
[RStudio]
View(captaincyOne) |
Now, type View and captaincyOne in parentheses.
Save the script and run these two lines. |
Highlight captaincyOne in the Source window | The contents of the updated captaincyOne appear in the Source window. |
[RStudio]
Highlight the tabs captaincyOne and captaincyTwo |
Close the two tabs captaincyOne and captaincyTwo. |
Cursor on the interface. | Now, we will learn how to import data of different formats in R. |
[RStudio]
# Importing data in different formats |
We shall add one comment first.
In the Source window, type # hash space Importing data in different formats. |
Highlight CaptaincyData.xml under Files tab | Now, let us import CaptaincyData.xml file.
For that, we need to install XML package. Make sure that you are connected to Internet. |
We need to install Ubuntu package libxml2-dev before installing XML package.
Information on how to install this package, is provided in the Additional Material. | |
[RStudio]
Click in the Console window |
I have already installed libxml2-dev package.
Hence, I will proceed for installing XML package now. |
[RStudio]
install.packages("XML") Highlight the red dot in the Console window |
On the Console window, type install dot packages.
Now, type XML inside double quotes and in parentheses. Press Enter. We will wait until R installs the package. |
Then, we load this package using library function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Click at the top of the script myDataSet.R | Since we are loading a package, we will add it at the top of the script. |
[RStudio]
library(XML) |
In the Source window, scroll up.
Now, at the top of the script myDataSet.R, type library and XML in parentheses. Save the script and run this line. |
[RStudio]
Point to the comment. xmldata <- xmlToDataFrame("CaptaincyData.xml") |
Now, in the Source window, click on the next line after the comment Importing data in different formats.
Type the following command and press Enter. |
[RStudio]
View(xmldata ) |
Then type View and xmldata in parentheses.
Save the script and run these two lines. |
Highlight xmldata in the Source window | The contents of the xml file are shown here. |
Highlight CaptaincyData.txt under Files tab | Next let us learn how to import CaptaincyData.txt. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
txtdata <- read.table(“CaptaincyData.txt”) |
In the Source window, type the following command and press Enter. |
[RStudio] View(txtdata) |
Next, type View and txtdata in parentheses.
Save the script and run these two lines. |
Highlight txtdata in the Source window | The contents of the txt file are shown. |
Highlight CaptaincyData.xlsx under the Files tab | Now, we will learn how to import data from user interface of Rstudio.
I am resizing the Source window. We will import the Excel file CaptaincyData.xlsx using this method. Please ensure that you have packages like readxl and Rcpp installed in your system. |
Highlight Environment tab | In the top right corner of RStudio, click on the Environment tab. |
Highlight Import Dataset button
Highlight From Excel option |
In the Environment tab, click on Import Dataset.
From the drop-down menu, select From Excel. |
Highlight Import Excel Data window | A window named Import Excel Data appears. |
Highlight File/Url option | You can select a file on your computer or type the URL from which you want to load an Excel file.
We will select a file on our computer. |
Highlight Browse option | In the upper right corner of this window, near File/Url text field, click on Browse. |
Highlight CaptaincyData.xlsx in the folder myProject | I will select the file CaptaincyData.xlsx located in DataMerging folder.
This folder is in myProject folder on the Desktop. Click Open to load this file. |
Highlight Data Preview option | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
Highlight Code Preview option | At the bottom right corner of this window, you can see the code for importing this Excel file. |
Highlight Import button | Finally, click on the Import button. |
Highlight CaptaincyData in the Source window | The contents of the Excel file are shown here. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |