Difference between revisions of "R/C2/Merging-and-Importing-Data/English"
Nancyvarkey (Talk | contribs) |
Nancyvarkey (Talk | contribs) m (Nancyvarkey moved page R/C2/Data-Merging-and-Data-Import/English to R/C2/Merging-and-Importing-Data/English without leaving a redirect) |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 12: | Line 12: | ||
Opening slide | Opening slide | ||
− | || Welcome to the spoken tutorial on '''Merging | + | || Welcome to the spoken tutorial on '''Merging and Importing Data''' |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Learning Objectives | + | '''Learning Objectives''' |
|| In this tutorial, we will learn how to: | || In this tutorial, we will learn how to: | ||
* Use '''built-in functions''' for exploring a '''data frame''' | * Use '''built-in functions''' for exploring a '''data frame''' | ||
Line 24: | Line 24: | ||
|| Show slide | || Show slide | ||
− | Pre-requisites | + | '''Pre-requisites''' |
http://spoken-tutorial.org | http://spoken-tutorial.org | ||
|| To understand this tutorial, you should know | || To understand this tutorial, you should know | ||
* '''Data frames''' in '''R''' | * '''Data frames''' in '''R''' | ||
− | * '''R script '''in '''RStudio ''' | + | * '''R script''' in '''RStudio ''' |
* How to set '''working directory''' in '''RStudio''' | * How to set '''working directory''' in '''RStudio''' | ||
+ | |||
If not, please locate the relevant tutorials on '''R''' on this website. | If not, please locate the relevant tutorials on '''R''' on this website. | ||
|- | |- | ||
|| Show slide | || Show slide | ||
− | System Specifications | + | '''System Specifications''' |
− | || This tutorial is recorded on | + | || This tutorial is recorded on, |
− | * '''Ubuntu Linux '''OS version | + | * '''Ubuntu Linux''' OS version 16.04 |
− | * '''R '''version | + | * '''R''' version 3.4.4 |
− | * '''RStudio''' version | + | * '''RStudio''' version 1.1.456 |
− | Install '''R''' version | + | Install '''R''' version 3.2.0 or higher. |
|- | |- | ||
|| Show slide | || Show slide | ||
− | Download files | + | '''Download files''' |
|| For this tutorial, we will use, | || For this tutorial, we will use, | ||
* five '''data frames ''' in different formats and | * five '''data frames ''' in different formats and | ||
Line 54: | Line 55: | ||
|| [Computer screen] | || [Computer screen] | ||
− | Highlight '''data frames '''and''' myDataSet.R '''in the folder '''myProject''' | + | Highlight '''data frames''' and '''myDataSet.R''' in the folder '''myProject''' |
|| I have downloaded these files from '''Code files''' link. | || I have downloaded these files from '''Code files''' link. | ||
And moved them to '''DataMerging '''folder in '''myProject''' folder on the '''Desktop'''. | And moved them to '''DataMerging '''folder in '''myProject''' folder on the '''Desktop'''. | ||
− | I have also set this folder as my '''Working Directory | + | I have also set this folder as my '''Working Directory'''. |
|- | |- | ||
|| | || | ||
Line 67: | Line 68: | ||
Point to''' myDataSet.R''' in '''Rstudio''' | Point to''' myDataSet.R''' in '''Rstudio''' | ||
− | || Open the '''script myDataSet.R''' in '''RStudio | + | || Open the '''script myDataSet.R''' in '''RStudio'''. |
For this, click on the '''script myDataSet.R'''. | For this, click on the '''script myDataSet.R'''. | ||
− | '''Script myDataSet.R '''opens in''' RStudio.''' | + | '''Script myDataSet.R''' opens in '''RStudio.''' |
|- | |- | ||
|| Highlight the '''Source''' button | || Highlight the '''Source''' button | ||
Line 86: | Line 87: | ||
For all the '''built-in functions''' used in this tutorial, please refer to the '''Additional Material'''. | For all the '''built-in functions''' used in this tutorial, please refer to the '''Additional Material'''. | ||
|- | |- | ||
− | || | + | || Cursor on the interface. |
− | || First, we will use '''summary '''function. | + | || First, we will use '''summary''' function. |
|- | |- | ||
|| Highlight '''myDataSet.R '''in the '''Source''' window | || Highlight '''myDataSet.R '''in the '''Source''' window | ||
Line 97: | Line 98: | ||
Highlight '''Source''' button | Highlight '''Source''' button | ||
− | || In the '''Source '''window, type '''summary '''and then '''captaincyOne '''in parentheses. | + | || In the '''Source''' window, type '''summary''' and then '''captaincyOne '''in parentheses. |
− | Save the '''script '''and run the current line by pressing '''Ctrl+Enter''' keys simultaneously. | + | Save the '''script''' and run the current line by pressing '''Ctrl+Enter''' keys simultaneously. |
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
|| In the '''Console''' window, scroll up to locate the output. | || In the '''Console''' window, scroll up to locate the output. | ||
− | '''Statistical parameters''' for each column of '''captaincyOne '''are shown on the '''Console'''. | + | '''Statistical parameters''' for each column of '''captaincyOne''' are shown on the '''Console'''. |
|- | |- | ||
Line 110: | Line 111: | ||
|| In the '''Source''' window, press '''Enter'''. | || In the '''Source''' window, press '''Enter'''. | ||
− | Press '''Enter '''at the end of every command. | + | Press '''Enter''' at the end of every command. |
|- | |- | ||
|| | || | ||
Line 118: | Line 119: | ||
'''class(captaincyOne)''' | '''class(captaincyOne)''' | ||
− | || In the '''Source '''window, type '''class '''and then '''captaincyOne '''in parentheses. | + | || In the '''Source''' window, type '''class''' and then '''captaincyOne''' in parentheses. |
Save the '''script '''and run the current line. | Save the '''script '''and run the current line. | ||
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || '''class function''' returns the class of '''captaincyOne | + | || '''class function''' returns the class of '''captaincyOne''', which is '''data frame'''. |
|- | |- | ||
− | || | + | || Point to '''Source''' window. |
|| Next let us look at '''typeof function'''. | || Next let us look at '''typeof function'''. | ||
|- | |- | ||
Line 131: | Line 132: | ||
'''typeof(captaincyOne)''' | '''typeof(captaincyOne)''' | ||
− | || In the '''Source '''window, type '''typeof '''and then '''captaincyOne '''in parentheses. | + | || In the '''Source '''window, type '''typeof''' and then '''captaincyOne''' in parentheses. |
− | Save the '''script '''and run the current line. | + | Save the '''script''' and run the current line. |
|- | |- | ||
− | || Highlight the output in the '''Console '''window | + | || Highlight the output in the '''Console''' window |
|| '''typeof function''' returns the storage type of '''captaincyOne''', which is '''list'''. | || '''typeof function''' returns the storage type of '''captaincyOne''', which is '''list'''. | ||
Line 154: | Line 155: | ||
|- | |- | ||
|| Highlight broom icon in the '''Console''' window | || Highlight broom icon in the '''Console''' window | ||
− | || Clear the '''Console '''window by clicking on the broom icon. | + | || Clear the '''Console''' window by clicking on the broom icon. |
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source''' window | || Highlight '''captaincyOne''' in the '''Source''' window | ||
Line 170: | Line 171: | ||
'''head(captaincyOne, 2)''' | '''head(captaincyOne, 2)''' | ||
− | || In the '''Source '''window, type '''head '''within parentheses '''captaincyOne comma '''space 2. | + | || In the '''Source''' window, type '''head''' within parentheses '''captaincyOne comma '''space 2. |
− | Save the '''script '''and run the current line. | + | Save the '''script''' and run the current line. |
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || The top two rows of '''captaincyOne '''are shown on the '''Console''' window. | + | || The top two rows of '''captaincyOne''' are shown on the '''Console''' window. |
|- | |- | ||
|| Highlight '''captaincyOne''' in the '''Source''' window | || Highlight '''captaincyOne''' in the '''Source''' window | ||
Line 191: | Line 192: | ||
'''tail(captaincyOne, 2)''' | '''tail(captaincyOne, 2)''' | ||
− | || In the '''Source '''window, type '''tail '''within parentheses '''captaincyOne comma '''space 2. | + | || In the '''Source''' window, type '''tail '''within parentheses '''captaincyOne comma '''space 2. |
Save the '''script '''and run the current line. | Save the '''script '''and run the current line. | ||
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
− | || The last two rows of '''captaincyOne '''are shown on the '''Console''' window. | + | || The last two rows of '''captaincyOne''' are shown on the '''Console''' window. |
|- | |- | ||
− | || | + | || Cursor on the interface. |
|| Next, let us learn about '''str function'''. | || Next, let us learn about '''str function'''. | ||
Line 206: | Line 207: | ||
'''str(captaincyOne)''' | '''str(captaincyOne)''' | ||
− | || In the '''Source '''window, type '''str '''within parentheses '''captaincyOne'''. | + | || In the '''Source''' window, type '''str '''within parentheses '''captaincyOne'''. |
− | Save the '''script '''and run the current line. | + | Save the '''script''' and run the current line. |
|- | |- | ||
|| Highlight the output in the '''Console '''window | || Highlight the output in the '''Console '''window | ||
Line 232: | Line 233: | ||
Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' ''' | Highlight '''CaptaincyData.csv''' and '''CaptaincyData2.csv''' under '''Files''' tab''' ''' | ||
− | || We will learn how to '''merge''' two '''data frames CaptaincyData.csv '''and '''CaptaincyData2.csv'''. | + | || We will learn how to '''merge''' two '''data frames CaptaincyData.csv''' and '''CaptaincyData2.csv'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 259: | Line 260: | ||
|- | |- | ||
|| Highlight '''captaincyOne '''in the '''Source '''window | || Highlight '''captaincyOne '''in the '''Source '''window | ||
− | || Now, we will update '''captaincyOne '''by adding information from '''captaincyTwo'''. | + | || Now, we will update '''captaincyOne''' by adding information from '''captaincyTwo'''. |
For this, we use '''merge function'''. | For this, we use '''merge function'''. | ||
Line 267: | Line 268: | ||
|- | |- | ||
− | || | + | || Drag the Source window. |
|| I am resizing the '''Source''' window. | || I am resizing the '''Source''' window. | ||
Line 294: | Line 295: | ||
Highlight the tabs '''captaincyOne '''and '''captaincyTwo''' | Highlight the tabs '''captaincyOne '''and '''captaincyTwo''' | ||
− | || Close the two tabs '''captaincyOne '''and '''captaincyTwo'''. | + | || Close the two tabs '''captaincyOne''' and '''captaincyTwo'''. |
|- | |- | ||
|| Cursor on the interface. | || Cursor on the interface. | ||
Line 322: | Line 323: | ||
Click in the '''Console''' window | Click in the '''Console''' window | ||
− | || I have already installed '''libxml2-dev '''package. | + | || I have already installed '''libxml2-dev''' package. |
Hence, I will proceed for installing '''XML''' package now. | Hence, I will proceed for installing '''XML''' package now. | ||
Line 331: | Line 332: | ||
Highlight the red dot in the '''Console''' window | Highlight the red dot in the '''Console''' window | ||
− | || On the '''Console '''window, type '''install dot packages'''. | + | || On the '''Console '''window, type '''install dot packages'''. |
− | Now | + | Now, type '''XML'''inside double quotes and in parentheses. |
Press '''Enter'''. | Press '''Enter'''. |
Latest revision as of 18:05, 2 April 2019
Title of script: Merging and Importing Data
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data merge, data import, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Merging and Importing Data |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on,
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use,
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight data frames and myDataSet.R in the folder myProject |
I have downloaded these files from Code files link.
And moved them to DataMerging folder in myProject folder on the Desktop. I have also set this folder as my Working Directory. |
Let us switch to RStudio. | |
Click myDataSet.R in RStudio
Point to myDataSet.R in Rstudio |
Open the script myDataSet.R in RStudio.
For this, click on the script myDataSet.R. Script myDataSet.R opens in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight captaincyOne in the Source window | captaincyOne appears in the Source window. |
[RStudio]
Highlight captaincyOne in the Source window |
We will use some built-in functions of R to explore captaincyOne.
For all the built-in functions used in this tutorial, please refer to the Additional Material. |
Cursor on the interface. | First, we will use summary function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
summary(captaincyOne) Highlight Source button |
In the Source window, type summary and then captaincyOne in parentheses.
Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight the output in the Console window | In the Console window, scroll up to locate the output.
Statistical parameters for each column of captaincyOne are shown on the Console. |
Highlight summary(captaincyOne) in the Source window | In the Source window, press Enter.
Press Enter at the end of every command. |
Now, let us look at class function. | |
[RStudio]
class(captaincyOne) |
In the Source window, type class and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | class function returns the class of captaincyOne, which is data frame. |
Point to Source window. | Next let us look at typeof function. |
[RStudio]
typeof(captaincyOne) |
In the Source window, type typeof and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | typeof function returns the storage type of captaincyOne, which is list. |
Highlight typeof in the Source window | To know more about typeof function, we will access the help section of RStudio. |
[RStudio]
help(typeof) |
In the Console window, type help, within parentheses typeof. Press Enter. |
Highlight Description in the help window | typeof determines the R internal type or storage mode of any object. |
Highlight Files tab in the lower right of RStudio | Click on the Files tab. |
Highlight broom icon in the Console window | Clear the Console window by clicking on the broom icon. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight captaincyOne in Source window | Now, let us extract two rows from top of captaincyOne.
For this, we will use head function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
head(captaincyOne, 2) |
In the Source window, type head within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The top two rows of captaincyOne are shown on the Console window. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight CaptaincyOne in the Source window | Suppose we want to extract two rows from bottom of captaincyOne.
For this, we will use the tail function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
tail(captaincyOne, 2) |
In the Source window, type tail within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The last two rows of captaincyOne are shown on the Console window. |
Cursor on the interface. | Next, let us learn about str function.
This function is used to display the structure of an R object. |
[RStudio]
str(captaincyOne) |
In the Source window, type str within parentheses captaincyOne.
Save the script and run the current line. |
Highlight the output in the Console window | The structural details of captaincyOne are shown on the Console. |
Now, we will look at merging of data frames. | |
Show slide
Merging data frames |
Merging data frames has advantages like:
|
Let us switch to RStudio. | |
[RStudio]
Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab |
We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
[RStudio]
captaincyTwo <- read.csv("CaptaincyData2.csv") |
We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
In the Source window, type the following command and press Enter. |
[RStudio]
View(captaincyTwo) |
Now, type View within parentheses captaincyTwo.
Save the script and run the last two lines. |
Highlight captaincyTwo in Source window | The contents of captaincyTwo appear in the Source window. |
Highlight the name of captains in captaincyTwo
Highlight the column drawn in captaincyTwo |
This data frame has the same captains as that in captaincyOne.
However, it has different information about them like the number of matches drawn. |
Highlight captaincyOne in the Source window | Now, we will update captaincyOne by adding information from captaincyTwo.
For this, we use merge function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Drag the Source window. | I am resizing the Source window. |
[RStudio]
captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names") |
In the Source window, type the following command. Press Enter. |
Highlight by = "names" in the Source window | In the merge function, we use column names by which we want to merge two data frames.
Here, it is names. |
[RStudio]
View(captaincyOne) |
Now, type View and captaincyOne in parentheses.
Save the script and run these two lines. |
Highlight captaincyOne in the Source window | The contents of the updated captaincyOne appear in the Source window. |
[RStudio]
Highlight the tabs captaincyOne and captaincyTwo |
Close the two tabs captaincyOne and captaincyTwo. |
Cursor on the interface. | Now, we will learn how to import data of different formats in R. |
[RStudio]
# Importing data in different formats |
We shall add one comment first.
In the Source window, type # hash space Importing data in different formats. |
Highlight CaptaincyData.xml under Files tab | Now, let us import CaptaincyData.xml file.
For that, we need to install XML package. Make sure that you are connected to Internet. |
We need to install Ubuntu package libxml2-dev before installing XML package.
Information on how to install this package, is provided in the Additional Material. | |
[RStudio]
Click in the Console window |
I have already installed libxml2-dev package.
Hence, I will proceed for installing XML package now. |
[RStudio]
install.packages("XML") Highlight the red dot in the Console window |
On the Console window, type install dot packages.
Now, type XMLinside double quotes and in parentheses. Press Enter. We will wait until R installs the package. |
Then, we load this package using library function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Click at the top of the script myDataSet.R | Since we are loading a package, we will add it at the top of the script. |
[RStudio]
library(XML) |
In the Source window, scroll up.
Now, at the top of the script myDataSet.R, type library and XML in parentheses. Save the script and run this line. |
[RStudio]
Point to the comment. xmldata <- xmlToDataFrame("CaptaincyData.xml") |
Now, in the Source window, click on the next line after the comment Importing data in different formats.
Type the following command and press Enter. |
[RStudio]
View(xmldata ) |
Then type View and xmldata in parentheses.
Save the script and run these two lines. |
Highlight xmldata in the Source window | The contents of the xml file are shown here. |
Highlight CaptaincyData.txt under Files tab | Next let us learn how to import CaptaincyData.txt. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
txtdata <- read.table(“CaptaincyData.txt”) |
In the Source window, type the following command and press Enter. |
[RStudio] View(txtdata) |
Next, type View and txtdata in parentheses.
Save the script and run these two lines. |
Highlight txtdata in the Source window | The contents of the txt file are shown. |
Highlight CaptaincyData.xlsx under the Files tab | Now, we will learn how to import data from user interface of Rstudio.
I am resizing the Source window. We will import the Excel file CaptaincyData.xlsx using this method. Please ensure that you have packages like readxl and Rcpp installed in your system. |
Highlight Environment tab | In the top right corner of RStudio, click on the Environment tab. |
Highlight Import Dataset button
Highlight From Excel option |
In the Environment tab, click on Import Dataset.
From the drop-down menu, select From Excel. |
Highlight Import Excel Data window | A window named Import Excel Data appears. |
Highlight File/Url option | You can select a file on your computer or type the URL from which you want to load an Excel file.
We will select a file on our computer. |
Highlight Browse option | In the upper right corner of this window, near File/Url text field, click on Browse. |
Highlight CaptaincyData.xlsx in the folder myProject | I will select the file CaptaincyData.xlsx located in DataMerging folder.
This folder is in myProject folder on the Desktop. Click Open to load this file. |
Highlight Data Preview option | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
Highlight Code Preview option | At the bottom right corner of this window, you can see the code for importing this Excel file. |
Highlight Import button | Finally, click on the Import button. |
Highlight CaptaincyData in the Source window | The contents of the Excel file are shown here. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |