Difference between revisions of "R/C2/Merging-and-Importing-Data/English"
Sudhakarst (Talk | contribs) |
Sudhakarst (Talk | contribs) |
||
Line 51: | Line 51: | ||
* five '''data frames ''' in different formats and | * five '''data frames ''' in different formats and | ||
* a '''script '''file '''myDataSet.R'''. | * a '''script '''file '''myDataSet.R'''. | ||
− | |||
− | |||
− | |||
− | |||
Please download these files from the '''Code files''' link of this tutorial. | Please download these files from the '''Code files''' link of this tutorial. | ||
Line 77: | Line 73: | ||
For this, click on the '''script myDataSet.R'''. | For this, click on the '''script myDataSet.R'''. | ||
− | |||
'''Script myDataSet.R '''opens in''' RStudio.''' | '''Script myDataSet.R '''opens in''' RStudio.''' | ||
Line 91: | Line 86: | ||
Highlight '''captaincyOne''' in the '''Source''' window | Highlight '''captaincyOne''' in the '''Source''' window | ||
|| We will use some built-in functions of '''R''' to explore''' captaincyOne'''. | || We will use some built-in functions of '''R''' to explore''' captaincyOne'''. | ||
− | |||
For all the built-in functions used in this tutorial, please refer to the '''Additional Material'''. | For all the built-in functions used in this tutorial, please refer to the '''Additional Material'''. | ||
Line 104: | Line 98: | ||
'''summary(captaincyOne)''' | '''summary(captaincyOne)''' | ||
− | |||
Highlight '''Source''' button | Highlight '''Source''' button | ||
Line 227: | Line 220: | ||
|- | |- | ||
|| Show slide | || Show slide | ||
− | |||
Merging data frames | Merging data frames | ||
Line 280: | Line 272: | ||
|| Highlight '''by =''' '''"names" '''in the '''Source''' window | || Highlight '''by =''' '''"names" '''in the '''Source''' window | ||
|| In the '''merge''' function, we use column names by which we want to merge two '''data frames'''. | || In the '''merge''' function, we use column names by which we want to merge two '''data frames'''. | ||
− | |||
Here, it is '''names'''. | Here, it is '''names'''. | ||
Line 360: | Line 351: | ||
'''library(XML)''' | '''library(XML)''' | ||
|| In the '''Source '''window, at the top of the '''script myDataSet.R''', type '''library '''and '''XML '''in parentheses'''. ''' | || In the '''Source '''window, at the top of the '''script myDataSet.R''', type '''library '''and '''XML '''in parentheses'''. ''' | ||
− | |||
Save the '''script '''and run this line. | Save the '''script '''and run this line. | ||
Line 420: | Line 410: | ||
Highlight''' From Excel '''option | Highlight''' From Excel '''option | ||
|| In the '''Environment '''tab, click on '''Import Dataset'''. | || In the '''Environment '''tab, click on '''Import Dataset'''. | ||
− | |||
From the drop-down menu, select '''From Excel'''. | From the drop-down menu, select '''From Excel'''. | ||
Line 429: | Line 418: | ||
|| Highlight '''File/Url''' option | || Highlight '''File/Url''' option | ||
|| You can select a '''file '''on your computer or type the '''url '''from which you want to load an '''excel '''file. | || You can select a '''file '''on your computer or type the '''url '''from which you want to load an '''excel '''file. | ||
− | |||
We will select a file on our computer. | We will select a file on our computer. | ||
Line 438: | Line 426: | ||
|| Highlight '''CaptaincyData.xlsx''' in the folder '''myProject''' | || Highlight '''CaptaincyData.xlsx''' in the folder '''myProject''' | ||
|| I will select the file '''CaptaincyData.xlsx''' located in '''DataMerging '''folder. | || I will select the file '''CaptaincyData.xlsx''' located in '''DataMerging '''folder. | ||
− | |||
This folder is in '''myProject''' folder on the '''Desktop'''. | This folder is in '''myProject''' folder on the '''Desktop'''. | ||
− | |||
Click '''Open '''to load this '''file'''. | Click '''Open '''to load this '''file'''. | ||
Line 463: | Line 449: | ||
Summary | Summary | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|| In this tutorial, we have learnt how to: | || In this tutorial, we have learnt how to: | ||
* Use built-in functions for exploring a '''data frame''' | * Use built-in functions for exploring a '''data frame''' | ||
* Merge two '''data frames''' | * Merge two '''data frames''' | ||
* Import data in different formats in '''R''' | * Import data in different formats in '''R''' | ||
− | |||
|- | |- | ||
Line 480: | Line 458: | ||
Assignment | Assignment | ||
− | |||
− | |||
− | |||
|| We now suggest an assignment. | || We now suggest an assignment. | ||
* Using built-in '''dataset iris''', implement all the functions we have learnt in this tutorial. | * Using built-in '''dataset iris''', implement all the functions we have learnt in this tutorial. | ||
Line 528: | Line 503: | ||
Thank You | Thank You | ||
|| The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). | || The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). | ||
− | |||
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. | ||
|- | |- | ||
|} | |} |
Revision as of 00:28, 20 March 2019
Title of script: Merging and Importing Data
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, data merge, data import, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Merging and Importing Data |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use,
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight data frames and myDataSet.R in the folder myProject |
I have downloaded these files from Code files link.
And moved them to DataMerging folder in myProject folder on the Desktop. I have also set this folder as my Working Directory. |
Let us switch to RStudio. | |
Click myDataSet.R in RStudio
|
Open the script myDataSet.R in RStudio.
For this, click on the script myDataSet.R. Script myDataSet.R opens in RStudio. |
Highlight the Source button | Run this script by clicking on Source button. |
Highlight captaincyOne in the Source window | captaincyOne opens in the Source window. |
[RStudio]
Highlight captaincyOne in the Source window |
We will use some built-in functions of R to explore captaincyOne.
For all the built-in functions used in this tutorial, please refer to the Additional Material. |
First, we will use summary function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
summary(captaincyOne) Highlight Source button |
In the Source window, type summary and then captaincyOne in parentheses.
Save the script and run the current line by pressing Ctrl+Enter keys simultaneously. |
Highlight the output in the Console window | Statistical parameters for each column of captaincyOne are shown on the Console. |
Highlight summary(captaincyOne) in the Source window | In the Source window, press Enter.
Press Enter at the end of every command. |
Now, let us look at class function. | |
[RStudio]
class(captaincyOne) |
In the Source window, type class and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | class function returns the class of captaincyOne, which is data frame. |
Next let us look at typeof function. | |
[RStudio]
typeof(captaincyOne) |
In the Source window, type typeof and then captaincyOne in parentheses.
Save the script and run the current line. |
Highlight the output in the Console window | typeof function returns the storage type of captaincyOne, which is list.
We will learn more about list later in this series. |
Highlight typeof in the Source window | To know more about typeof function, we will access the help section of RStudio. |
[RStudio]
help(typeof) |
In the Console window, type help, within parentheses typeof. Press Enter. |
Highlight Description in the help window | typeof determines the R internal type or storage mode of any object. |
Highlight Files tab in the lower right of RStudio | Click on Files tab. |
Highlight broom icon in the Console window | Clear the Console window by clicking on the broom icon. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight captaincyOne in Source window | Now let’s extract two rows from top of captaincyOne.
For this, we will use head function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
head(captaincyOne, 2) |
In the Source window,
type head within parentheses captaincyOne comma space 2. Save the script and run the current line. |
Highlight the output in the Console window | The top two rows of captaincyOne are shown on the Console window. |
Highlight captaincyOne in the Source window | Click on the data frame captaincyOne. |
Highlight CaptaincyOne in the Source window | Suppose we want to extract two rows from bottom of captaincyOne.
For this, we will use the tail function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
tail(captaincyOne, 2) |
In the Source window, type tail within parentheses captaincyOne comma space 2.
Save the script and run the current line. |
Highlight the output in the Console window | The last two rows of captaincyOne are shown on the Console window. |
Next, let us learn about str function.
This function is used to display the structure of an R object. | |
[RStudio]
str(captaincyOne) |
In the Source window, type str within parentheses captaincyOne.
Save the script and run the current line. |
Highlight the output in the Console window | The structural details of captaincyOne are shown on the Console. |
Now, we will look at merging of data frames. | |
Show slide
Merging data frames |
Merging data frames has advantages like:
|
[RStudio]
Highlight CaptaincyData.csv and CaptaincyData2.csv under Files tab |
We will learn how to merge two data frames CaptaincyData.csv and CaptaincyData2.csv. |
[RStudio]
captaincyTwo <- read.csv("CaptaincyData2.csv") |
We will declare a variable captaincyTwo to store and read CaptaincyData2.csv.
In the Source window, type the following command and press Enter. |
[RStudio]
View(captaincyTwo) |
In the Source window, type View within parentheses captaincyTwo.
Save the script and run the last two lines. |
Highlight captaincyTwo in Source window | The contents of captaincyTwo appear in the Source window. |
Highlight the name of captains in captaincyTwo
Highlight the column drawn in captaincyTwo |
This data frame has the same captains as that in captaincyOne.
However, it has different information about them like the number of matches drawn. |
Highlight captaincyOne in the Source window | Now, we will update captaincyOne by adding information from captaincyTwo.
For this, we use merge function. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
captaincyOne <- merge(captaincyOne, captaincyTwo, by = "names") |
In the Source window, type the following command and press Enter. |
Highlight by = "names" in the Source window | In the merge function, we use column names by which we want to merge two data frames.
Here, it is names. |
[RStudio]
View(captaincyOne) |
Now, type View and captaincyOne in parentheses.
Save the script and run these two lines. |
Highlight captaincyOne in the Source window | The contents of the updated captaincyOne appear in the Source window. |
[RStudio]
Highlight the tabs captaincyOne and captaincyTwo |
Close the two tabs captaincyOne and captaincyTwo. |
Cursor on the interface. | Now, we will learn how to import data of different formats in R. |
[RStudio]
# Importing data in different formats |
We shall add one comment first.
In the Source window, type # hash space Importing data in different formats. |
Highlight CaptaincyData.xml under Files tab | Now, let us import CaptaincyData.xml file.
For that, we need to install XML package. Make sure that you are connected to Internet. |
this information can shown as a text while editing.
Pls mention this editing team. |
We need to install Ubuntu package libxml2-dev
before installing XML package. Information on how to install this package, is provided in the Additional Material. |
[RStudio]
Click in the Console window |
I have already installed libxml2-dev package.
Hence, I will proceed for installing XML package now. |
[RStudio]
install.packages("XML") Highlight the red dot in the Console window |
On the Console window, type install dot packages.
Now, type XML inside double quotes and in parentheses. Press Enter. We will wait until R installs the package. |
Then, we load this package using library function. | |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
Click at the top of the script myDataSet.R | Since we are loading a package, we will add it at the top of the script. |
[RStudio]
library(XML) |
In the Source window, at the top of the script myDataSet.R, type library and XML in parentheses.
Save the script and run this line. |
[RStudio]
Point to the comment. xmldata <- xmlToDataFrame("CaptaincyData.xml") |
Now, in the Source window, click on the next line after the comment Importing data in different formats.
Type the following command and press Enter. |
[RStudio]
View(xmldata ) |
Then type View and xmldata in parentheses.
Save the script and run these two lines. |
Highlight xmldata in the Source window | The contents of the xml file are shown here. |
Highlight CaptaincyData.txt under Files tab | Next let us learn how to import CaptaincyData.txt. |
Highlight myDataSet.R in the Source window | Click on the script myDataSet.R |
[RStudio]
txtdata <- read.table(“CaptaincyData.txt”) |
In the Source window, type the following command and press Enter. |
[RStudio] View(txtdata) |
Next, type View and txtdata in parentheses.
Save the script and run these two lines. |
Highlight txtdata in the Source window | The contents of the text file are shown. |
Highlight CaptaincyData.xlsx under the Files tab | Now, we will learn how to import data from the user interface of Rstudio.
We will import the Excel file CaptaincyData.xlsx using this method. Please ensure that you have packages like readxl and Rcpp installed in your system. |
Highlight Environment tab | In the top right corner of RStudio, click on the Environment tab. |
Highlight Import Dataset button
Highlight From Excel option |
In the Environment tab, click on Import Dataset.
From the drop-down menu, select From Excel. |
Highlight Import Excel Data window | A window named Import Excel Data appears. |
Highlight File/Url option | You can select a file on your computer or type the url from which you want to load an excel file.
We will select a file on our computer. |
Highlight Browse option | In the upper right corner of this window, near File/Url text field, click on Browse. |
Highlight CaptaincyData.xlsx in the folder myProject | I will select the file CaptaincyData.xlsx located in DataMerging folder.
This folder is in myProject folder on the Desktop. Click Open to load this file. |
Highlight Data Preview option | Below the field File/Url, RStudio shows the preview of the Excel file being imported. |
Highlight Code Preview option | At the bottom right corner of this window, you can see the code for importing this excel file. |
Highlight Import button | Finally, click on the Import button. |
Highlight CaptaincyData in the Source window | The contents of the Excel file are shown here. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
|
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Pls post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Pls post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |