R/C2/Data-types-and-Factors/English
Title of script: Data Types and Factors
Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)
Keywords: R, RStudio, factor, levels, categorical, video tutorial
Visual Cue | Narration |
Show slide
Opening slide |
Welcome to the spoken tutorial on Data types and Factors. |
Show slide
Learning Objectives |
In this tutorial, we will learn how to:
|
Show slide
Pre-requisites |
To understand this tutorial, you should know,
If not, please locate the relevant tutorials on R on this website. |
Show slide
System Specifications |
This tutorial is recorded on,
Install R version 3.2.0 or higher. |
Show slide
Download files |
For this tutorial, we will use
Please download these files from the Code files link of this tutorial. |
[Computer screen]
Highlight CaptaincyData.csv and myFactor.R in the folder DataTypes |
I have downloaded and moved these files to DataTypes folder.
This folder is located in myProject folder on my Desktop. I have also set this folder as my Working Directory. |
Show slide
R-Objects |
In R programming language,
It means, everything in R is an object. |
Show slide
Type of R-Objects |
The frequently used R-Objects are:
The simplest of these objects is the vector object. |
Show slide
Types of vectors |
R language has the following atomic vector types:
By atomic, we mean that, vector holds the data of a single data type. |
Show slide
Types of vectors |
Now, we will learn how to declare these vector types. |
Let us switch to RStudio. | |
[RStudio]
testData <- TRUE class(testData) |
On the Console window, type test Data with capital D.
Press Alt and -(hyphen) keys simultaneously. Now type TRUE in capitals. Press Enter. Now, type class and then testData in parentheses. Press Enter. |
Highlight the output in the Console window | Observe that the data type shown here is logical. |
[RStudio]
testData <- "TRUE" class(testData) |
Now, type testData
Press Alt and -(hyphen) keys simultaneously. Type, within double quotes TRUE in capitals. Now, type class and then testData in parentheses. Press Enter. Press Enter at the end of every command. |
Highlight the output in the Console window | Observe that the data type shown here is character. |
Highlight TRUE and "TRUE" in the Console window | Note that, R considers TRUE as logical data and TRUE within double quotes as character data. |
Cursor on the interface. | Now, we will learn about numeric data type.
For this, we will assign a value of 12 to our testData. |
[RStudio]
testData <- "TRUE" |
We will modify the previous command.
Click in the Console window and press the up arrow key twice. The command with testData appears. Delete the word TRUE in capitals. |
[RStudio]
testData <- 12 class(testData) |
Now, type 12 and press Enter.
Type class and then testData in parentheses. Press Enter. |
Highlight the output in the Console window | The data type shown here is numeric. |
[RStudio]
testData <- 12.5 class(testData) |
Now, we will assign a value of 12.5 to our testData.
In the Console window, press the up arrow key. Locate the command with testData and assign 12 point 5 to this variable. Press Enter. Now, type class and then testData in parentheses. |
Highlight the output in the Console window | The data type is shown again as numeric. |
Highlight 12 and 12.5 in the Console window | Here, R considers both 12 and 12.5 as numeric. |
In order to declare an integer variable in R, we will invoke the as dot integer function. | |
[RStudio]
testData <- as.integer(12) class(testData) |
In the Console window, type testData
Press Alt and -(hyphen) keys simultaneously. Now, type as dot integer and in parentheses 12. Now, type class and then testData in parentheses. |
Highlight the output in the Console window | The data type is shown as integer. |
We can also declare an integer by appending an L suffix. | |
[RStudio]
testData <- 12L class(testData) |
On the Console window, type the following commands. |
Highlight the output in the Console window. | Again, the data type shown is integer. |
To know more about the vector types, please refer to the Additional material section on this website. | |
Highlight myFactor.R in the Files window of RStudio | Open the script myFactor.R. |
Click and drag the Source window. | I am resizing the Source window. |
Highlight the Source button | Run this script by clicking on Source button. |
Click and drag the Source window. | I am resizing the Source window. |
Highlight captaincy in the Source window | captaincy opens in the Source window. |
Highlight captaincy in the Source window | Let us find the data type for the data in each column of captaincy, using str function. |
Highlight myFactor.R in the Source window | Click on the script myFactor.R |
[RStudio]
str(captaincy) |
In the Source window, type str and within parentheses captaincy.
Save the script and run the current line by pressing Ctrl + Enter keys simultaneously. |
Drag the Console window. | I am resizing the Console window. |
Highlight output in the Console window | On the Console, the details of captaincy are shown.
There are 6 observations of 9 variables. |
Highlight Factor in the Console window | The structure of names in the captaincy is denoted as Factor. |
Cursor on the interface. | In R language, Factors are data objects.
They are used to categorize the data and store it as levels. |
Show Slide
Factors in R |
Factors are variables, which take on a limited number of different values.
They are often referred to as categorical variables. |
Let us switch to RStudio. | |
Highlight captaincy in the Source window | Click on captaincy data frame. |
Click and drag the Source window. | I am resizing the Source window. |
Highlight names column in the Source window | We will look at the data in names column of captaincy. |
Highlight myFactor.R in the Source window | Click on the script myFactor.R |
[RStudio]
print(captaincy$names) |
In the Source window, type print
within parentheses, captaincy dollar sign names. Here dollar sign is used to extract elements by name. Run the current line. |
Highlight output in the Console window | The names of the captains are shown in the Console window.
Also, the levels are shown. |
Highlight Levels in the Console window | Levels are distinct values in a Factor.
R language considers names as a Factor. |
Highlight captaincy in the Source window | Click on captaincy data frame. |
Highlight names column in the Source window | names should be of character data type.
We will change its type from Factor to character. |
Highlight myFactor.R in the Source window | Click on the script myFactor.R |
[RStudio]
captaincy$names <- as.character(captaincy$names) |
In the Source window, type captaincy dollar sign names
Press Alt and -(hyphen) keys simultaneously. Now, type as dot character within parentheses, captaincy dollar sign names. |
Now, we will check the variable type of names again. | |
[RStudio]
str(captaincy) |
In the Source window, type str and captaincy in parentheses.
Run the last two lines. |
Click and drag the Source window. | I am resizing the Source window. |
Highlight output in the Console window | Now, the type of names is changed to character. |
Let us learn how to identify a categorical variable. | |
Click on captaincy data frame. | Click on captaincy data frame. |
Drag the Console window. | I am resizing the Console window. |
Highlight captaincy in the Source window | formats represents the number of cricket formats played by a captain.
There are three formats of cricket played at the international level:
|
Highlight formats in the Source window | Accordingly, formats can take one of three distinct values: 1, 2 or 3.
Observe that formats in captaincy should be a categorical variable. |
Highlight formats in the Console window | At this instant, the variable type of formats is set as integer. |
Now, we will change the type of formats from integer to Factor. | |
Highlight myFactor.R in the Source window | Click on the script myFactor.R |
[RStudio]
captaincy$formats <- factor(captaincy$formats) |
In the Source window, type captaincy dollar sign formats
Press Alt and -(hyphen) keys simultaneously. Type factor within parentheses, captaincy dollar sign formats. |
Highlight factor in the Source window | factor function is used to create a factor. |
[RStudio]
str(captaincy) |
Now, type str and in parentheses captaincy.
Run the last two lines. |
Highlight formats in the Console window | formats is shown as Factor with 3 different levels 1, 2 and 3.
These levels are of character type. So, a factor’s levels are always character values. |
We can also check the levels of a factor variable using levels function. | |
[RStudio]
levels(captaincy$formats) |
In the Source window, type levels, within parentheses captaincy dollar sign formats.
Run the current line. |
Highlight output in the Console window | The levels of formats are shown as 1 2 3. |
Highlight "1" "2" "3" in the Console window | We can also change the values of levels using levels function.
Let us change the levels of formats from 1, 2, 3 in digits, to One, Two, Three in words |
[RStudio]
levels(captaincy$formats) <- c("One", "Two", "Three") |
In the Source window, type the following command.
Press Enter. |
[RStudio]
print(captaincy$formats) |
Now, type print, within parentheses captaincy dollar sign formats.
Save the script and run the last two lines. |
Highlight output in the Console window | The values and levels of formats have been changed. |
Let us summarize what we have learnt. | |
Show Slide
Summary |
In this tutorial, we have learnt how to:
|
Show Slide
Assignment |
We now suggest an assignment.
If yes, change it to categorical. |
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Please post your general queries in this forum. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the TBC project.
For more details, please visit these sites. |
Show Slide
Acknowledgement |
The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
Show Slide
Thank You |
The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |