Difference between revisions of "R/C2/Data-types-and-Factors/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
Line 18: Line 18:
 
Learning Objectives
 
Learning Objectives
 
|| In this tutorial, we will learn how to:  
 
|| In this tutorial, we will learn how to:  
* Find types of '''vectors '''
+
* Find '''types''' of '''vectors '''
* Identify '''categorical '''variables
+
* Identify '''categorical variables'''
* Use '''factor '''and '''levels '''function
+
* Use '''factor '''and '''levels function'''
  
 
|-  
 
|-  
Line 60: Line 60:
  
 
R-Objects  
 
R-Objects  
|| In R programing language,  
+
|| In '''R programming language''',  
* Variables are not declared as some data type
+
* '''Variables''' are not declared as some '''data type'''
* Variables are assigned with '''R-Objects.'''  
+
* '''Variables''' are assigned with '''R-Objects.'''  
* The data type of the '''R-Object''' becomes the data type of the variable.  
+
* The '''data type''' of the '''R-Object''' becomes the '''data type''' of the '''variable'''.  
  
It means, everything in R is an '''object'''.  
+
It means, everything in '''R''' is an '''object'''.  
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
Line 71: Line 71:
 
Type of R-Objects  
 
Type of R-Objects  
 
|| The frequently used '''R-Objects''' are:
 
|| The frequently used '''R-Objects''' are:
* Vectors
+
* '''Vectors'''
* Lists
+
* '''Lists'''
* Matrices
+
* '''Matrices'''
* Factors and  
+
* '''Factors''' and  
* Data Frames
+
* '''Data Frames'''
  
The simplest of these '''object'''s is the '''vector''' '''object'''.  
+
The simplest of these '''object'''s is the '''vector object'''.  
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
  
 
Types of vectors
 
Types of vectors
|| R language has the following '''atomic''' '''vector''' types:
+
|| '''R language''' has the following '''atomic vector types''':
* Logical
+
* '''Logical'''
* Integer
+
* '''Integer'''
* Numeric
+
* '''Numeric'''
* Complex and
+
* '''Complex''' and
* Character
+
* '''Character'''
  
By '''atomic''', we mean that, '''vector''' holds the data of a single data type.
+
By '''atomic''', we mean that, '''vector''' holds the data of a single '''data type'''.
 
|-  
 
|-  
 
|| Show slide  
 
|| Show slide  
  
 
Types of vectors
 
Types of vectors
|| Now, we will learn how to declare these vector types.
+
|| Now, we will learn how to declare these '''vector types'''.
 
|-  
 
|-  
 
||  
 
||  
Line 104: Line 104:
  
 
'''class(testData)'''
 
'''class(testData)'''
|| On the '''Console '''window, type '''test Data '''with capital D'''.'''
+
|| On the '''Console '''window, type '''test Data '''with capital D.
  
 
Press '''Alt''' and '''-'''(hyphen) keys simultaneously.  
 
Press '''Alt''' and '''-'''(hyphen) keys simultaneously.  
Line 113: Line 113:
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| Observe that the data type shown here is '''logical'''.  
+
|| Observe that the '''data type''' shown here is '''logical'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 128: Line 128:
 
Now, type '''class '''and then '''testData '''in parentheses. Press '''Enter'''.  
 
Now, type '''class '''and then '''testData '''in parentheses. Press '''Enter'''.  
  
Press '''Enter''' at the end of every command.  
+
Press '''Enter''' at the end of every '''command'''.  
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| Observe that the data type shown here is '''character'''.  
+
|| Observe that the '''data type''' shown here is '''character'''.  
 
|-  
 
|-  
 
|| Highlight '''TRUE''' and '''"TRUE" '''in the '''Console''' window  
 
|| Highlight '''TRUE''' and '''"TRUE" '''in the '''Console''' window  
Line 137: Line 137:
 
|-  
 
|-  
 
|| Cursor on the interface.
 
|| Cursor on the interface.
|| Now, we will learn about '''numeric''' data type.
+
|| Now, we will learn about '''numeric data type'''.
  
 
For this, we will assign a value of 12 to our '''testData'''.
 
For this, we will assign a value of 12 to our '''testData'''.
Line 145: Line 145:
 
'''testData <- "TRUE"'''
 
'''testData <- "TRUE"'''
  
|| We will modify the previous command.  
+
|| We will modify the previous '''command'''.  
  
 
Click in the '''Console '''window and press the up arrow key twice.  
 
Click in the '''Console '''window and press the up arrow key twice.  
Line 165: Line 165:
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| The data type shown here is '''numeric'''.  
+
|| The '''data type''' shown here is '''numeric'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 176: Line 176:
 
In the '''Console '''window, press the up arrow key.  
 
In the '''Console '''window, press the up arrow key.  
  
Locate the command with '''testData''' and assign 12 point 5 to this variable. Press '''Enter'''.  
+
Locate the '''command''' with '''testData''' and assign 12 point 5 to this '''variable'''. Press '''Enter'''.  
  
 
Now, type '''class '''and then '''testData '''in parentheses.
 
Now, type '''class '''and then '''testData '''in parentheses.
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| The data type is shown again as '''numeric'''.  
+
|| The '''data type''' is shown again as '''numeric'''.  
 
|-  
 
|-  
 
|| Highlight 12 and 12.5 in the '''Console''' window
 
|| Highlight 12 and 12.5 in the '''Console''' window
Line 187: Line 187:
 
|-  
 
|-  
 
||  
 
||  
|| In order to declare an '''integer '''variable in '''R''', we will invoke the '''as dot''' '''integer '''function.
+
|| In order to declare an '''integer variable''' in '''R''', we will invoke the '''as dot integer function'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 196: Line 196:
 
|| In the '''Console '''window, type '''testData '''
 
|| In the '''Console '''window, type '''testData '''
  
Press '''Alt''' and '''-'''(hyphen) keys simultaneously.  
+
Press '''Alt''' and -(hyphen) keys simultaneously.  
  
 
Now, type '''as dot integer''' and in parentheses 12.  
 
Now, type '''as dot integer''' and in parentheses 12.  
Line 203: Line 203:
 
|-  
 
|-  
 
|| Highlight the output in the '''Console '''window
 
|| Highlight the output in the '''Console '''window
|| The data type is shown as '''integer'''.  
+
|| The '''data type''' is shown as '''integer'''.  
 
|-  
 
|-  
 
||  
 
||  
Line 215: Line 215:
 
|| On the '''Console '''window, type the following commands.  
 
|| On the '''Console '''window, type the following commands.  
 
|-  
 
|-  
|| Highlight the output in the '''Console '''window
+
|| Highlight the output in the '''Console '''window.
|| Again, the data type shown is '''integer'''.  
+
|| Again, the '''data type''' shown is '''integer'''.  
 
|-  
 
|-  
 
||  
 
||  
|| To know more about the vector types, please refer to the '''Additional material '''section on this website.  
+
|| To know more about the '''vector types''', please refer to the '''Additional material '''section on this website.  
 
|-  
 
|-  
|| Highlight '''myFactor.R''' in the '''Files '''window''' '''of '''RStudio '''
+
|| Highlight '''myFactor.R''' in the '''Files '''window of '''RStudio '''
 
|| Open the '''script myFactor.R.'''
 
|| Open the '''script myFactor.R.'''
 
|-  
 
|-  
Line 237: Line 237:
 
|-  
 
|-  
 
|| Highlight '''captaincy '''in the '''Source '''window  
 
|| Highlight '''captaincy '''in the '''Source '''window  
|| Let us find the data type, for the data in each column of '''captaincy, '''using''' str '''function.
+
|| Let us find the '''data type''' for the data in each column of '''captaincy, '''using''' str function'''.
 
|-  
 
|-  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
|| Click on the '''script''' '''myFactor.R'''
+
|| Click on the '''script myFactor.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''str(captaincy)'''
 
'''str(captaincy)'''
|| In the '''Source '''window, type '''str '''and''' '''within parentheses '''captaincy'''.
+
|| In the '''Source '''window, type '''str '''and within parentheses '''captaincy'''.
  
 
Save the '''script '''and run the current line by pressing '''Ctrl '''+''' Enter''' keys simultaneously.  
 
Save the '''script '''and run the current line by pressing '''Ctrl '''+''' Enter''' keys simultaneously.  
Line 255: Line 255:
 
|| On the '''Console''', the details of '''captaincy '''are shown.  
 
|| On the '''Console''', the details of '''captaincy '''are shown.  
  
There are 6 observations of 9 variables.  
+
There are 6 observations of 9 '''variables'''.  
 
|-  
 
|-  
 
|| Highlight '''Factor '''in the '''Console '''window  
 
|| Highlight '''Factor '''in the '''Console '''window  
Line 261: Line 261:
 
|-  
 
|-  
 
|| Cursor on the interface.
 
|| Cursor on the interface.
|| In R language, '''Factors''' are data objects.  
+
|| In '''R language, Factors''' are '''data objects'''.  
  
They are used to categorize the data and store it as levels.
+
They are used to categorize the data and store it as '''levels'''.
 
|-  
 
|-  
 
|| Show Slide  
 
|| Show Slide  
  
 
Factors in R
 
Factors in R
|| '''Factors '''are variables, which can be assigned a limited number of different values.
+
|| '''Factors '''are '''variables''', which can be assigned a limited number of different values.
  
They are often referred to as categorical variables.
+
They are often referred to as '''categorical variables'''.
 
|-  
 
|-  
 
||  
 
||  
Line 282: Line 282:
 
|-  
 
|-  
 
|| Highlight '''names '''column in the '''Source '''window  
 
|| Highlight '''names '''column in the '''Source '''window  
|| We will look at the data''' '''in '''names '''column of '''captaincy'''.
+
|| We will look at the data in '''names '''column of '''captaincy'''.
 
|-  
 
|-  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
|| Click on the '''script''' '''myFactor.R'''
+
|| Click on the '''script myFactor.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 294: Line 294:
 
within parentheses, '''captaincy '''dollar sign '''names. '''
 
within parentheses, '''captaincy '''dollar sign '''names. '''
  
Here dollar sign is used to extract elements by name.
+
Here '''dollar sign''' is used to extract '''elements''' by name.
  
 
Run the current line.  
 
Run the current line.  
Line 301: Line 301:
 
|| The '''names '''of the captains are shown in the '''Console '''window.
 
|| The '''names '''of the captains are shown in the '''Console '''window.
  
Also, the '''Levels '''are shown.
+
Also, the '''levels '''are shown.
 
|-  
 
|-  
 
|| Highlight '''Levels''' in the '''Console''' window  
 
|| Highlight '''Levels''' in the '''Console''' window  
 
|| '''Levels '''are distinct values in a '''Factor'''.  
 
|| '''Levels '''are distinct values in a '''Factor'''.  
  
R language considers '''names''' as '''Factor'''.  
+
'''R language''' considers '''names''' as '''Factor'''.  
 
|-  
 
|-  
 
|| Highlight '''captaincy''' in the '''Source''' window  
 
|| Highlight '''captaincy''' in the '''Source''' window  
Line 312: Line 312:
 
|-  
 
|-  
 
|| Highlight '''names '''column in the '''Source '''window  
 
|| Highlight '''names '''column in the '''Source '''window  
|| '''names '''should be of '''character '''data type.  
+
|| '''names '''should be of '''character data type'''.  
  
 
We will change its type from '''Factor '''to '''character'''.
 
We will change its type from '''Factor '''to '''character'''.
 
|-  
 
|-  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
|| Click on the '''script''' '''myFactor.R'''
+
|| Click on the '''script myFactor.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''captaincy$names <- as.character(captaincy$names)'''
 
'''captaincy$names <- as.character(captaincy$names)'''
|| In the '''Source '''window, type '''captaincy '''dollar sign '''names '''
+
|| In the '''Source '''window, type '''captaincy dollar sign names'''
  
Press '''Alt''' and '''-'''(hyphen) keys simultaneously.
+
Press '''Alt''' and -(hyphen) keys simultaneously.
  
Now, type '''as dot''' '''character '''within parentheses, '''captaincy dollar''' sign '''names'''.  
+
Now, type '''as dot character '''within parentheses, '''captaincy dollar sign names'''.  
 
|-  
 
|-  
 
||  
 
||  
|| Now, we will check the variable type of '''names''' again.
+
|| Now, we will check the '''variable type''' of '''names''' again.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 345: Line 345:
 
|-  
 
|-  
 
||  
 
||  
|| Let us learn how to identify a '''categorical '''variable.  
+
|| Let us learn how to identify a '''categorical variable'''.  
 
|-  
 
|-  
 
|| Click on '''captaincy data frame'''.  
 
|| Click on '''captaincy data frame'''.  
Line 365: Line 365:
 
|| Accordingly, '''formats''' can take one of three distinct values: 1, 2 or 3.
 
|| Accordingly, '''formats''' can take one of three distinct values: 1, 2 or 3.
  
Observe that '''formats '''in '''captaincy''' should be a categorical variable.
+
Observe that '''formats '''in '''captaincy''' should be a '''categorical variable'''.
 
|-  
 
|-  
 
|| Highlight '''formats '''in the '''Console '''window  
 
|| Highlight '''formats '''in the '''Console '''window  
|| At this instant, the variable type of formats is set as '''integer'''.  
+
|| At this instant, the '''variable type''' of formats is set as '''integer'''.  
 
|-  
 
|-  
 
||  
 
||  
Line 374: Line 374:
 
|-  
 
|-  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
 
|| Highlight '''myFactor.R '''in the '''Source''' window  
|| Click on the '''script''' '''myFactor.R'''
+
|| Click on the '''script myFactor.R'''
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
  
 
'''captaincy$formats <- factor(captaincy$formats)'''
 
'''captaincy$formats <- factor(captaincy$formats)'''
|| In the '''Source '''window, type '''captaincy '''dollar sign '''formats '''
+
|| In the '''Source '''window, type '''captaincy dollar sign formats'''
  
Press '''Alt''' and '''-'''(hyphen) keys simultaneously.
+
Press '''Alt''' and -(hyphen) keys simultaneously.
  
Type '''factor''' within parentheses, '''captaincy dollar''' sign '''formats'''.  
+
Type '''factor''' within parentheses, '''captaincy dollar sign formats'''.  
 
|-  
 
|-  
 
|| Highlight '''factor '''in the '''Source '''window
 
|| Highlight '''factor '''in the '''Source '''window
|| '''factor '''function is used to create a '''factor'''.  
+
|| '''factor function''' is used to create a '''factor'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 396: Line 396:
 
|-  
 
|-  
 
|| Highlight '''formats '''in the '''Console '''window
 
|| Highlight '''formats '''in the '''Console '''window
|| '''formats '''is shown as '''Factor '''with 3 different levels 1, 2 and 3.  
+
|| '''formats '''is shown as '''Factor '''with 3 different '''levels''' 1, 2 and 3.  
  
These levels are of '''character '''type.
+
These '''levels''' are of '''character type'''.
  
So, a factor’s levels are always '''character '''values.
+
So, a '''factor’s levels''' are always '''character '''values.
 
|-  
 
|-  
 
||  
 
||  
|| We can also check the '''levels '''of a '''factor '''variable using '''levels '''function.
+
|| We can also check the '''levels '''of a '''factor variable''' using '''levels function'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 416: Line 416:
 
|-  
 
|-  
 
|| Highlight "1" "2" "3" in the '''Console''' window  
 
|| Highlight "1" "2" "3" in the '''Console''' window  
|| We can also change the values of '''levels '''using '''levels '''function.
+
|| We can also change the values of '''levels '''using '''levels function'''.
  
 
Let us change the '''levels '''of '''formats '''from 1, 2, 3 in digits, to One, Two, Three in words  
 
Let us change the '''levels '''of '''formats '''from 1, 2, 3 in digits, to One, Two, Three in words  
Line 423: Line 423:
  
 
'''levels(captaincy$formats)[1:3] <- c("One", "Two", "Three")'''
 
'''levels(captaincy$formats)[1:3] <- c("One", "Two", "Three")'''
|| In the '''Source '''window, type the following command.  
+
|| In the '''Source '''window, type the following '''command'''.  
  
 
Press '''Enter'''.  
 
Press '''Enter'''.  
Line 435: Line 435:
 
|-  
 
|-  
 
|| Highlight '''output '''in the '''Console '''window
 
|| Highlight '''output '''in the '''Console '''window
|| The values and levels of '''formats '''have been changed.
+
|| The values and '''levels''' of '''formats '''have been changed.
 
|-  
 
|-  
 
||  
 
||  
Line 444: Line 444:
 
Summary
 
Summary
  
|| In this tutorial, we have learnt how to:* Find types of '''vectors '''
+
|| In this tutorial, we have learnt how to:
* Identify '''categorical '''variables
+
* Find '''types''' of '''vectors '''
* Use '''factor '''and '''levels '''function
+
* Identify '''categorical variables'''
 +
* Use '''factor '''and '''levels function'''
 
|-  
 
|-  
 
|| Show Slide
 
|| Show Slide
Line 452: Line 453:
 
Assignment
 
Assignment
 
|| We now suggest an assignment.
 
|| We now suggest an assignment.
* Using built-in dataset '''iris''', find out the categorical variables.  
+
* Using '''built-in dataset iris''', find out the '''categorical variables'''.  
* Can you find a variable which is '''categorical''', but R reads as '''numeric'''?
+
* Can you find a '''variable''' which is '''categorical''', but R reads as '''numeric'''?
 
If yes, change it to '''categorical'''.  
 
If yes, change it to '''categorical'''.  
 
|-  
 
|-  

Revision as of 21:13, 10 April 2019

Title of script: Data Types and Factors

Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, factor, levels, categorical, video tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to the spoken tutorial on Data types and Factors.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Find types of vectors
  • Identify categorical variables
  • Use factor and levels function
Show slide

Pre-requisites

https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Data frames in R
  • How to set working directory in RStudio

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on,
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.456

Install R version 3.2.0 or higher.

Show slide

Download files

For this tutorial, we will use
  • A data frame CaptaincyData.csv and
  • A script file myFactor.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight CaptaincyData.csv and myFactor.R in the folder DataTypes

I have downloaded and moved these files to DataTypes folder.

This folder is located in myProject folder on my Desktop.

I have also set this folder as my Working Directory.

Show slide

R-Objects

In R programming language,
  • Variables are not declared as some data type
  • Variables are assigned with R-Objects.
  • The data type of the R-Object becomes the data type of the variable.

It means, everything in R is an object.

Show slide

Type of R-Objects

The frequently used R-Objects are:
  • Vectors
  • Lists
  • Matrices
  • Factors and
  • Data Frames

The simplest of these objects is the vector object.

Show slide

Types of vectors

R language has the following atomic vector types:
  • Logical
  • Integer
  • Numeric
  • Complex and
  • Character

By atomic, we mean that, vector holds the data of a single data type.

Show slide

Types of vectors

Now, we will learn how to declare these vector types.
Let us switch to RStudio.
[RStudio]

testData <- TRUE

class(testData)

On the Console window, type test Data with capital D.

Press Alt and -(hyphen) keys simultaneously.

Now type TRUE in capitals. Press Enter.

Now, type class and then testData in parentheses. Press Enter.

Highlight the output in the Console window Observe that the data type shown here is logical.
[RStudio]

testData <- "TRUE"

class(testData)

Now, type testData

Press Alt and -(hyphen) keys simultaneously.

Type, within double quotes TRUE in capitals.

Now, type class and then testData in parentheses. Press Enter.

Press Enter at the end of every command.

Highlight the output in the Console window Observe that the data type shown here is character.
Highlight TRUE and "TRUE" in the Console window Note that, R considers TRUE as logical data and TRUE within double quotes as character data.
Cursor on the interface. Now, we will learn about numeric data type.

For this, we will assign a value of 12 to our testData.

[RStudio]

testData <- "TRUE"

We will modify the previous command.

Click in the Console window and press the up arrow key twice.

The command with testData appears.

Delete the word TRUE in capitals.

[RStudio]

testData <- 12

class(testData)

Now, type 12 and press Enter.

Type class and then testData in parentheses.

Press Enter.

Highlight the output in the Console window The data type shown here is numeric.
[RStudio]

testData <- 12.5

class(testData)

Now, we will assign a value of 12.5 to our testData.

In the Console window, press the up arrow key.

Locate the command with testData and assign 12 point 5 to this variable. Press Enter.

Now, type class and then testData in parentheses.

Highlight the output in the Console window The data type is shown again as numeric.
Highlight 12 and 12.5 in the Console window Here, R considers both 12 and 12.5 as numeric.
In order to declare an integer variable in R, we will invoke the as dot integer function.
[RStudio]

testData <- as.integer(12)

class(testData)

In the Console window, type testData

Press Alt and -(hyphen) keys simultaneously.

Now, type as dot integer and in parentheses 12.

Now, type class and then testData in parentheses.

Highlight the output in the Console window The data type is shown as integer.
We can also declare an integer by appending an L suffix.
[RStudio]

testData <- 12L

class(testData)

On the Console window, type the following commands.
Highlight the output in the Console window. Again, the data type shown is integer.
To know more about the vector types, please refer to the Additional material section on this website.
Highlight myFactor.R in the Files window of RStudio Open the script myFactor.R.
Click and drag the Source window. I am resizing the Source window.
Highlight the Source button Run this script by clicking on Source button.
Click and drag the Source window. I am resizing the Source window.
Highlight captaincy in the Source window captaincy opens in the Source window.
Highlight captaincy in the Source window Let us find the data type for the data in each column of captaincy, using str function.
Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

str(captaincy)

In the Source window, type str and within parentheses captaincy.

Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.

Drag the Console window. I am resizing the Console window.
Highlight output in the Console window On the Console, the details of captaincy are shown.

There are 6 observations of 9 variables.

Highlight Factor in the Console window The structure of names in the captaincy is denoted as Factor.
Cursor on the interface. In R language, Factors are data objects.

They are used to categorize the data and store it as levels.

Show Slide

Factors in R

Factors are variables, which can be assigned a limited number of different values.

They are often referred to as categorical variables.

Let us switch to RStudio.
Highlight captaincy in the Source window Click on captaincy data frame.
Click and drag the Source window. I am resizing the Source window.
Highlight names column in the Source window We will look at the data in names column of captaincy.
Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

print(captaincy$names)

In the Source window, type print

within parentheses, captaincy dollar sign names.

Here dollar sign is used to extract elements by name.

Run the current line.

Highlight output in the Console window The names of the captains are shown in the Console window.

Also, the levels are shown.

Highlight Levels in the Console window Levels are distinct values in a Factor.

R language considers names as Factor.

Highlight captaincy in the Source window Click on captaincy data frame.
Highlight names column in the Source window names should be of character data type.

We will change its type from Factor to character.

Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

captaincy$names <- as.character(captaincy$names)

In the Source window, type captaincy dollar sign names

Press Alt and -(hyphen) keys simultaneously.

Now, type as dot character within parentheses, captaincy dollar sign names.

Now, we will check the variable type of names again.
[RStudio]

str(captaincy)

In the Source window, type str and captaincy in parentheses.

Run the last two lines.

Click and drag the Source window. I am resizing the Source window.
Highlight output in the Console window Now, the type of names is changed to character.
Let us learn how to identify a categorical variable.
Click on captaincy data frame. Click on captaincy data frame.
Drag the Console window. I am resizing the Console window.
Highlight captaincy in the Source window formats represents the number of cricket formats played by a captain.

There are three formats of cricket played at the international level:

  • Test matches,
  • One-Day Internationals and
  • Twenty20 Internationals.
Highlight formats in the Source window Accordingly, formats can take one of three distinct values: 1, 2 or 3.

Observe that formats in captaincy should be a categorical variable.

Highlight formats in the Console window At this instant, the variable type of formats is set as integer.
Now, we will change the type of formats from integer to Factor.
Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

captaincy$formats <- factor(captaincy$formats)

In the Source window, type captaincy dollar sign formats

Press Alt and -(hyphen) keys simultaneously.

Type factor within parentheses, captaincy dollar sign formats.

Highlight factor in the Source window factor function is used to create a factor.
[RStudio]

str(captaincy)

Now, type str and in parentheses captaincy.

Run the last two lines.

Highlight formats in the Console window formats is shown as Factor with 3 different levels 1, 2 and 3.

These levels are of character type.

So, a factor’s levels are always character values.

We can also check the levels of a factor variable using levels function.
[RStudio]

levels(captaincy$formats)

In the Source window, type levels, within parentheses captaincy dollar sign formats.

Run the current line.

Highlight output in the Console window The levels of formats are shown as 1 2 3.
Highlight "1" "2" "3" in the Console window We can also change the values of levels using levels function.

Let us change the levels of formats from 1, 2, 3 in digits, to One, Two, Three in words

[RStudio]

levels(captaincy$formats)[1:3] <- c("One", "Two", "Three")

In the Source window, type the following command.

Press Enter.

[RStudio]

print(captaincy$formats)

Now, type print, within parentheses captaincy dollar sign formats.

Save the script and run the last two lines.

Highlight output in the Console window The values and levels of formats have been changed.
Let us summarize what we have learnt.
Show Slide

Summary

In this tutorial, we have learnt how to:
  • Find types of vectors
  • Identify categorical variables
  • Use factor and levels function
Show Slide

Assignment

We now suggest an assignment.
  • Using built-in dataset iris, find out the categorical variables.
  • Can you find a variable which is categorical, but R reads as numeric?

If yes, change it to categorical.

Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgement

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst