R/C2/Data-types-and-Factors/English

From Script | Spoken-Tutorial
Revision as of 15:04, 11 April 2019 by Madhurig (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of script: Data Types and Factors

Author: Shaik Sameer (IIIT Vadodara) and Sudhakar Kumar (IIT Bombay)

Keywords: R, RStudio, factor, levels, categorical, video tutorial

Visual Cue Narration
Show slide

Opening slide

Welcome to the spoken tutorial on Data types and Factors.
Show slide

Learning Objectives

In this tutorial, we will learn how to:
  • Find types of vectors
  • Identify categorical variables
  • Use factor and levels function
Show slide

Pre-requisites

https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Data frames in R
  • How to set working directory in RStudio

If not, please locate the relevant tutorials on R on this website.

Show slide

System Specifications

This tutorial is recorded on,
  • Ubuntu Linux OS version 16.04
  • R version 3.4.4
  • RStudio version 1.1.456

Install R version 3.2.0 or higher.

Show slide

Download files

For this tutorial, we will use
  • A data frame CaptaincyData.csv and
  • A script file myFactor.R.

Please download these files from the Code files link of this tutorial.

[Computer screen]

Highlight CaptaincyData.csv and myFactor.R in the folder DataTypes

I have downloaded and moved these files to DataTypes folder.

This folder is located in myProject folder on my Desktop.

I have also set this folder as my Working Directory.

Show slide

R-Objects

In R programming language,
  • Variables are not declared as some data type
  • Variables are assigned with R-Objects.
  • The data type of the R-Object becomes the data type of the variable.

It means, everything in R is an object.

Show slide

Type of R-Objects

The frequently used R-Objects are:
  • Vectors
  • Lists
  • Matrices
  • Factors and
  • Data Frames

The simplest of these objects is the vector object.

Show slide

Types of vectors

R language has the following atomic vector types:
  • Logical
  • Integer
  • Numeric
  • Complex and
  • Character

By atomic, we mean that, vector holds the data of a single data type.

Show slide

Types of vectors

Now, we will learn how to declare these vector types.
Let us switch to RStudio.
[RStudio]

testData <- TRUE

class(testData)

On the Console window, type test Data with capital D.

Press Alt and -(hyphen) keys simultaneously.

Now type TRUE in capitals. Press Enter.

Now, type class and then testData in parentheses. Press Enter.

Highlight the output in the Console window Observe that the data type shown here is logical.
[RStudio]

testData <- "TRUE"

class(testData)

Now, type testData

Press Alt and -(hyphen) keys simultaneously.

Type, within double quotes TRUE in capitals.

Now, type class and then testData in parentheses. Press Enter.

Press Enter at the end of every command.

Highlight the output in the Console window Observe that the data type shown here is character.
Highlight TRUE and "TRUE" in the Console window Note that, R considers TRUE as logical data and TRUE within double quotes as character data.
Cursor on the interface. Now, we will learn about numeric data type.

For this, we will assign a value of 12 to our testData.

[RStudio]

testData <- "TRUE"

We will modify the previous command.

Click in the Console window and press the up arrow key twice.

The command with testData appears.

Delete the word TRUE in capitals.

[RStudio]

testData <- 12

class(testData)

Now, type 12 and press Enter.

Type class and then testData in parentheses.

Press Enter.

Highlight the output in the Console window The data type shown here is numeric.
[RStudio]

testData <- 12.5

class(testData)

Now, we will assign a value of 12.5 to our testData.

In the Console window, press the up arrow key.

Locate the command with testData and assign 12 point 5 to this variable. Press Enter.

Now, type class and then testData in parentheses.

Highlight the output in the Console window The data type is shown again as numeric.
Highlight 12 and 12.5 in the Console window Here, R considers both 12 and 12.5 as numeric.
In order to declare an integer variable in R, we will invoke the as dot integer function.
[RStudio]

testData <- as.integer(12)

class(testData)

In the Console window, type testData

Press Alt and -(hyphen) keys simultaneously.

Now, type as dot integer and in parentheses 12.

Now, type class and then testData in parentheses.

Highlight the output in the Console window The data type is shown as integer.
We can also declare an integer by appending an L suffix.
[RStudio]

testData <- 12L

class(testData)

On the Console window, type the following commands.
Highlight the output in the Console window. Again, the data type shown is integer.
To know more about the vector types, please refer to the Additional material section on this website.
Highlight myFactor.R in the Files window of RStudio Open the script myFactor.R.
Click and drag the Source window. I am resizing the Source window.
Highlight the Source button Run this script by clicking on Source button.
Click and drag the Source window. I am resizing the Source window.
Highlight captaincy in the Source window captaincy opens in the Source window.
Highlight captaincy in the Source window Let us find the data type for the data in each column of captaincy, using str function.
Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

str(captaincy)

In the Source window, type str and within parentheses captaincy.

Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.

Drag the Console window. I am resizing the Console window.
Highlight output in the Console window On the Console, the details of captaincy are shown.

There are 6 observations of 9 variables.

Highlight Factor in the Console window The structure of names in the captaincy is denoted as Factor.
Cursor on the interface. In R language, Factors are data objects.

They are used to categorize the data and store it as levels.

Show Slide

Factors in R

Factors are variables, which take on a limited number of different values.

They are often referred to as categorical variables.

Let us switch to RStudio.
Highlight captaincy in the Source window Click on captaincy data frame.
Click and drag the Source window. I am resizing the Source window.
Highlight names column in the Source window We will look at the data in names column of captaincy.
Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

print(captaincy$names)

In the Source window, type print

within parentheses, captaincy dollar sign names.

Here dollar sign is used to extract elements by name.

Run the current line.

Highlight output in the Console window The names of the captains are shown in the Console window.

Also, the levels are shown.

Highlight Levels in the Console window Levels are distinct values in a Factor.

R language considers names as a Factor.

Highlight captaincy in the Source window Click on captaincy data frame.
Highlight names column in the Source window names should be of character data type.

We will change its type from Factor to character.

Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

captaincy$names <- as.character(captaincy$names)

In the Source window, type captaincy dollar sign names

Press Alt and -(hyphen) keys simultaneously.

Now, type as dot character within parentheses, captaincy dollar sign names.

Now, we will check the variable type of names again.
[RStudio]

str(captaincy)

In the Source window, type str and captaincy in parentheses.

Run the last two lines.

Click and drag the Source window. I am resizing the Source window.
Highlight output in the Console window Now, the type of names is changed to character.
Let us learn how to identify a categorical variable.
Click on captaincy data frame. Click on captaincy data frame.
Drag the Console window. I am resizing the Console window.
Highlight captaincy in the Source window formats represents the number of cricket formats played by a captain.

There are three formats of cricket played at the international level:

  • Test matches,
  • One-Day Internationals and
  • Twenty20 Internationals.
Highlight formats in the Source window Accordingly, formats can take one of three distinct values: 1, 2 or 3.

Observe that formats in captaincy should be a categorical variable.

Highlight formats in the Console window At this instant, the variable type of formats is set as integer.
Now, we will change the type of formats from integer to Factor.
Highlight myFactor.R in the Source window Click on the script myFactor.R
[RStudio]

captaincy$formats <- factor(captaincy$formats)

In the Source window, type captaincy dollar sign formats

Press Alt and -(hyphen) keys simultaneously.

Type factor within parentheses, captaincy dollar sign formats.

Highlight factor in the Source window factor function is used to create a factor.
[RStudio]

str(captaincy)

Now, type str and in parentheses captaincy.

Run the last two lines.

Highlight formats in the Console window formats is shown as Factor with 3 different levels 1, 2 and 3.

These levels are of character type.

So, a factor’s levels are always character values.

We can also check the levels of a factor variable using levels function.
[RStudio]

levels(captaincy$formats)

In the Source window, type levels, within parentheses captaincy dollar sign formats.

Run the current line.

Highlight output in the Console window The levels of formats are shown as 1 2 3.
Highlight "1" "2" "3" in the Console window We can also change the values of levels using levels function.

Let us change the levels of formats from 1, 2, 3 in digits, to One, Two, Three in words

[RStudio]

levels(captaincy$formats) <- c("One", "Two", "Three")

In the Source window, type the following command.

Press Enter.

[RStudio]

print(captaincy$formats)

Now, type print, within parentheses captaincy dollar sign formats.

Save the script and run the last two lines.

Highlight output in the Console window The values and levels of formats have been changed.
Let us summarize what we have learnt.
Show Slide

Summary

In this tutorial, we have learnt how to:
  • Find types of vectors
  • Identify categorical variables
  • Use factor and levels function
Show Slide

Assignment

We now suggest an assignment.
  • Using built-in dataset iris, find out the categorical variables.
  • Can you find a variable which is categorical, but R reads as numeric?

If yes, change it to categorical.

Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Please post your general queries in this forum.
Show Slide

Textbook Companion

The FOSSEE team coordinates the TBC project.

For more details, please visit these sites.

Show Slide

Acknowledgement

The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
Show Slide

Thank You

The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).

This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey, Sudhakarst