R/C2/Data-types-and-Factors/English-timed

From Script | Spoken-Tutorial
Jump to: navigation, search
Time Narration
00:01 Welcome to the spoken tutorial on Data types and Factors.
00:06 In this tutorial, we will learn how to:
00:10 Find types of vectors
00:13 Identify categorical variables
00:17 Use factor and levels function
00:21 To understand this tutorial, you should know,
00:25 Data frames in R
00:28 How to set working directory in Rstudio
00:33 If not, please locate the relevant tutorials on R on this website.
00:40 This tutorial is recorded on,
00:43 Ubuntu Linux OS version 16.04
00:48 R version 3.4.4
00:52 RStudio version 1.1.456
00:58 Install R version 3.2.0 or higher.
01:04 For this tutorial, we will use
01:07 A data frame CaptaincyData.csv and
01:13 A script file myFactor.R.
01:17 Please download these files from the Code files link of this tutorial.
01:23 I have downloaded and moved these files to DataTypes folder.
01:29 This folder is located in myProject folder on my Desktop.
01:35 I have also set this folder as my Working Directory.
01:41 In R programming language, Variables are not declared as some data type
01:48 Variables are assigned with R-Objects.
01:52 The data type of the R-Object becomes the data type of the variable.
01:58 It means, everything in R is an object.
02:03 The frequently used R-Objects are:
02:07 Vectors
02:09 Lists
02:11 Matrices
02:13 Factors
02:15 and Data Frames
02:18 The simplest of these objects is the vector object.
02:23 R language has the following atomic vector types:
02:28 Logical
02:30 Integer
02:32 Numeric
02:34 Complex
02:36 and Character
02:38 By atomic, we mean that, vector holds the data of a single data type.
02:45 Now, we will learn how to declare these vector types.
02:51 Let us switch to RStudio.
02:55 On the Console window, type test Data with capital D.
03:01 Press Alt and -(hyphen) keys simultaneously.
03:06 Now type TRUE in capitals.
03:10 Press Enter.
03:13 Now, type class and then testData in parentheses.
03:18 Press Enter.
03:21 Observe that the data type shown here is logical.
03:26 Now, type testData
03:30 Press Alt and -(hyphen) keys simultaneously.
03:35 Type, within double quotes TRUE in capitals.
03:40 Now, type class and then testData in parentheses.
03:46 Press Enter.
03:48 Press Enter at the end of every command.
03:52 Observe that the data type shown here is character.
03:57 Note that, R considers TRUE as logical data and TRUE within double quotes as character data.
04:06 Now, we will learn about numeric data type.
04:11 For this, we will assign a value of 12 to our testData.
04:17 We will modify the previous command.
04:20 Click in the Console window and press the up arrow key twice.
04:26 The command with testData appears.
04:30 Delete the word TRUE in capitals.
04:34 Now, type 12 and press Enter.
04:39 Type class and then testData in parentheses.
04:45 Press Enter.
04:47 The data type shown here is numeric.
04:52 Now, we will assign a value of 12.5 to our testData.
04:59 In the Console window, press the up arrow key.
05:03 Locate the command with testData
05:07 and assign 12 point 5 to this variable.
05:12 Press Enter.
05:14 Now, type class and then testData in parentheses.
05:20 The data type is shown again as numeric.
05:25 Here, R considers both 12 and 12.5 as numeric.
05:32 In order to declare an integer variable in R, we will invoke the as dot integer function.
05:40 In the Console window, type testData
05:44 Press Alt and -(hyphen) keys simultaneously.
05:48 Now, type as dot integer and in parentheses 12.
05:55 Now, type class and then testData in parentheses.
06:02 The data type is shown as integer.
06:06 We can also declare an integer by appending an L suffix.
06:12 On the Console window, type the following commands.
06:16 Again, the data type shown is integer.
06:21 To know more about the vector types, please refer to the Additional material section on this website.
06:29 Open the script myFactor.R.
06:33 I am resizing the Source window.
06:37 Run this script by clicking on Source button.
06:42 I am resizing the Source window.
06:46 captaincy opens in the Source window.
06:50 Let us find the data type for the data in each column of captaincy, using str function.
06:59 Click on the script myFactor.R
07:03 In the Source window, type str and within parentheses captaincy.
07:10 Save the script and run the current line by pressing Ctrl + Enter keys simultaneously.
07:19 I am resizing the Console window.
07:23 On the Console, the details of captaincy are shown.
07:28 There are 6 observations of 9 variables.
07:33 The structure of names in the captaincy is denoted as Factor.
07:39 In R language, Factors are data objects.
07:44 They are used to categorize the data and store it as levels.
07:50 Factors are variables, which take on a limited number of different values.
07:57 They are often referred to as categorical variables.
08:02 Let us switch to Rstudio.
08:05 Click on captaincy data frame.
08:09 I am resizing the Source window.
08:13 We will look at the data in names column of captaincy.
08:18 Click on the script myFactor.R
08:22 In the Source window, type print within parentheses, captaincy dollar sign names.
08:30 Here dollar sign is used to extract elements by name.
08:36 Run the current line.
08:38 The names of the captains are shown in the Console window.
08:43 Also, the levels are shown.
08:46 Levels are distinct values in a Factor.
08:50 R language considers names as a Factor.
08:55 Click on captaincy data frame.
08:58 names should be of character data type.
09:03 We will change its type from Factor to character.
09:08 Click on the script myFactor.R
09:12 In the Source window, type captaincy dollar sign names
09:18 Press Alt and -(hyphen) keys simultaneously.
09:22 Now, type as dot character within parentheses, captaincy dollar sign names.
09:31 Now, we will check the variable type of names again.
09:36 In the Source window, type str and captaincy in parentheses.
09:44 Run the last two lines.
09:47 I am resizing the Source window.
09:51 Now, the type of names is changed to character.
09:56 Let us learn how to identify a categorical variable.
10:01 Click on captaincy data frame.
10:05 I am resizing the Console window.
10:09 formats represents the number of cricket formats played by a captain.
10:15 There are three formats of cricket played at the international level:
10:21 Test matches,
10:23 One-Day Internationals
10:25 and Twenty20 Internationals.
10:28 Accordingly, formats can take one of three distinct values:
10:34 1, 2 or 3.
10:37 Observe that formats in captaincy should be a categorical variable.
10:43 At this instant, the variable type of formats is set as integer.
10:49 Now, we will change the type of formats from integer to Factor.
10:55 Click on the script myFactor.R
10:59 In the Source window, type captaincy dollar sign formats
11:05 Press Alt and -(hyphen) keys simultaneously.
11:10 Type factor within parentheses, captaincy dollar sign formats.
11:18 factor function is used to create a factor.
11:23 Now, type str and in parentheses captaincy.
11:30 Run the last two lines.
11:33 formats is shown as Factor with 3 different levels 1, 2 and 3.
11:41 These levels are of character type.
11:45 So, a factor’s levels are always character values.
11:51 We can also check the levels of a factor variable using levels function.
11:58 In the Source window, type levels, within parentheses captaincy dollar sign formats.
12:08 Run the current line.
12:11 The levels of formats are shown as 1 2 3.
12:17 We can also change the values of levels using levels function.
12:23 Let us change the levels of formats from 1, 2, 3 in digits, to One, Two, Three in words
12:33 In the Source window, type the following command.
12:38 Press Enter.
12:40 Now, type print, within parentheses captaincy dollar sign formats.
12:48 Save the script and run the last two lines.
12:54 The values and levels of formats have been changed.
12:59 Let us summarize what we have learnt.
13:03 In this tutorial, we have learnt how to: Find types of vectors
13:09 Identify categorical variables
13:14 Use factor and levels function
13:17 We now suggest an assignment.
13:21 Using built-in dataset iris, find out the categorical variables.
13:27 Can you find a variable which is categorical, but R reads as numeric?
13:33 If yes, change it to categorical.
13:37 The video at the following link summarises the Spoken Tutorial project.
13:43 Please download and watch it.
13:46 We conduct workshops using Spoken Tutorials and give certificates.
13:52 Please contact us.
13:55 Please post your timed queries in this forum.
14:00 Please post your general queries in this forum.
14:04 The FOSSEE team coordinates the TBC project.
14:09 For more details, please visit these sites.
14:13 The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India
14:20 The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018).
14:28 This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching.

Contributors and Content Editors

Sakinashaikh