R/C2/Data-types-and-Factors/English-timed
From Script | Spoken-Tutorial
Time | Narration |
00:01 | Welcome to the spoken tutorial on Data types and Factors. |
00:06 | In this tutorial, we will learn how to: |
00:10 | Find types of vectors |
00:13 | Identify categorical variables |
00:17 | Use factor and levels function |
00:21 | To understand this tutorial, you should know, |
00:25 | Data frames in R |
00:28 | How to set working directory in Rstudio |
00:33 | If not, please locate the relevant tutorials on R on this website. |
00:40 | This tutorial is recorded on, |
00:43 | Ubuntu Linux OS version 16.04 |
00:48 | R version 3.4.4 |
00:52 | RStudio version 1.1.456 |
00:58 | Install R version 3.2.0 or higher. |
01:04 | For this tutorial, we will use |
01:07 | A data frame CaptaincyData.csv and |
01:13 | A script file myFactor.R. |
01:17 | Please download these files from the Code files link of this tutorial. |
01:23 | I have downloaded and moved these files to DataTypes folder. |
01:29 | This folder is located in myProject folder on my Desktop. |
01:35 | I have also set this folder as my Working Directory. |
01:41 | In R programming language, Variables are not declared as some data type |
01:48 | Variables are assigned with R-Objects. |
01:52 | The data type of the R-Object becomes the data type of the variable. |
01:58 | It means, everything in R is an object. |
02:03 | The frequently used R-Objects are: |
02:07 | Vectors |
02:09 | Lists |
02:11 | Matrices |
02:13 | Factors |
02:15 | and Data Frames |
02:18 | The simplest of these objects is the vector object. |
02:23 | R language has the following atomic vector types: |
02:28 | Logical |
02:30 | Integer |
02:32 | Numeric |
02:34 | Complex |
02:36 | and Character |
02:38 | By atomic, we mean that, vector holds the data of a single data type. |
02:45 | Now, we will learn how to declare these vector types. |
02:51 | Let us switch to RStudio. |
02:55 | On the Console window, type test Data with capital D. |
03:01 | Press Alt and -(hyphen) keys simultaneously. |
03:06 | Now type TRUE in capitals. |
03:10 | Press Enter. |
03:13 | Now, type class and then testData in parentheses. |
03:18 | Press Enter. |
03:21 | Observe that the data type shown here is logical. |
03:26 | Now, type testData |
03:30 | Press Alt and -(hyphen) keys simultaneously. |
03:35 | Type, within double quotes TRUE in capitals. |
03:40 | Now, type class and then testData in parentheses. |
03:46 | Press Enter. |
03:48 | Press Enter at the end of every command. |
03:52 | Observe that the data type shown here is character. |
03:57 | Note that, R considers TRUE as logical data and TRUE within double quotes as character data. |
04:06 | Now, we will learn about numeric data type. |
04:11 | For this, we will assign a value of 12 to our testData. |
04:17 | We will modify the previous command. |
04:20 | Click in the Console window and press the up arrow key twice. |
04:26 | The command with testData appears. |
04:30 | Delete the word TRUE in capitals. |
04:34 | Now, type 12 and press Enter. |
04:39 | Type class and then testData in parentheses. |
04:45 | Press Enter. |
04:47 | The data type shown here is numeric. |
04:52 | Now, we will assign a value of 12.5 to our testData. |
04:59 | In the Console window, press the up arrow key. |
05:03 | Locate the command with testData |
05:07 | and assign 12 point 5 to this variable. |
05:12 | Press Enter. |
05:14 | Now, type class and then testData in parentheses. |
05:20 | The data type is shown again as numeric. |
05:25 | Here, R considers both 12 and 12.5 as numeric. |
05:32 | In order to declare an integer variable in R, we will invoke the as dot integer function. |
05:40 | In the Console window, type testData |
05:44 | Press Alt and -(hyphen) keys simultaneously. |
05:48 | Now, type as dot integer and in parentheses 12. |
05:55 | Now, type class and then testData in parentheses. |
06:02 | The data type is shown as integer. |
06:06 | We can also declare an integer by appending an L suffix. |
06:12 | On the Console window, type the following commands. |
06:16 | Again, the data type shown is integer. |
06:21 | To know more about the vector types, please refer to the Additional material section on this website. |
06:29 | Open the script myFactor.R. |
06:33 | I am resizing the Source window. |
06:37 | Run this script by clicking on Source button. |
06:42 | I am resizing the Source window. |
06:46 | captaincy opens in the Source window. |
06:50 | Let us find the data type for the data in each column of captaincy, using str function. |
06:59 | Click on the script myFactor.R |
07:03 | In the Source window, type str and within parentheses captaincy. |
07:10 | Save the script and run the current line by pressing Ctrl + Enter keys simultaneously. |
07:19 | I am resizing the Console window. |
07:23 | On the Console, the details of captaincy are shown. |
07:28 | There are 6 observations of 9 variables. |
07:33 | The structure of names in the captaincy is denoted as Factor. |
07:39 | In R language, Factors are data objects. |
07:44 | They are used to categorize the data and store it as levels. |
07:50 | Factors are variables, which take on a limited number of different values. |
07:57 | They are often referred to as categorical variables. |
08:02 | Let us switch to Rstudio. |
08:05 | Click on captaincy data frame. |
08:09 | I am resizing the Source window. |
08:13 | We will look at the data in names column of captaincy. |
08:18 | Click on the script myFactor.R |
08:22 | In the Source window, type print within parentheses, captaincy dollar sign names. |
08:30 | Here dollar sign is used to extract elements by name. |
08:36 | Run the current line. |
08:38 | The names of the captains are shown in the Console window. |
08:43 | Also, the levels are shown. |
08:46 | Levels are distinct values in a Factor. |
08:50 | R language considers names as a Factor. |
08:55 | Click on captaincy data frame. |
08:58 | names should be of character data type. |
09:03 | We will change its type from Factor to character. |
09:08 | Click on the script myFactor.R |
09:12 | In the Source window, type captaincy dollar sign names |
09:18 | Press Alt and -(hyphen) keys simultaneously. |
09:22 | Now, type as dot character within parentheses, captaincy dollar sign names. |
09:31 | Now, we will check the variable type of names again. |
09:36 | In the Source window, type str and captaincy in parentheses. |
09:44 | Run the last two lines. |
09:47 | I am resizing the Source window. |
09:51 | Now, the type of names is changed to character. |
09:56 | Let us learn how to identify a categorical variable. |
10:01 | Click on captaincy data frame. |
10:05 | I am resizing the Console window. |
10:09 | formats represents the number of cricket formats played by a captain. |
10:15 | There are three formats of cricket played at the international level: |
10:21 | Test matches, |
10:23 | One-Day Internationals |
10:25 | and Twenty20 Internationals. |
10:28 | Accordingly, formats can take one of three distinct values: |
10:34 | 1, 2 or 3. |
10:37 | Observe that formats in captaincy should be a categorical variable. |
10:43 | At this instant, the variable type of formats is set as integer. |
10:49 | Now, we will change the type of formats from integer to Factor. |
10:55 | Click on the script myFactor.R |
10:59 | In the Source window, type captaincy dollar sign formats |
11:05 | Press Alt and -(hyphen) keys simultaneously. |
11:10 | Type factor within parentheses, captaincy dollar sign formats. |
11:18 | factor function is used to create a factor. |
11:23 | Now, type str and in parentheses captaincy. |
11:30 | Run the last two lines. |
11:33 | formats is shown as Factor with 3 different levels 1, 2 and 3. |
11:41 | These levels are of character type. |
11:45 | So, a factor’s levels are always character values. |
11:51 | We can also check the levels of a factor variable using levels function. |
11:58 | In the Source window, type levels, within parentheses captaincy dollar sign formats. |
12:08 | Run the current line. |
12:11 | The levels of formats are shown as 1 2 3. |
12:17 | We can also change the values of levels using levels function. |
12:23 | Let us change the levels of formats from 1, 2, 3 in digits, to One, Two, Three in words |
12:33 | In the Source window, type the following command. |
12:38 | Press Enter. |
12:40 | Now, type print, within parentheses captaincy dollar sign formats. |
12:48 | Save the script and run the last two lines. |
12:54 | The values and levels of formats have been changed. |
12:59 | Let us summarize what we have learnt. |
13:03 | In this tutorial, we have learnt how to: Find types of vectors |
13:09 | Identify categorical variables |
13:14 | Use factor and levels function |
13:17 | We now suggest an assignment. |
13:21 | Using built-in dataset iris, find out the categorical variables. |
13:27 | Can you find a variable which is categorical, but R reads as numeric? |
13:33 | If yes, change it to categorical. |
13:37 | The video at the following link summarises the Spoken Tutorial project. |
13:43 | Please download and watch it. |
13:46 | We conduct workshops using Spoken Tutorials and give certificates. |
13:52 | Please contact us. |
13:55 | Please post your timed queries in this forum. |
14:00 | Please post your general queries in this forum. |
14:04 | The FOSSEE team coordinates the TBC project. |
14:09 | For more details, please visit these sites. |
14:13 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
14:20 | The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). |
14:28 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |