R/C2/Data-types-and-Factors/English-timed
From Script | Spoken-Tutorial
| Time | Narration |
| 00:01 | Welcome to the spoken tutorial on Data types and Factors. |
| 00:06 | In this tutorial, we will learn how to: |
| 00:10 | Find types of vectors |
| 00:13 | Identify categorical variables |
| 00:17 | Use factor and levels function |
| 00:21 | To understand this tutorial, you should know, |
| 00:25 | Data frames in R |
| 00:28 | How to set working directory in Rstudio |
| 00:33 | If not, please locate the relevant tutorials on R on this website. |
| 00:40 | This tutorial is recorded on, |
| 00:43 | Ubuntu Linux OS version 16.04 |
| 00:48 | R version 3.4.4 |
| 00:52 | RStudio version 1.1.456 |
| 00:58 | Install R version 3.2.0 or higher. |
| 01:04 | For this tutorial, we will use |
| 01:07 | A data frame CaptaincyData.csv and |
| 01:13 | A script file myFactor.R. |
| 01:17 | Please download these files from the Code files link of this tutorial. |
| 01:23 | I have downloaded and moved these files to DataTypes folder. |
| 01:29 | This folder is located in myProject folder on my Desktop. |
| 01:35 | I have also set this folder as my Working Directory. |
| 01:41 | In R programming language, Variables are not declared as some data type |
| 01:48 | Variables are assigned with R-Objects. |
| 01:52 | The data type of the R-Object becomes the data type of the variable. |
| 01:58 | It means, everything in R is an object. |
| 02:03 | The frequently used R-Objects are: |
| 02:07 | Vectors |
| 02:09 | Lists |
| 02:11 | Matrices |
| 02:13 | Factors |
| 02:15 | and Data Frames |
| 02:18 | The simplest of these objects is the vector object. |
| 02:23 | R language has the following atomic vector types: |
| 02:28 | Logical |
| 02:30 | Integer |
| 02:32 | Numeric |
| 02:34 | Complex |
| 02:36 | and Character |
| 02:38 | By atomic, we mean that, vector holds the data of a single data type. |
| 02:45 | Now, we will learn how to declare these vector types. |
| 02:51 | Let us switch to RStudio. |
| 02:55 | On the Console window, type test Data with capital D. |
| 03:01 | Press Alt and -(hyphen) keys simultaneously. |
| 03:06 | Now type TRUE in capitals. |
| 03:10 | Press Enter. |
| 03:13 | Now, type class and then testData in parentheses. |
| 03:18 | Press Enter. |
| 03:21 | Observe that the data type shown here is logical. |
| 03:26 | Now, type testData |
| 03:30 | Press Alt and -(hyphen) keys simultaneously. |
| 03:35 | Type, within double quotes TRUE in capitals. |
| 03:40 | Now, type class and then testData in parentheses. |
| 03:46 | Press Enter. |
| 03:48 | Press Enter at the end of every command. |
| 03:52 | Observe that the data type shown here is character. |
| 03:57 | Note that, R considers TRUE as logical data and TRUE within double quotes as character data. |
| 04:06 | Now, we will learn about numeric data type. |
| 04:11 | For this, we will assign a value of 12 to our testData. |
| 04:17 | We will modify the previous command. |
| 04:20 | Click in the Console window and press the up arrow key twice. |
| 04:26 | The command with testData appears. |
| 04:30 | Delete the word TRUE in capitals. |
| 04:34 | Now, type 12 and press Enter. |
| 04:39 | Type class and then testData in parentheses. |
| 04:45 | Press Enter. |
| 04:47 | The data type shown here is numeric. |
| 04:52 | Now, we will assign a value of 12.5 to our testData. |
| 04:59 | In the Console window, press the up arrow key. |
| 05:03 | Locate the command with testData |
| 05:07 | and assign 12 point 5 to this variable. |
| 05:12 | Press Enter. |
| 05:14 | Now, type class and then testData in parentheses. |
| 05:20 | The data type is shown again as numeric. |
| 05:25 | Here, R considers both 12 and 12.5 as numeric. |
| 05:32 | In order to declare an integer variable in R, we will invoke the as dot integer function. |
| 05:40 | In the Console window, type testData |
| 05:44 | Press Alt and -(hyphen) keys simultaneously. |
| 05:48 | Now, type as dot integer and in parentheses 12. |
| 05:55 | Now, type class and then testData in parentheses. |
| 06:02 | The data type is shown as integer. |
| 06:06 | We can also declare an integer by appending an L suffix. |
| 06:12 | On the Console window, type the following commands. |
| 06:16 | Again, the data type shown is integer. |
| 06:21 | To know more about the vector types, please refer to the Additional material section on this website. |
| 06:29 | Open the script myFactor.R. |
| 06:33 | I am resizing the Source window. |
| 06:37 | Run this script by clicking on Source button. |
| 06:42 | I am resizing the Source window. |
| 06:46 | captaincy opens in the Source window. |
| 06:50 | Let us find the data type for the data in each column of captaincy, using str function. |
| 06:59 | Click on the script myFactor.R |
| 07:03 | In the Source window, type str and within parentheses captaincy. |
| 07:10 | Save the script and run the current line by pressing Ctrl + Enter keys simultaneously. |
| 07:19 | I am resizing the Console window. |
| 07:23 | On the Console, the details of captaincy are shown. |
| 07:28 | There are 6 observations of 9 variables. |
| 07:33 | The structure of names in the captaincy is denoted as Factor. |
| 07:39 | In R language, Factors are data objects. |
| 07:44 | They are used to categorize the data and store it as levels. |
| 07:50 | Factors are variables, which take on a limited number of different values. |
| 07:57 | They are often referred to as categorical variables. |
| 08:02 | Let us switch to Rstudio. |
| 08:05 | Click on captaincy data frame. |
| 08:09 | I am resizing the Source window. |
| 08:13 | We will look at the data in names column of captaincy. |
| 08:18 | Click on the script myFactor.R |
| 08:22 | In the Source window, type print within parentheses, captaincy dollar sign names. |
| 08:30 | Here dollar sign is used to extract elements by name. |
| 08:36 | Run the current line. |
| 08:38 | The names of the captains are shown in the Console window. |
| 08:43 | Also, the levels are shown. |
| 08:46 | Levels are distinct values in a Factor. |
| 08:50 | R language considers names as a Factor. |
| 08:55 | Click on captaincy data frame. |
| 08:58 | names should be of character data type. |
| 09:03 | We will change its type from Factor to character. |
| 09:08 | Click on the script myFactor.R |
| 09:12 | In the Source window, type captaincy dollar sign names |
| 09:18 | Press Alt and -(hyphen) keys simultaneously. |
| 09:22 | Now, type as dot character within parentheses, captaincy dollar sign names. |
| 09:31 | Now, we will check the variable type of names again. |
| 09:36 | In the Source window, type str and captaincy in parentheses. |
| 09:44 | Run the last two lines. |
| 09:47 | I am resizing the Source window. |
| 09:51 | Now, the type of names is changed to character. |
| 09:56 | Let us learn how to identify a categorical variable. |
| 10:01 | Click on captaincy data frame. |
| 10:05 | I am resizing the Console window. |
| 10:09 | formats represents the number of cricket formats played by a captain. |
| 10:15 | There are three formats of cricket played at the international level: |
| 10:21 | Test matches, |
| 10:23 | One-Day Internationals |
| 10:25 | and Twenty20 Internationals. |
| 10:28 | Accordingly, formats can take one of three distinct values: |
| 10:34 | 1, 2 or 3. |
| 10:37 | Observe that formats in captaincy should be a categorical variable. |
| 10:43 | At this instant, the variable type of formats is set as integer. |
| 10:49 | Now, we will change the type of formats from integer to Factor. |
| 10:55 | Click on the script myFactor.R |
| 10:59 | In the Source window, type captaincy dollar sign formats |
| 11:05 | Press Alt and -(hyphen) keys simultaneously. |
| 11:10 | Type factor within parentheses, captaincy dollar sign formats. |
| 11:18 | factor function is used to create a factor. |
| 11:23 | Now, type str and in parentheses captaincy. |
| 11:30 | Run the last two lines. |
| 11:33 | formats is shown as Factor with 3 different levels 1, 2 and 3. |
| 11:41 | These levels are of character type. |
| 11:45 | So, a factor’s levels are always character values. |
| 11:51 | We can also check the levels of a factor variable using levels function. |
| 11:58 | In the Source window, type levels, within parentheses captaincy dollar sign formats. |
| 12:08 | Run the current line. |
| 12:11 | The levels of formats are shown as 1 2 3. |
| 12:17 | We can also change the values of levels using levels function. |
| 12:23 | Let us change the levels of formats from 1, 2, 3 in digits, to One, Two, Three in words |
| 12:33 | In the Source window, type the following command. |
| 12:38 | Press Enter. |
| 12:40 | Now, type print, within parentheses captaincy dollar sign formats. |
| 12:48 | Save the script and run the last two lines. |
| 12:54 | The values and levels of formats have been changed. |
| 12:59 | Let us summarize what we have learnt. |
| 13:03 | In this tutorial, we have learnt how to: Find types of vectors |
| 13:09 | Identify categorical variables |
| 13:14 | Use factor and levels function |
| 13:17 | We now suggest an assignment. |
| 13:21 | Using built-in dataset iris, find out the categorical variables. |
| 13:27 | Can you find a variable which is categorical, but R reads as numeric? |
| 13:33 | If yes, change it to categorical. |
| 13:37 | The video at the following link summarises the Spoken Tutorial project. |
| 13:43 | Please download and watch it. |
| 13:46 | We conduct workshops using Spoken Tutorials and give certificates. |
| 13:52 | Please contact us. |
| 13:55 | Please post your timed queries in this forum. |
| 14:00 | Please post your general queries in this forum. |
| 14:04 | The FOSSEE team coordinates the TBC project. |
| 14:09 | For more details, please visit these sites. |
| 14:13 | The Spoken Tutorial project is funded by NMEICT, MHRD, Govt. of India |
| 14:20 | The script for this tutorial was contributed by Shaik Sameer (FOSSEE Fellow 2018). |
| 14:28 | This is Sudhakar Kumar from IIT Bombay signing off. Thanks for watching. |