Difference between revisions of "Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English"

Revision as of 18:43, 21 February 2022

Title of the script: Supervised Learning

Author: Sudhakar Kumar

Keywords: R, RStudio, machine learning, supervised learning, unsupervised, classification, Naive Bayes, confusion matrix, video tutorial.

Visual Cue	Narration
Show Slide Opening Slide	Welcome to this spoken tutorial on Supervised Learning.
Show Slide Learning Objectives	In this tutorial, we will learn about: Machine Learning and its types Supervised learning Classification model on iris data Confusion matrix
Show Slide System Specifications	This tutorial is recorded using, Ubuntu Linux OS version 20.04 R version 4.1.2 RStudio version 1.4.1717 It is recommended to install R version 4.1.0 or higher.
Show Slide Prerequisites https://spoken-tutorial.org	To understand this tutorial, you should know, Basics of R programming Basics of Statistics If not, please access the relevant tutorials on R on this website.
Show Slide What is Machine Learning?	Now let us see what machine learning is? ML is a science that enables computers to learn without being explicitly programmed Its applications include self-driven cars, speech recognition, etc. It is seen as a subset of Artificial Intelligence, also known as AI.
Show Slide Classification of Machine Learning	ML is broadly classified into the following types: Supervised learning, Unsupervised learning, Semi-supervised learning and Reinforcement learning. In this series, we will focus on Supervised and Unsupervised learning.
Show Slide Iris Flower Highlight the iris flower	Let us consider a flower named iris. An image of this flower is shown here. There are two critical parameters of an iris flower: Sepal, and Petal One can measure the length and width of these two parameters.
Show Slide Species of an iris flower Highlight the species of an iris flower	Based on the measurements, three species of iris flower are available: Setosa Versicolor Verginica
Show Slide Tabulating the Data	Consider a situation: A botanist wants to distinguish the species of iris flowers. She collects four features of some iris flowers: Sepal length and Sepal width Petal length and Petal width
Show Slide Tabulating the Data	She gets these flowers labeled as one of the three species by an expert.
Show Slide Download Files	For this tutorial, we will use: A data set iris.csv A script file irisModel.R Please download these files from the Code files link of this tutorial. Make a copy and then use them for practising.
[Computer screen] Highlight irisModel.R and the folder SupervisedLearning	I have downloaded and moved these files to the SupervisedLearning folder. This folder is located in the MLProject folder on my Desktop. I have also set the SupervisedLearning folder as my Working Directory.
	Let us switch to RStudio.
Double click on irisModel.R to open in RStudio Point to irisModel.R in RStudio.	Let’s open the script irisModel.R in RStudio. For this, double-click on the script irisModel.R Script irisModel.R opens in RStudio.
Highlight irisModel.R in the Source window	Run this script by clicking on the Source button.
Highlight iris_data in the Source window	The iris data frame is displayed in the Source window.
Highlight 100 entries, 5 total columns at the bottom of the Source window	Here we can see five columns with 100 rows.
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window	The columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species.
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window	The first four columns are the features of an iris flower. The fifth column, Species, is the label of each iris flower.
Highlight Species column in the Source window	In the Source window, scroll down to locate the different Species. Notice that there are two species of the iris flower, setosa and versicolor. A typical iris dataset contains three different Species.
Show Slide Posing the Problem	Suppose that the botanist considers the following about the iris flower: Can I build a model that learns from labels of known species? Can this model accurately predict the species from its measurements?
Show Slide Mapping of Features and Labels	We will map the dimensions of sepal and petal to iris species. The classification model would work as a function as given below: This mechanism is supervised learning.
Highlight the function Show Slide Supervised Learning	In Supervised learning, The desired output labels are available for training datasets. These labels can be called supervisors.
Show Slide Supervised Learning	While learning, the model makes predictions using the given training dataset. The model iteratively makes predictions on the training dataset. The supervisor corrects the model.
Show Slide Types of Supervised Learning	There are two types of supervised learning: Regression and Classification. Regression is applied to predict a continuous-valued output. For example, predicting prices for the real estate sector.
Types of Supervised Learning	Classification is applied to predict a discrete-valued output. For example, predicting the species of an iris flower.
	Let’s model a classification algorithm to predict the Species of an iris flower. Here we will perform a 2-class classification. The species which we will try to predict are setosa and versicolor. For this task, we will apply a Naive Bayes classifier.
	Let us switch to RStudio.
Highlight irisModel.R in the Source window button	In the Source window, click on the script irisModel.R.
Libraries e1071 and caret. Install.packages() function.	Here we need to install and import libraries e1071 and caret. These packages are needed to fit a Naive Bayes classifier and visualize its performance. To know more about these packages, please refer to Additional Reading Material. As I have already installed e1071 and caret. I will directly import these. If you have not installed, please install using the install.packages function.
[RStudio] library(e1071) library(caret)	Let us type the following commands at the top of the script. Press Ctrl + S keys to save the script.
Highlight library(e1071) and library(caret) in the Source window	Select the commands and click the Run button to load these libraries.
Highlight iris_data in the Source window n <- nrow(iris_data)	Type the following command below the View(iris_data) command. Using this command we can find rows in the iris_data.
[RStudio] Type *n_train <- round(0.80 n)**	We will now reserve the number of data points for the training set. Type the following command. For this model, I will use 80% of the data points for training the model. The remaining 20% of the data points will be used for testing the model. To know more about splitting the dataset, please refer to Additional Reading Material.
Click on Save button.	Save the script.
[RStudio] train_indices <- sample(1:n, n_train) iris_train <- iris_data[train_indices, ]	Next, we will create a vector of indices. Type the following commands. It will be an 80% random sample of the total number of rows. This vector will be used to extract the data points for the train set.
iris_test <- iris_data[-train_indices, ] Highlight the minus sign before train_indices	Now, we will create a test set. Type the following command. Note that there is a minus sign before train_indices. It is to exclude the data points already used in the train set.
Highlight the Source button Click Save and Run buttons.	Save the script and select the commands after View to the end. Click on the Run button to execute the selected commands.
Drag the boundary.	I will drag the boundary to see the Environment tab clearly.
Highlight iris_train and iris_test in the Environment.	Click the train set and test set to load them in the Source window. In the Source window, click on iris_train and iris_test to see the details.
Drag the boundary.	Now, we will train a classification model with a Naive Bayes classifier. Again I will drag the boundary to see the Source window clearly.
[RStudio] iris_model <- naiveBayes(formula = Species~., data = iris_train) Highlight the above command	In the Source window, type the following command. We will learn more about arguments in the upcoming tutorials in this series. Save the script and run this line by pressing Ctrl + Enter keys together.
Highlight the iris_model command	Now, let's use the test set to evaluate the performance of the model created.
[RStudio] class_prediction <- predict(object = iris_model, newdata = iris_test)	Type the following command. Using this, we will predict the Species of the data points in the test set.
Highlight the class_prediction command	Save the script and run this line by pressing Ctrl + Enter keys together.
Highlight class_prediction in the Environment window	Now we can use class_prediction values to evaluate the performance of our model. For this, we can use a confusion matrix.
Show Slide Confusion Matrix	It is a performance measurement for ML classification problems. In these classification problems, the output can be two or more classes. To know more about Confusion matrix, please refer to Additional Reading Material.
	Let us switch to RStudio.
[RStudio] Highlight iris_model in the Source window confusionMatrix(data= class_prediction, reference = as.factor(iris_test$Species))	Now, we will draw the confusion matrix to check the performance of this model. In the Source window, type the following command. Save the script and run this line by pressing Ctrl + Enter keys together.
Drag boundary.	Drag boundary to see the Console window clearly.
[RStudio] Highlight the Console window Highlight the Confusion Matrix and Statistics	In the Console window, scroll up and locate the Confusion Matrix and Statistics. The confusion matrix and its corresponding values are displayed.
[RStudio] Highlight Reference in the Console window Highlight Prediction in the Console window	Here, the Reference represents the actual values. Prediction represents the predicted values.
[RStudio] Highlight the figures in the confusion matrix on the Console window	Accuracy of the model can be checked using the values of True Positive and True Negative. In this case, the accuracy of the model is 1. The classification model correctly predicted the values for all the points in the test set.
	With this we come to the end of tutorial. Let us summarize.
Show Slide Summary	In this tutorial, we have learnt about: Machine Learning and its types Supervised Learning Classification model on iris data Confusion Matrix
Show Slide About the Spoken Tutorial Project	The video at the following link summarises the Spoken Tutorial project. Please download and watch it.
Show Slide Spoken Tutorial Workshops	We conduct workshops using Spoken Tutorials and give certificates. Please contact us.
Show Slide Spoken Tutorial Forum to answer questions	Do you have questions about THIS Spoken Tutorial? Please visit this site. Choose the minute and second where you have the question.Explain your question briefly. The FOSSEE project will ensure an answer. You will have to register to ask questions.
Show Slide Spoken Tutorial Forum for specific questions:	The Spoken Tutorial forum is for specific questions on this tutorial. Please do not post unrelated and general questions on them. This will help reduce the clutter. With less clutter, we can use these discussions as instructional material.
Show Slide Forum to answer questions	Do you have any general/technical questions? Please visit the forum given in the link.
Show Slide Textbook Companion	The FOSSEE team coordinates the coding of solved examples of popular books and case study projects. We give certificates to those who do this. For more details, please visit these sites.
Show Slide Acknowledgment	The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education, Govt. of India.
Show Slide About the Contributors	This tutorial is contributed by Sudhakar Kumar and Madhuri Ganapathi from IIT Bombay. Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey

Difference between revisions of "Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English"

Revision as of 18:43, 21 February 2022

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools

@@ Line 24: / Line 24: @@
 || In this tutorial, we will learn about:
 * '''Machine Learning''' and its types
-* '''Supervised''' learning
+* '''Supervised learning '''
-* Classification model on '''iris''' data
+* '''Classification model''' on '''iris data'''
 * '''Confusion matrix'''
 |-
@@ Line 50: / Line 49: @@
 '''https://spoken-tutorial.org'''
 || To understand this tutorial, you should know,
-* Basics of '''R '''programming
+* Basics of '''R programming'''
 * Basics of Statistics
@@ Line 64: / Line 63: @@
 || Now let us see what machine learning is?
-* ML is a science that enables computers to learn without being explicitly programmed
+* '''ML''' is a science that enables computers to learn without being explicitly '''programmed'''
 * Its applications include self-driven cars, speech recognition, etc.
 * It is seen as a subset of '''Artificial Intelligence''', also known as '''AI'''.
@@ Line 74: / Line 73: @@
 '''Classification of Machine Learning'''
-|| ML is broadly classified into the following types:
+|| '''ML''' is broadly classified into the following types:
-* '''Supervised '''learning''',
+* '''Supervised learning''',
-* '''Unsupervised '''learning''',
+* '''Unsupervised learning''',
-* '''Semi-supervised '''learning''' and
+* '''Semi-supervised learning''' and
-* '''Reinforcement '''learning'''.
+* '''Reinforcement learning'''.
-In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning.
+In this series, we will focus on '''Supervised''' and '''Unsupervised learning'''.
 |-
 || '''Show Slide '''
@@ Line 95: / Line 94: @@
-There are two critical parameters of an '''iris''' flower:
+There are two critical '''parameters''' of an '''iris''' flower:
 * '''Sepal''', and
 * '''Petal'''
-One can measure the length and width of these two parameters.
+One can measure the length and width of these two '''parameters'''.
 |-
 || '''Show Slide'''
@@ Line 129: / Line 128: @@
 '''Tabulating the Data'''
-||
+||She gets these flowers labeled as one of the three species by an expert.
-* She gets these flowers labeled as one of the three species by an expert.
@@ Line 139: / Line 137: @@
 '''Download Files '''
 || For this tutorial, we will use:
-* A '''data set''' '''iris.csv'''
+* A '''data set iris.csv'''
 * A '''script''' file '''irisModel.R '''
@@ Line 154: / Line 152: @@
-This folder is located in the '''MLProject''' folder on my Desktop.
+This folder is located in the '''MLProject''' folder on my '''Desktop'''.
-I have also set the '''SupervisedLearning''' folder as my working Directory.
+I have also set the '''SupervisedLearning''' folder as my '''Working Directory.'''
 |-
 ||
@@ Line 165: / Line 163: @@
 Point to '''irisModel.R''' in RStudio.
-|| Let’s open the script '''irisModel.R''' in '''RStudio'''.
+|| Let’s open the '''script irisModel.R''' in '''RStudio'''.
-For this, double-click on the script '''irisModel.R'''
+For this, double-click on the '''script irisModel.R'''
-Script '''irisModel'''.'''R''' opens in '''RStudio'''.
+'''Script irisModel'''.'''R''' opens in '''RStudio'''.
 |-
 || Highlight '''irisModel.R''' in the '''Source''' window
-|| Run this script by clicking on the '''Source''' button.
+|| '''Run''' this '''script''' by clicking on the '''Source''' button.
 |-
 || Highlight '''iris_data''' in the '''Source''' window
-|| The '''iris''' data frame is displayed in the '''Source''' window.
+|| The '''iris data frame''' is displayed in the '''Source''' window.
 |-
@@ Line 210: / Line 208: @@
-A typical '''iris''' dataset contains three different '''Species'''.
+A typical '''iris dataset''' contains three different '''Species'''.
 |-
@@ Line 217: / Line 215: @@
 '''Posing the Problem '''
-|| Suppose that the botanist considers the following about the iris flower:
+|| Suppose that the botanist considers the following about the '''iris''' flower:
-* Can I build a model that learns from labels of known species?
+* Can I build a '''model''' that learns from labels of known species?
-* Can this model accurately predict the species from its measurements?
+* Can this '''model''' accurately predict the species from its measurements?
 |-
@@ Line 230: / Line 227: @@
-The classification model would work as a function as given below:
+The '''classification model''' would work as a '''function''' as given below:
-This mechanism is '''supervised''' learning.
+This mechanism is '''supervised learning'''.
 |-
 ||
@@ Line 242: / Line 239: @@
 '''Supervised Learning '''
-|| In '''Supervised''' learning,
+|| In '''Supervised learning''',
-* The desired output labels are available for training datasets.
+* The desired output labels are available for training '''datasets'''.
-* These labels can be called supervisors.
+* These labels can be called '''supervisors'''.
@@ Line 253: / Line 250: @@
 '''Supervised Learning '''
 ||
-* While learning, the model makes predictions using the given training dataset.
+* While '''learning''', the model makes predictions using the given training '''dataset'''.
-* The model iteratively makes predictions on the training dataset
+* The '''model''' iteratively makes predictions on the '''training dataset'''.
-* The supervisor corrects the model.
+* The '''supervisor''' corrects the '''model'''.
 |-
@@ Line 262: / Line 258: @@
 '''Types of Supervised Learning '''
-|| There are two types of '''supervised''' learning:
+|| There are two types of '''supervised learning''':
-Regression and Classification.
+'''Regression''' and '''Classification'''.
 * '''Regression''' is applied to predict a continuous-valued output.
 * For example, predicting prices for the real estate sector.
 |-
@@ Line 274: / Line 269: @@
 ||
 * '''Classification''' is applied to predict a discrete-valued output.
-* For example, predicting the species of an iris flower
+* For example, predicting the species of an '''iris''' flower.
 |-
 ||
-|| Let’s model a classification algorithm to predict the '''Species''' of an '''iris''' flower.
+|| Let’s '''model''' a '''classification algorithm''' to predict the '''Species''' of an '''iris''' flower.
 Here we will perform a '''2-class classification'''.
@@ Line 285: / Line 280: @@
-For this task, we will apply a '''Naive Bayes''' classifier.
+For this task, we will apply a '''Naive Bayes classifier'''.
 |-
 ||
@@ Line 291: / Line 286: @@
 |-
 || Highlight '''irisModel.R''' in the '''Source''' window button
-|| In the '''Source''' window, click on the script '''irisModel.R'''.
+|| In the '''Source''' window, click on the '''script irisModel.R'''.
 |-
 || Libraries '''e1071''' and '''caret.'''
 '''Install.packages() '''function.
-|| Here we need to install and import libraries '''e1071''' and '''caret.'''
+|| Here we need to install and import '''libraries e1071''' and '''caret.'''
-These packages are needed to fit a '''Naive Bayes classifier''' and visualize its performance.
+These '''packages''' are needed to fit a '''Naive Bayes classifier''' and visualize its performance.
-To know more about these packages, please refer to '''Additional Reading Material'''.
+To know more about these '''packages''', please refer to '''Additional Reading Material'''.
-As I have already installed '''e1071 '''and '''caret''', I will directly import these.
+As I have already installed '''e1071 '''and '''caret'''. I will directly import these.
-If you have not installed, please install using the '''install.packages '''function.
+If you have not installed, please install using the '''install.packages function'''.
 |-
 || [RStudio]
@@ Line 315: / Line 310: @@
 '''library(caret)'''
-|| Let us type the following commands at the top of the script.
+|| Let us type the following '''commands''' at the top of the '''script'''.
-Press '''Ctrl''' + '''S''' keys to save the script.
+Press '''Ctrl''' + '''S''' keys to save the '''script'''.
 |-
 || Highlight '''library(e1071)''' and
 '''library(caret)''' in the '''Source''' window
-|| Select the commands and click the '''Run''' button to load these libraries.
+|| Select the '''commands''' and click the '''Run''' button to load these '''libraries'''.
 |-
 || Highlight '''iris_data''' in the '''Source''' window
 '''n <- nrow(iris_data)'''
-|| Type the following command below the '''View(iris_data)''' command.
+|| Type the following '''command''' below the '''View(iris_data) command.'''
-Using this command we can find rows in the '''iris_data'''.
+Using this '''command''' we can find rows in the '''iris_data'''.
 |-
 || [RStudio]
@@ Line 338: / Line 333: @@
 '''n_train <- round(0.80 * n) '''
-|| We will now reserve the number of data points for the '''training set.'''
+|| We will now reserve the number of '''data points''' for the '''training set.'''
-Type the following command.
+Type the following '''command'''.
-For this model, I will use 80% of the data points for training the model.
+For this '''model''', I will use 80% of the '''data points''' for '''training''' the '''model'''.
-The remaining 20% of the data points will be used for testing the model.
+The remaining 20% of the '''data points''' will be used for '''testing''' the '''model'''.
-To know more about splitting the''' dataset, '''please refer''' '''to '''Additional Reading Material.'''
+To know more about splitting the''' dataset, '''please refer to '''Additional Reading Material.'''
 |-
 || Click on Save button.
-|| Save the script.
+|| Save the '''script'''.
 |-
@@ Line 367: / Line 361: @@
 '''iris_train <- iris_data[train_indices, ] '''
-|| Next, we will create a '''vector''' of indices.
+|| Next, we will create a '''vector''' of '''indices'''.
-Type the following commands.
+Type the following '''commands'''.
@@ Line 376: / Line 370: @@
-This vector will be used to extract the data points for the '''train''' set.
+This '''vector''' will be used to extract the '''data points''' for the '''train set'''.
 |-
@@ Line 383: / Line 377: @@
 Highlight the minus sign before '''train_indices'''
-|| Now, we will create a '''test''' set.
+|| Now, we will create a '''test set'''.
-Type the following command.
+Type the following '''command'''.
@@ Line 392: / Line 386: @@
-It is to exclude the data points already used in the '''train''' set.
+It is to exclude the '''data points''' already used in the '''train set'''.
 |-
@@ Line 399: / Line 393: @@
 Click Save and Run buttons.
-|| Save the script and select the commands after '''View''' to the end.
+|| Save the '''script''' and select the '''commands''' after '''View''' to the end.
-Click on the '''Run''' button to execute the selected commands.
+Click on the '''Run''' button to '''execute''' the selected '''commands'''.
 |-
@@ Line 414: / Line 406: @@
 || Highlight '''iris_train''' and '''iris_test '''in the '''Environment.'''
-|| Click the train set and test set to load them in the Source window.
+|| Click the '''train set''' and '''test set''' to load them in the '''Source''' window.
@@ Line 422: / Line 414: @@
 || Drag the boundary.
-|| Now, we will '''train''' a '''classification model''' with a '''Naive Bayes''' '''classifier'''.
+|| Now, we will '''train''' a '''classification model''' with a '''Naive Bayes classifier'''.
@@ Line 435: / Line 427: @@
 Highlight the above command
-|| In the '''Source''' window, type the following command.
+|| In the '''Source''' window, type the following '''command'''.
-We will learn more about arguments in the upcoming tutorials in this series.
+We will learn more about '''arguments''' in the upcoming tutorials in this series.
-Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together.
+Save the '''script''' and '''run''' this line by pressing '''Ctrl''' + '''Enter''' keys together.
 |-
 || Highlight the '''iris_model''' command
-|| Now, let's use the '''test''' set to evaluate the performance of the model created.
+|| Now, let's use the '''test set''' to evaluate the performance of the '''model''' created.
 |-
@@ Line 453: / Line 445: @@
 '''class_prediction <- predict(object = iris_model, newdata = iris_test)'''
-|| Type the following command.
+|| Type the following '''command'''.
-Using this, we will predict the '''Species''' of the data points in the '''test''' set.
+Using this, we will predict the '''Species''' of the '''data points''' in the '''test set'''.
@@ Line 462: / Line 454: @@
 || Highlight the '''class_prediction''' command
-|| Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together.
+|| Save the '''script''' and run this line by pressing '''Ctrl''' + '''Enter''' keys together.
 |-
 || Highlight '''class_prediction''' in the '''Environment''' window
-|| Now we can use '''class_prediction''' values to evaluate the performance of our model.
+|| Now we can use '''class_prediction''' values to evaluate the performance of our '''model'''.
@@ Line 478: / Line 470: @@
 Confusion Matrix
 ||
-* It is a performance measurement for '''ML''' classification problems.
+* It is a performance measurement for '''ML classification ''' problems.
-* In these classification problems, the output can be two or more classes.<br/>
+* In these '''classification problems''', the output can be two or more '''classes'''.<br/>
@@ Line 497: / Line 489: @@
 '''reference = as.factor(iris_test$Species))'''
-|| Now, we will draw the '''confusion matrix''' to check the performance of this model.
+|| Now, we will draw the '''confusion matrix''' to check the performance of this '''model'''.
-In the '''Source''' window, type the following command:
+In the '''Source''' window, type the following '''command'''.
-Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together.
+Save the '''script''' and run this line by pressing '''Ctrl''' + '''Enter''' keys together.
 |-
@@ Line 541: / Line 533: @@
 Highlight the figures in the '''confusion matrix''' on the '''Console''' window
-|| Accuracy of the model can be checked using the values of '''True Positive''' and '''True Negative.'''
+|| Accuracy of the '''model''' can be checked using the values of '''True Positive''' and '''True Negative.'''
-In this case, the accuracy of the model is 1.
+In this case, the accuracy of the '''model''' is 1.
-The classification model correctly predicted the values for all the points in the '''test''' set.
+The '''classification model''' correctly predicted the values for all the '''points''' in the '''test set'''.
 |-
@@ Line 563: / Line 555: @@
 || In this tutorial, we have learnt about:
 * '''Machine Learning''' and its types
-* '''Supervised''' Learning
+* '''Supervised Learning'''
-* Classification model on '''iris''' data
+* '''Classification model''' on '''iris data '''
 * '''Confusion Matrix'''
 |-
 || Show Slide