Difference between revisions of "Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
Line 24: Line 24:
 
|| In this tutorial, we will learn about:  
 
|| In this tutorial, we will learn about:  
 
* '''Machine Learning''' and its types  
 
* '''Machine Learning''' and its types  
* '''Supervised''' learning
+
* '''Supervised learning '''
* Classification model on '''iris''' data
+
* '''Classification model''' on '''iris data'''
 
* '''Confusion matrix'''
 
* '''Confusion matrix'''
 
  
 
|-  
 
|-  
Line 50: Line 49:
 
'''https://spoken-tutorial.org'''
 
'''https://spoken-tutorial.org'''
 
|| To understand this tutorial, you should know,
 
|| To understand this tutorial, you should know,
* Basics of '''R '''programming
+
* Basics of '''R programming'''  
 
* Basics of Statistics
 
* Basics of Statistics
  
Line 64: Line 63:
  
 
|| Now let us see what machine learning is?
 
|| Now let us see what machine learning is?
* ML is a science that enables computers to learn without being explicitly programmed
+
* '''ML''' is a science that enables computers to learn without being explicitly '''programmed'''
 
* Its applications include self-driven cars, speech recognition, etc.
 
* Its applications include self-driven cars, speech recognition, etc.
 
* It is seen as a subset of '''Artificial Intelligence''', also known as '''AI'''.  
 
* It is seen as a subset of '''Artificial Intelligence''', also known as '''AI'''.  
Line 74: Line 73:
 
'''Classification of Machine Learning'''  
 
'''Classification of Machine Learning'''  
  
|| ML is broadly classified into the following types:  
+
|| '''ML''' is broadly classified into the following types:  
* '''Supervised '''learning''',  
+
* '''Supervised learning''',  
* '''Unsupervised '''learning''',  
+
* '''Unsupervised learning''',  
* '''Semi-supervised '''learning''' and  
+
* '''Semi-supervised learning''' and  
* '''Reinforcement '''learning'''.
+
* '''Reinforcement learning'''.
  
  
  
In this series, we will focus on '''Supervised''' and '''Unsupervised''' learning.  
+
In this series, we will focus on '''Supervised''' and '''Unsupervised learning'''.  
 
|-  
 
|-  
 
|| '''Show Slide '''
 
|| '''Show Slide '''
Line 95: Line 94:
  
  
There are two critical parameters of an '''iris''' flower:
+
There are two critical '''parameters''' of an '''iris''' flower:
 
* '''Sepal''', and  
 
* '''Sepal''', and  
 
* '''Petal'''  
 
* '''Petal'''  
  
  
One can measure the length and width of these two parameters.  
+
One can measure the length and width of these two '''parameters'''.  
 
|-  
 
|-  
 
|| '''Show Slide'''
 
|| '''Show Slide'''
Line 129: Line 128:
  
 
'''Tabulating the Data'''
 
'''Tabulating the Data'''
||
+
||She gets these flowers labeled as one of the three species by an expert.  
* She gets these flowers labeled as one of the three species by an expert.  
+
  
  
Line 139: Line 137:
 
'''Download Files '''
 
'''Download Files '''
 
|| For this tutorial, we will use:
 
|| For this tutorial, we will use:
* A '''data set''' '''iris.csv'''  
+
* A '''data set iris.csv'''  
 
* A '''script''' file '''irisModel.R '''
 
* A '''script''' file '''irisModel.R '''
  
Line 154: Line 152:
  
  
This folder is located in the '''MLProject''' folder on my Desktop.
+
This folder is located in the '''MLProject''' folder on my '''Desktop'''.
  
  
I have also set the '''SupervisedLearning''' folder as my working Directory.
+
I have also set the '''SupervisedLearning''' folder as my '''Working Directory.'''
 
|-  
 
|-  
 
||  
 
||  
Line 165: Line 163:
  
 
Point to '''irisModel.R''' in RStudio.
 
Point to '''irisModel.R''' in RStudio.
|| Let’s open the script '''irisModel.R''' in '''RStudio'''.
+
|| Let’s open the '''script irisModel.R''' in '''RStudio'''.
  
  
For this, double-click on the script '''irisModel.R'''
+
For this, double-click on the '''script irisModel.R'''
  
  
Script '''irisModel'''.'''R''' opens in '''RStudio'''.
+
'''Script irisModel'''.'''R''' opens in '''RStudio'''.
  
 
|-  
 
|-  
  
 
|| Highlight '''irisModel.R''' in the '''Source''' window
 
|| Highlight '''irisModel.R''' in the '''Source''' window
|| Run this script by clicking on the '''Source''' button.
+
|| '''Run''' this '''script''' by clicking on the '''Source''' button.
  
  
 
|-  
 
|-  
 
|| Highlight '''iris_data''' in the '''Source''' window
 
|| Highlight '''iris_data''' in the '''Source''' window
|| The '''iris''' data frame is displayed in the '''Source''' window.
+
|| The '''iris data frame''' is displayed in the '''Source''' window.
  
 
|-  
 
|-  
Line 210: Line 208:
  
  
A typical '''iris''' dataset contains three different '''Species'''.  
+
A typical '''iris dataset''' contains three different '''Species'''.  
  
 
|-  
 
|-  
Line 217: Line 215:
  
 
'''Posing the Problem '''
 
'''Posing the Problem '''
|| Suppose that the botanist considers the following about the iris flower:  
+
|| Suppose that the botanist considers the following about the '''iris''' flower:  
* Can I build a model that learns from labels of known species?
+
* Can I build a '''model''' that learns from labels of known species?
* Can this model accurately predict the species from its measurements?
+
* Can this '''model''' accurately predict the species from its measurements?
 
+
  
 
|-  
 
|-  
Line 230: Line 227:
  
  
The classification model would work as a function as given below:
+
The '''classification model''' would work as a '''function''' as given below:
  
  
This mechanism is '''supervised''' learning.  
+
This mechanism is '''supervised learning'''.  
 
|-  
 
|-  
 
||
 
||
Line 242: Line 239:
 
'''Supervised Learning '''
 
'''Supervised Learning '''
  
|| In '''Supervised''' learning,  
+
|| In '''Supervised learning''',  
* The desired output labels are available for training datasets.  
+
* The desired output labels are available for training '''datasets'''.  
* These labels can be called supervisors.
+
* These labels can be called '''supervisors'''.
  
  
Line 253: Line 250:
 
'''Supervised Learning '''
 
'''Supervised Learning '''
 
||  
 
||  
* While learning, the model makes predictions using the given training dataset.  
+
* While '''learning''', the model makes predictions using the given training '''dataset'''.  
* The model iteratively makes predictions on the training dataset  
+
* The '''model''' iteratively makes predictions on the '''training dataset'''.
* The supervisor corrects the model.
+
* The '''supervisor''' corrects the '''model'''.
 
+
  
 
|-  
 
|-  
Line 262: Line 258:
  
 
'''Types of Supervised Learning '''
 
'''Types of Supervised Learning '''
|| There are two types of '''supervised''' learning:
+
|| There are two types of '''supervised learning''':
  
Regression and Classification.
+
'''Regression''' and '''Classification'''.
  
 
* '''Regression''' is applied to predict a continuous-valued output.  
 
* '''Regression''' is applied to predict a continuous-valued output.  
 
* For example, predicting prices for the real estate sector.  
 
* For example, predicting prices for the real estate sector.  
 
  
 
|-  
 
|-  
Line 274: Line 269:
 
||  
 
||  
 
* '''Classification''' is applied to predict a discrete-valued output.  
 
* '''Classification''' is applied to predict a discrete-valued output.  
* For example, predicting the species of an iris flower
+
* For example, predicting the species of an '''iris''' flower.
  
 
|-  
 
|-  
 
||  
 
||  
|| Let’s model a classification algorithm to predict the '''Species''' of an '''iris''' flower.  
+
|| Let’s '''model''' a '''classification algorithm''' to predict the '''Species''' of an '''iris''' flower.  
  
 
Here we will perform a '''2-class classification'''.  
 
Here we will perform a '''2-class classification'''.  
Line 285: Line 280:
  
  
For this task, we will apply a '''Naive Bayes''' classifier.  
+
For this task, we will apply a '''Naive Bayes classifier'''.  
 
|-  
 
|-  
 
||  
 
||  
Line 291: Line 286:
 
|-  
 
|-  
 
|| Highlight '''irisModel.R''' in the '''Source''' window button
 
|| Highlight '''irisModel.R''' in the '''Source''' window button
|| In the '''Source''' window, click on the script '''irisModel.R'''.  
+
|| In the '''Source''' window, click on the '''script irisModel.R'''.  
 
|-  
 
|-  
 
|| Libraries '''e1071''' and '''caret.'''
 
|| Libraries '''e1071''' and '''caret.'''
  
 
'''Install.packages() '''function.
 
'''Install.packages() '''function.
|| Here we need to install and import libraries '''e1071''' and '''caret.'''
+
|| Here we need to install and import '''libraries e1071''' and '''caret.'''
  
  
These packages are needed to fit a '''Naive Bayes classifier''' and visualize its performance.  
+
These '''packages''' are needed to fit a '''Naive Bayes classifier''' and visualize its performance.  
  
  
To know more about these packages, please refer to '''Additional Reading Material'''.  
+
To know more about these '''packages''', please refer to '''Additional Reading Material'''.  
  
  
As I have already installed '''e1071 '''and '''caret''', I will directly import these.  
+
As I have already installed '''e1071 '''and '''caret'''. I will directly import these.  
  
  
If you have not installed, please install using the '''install.packages '''function.  
+
If you have not installed, please install using the '''install.packages function'''.  
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 315: Line 310:
  
 
'''library(caret)'''
 
'''library(caret)'''
|| Let us type the following commands at the top of the script.  
+
|| Let us type the following '''commands''' at the top of the '''script'''.  
  
  
Press '''Ctrl''' + '''S''' keys to save the script.  
+
Press '''Ctrl''' + '''S''' keys to save the '''script'''.  
 
|-  
 
|-  
 
|| Highlight '''library(e1071)''' and
 
|| Highlight '''library(e1071)''' and
  
 
'''library(caret)''' in the '''Source''' window  
 
'''library(caret)''' in the '''Source''' window  
|| Select the commands and click the '''Run''' button to load these libraries.  
+
|| Select the '''commands''' and click the '''Run''' button to load these '''libraries'''.  
 
|-  
 
|-  
 
|| Highlight '''iris_data''' in the '''Source''' window  
 
|| Highlight '''iris_data''' in the '''Source''' window  
  
 
'''n <- nrow(iris_data)'''
 
'''n <- nrow(iris_data)'''
|| Type the following command below the '''View(iris_data)''' command.
+
|| Type the following '''command''' below the '''View(iris_data) command.'''
  
  
Using this command we can find rows in the '''iris_data'''.
+
Using this '''command''' we can find rows in the '''iris_data'''.
 
|-  
 
|-  
 
|| [RStudio]
 
|| [RStudio]
Line 338: Line 333:
 
'''n_train <- round(0.80 * n) '''
 
'''n_train <- round(0.80 * n) '''
  
|| We will now reserve the number of data points for the '''training set.'''  
+
|| We will now reserve the number of '''data points''' for the '''training set.'''  
  
  
Type the following command.
+
Type the following '''command'''.
  
  
For this model, I will use 80% of the data points for training the model.  
+
For this '''model''', I will use 80% of the '''data points''' for '''training''' the '''model'''.  
  
  
The remaining 20% of the data points will be used for testing the model.  
+
The remaining 20% of the '''data points''' will be used for '''testing''' the '''model'''.  
  
  
To know more about splitting the''' dataset, '''please refer''' '''to '''Additional Reading Material.'''
+
To know more about splitting the''' dataset, '''please refer to '''Additional Reading Material.'''
 
+
  
 
|-  
 
|-  
  
 
|| Click on Save button.
 
|| Click on Save button.
|| Save the script.
+
|| Save the '''script'''.
  
 
|-  
 
|-  
Line 367: Line 361:
  
 
'''iris_train <- iris_data[train_indices, ] '''
 
'''iris_train <- iris_data[train_indices, ] '''
|| Next, we will create a '''vector''' of indices.
+
|| Next, we will create a '''vector''' of '''indices'''.
  
  
Type the following commands.  
+
Type the following '''commands'''.  
  
  
Line 376: Line 370:
  
  
This vector will be used to extract the data points for the '''train''' set.  
+
This '''vector''' will be used to extract the '''data points''' for the '''train set'''.  
  
 
|-  
 
|-  
Line 383: Line 377:
  
 
Highlight the minus sign before '''train_indices'''
 
Highlight the minus sign before '''train_indices'''
|| Now, we will create a '''test''' set.  
+
|| Now, we will create a '''test set'''.  
  
  
Type the following command.  
+
Type the following '''command'''.  
  
  
Line 392: Line 386:
  
  
It is to exclude the data points already used in the '''train''' set.  
+
It is to exclude the '''data points''' already used in the '''train set'''.  
  
 
|-  
 
|-  
Line 399: Line 393:
  
 
Click Save and Run buttons.
 
Click Save and Run buttons.
|| Save the script and select the commands after '''View''' to the end.
+
|| Save the '''script''' and select the '''commands''' after '''View''' to the end.  
 
+
 
+
Click on the '''Run''' button to execute the selected commands.  
+
  
  
 +
Click on the '''Run''' button to '''execute''' the selected '''commands'''.
  
 
|-  
 
|-  
Line 414: Line 406:
  
 
|| Highlight '''iris_train''' and '''iris_test '''in the '''Environment.'''
 
|| Highlight '''iris_train''' and '''iris_test '''in the '''Environment.'''
|| Click the train set and test set to load them in the Source window.
+
|| Click the '''train set''' and '''test set''' to load them in the '''Source''' window.
  
  
Line 422: Line 414:
  
 
|| Drag the boundary.
 
|| Drag the boundary.
|| Now, we will '''train''' a '''classification model''' with a '''Naive Bayes''' '''classifier'''.  
+
|| Now, we will '''train''' a '''classification model''' with a '''Naive Bayes classifier'''.  
  
  
Line 435: Line 427:
  
 
Highlight the above command
 
Highlight the above command
|| In the '''Source''' window, type the following command.  
+
|| In the '''Source''' window, type the following '''command'''.  
  
  
We will learn more about arguments in the upcoming tutorials in this series.  
+
We will learn more about '''arguments''' in the upcoming tutorials in this series.  
  
  
Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together.  
+
Save the '''script''' and '''run''' this line by pressing '''Ctrl''' + '''Enter''' keys together.  
  
 
|-  
 
|-  
  
 
|| Highlight the '''iris_model''' command  
 
|| Highlight the '''iris_model''' command  
|| Now, let's use the '''test''' set to evaluate the performance of the model created.  
+
|| Now, let's use the '''test set''' to evaluate the performance of the '''model''' created.  
  
 
|-  
 
|-  
Line 453: Line 445:
  
 
'''class_prediction <- predict(object = iris_model, newdata = iris_test)'''
 
'''class_prediction <- predict(object = iris_model, newdata = iris_test)'''
|| Type the following command.  
+
|| Type the following '''command'''.  
  
Using this, we will predict the '''Species''' of the data points in the '''test''' set.  
+
Using this, we will predict the '''Species''' of the '''data points''' in the '''test set'''.  
  
  
Line 462: Line 454:
  
 
|| Highlight the '''class_prediction''' command  
 
|| Highlight the '''class_prediction''' command  
|| Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together.  
+
|| Save the '''script''' and run this line by pressing '''Ctrl''' + '''Enter''' keys together.  
  
 
|-  
 
|-  
  
 
|| Highlight '''class_prediction''' in the '''Environment''' window  
 
|| Highlight '''class_prediction''' in the '''Environment''' window  
|| Now we can use '''class_prediction''' values to evaluate the performance of our model.  
+
|| Now we can use '''class_prediction''' values to evaluate the performance of our '''model'''.  
  
  
Line 478: Line 470:
 
Confusion Matrix  
 
Confusion Matrix  
 
||  
 
||  
* It is a performance measurement for '''ML''' classification problems.  
+
* It is a performance measurement for '''ML classification ''' problems.  
* In these classification problems, the output can be two or more classes.<br/>
+
* In these '''classification problems''', the output can be two or more '''classes'''.<br/>
  
  
Line 497: Line 489:
  
 
'''reference = as.factor(iris_test$Species))'''
 
'''reference = as.factor(iris_test$Species))'''
|| Now, we will draw the '''confusion matrix''' to check the performance of this model.  
+
|| Now, we will draw the '''confusion matrix''' to check the performance of this '''model'''.  
  
  
In the '''Source''' window, type the following command:
+
In the '''Source''' window, type the following '''command'''.
  
Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together.  
+
Save the '''script''' and run this line by pressing '''Ctrl''' + '''Enter''' keys together.  
  
 
|-  
 
|-  
Line 541: Line 533:
 
Highlight the figures in the '''confusion matrix''' on the '''Console''' window
 
Highlight the figures in the '''confusion matrix''' on the '''Console''' window
  
|| Accuracy of the model can be checked using the values of '''True Positive''' and '''True Negative.'''
+
|| Accuracy of the '''model''' can be checked using the values of '''True Positive''' and '''True Negative.'''
  
  
In this case, the accuracy of the model is 1.  
+
In this case, the accuracy of the '''model''' is 1.  
  
  
The classification model correctly predicted the values for all the points in the '''test''' set.
+
The '''classification model''' correctly predicted the values for all the '''points''' in the '''test set'''.
  
 
|-  
 
|-  
Line 563: Line 555:
 
|| In this tutorial, we have learnt about:
 
|| In this tutorial, we have learnt about:
 
* '''Machine Learning''' and its types  
 
* '''Machine Learning''' and its types  
* '''Supervised''' Learning
+
* '''Supervised Learning'''  
* Classification model on '''iris''' data
+
* '''Classification model''' on '''iris data '''
 
* '''Confusion Matrix'''
 
* '''Confusion Matrix'''
 
  
 
|-  
 
|-  
 
 
|| Show Slide
 
|| Show Slide
  

Revision as of 18:43, 21 February 2022

Title of the script: Supervised Learning

Author: Sudhakar Kumar

Keywords: R, RStudio, machine learning, supervised learning, unsupervised, classification, Naive Bayes, confusion matrix, video tutorial.


Visual Cue Narration
Show Slide

Opening Slide

Welcome to this spoken tutorial on Supervised Learning.
Show Slide

Learning Objectives

In this tutorial, we will learn about:
  • Machine Learning and its types
  • Supervised learning
  • Classification model on iris data
  • Confusion matrix
Show Slide

System Specifications

This tutorial is recorded using,
  • Ubuntu Linux OS version 20.04
  • R version 4.1.2
  • RStudio version 1.4.1717

It is recommended to install R version 4.1.0 or higher.

Show Slide

Prerequisites


https://spoken-tutorial.org

To understand this tutorial, you should know,
  • Basics of R programming
  • Basics of Statistics


If not, please access the relevant tutorials on R on this website.

Show Slide

What is Machine Learning?

Now let us see what machine learning is?
  • ML is a science that enables computers to learn without being explicitly programmed
  • Its applications include self-driven cars, speech recognition, etc.
  • It is seen as a subset of Artificial Intelligence, also known as AI.
Show Slide

Classification of Machine Learning

ML is broadly classified into the following types:
  • Supervised learning,
  • Unsupervised learning,
  • Semi-supervised learning and
  • Reinforcement learning.


In this series, we will focus on Supervised and Unsupervised learning.

Show Slide

Iris Flower


Highlight the iris flower

Let us consider a flower named iris.

An image of this flower is shown here.


There are two critical parameters of an iris flower:

  • Sepal, and
  • Petal


One can measure the length and width of these two parameters.

Show Slide

Species of an iris flower


Highlight the species of an iris flower

Based on the measurements, three species of iris flower are available:
  • Setosa
  • Versicolor
  • Verginica
Show Slide

Tabulating the Data

Consider a situation:
  • A botanist wants to distinguish the species of iris flowers.
  • She collects four features of some iris flowers:
    • Sepal length and Sepal width
    • Petal length and Petal width
Show Slide

Tabulating the Data

She gets these flowers labeled as one of the three species by an expert.


Show Slide

Download Files

For this tutorial, we will use:
  • A data set iris.csv
  • A script file irisModel.R


Please download these files from the Code files link of this tutorial.

Make a copy and then use them for practising.

[Computer screen]


Highlight irisModel.R and the folder SupervisedLearning

I have downloaded and moved these files to the SupervisedLearning folder.


This folder is located in the MLProject folder on my Desktop.


I have also set the SupervisedLearning folder as my Working Directory.

Let us switch to RStudio.
Double click on irisModel.R to open in RStudio

Point to irisModel.R in RStudio.

Let’s open the script irisModel.R in RStudio.


For this, double-click on the script irisModel.R


Script irisModel.R opens in RStudio.

Highlight irisModel.R in the Source window Run this script by clicking on the Source button.


Highlight iris_data in the Source window The iris data frame is displayed in the Source window.
Highlight 100 entries, 5 total columns at the bottom of the Source window Here we can see five columns with 100 rows.
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window The columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species.
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window The first four columns are the features of an iris flower.


The fifth column, Species, is the label of each iris flower.

Highlight Species column in the Source window In the Source window, scroll down to locate the different Species.


Notice that there are two species of the iris flower, setosa and versicolor.


A typical iris dataset contains three different Species.

Show Slide

Posing the Problem

Suppose that the botanist considers the following about the iris flower:
  • Can I build a model that learns from labels of known species?
  • Can this model accurately predict the species from its measurements?
Show Slide

Mapping of Features and Labels

We will map the dimensions of sepal and petal to iris species.


The classification model would work as a function as given below:


This mechanism is supervised learning.

Highlight the function

Show Slide

Supervised Learning

In Supervised learning,
  • The desired output labels are available for training datasets.
  • These labels can be called supervisors.


Show Slide

Supervised Learning

  • While learning, the model makes predictions using the given training dataset.
  • The model iteratively makes predictions on the training dataset.
  • The supervisor corrects the model.
Show Slide

Types of Supervised Learning

There are two types of supervised learning:

Regression and Classification.

  • Regression is applied to predict a continuous-valued output.
  • For example, predicting prices for the real estate sector.
Types of Supervised Learning
  • Classification is applied to predict a discrete-valued output.
  • For example, predicting the species of an iris flower.
Let’s model a classification algorithm to predict the Species of an iris flower.

Here we will perform a 2-class classification.

The species which we will try to predict are setosa and versicolor.


For this task, we will apply a Naive Bayes classifier.

Let us switch to RStudio.
Highlight irisModel.R in the Source window button In the Source window, click on the script irisModel.R.
Libraries e1071 and caret.

Install.packages() function.

Here we need to install and import libraries e1071 and caret.


These packages are needed to fit a Naive Bayes classifier and visualize its performance.


To know more about these packages, please refer to Additional Reading Material.


As I have already installed e1071 and caret. I will directly import these.


If you have not installed, please install using the install.packages function.

[RStudio]

library(e1071)

library(caret)

Let us type the following commands at the top of the script.


Press Ctrl + S keys to save the script.

Highlight library(e1071) and

library(caret) in the Source window

Select the commands and click the Run button to load these libraries.
Highlight iris_data in the Source window

n <- nrow(iris_data)

Type the following command below the View(iris_data) command.


Using this command we can find rows in the iris_data.

[RStudio]

Type n_train <- round(0.80 * n)

We will now reserve the number of data points for the training set.


Type the following command.


For this model, I will use 80% of the data points for training the model.


The remaining 20% of the data points will be used for testing the model.


To know more about splitting the dataset, please refer to Additional Reading Material.

Click on Save button. Save the script.
[RStudio]


train_indices <- sample(1:n, n_train)


iris_train <- iris_data[train_indices, ]

Next, we will create a vector of indices.


Type the following commands.


It will be an 80% random sample of the total number of rows.


This vector will be used to extract the data points for the train set.

iris_test <- iris_data[-train_indices, ]


Highlight the minus sign before train_indices

Now, we will create a test set.


Type the following command.


Note that there is a minus sign before train_indices.


It is to exclude the data points already used in the train set.

Highlight the Source button

Click Save and Run buttons.

Save the script and select the commands after View to the end.


Click on the Run button to execute the selected commands.

Drag the boundary. I will drag the boundary to see the Environment tab clearly.
Highlight iris_train and iris_test in the Environment. Click the train set and test set to load them in the Source window.


In the Source window, click on iris_train and iris_test to see the details.

Drag the boundary. Now, we will train a classification model with a Naive Bayes classifier.


Again I will drag the boundary to see the Source window clearly.

[RStudio]

iris_model <- naiveBayes(formula = Species~., data = iris_train)


Highlight the above command

In the Source window, type the following command.


We will learn more about arguments in the upcoming tutorials in this series.


Save the script and run this line by pressing Ctrl + Enter keys together.

Highlight the iris_model command Now, let's use the test set to evaluate the performance of the model created.
[RStudio]

class_prediction <- predict(object = iris_model, newdata = iris_test)

Type the following command.

Using this, we will predict the Species of the data points in the test set.


Highlight the class_prediction command Save the script and run this line by pressing Ctrl + Enter keys together.
Highlight class_prediction in the Environment window Now we can use class_prediction values to evaluate the performance of our model.


For this, we can use a confusion matrix.

Show Slide

Confusion Matrix

  • It is a performance measurement for ML classification problems.
  • In these classification problems, the output can be two or more classes.


To know more about Confusion matrix, please refer to Additional Reading Material.

Let us switch to RStudio.
[RStudio]

Highlight iris_model in the Source window

confusionMatrix(data= class_prediction,

reference = as.factor(iris_test$Species))

Now, we will draw the confusion matrix to check the performance of this model.


In the Source window, type the following command.

Save the script and run this line by pressing Ctrl + Enter keys together.

Drag boundary. Drag boundary to see the Console window clearly.
[RStudio]

Highlight the Console window


Highlight the Confusion Matrix and Statistics

In the Console window, scroll up and locate the Confusion Matrix and Statistics.


The confusion matrix and its corresponding values are displayed.

[RStudio]

Highlight Reference in the Console window

Highlight Prediction in the Console window

Here, the Reference represents the actual values.


Prediction represents the predicted values.

[RStudio]

Highlight the figures in the confusion matrix on the Console window

Accuracy of the model can be checked using the values of True Positive and True Negative.


In this case, the accuracy of the model is 1.


The classification model correctly predicted the values for all the points in the test set.

With this we come to the end of tutorial.

Let us summarize.

Show Slide

Summary

In this tutorial, we have learnt about:
  • Machine Learning and its types
  • Supervised Learning
  • Classification model on iris data
  • Confusion Matrix
Show Slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show Slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

Please contact us.

Show Slide

Spoken Tutorial Forum to answer questions

Do you have questions about THIS Spoken Tutorial?

Please visit this site.

Choose the minute and second where you have the question.Explain your question briefly.

The FOSSEE project will ensure an answer.

You will have to register to ask questions.

Show Slide

Spoken Tutorial Forum for specific questions:

The Spoken Tutorial forum is for specific questions on this tutorial.

Please do not post unrelated and general questions on them.

This will help reduce the clutter.

With less clutter, we can use these discussions as instructional material.

Show Slide

Forum to answer questions

Do you have any general/technical questions?

Please visit the forum given in the link.

Show Slide

Textbook Companion

The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.

We give certificates to those who do this.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education, Govt. of India.
Show Slide

About the Contributors

This tutorial is contributed by Sudhakar Kumar and Madhuri Ganapathi from IIT Bombay.

Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey