Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English
Title of the script: Supervised Learning
Author: Sudhakar Kumar
Keywords: R, RStudio, machine learning, supervised learning, unsupervised, classification, Naive Bayes, confusion matrix, video tutorial.
Visual Cue | Narration |
Show Slide
Opening Slide |
Welcome to this spoken tutorial on Supervised Learning. |
Show Slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show Slide
System Specifications |
This tutorial is recorded using,
It is recommended to install R version 4.1.0 or higher. |
Show Slide
Prerequisites |
To understand this tutorial, you should know,
|
Show Slide
What is Machine Learning? |
Now let us see what machine learning is?
|
Show Slide
Classification of Machine Learning |
ML is broadly classified into the following types:
|
Show Slide
Iris Flower
|
Let us consider a flower named iris.
An image of this flower is shown here. There are two critical parameters of an iris flower:
|
Show Slide
Species of an iris flower
|
Based on the measurements, three species of iris flowers are available:
|
Show Slide
Tabulating the Data |
Consider a situation:
|
Show Slide
Tabulating the Data |
She gets these flowers labeled as one of the three species by an expert.
|
Show Slide
Download Files |
For this tutorial, we will use:
Make a copy and then use them for practising. |
[Computer screen]
|
I have downloaded and moved these files to the SupervisedLearning folder.
|
Cursor near irisModel.R file. | Let us switch to RStudio. |
Double click on irisModel.R to open in RStudio
Point to irisModel.R in RStudio. |
Let’s open the script irisModel.R in RStudio.
|
Highlight irisModel.R in the Source window | Run this script by clicking on the Source button.
|
Highlight iris_data in the Source window | The iris data frame is displayed in the Source window. |
Highlight 100 entries, 5 total columns at the bottom of the Source window | Here we can see five columns with 100 rows. |
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window | The columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species. |
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window | The first four columns are the features of an iris flower.
|
Highlight Species column in the Source window | In the Source window, scroll down to locate the different Species.
|
Show Slide
Posing the Problem |
Suppose that the botanist considers the following about the iris flower:
|
Show Slide
Mapping of Features and Labels |
We will map the dimensions of sepal and petal to iris species.
|
Highlight the function Show Slide Supervised Learning |
In Supervised learning,
|
Show Slide
Supervised Learning |
|
Show Slide
Types of Supervised Learning |
There are two types of supervised learning:
Regression and Classification.
|
Types of Supervised Learning |
|
Let’s model a classification algorithm to predict the Species of an iris flower.
Here we will perform a 2-class classification. The species which we will try to predict are setosa and versicolor.
| |
Let us switch to RStudio. | |
Highlight irisModel.R in the Source window button | In the Source window, click on the script irisModel.R. |
Libraries e1071 and caret.
Install.packages() function. |
Here we need to install and import libraries e1071 and caret.
|
[RStudio]
library(e1071) library(caret) |
Let us type the following commands at the top of the script.
|
Highlight library(e1071) and
library(caret) in the Source window |
Select the commands and click the Run button to load these libraries. |
Highlight iris_data in the Source window
n <- nrow(iris_data) |
Type the following command below the View(iris_data) command.
|
[RStudio]
Type n_train <- round(0.80 * n) |
We will now reserve the number of data points for the training set.
|
Click on Save button. | Save the script. |
[RStudio]
|
Next, we will create a vector of indices.
|
iris_test <- iris_data[-train_indices, ]
|
Now, we will create a test set.
|
Highlight the Source button
Click Save and Run buttons. |
Save the script and select the commands after View to the end.
|
Drag the boundary. | I will drag the boundary to see the Environment tab clearly. |
Highlight iris_train and iris_test in the Environment. | Click the train set and test set to load them in the Source window.
|
Drag the boundary. | Now, we will train a classification model with a Naive Bayes classifier.
|
[RStudio]
iris_model <- naiveBayes(formula = Species~., data = iris_train)
|
In the Source window, type the following command.
|
Highlight the iris_model command | Now, let's use the test set to evaluate the performance of the model created. |
[RStudio]
class_prediction <- predict(object = iris_model, newdata = iris_test) |
Type the following command.
Using this, we will predict the Species of the data points in the test set.
|
Highlight the class_prediction command | Save the script and run this line by pressing Ctrl + Enter keys together. |
Highlight class_prediction in the Environment window | Now we can use class_prediction values to evaluate the performance of our model.
|
Show Slide
Confusion Matrix |
|
Let us switch to RStudio. | |
[RStudio]
Highlight iris_model in the Source window confusionMatrix(data= class_prediction, reference = as.factor(iris_test$Species)) |
Now, we will draw the confusion matrix to check the performance of this model.
Save the script and run this line by pressing Ctrl + Enter keys together. |
Drag boundary. | Drag boundary to see the Console window clearly. |
[RStudio]
Highlight the Console window
|
In the Console window, scroll up and locate the Confusion Matrix and Statistics.
|
[RStudio]
Highlight Reference in the Console window Highlight Prediction in the Console window |
Here, the Reference represents the actual values.
|
[RStudio]
Highlight the figures in the confusion matrix on the Console window |
Accuracy of the model can be checked using the values of True Positive and True Negative.
|
With this we come to the end of tutorial.
Let us summarize. | |
Show Slide
Summary |
In this tutorial, we have learnt about:
|
Show Slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show Slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Spoken Tutorial Forum to answer questions |
Do you have questions about THIS Spoken Tutorial?
Please visit this site. Choose the minute and second where you have the question.Explain your question briefly. The FOSSEE project will ensure an answer. You will have to register to ask questions. |
Show Slide
Spoken Tutorial Forum for specific questions: |
The Spoken Tutorial forum is for specific questions on this tutorial.
Please do not post unrelated and general questions on them. This will help reduce the clutter. With less clutter, we can use these discussions as instructional material. |
Show Slide
Forum to answer questions |
Do you have any general/technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
We give certificates to those who do this. For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education, Govt. of India. |
Show Slide
About the Contributors |
This tutorial is contributed by Sudhakar Kumar and Madhuri Ganapathi from IIT Bombay.
Thank you for watching. |