Difference between revisions of "Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English"
Nancyvarkey (Talk | contribs) m (Nancyvarkey moved page Machine-Learning-using-R/C2/Supervised-Learning/English to Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English without leaving a redirect: Archiving previous version because new version will be created) |
|||
(2 intermediate revisions by one other user not shown) | |||
Line 50: | Line 50: | ||
|| To understand this tutorial, you should know, | || To understand this tutorial, you should know, | ||
* Basics of '''R programming''' | * Basics of '''R programming''' | ||
− | * Basics of Statistics | + | * Basics of Statistics. |
− | + | ||
If not, please access the relevant tutorials on '''R''' on this website. | If not, please access the relevant tutorials on '''R''' on this website. | ||
Line 126: | Line 125: | ||
'''Tabulating the Data''' | '''Tabulating the Data''' | ||
||She gets these flowers labeled as one of the three species by an expert. | ||She gets these flowers labeled as one of the three species by an expert. | ||
− | |||
|- | |- | ||
Line 135: | Line 133: | ||
|| For this tutorial, we will use: | || For this tutorial, we will use: | ||
* A '''data set iris.csv''' | * A '''data set iris.csv''' | ||
− | * A '''script''' file '''irisModel.R ''' | + | * A '''script''' file '''irisModel.R '''. |
Line 152: | Line 150: | ||
− | I have also set the '''SupervisedLearning''' folder as my '''Working Directory | + | I have also set the '''SupervisedLearning''' folder as my '''Working Directory'''. |
|- | |- | ||
|| Cursor near irisModel.R file. | || Cursor near irisModel.R file. | ||
Line 163: | Line 161: | ||
− | For this, double-click on the '''script irisModel.R''' | + | For this, double-click on the '''script irisModel.R'''. |
Line 229: | Line 227: | ||
This mechanism is '''supervised learning'''. | This mechanism is '''supervised learning'''. | ||
|- | |- | ||
− | || | + | ||'''Highlight the function ''' |
− | '''Highlight the function ''' | + | |
'''Show Slide ''' | '''Show Slide ''' | ||
Line 270: | Line 267: | ||
|- | |- | ||
|| | || | ||
− | || Let’s '''model''' a '''classification algorithm''' to predict the '''Species''' of an '''iris''' | + | || Let’s '''model''' a '''classification algorithm''' to predict the '''Species''' of an '''iris''' model. |
Here we will perform a '''2-class classification'''. | Here we will perform a '''2-class classification'''. | ||
Line 288: | Line 285: | ||
'''Install.packages() '''function. | '''Install.packages() '''function. | ||
− | || Here we need to install and import '''libraries e1071''' and '''caret | + | || Here we need to install and import '''libraries e1071''' and '''caret'''. |
Line 297: | Line 294: | ||
− | + | I have already installed '''e1071 ''' and '''caret'''. | |
+ | |||
+ | I will directly import these. | ||
Line 310: | Line 309: | ||
− | Press '''Ctrl''' | + | Press '''Ctrl''', '''S''' keys to save the '''script'''. |
|- | |- | ||
|| Highlight '''library(e1071)''' and | || Highlight '''library(e1071)''' and | ||
Line 323: | Line 322: | ||
− | Using this '''command''' we can find rows in | + | Using this '''command''' we can find rows in '''iris_data'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 342: | Line 341: | ||
− | To know more about splitting the''' dataset, '''please refer to '''Additional Reading Material | + | To know more about splitting the''' dataset, '''please refer to '''Additional Reading Material'''. |
|- | |- | ||
Line 445: | Line 444: | ||
Using this, we will predict the '''Species''' of the '''data points''' in the '''test set'''. | Using this, we will predict the '''Species''' of the '''data points''' in the '''test set'''. | ||
− | |||
− | |||
|- | |- | ||
Line 456: | Line 453: | ||
|| Highlight '''class_prediction''' in the '''Environment''' window | || Highlight '''class_prediction''' in the '''Environment''' window | ||
− | || Now we can use '''class_prediction''' values to evaluate the performance of our '''model'''. | + | || Now we can use the '''class_prediction''' values to evaluate the performance of our '''model'''. |
Line 468: | Line 465: | ||
|| | || | ||
* It is a performance measurement for '''ML classification ''' problems. | * It is a performance measurement for '''ML classification ''' problems. | ||
− | * In these '''classification problems''', the output can be two or more '''classes'''. | + | * In these '''classification problems''', the output can be two or more '''classes'''. |
− | To know more about '''Confusion matrix,''' please refer to '''Additional Reading Material | + | To know more about '''Confusion matrix,''' please refer to '''Additional Reading Material'''. |
|- | |- | ||
Line 496: | Line 493: | ||
|| Drag boundary. | || Drag boundary. | ||
− | || Drag boundary to see the '''Console''' window clearly. | + | || Drag the boundary to see the '''Console''' window clearly. |
|- | |- | ||
Line 540: | Line 537: | ||
|- | |- | ||
− | || | + | || Only Narration |
|| With this we come to the end of tutorial. | || With this we come to the end of tutorial. | ||
Line 557: | Line 554: | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
+ | |||
− | About the Spoken Tutorial Project | + | '''About the Spoken Tutorial Project''' |
|| The video at the following link summarises the Spoken Tutorial project. | || The video at the following link summarises the Spoken Tutorial project. | ||
Line 565: | Line 563: | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Spoken Tutorial Workshops | + | '''Spoken Tutorial Workshops''' |
|| We conduct workshops using Spoken Tutorials and give certificates. | || We conduct workshops using Spoken Tutorials and give certificates. | ||
Please contact us. | Please contact us. | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Spoken Tutorial Forum to answer questions | + | '''Spoken Tutorial Forum to answer questions''' |
|| Do you have questions about THIS Spoken Tutorial? | || Do you have questions about THIS Spoken Tutorial? | ||
Line 580: | Line 578: | ||
Please visit this site. | Please visit this site. | ||
− | Choose the minute and second where you have the question.Explain your question briefly. | + | Choose the minute and second where you have the question. |
+ | |||
+ | Explain your question briefly. | ||
The FOSSEE project will ensure an answer. | The FOSSEE project will ensure an answer. | ||
Line 588: | Line 588: | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Spoken Tutorial Forum for specific questions | + | '''Spoken Tutorial Forum for specific questions''' |
|| The Spoken Tutorial forum is for specific questions on this tutorial. | || The Spoken Tutorial forum is for specific questions on this tutorial. | ||
Line 601: | Line 601: | ||
|- | |- | ||
− | || Show Slide | + | || '''Show Slide''' |
− | Forum to answer questions | + | '''Forum to answer questions''' |
|| Do you have any general/technical questions? | || Do you have any general/technical questions? | ||
Latest revision as of 08:25, 9 October 2023
Title of the script: Supervised Learning
Author: Sudhakar Kumar
Keywords: R, RStudio, machine learning, supervised learning, unsupervised, classification, Naive Bayes, confusion matrix, video tutorial.
Visual Cue | Narration |
Show Slide
Opening Slide |
Welcome to this spoken tutorial on Supervised Learning. |
Show Slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show Slide
System Specifications |
This tutorial is recorded using,
It is recommended to install R version 4.1.0 or higher. |
Show Slide
Prerequisites |
To understand this tutorial, you should know,
If not, please access the relevant tutorials on R on this website. |
Show Slide
What is Machine Learning? |
Now let us see what machine learning is?
|
Show Slide
Classification of Machine Learning |
ML is broadly classified into the following types:
|
Show Slide
Iris Flower
|
Let us consider a flower named iris.
An image of this flower is shown here. There are two critical parameters of an iris flower:
|
Show Slide
Species of an iris flower
|
Based on the measurements, three species of iris flowers are available:
|
Show Slide
Tabulating the Data |
Consider a situation:
|
Show Slide
Tabulating the Data |
She gets these flowers labeled as one of the three species by an expert. |
Show Slide
Download Files |
For this tutorial, we will use:
Make a copy and then use them for practising. |
[Computer screen]
|
I have downloaded and moved these files to the SupervisedLearning folder.
|
Cursor near irisModel.R file. | Let us switch to RStudio. |
Double click on irisModel.R to open in RStudio
Point to irisModel.R in RStudio. |
Let’s open the script irisModel.R in RStudio.
|
Highlight irisModel.R in the Source window | Run this script by clicking on the Source button.
|
Highlight iris_data in the Source window | The iris data frame is displayed in the Source window. |
Highlight 100 entries, 5 total columns at the bottom of the Source window | Here we can see five columns with 100 rows. |
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window | The columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species. |
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window | The first four columns are the features of an iris flower.
|
Highlight Species column in the Source window | In the Source window, scroll down to locate the different Species.
|
Show Slide
Posing the Problem |
Suppose that the botanist considers the following about the iris flower:
|
Show Slide
Mapping of Features and Labels |
We will map the dimensions of sepal and petal to iris species.
|
Highlight the function
Show Slide Supervised Learning |
In Supervised learning,
|
Show Slide
Supervised Learning |
|
Show Slide
Types of Supervised Learning |
There are two types of supervised learning:
Regression and Classification.
|
Types of Supervised Learning |
|
Let’s model a classification algorithm to predict the Species of an iris model.
Here we will perform a 2-class classification. The species which we will try to predict are setosa and versicolor.
| |
Let us switch to RStudio. | |
Highlight irisModel.R in the Source window button | In the Source window, click on the script irisModel.R. |
Libraries e1071 and caret.
Install.packages() function. |
Here we need to install and import libraries e1071 and caret.
I will directly import these.
|
[RStudio]
library(e1071) library(caret) |
Let us type the following commands at the top of the script.
|
Highlight library(e1071) and
library(caret) in the Source window |
Select the commands and click the Run button to load these libraries. |
Highlight iris_data in the Source window
n <- nrow(iris_data) |
Type the following command below the View(iris_data) command.
|
[RStudio]
Type n_train <- round(0.80 * n) |
We will now reserve the number of data points for the training set.
|
Click on Save button. | Save the script. |
[RStudio]
|
Next, we will create a vector of indices.
|
iris_test <- iris_data[-train_indices, ]
|
Now, we will create a test set.
|
Highlight the Source button
Click Save and Run buttons. |
Save the script and select the commands after View to the end.
|
Drag the boundary. | I will drag the boundary to see the Environment tab clearly. |
Highlight iris_train and iris_test in the Environment. | Click the train set and test set to load them in the Source window.
|
Drag the boundary. | Now, we will train a classification model with a Naive Bayes classifier.
|
[RStudio]
iris_model <- naiveBayes(formula = Species~., data = iris_train)
|
In the Source window, type the following command.
|
Highlight the iris_model command | Now, let's use the test set to evaluate the performance of the model created. |
[RStudio]
class_prediction <- predict(object = iris_model, newdata = iris_test) |
Type the following command.
Using this, we will predict the Species of the data points in the test set. |
Highlight the class_prediction command | Save the script and run this line by pressing Ctrl + Enter keys together. |
Highlight class_prediction in the Environment window | Now we can use the class_prediction values to evaluate the performance of our model.
|
Show Slide
Confusion Matrix |
|
Let us switch to RStudio. | |
[RStudio]
Highlight iris_model in the Source window confusionMatrix(data= class_prediction, reference = as.factor(iris_test$Species)) |
Now, we will draw the confusion matrix to check the performance of this model.
Save the script and run this line by pressing Ctrl + Enter keys together. |
Drag boundary. | Drag the boundary to see the Console window clearly. |
[RStudio]
Highlight the Console window
|
In the Console window, scroll up and locate the Confusion Matrix and Statistics.
|
[RStudio]
Highlight Reference in the Console window Highlight Prediction in the Console window |
Here, the Reference represents the actual values.
|
[RStudio]
Highlight the figures in the confusion matrix on the Console window |
Accuracy of the model can be checked using the values of True Positive and True Negative.
|
Only Narration | With this we come to the end of tutorial.
Let us summarize. |
Show Slide
Summary |
In this tutorial, we have learnt about:
|
Show Slide
|
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show Slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Spoken Tutorial Forum to answer questions |
Do you have questions about THIS Spoken Tutorial?
Please visit this site. Choose the minute and second where you have the question. Explain your question briefly. The FOSSEE project will ensure an answer. You will have to register to ask questions. |
Show Slide
Spoken Tutorial Forum for specific questions |
The Spoken Tutorial forum is for specific questions on this tutorial.
Please do not post unrelated and general questions on them. This will help reduce the clutter. With less clutter, we can use these discussions as instructional material. |
Show Slide
Forum to answer questions |
Do you have any general/technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
We give certificates to those who do this. For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education, Govt. of India. |
Show Slide
About the Contributors |
This tutorial is contributed by Sudhakar Kumar and Madhuri Ganapathi from IIT Bombay.
Thank you for watching. |