Difference between revisions of "Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English"
Nancyvarkey (Talk | contribs) |
|||
Line 24: | Line 24: | ||
|| In this tutorial, we will learn about: | || In this tutorial, we will learn about: | ||
* '''Machine Learning''' and its types | * '''Machine Learning''' and its types | ||
− | * '''Supervised''' | + | * '''Supervised learning ''' |
− | * Classification model on '''iris''' | + | * '''Classification model''' on '''iris data''' |
* '''Confusion matrix''' | * '''Confusion matrix''' | ||
− | |||
|- | |- | ||
Line 50: | Line 49: | ||
'''https://spoken-tutorial.org''' | '''https://spoken-tutorial.org''' | ||
|| To understand this tutorial, you should know, | || To understand this tutorial, you should know, | ||
− | * Basics of '''R ''' | + | * Basics of '''R programming''' |
* Basics of Statistics | * Basics of Statistics | ||
Line 64: | Line 63: | ||
|| Now let us see what machine learning is? | || Now let us see what machine learning is? | ||
− | * ML is a science that enables computers to learn without being explicitly programmed | + | * '''ML''' is a science that enables computers to learn without being explicitly '''programmed''' |
* Its applications include self-driven cars, speech recognition, etc. | * Its applications include self-driven cars, speech recognition, etc. | ||
* It is seen as a subset of '''Artificial Intelligence''', also known as '''AI'''. | * It is seen as a subset of '''Artificial Intelligence''', also known as '''AI'''. | ||
Line 74: | Line 73: | ||
'''Classification of Machine Learning''' | '''Classification of Machine Learning''' | ||
− | || ML is broadly classified into the following types: | + | || '''ML''' is broadly classified into the following types: |
− | * '''Supervised | + | * '''Supervised learning''', |
− | * '''Unsupervised | + | * '''Unsupervised learning''', |
− | * '''Semi-supervised | + | * '''Semi-supervised learning''' and |
− | * '''Reinforcement | + | * '''Reinforcement learning'''. |
− | In this series, we will focus on '''Supervised''' and '''Unsupervised''' | + | In this series, we will focus on '''Supervised''' and '''Unsupervised learning'''. |
|- | |- | ||
|| '''Show Slide ''' | || '''Show Slide ''' | ||
Line 95: | Line 94: | ||
− | There are two critical parameters of an '''iris''' flower: | + | There are two critical '''parameters''' of an '''iris''' flower: |
* '''Sepal''', and | * '''Sepal''', and | ||
* '''Petal''' | * '''Petal''' | ||
− | One can measure the length and width of these two parameters. | + | One can measure the length and width of these two '''parameters'''. |
|- | |- | ||
|| '''Show Slide''' | || '''Show Slide''' | ||
Line 129: | Line 128: | ||
'''Tabulating the Data''' | '''Tabulating the Data''' | ||
− | || | + | ||She gets these flowers labeled as one of the three species by an expert. |
− | + | ||
Line 139: | Line 137: | ||
'''Download Files ''' | '''Download Files ''' | ||
|| For this tutorial, we will use: | || For this tutorial, we will use: | ||
− | * A '''data set | + | * A '''data set iris.csv''' |
* A '''script''' file '''irisModel.R ''' | * A '''script''' file '''irisModel.R ''' | ||
Line 154: | Line 152: | ||
− | This folder is located in the '''MLProject''' folder on my Desktop. | + | This folder is located in the '''MLProject''' folder on my '''Desktop'''. |
− | I have also set the '''SupervisedLearning''' folder as my | + | I have also set the '''SupervisedLearning''' folder as my '''Working Directory.''' |
|- | |- | ||
|| | || | ||
Line 165: | Line 163: | ||
Point to '''irisModel.R''' in RStudio. | Point to '''irisModel.R''' in RStudio. | ||
− | || Let’s open the | + | || Let’s open the '''script irisModel.R''' in '''RStudio'''. |
− | For this, double-click on the | + | For this, double-click on the '''script irisModel.R''' |
− | + | '''Script irisModel'''.'''R''' opens in '''RStudio'''. | |
|- | |- | ||
|| Highlight '''irisModel.R''' in the '''Source''' window | || Highlight '''irisModel.R''' in the '''Source''' window | ||
− | || Run this script by clicking on the '''Source''' button. | + | || '''Run''' this '''script''' by clicking on the '''Source''' button. |
|- | |- | ||
|| Highlight '''iris_data''' in the '''Source''' window | || Highlight '''iris_data''' in the '''Source''' window | ||
− | || The '''iris''' | + | || The '''iris data frame''' is displayed in the '''Source''' window. |
|- | |- | ||
Line 210: | Line 208: | ||
− | A typical '''iris''' | + | A typical '''iris dataset''' contains three different '''Species'''. |
|- | |- | ||
Line 217: | Line 215: | ||
'''Posing the Problem ''' | '''Posing the Problem ''' | ||
− | || Suppose that the botanist considers the following about the iris flower: | + | || Suppose that the botanist considers the following about the '''iris''' flower: |
− | * Can I build a model that learns from labels of known species? | + | * Can I build a '''model''' that learns from labels of known species? |
− | * Can this model accurately predict the species from its measurements? | + | * Can this '''model''' accurately predict the species from its measurements? |
− | + | ||
|- | |- | ||
Line 230: | Line 227: | ||
− | The classification model would work as a function as given below: | + | The '''classification model''' would work as a '''function''' as given below: |
− | This mechanism is '''supervised''' | + | This mechanism is '''supervised learning'''. |
|- | |- | ||
|| | || | ||
Line 242: | Line 239: | ||
'''Supervised Learning ''' | '''Supervised Learning ''' | ||
− | || In '''Supervised''' | + | || In '''Supervised learning''', |
− | * The desired output labels are available for training datasets. | + | * The desired output labels are available for training '''datasets'''. |
− | * These labels can be called supervisors. | + | * These labels can be called '''supervisors'''. |
Line 253: | Line 250: | ||
'''Supervised Learning ''' | '''Supervised Learning ''' | ||
|| | || | ||
− | * While learning, the model makes predictions using the given training dataset. | + | * While '''learning''', the model makes predictions using the given training '''dataset'''. |
− | * The model iteratively makes predictions on the training dataset | + | * The '''model''' iteratively makes predictions on the '''training dataset'''. |
− | * The supervisor corrects the model. | + | * The '''supervisor''' corrects the '''model'''. |
− | + | ||
|- | |- | ||
Line 262: | Line 258: | ||
'''Types of Supervised Learning ''' | '''Types of Supervised Learning ''' | ||
− | || There are two types of '''supervised''' | + | || There are two types of '''supervised learning''': |
− | Regression and Classification. | + | '''Regression''' and '''Classification'''. |
* '''Regression''' is applied to predict a continuous-valued output. | * '''Regression''' is applied to predict a continuous-valued output. | ||
* For example, predicting prices for the real estate sector. | * For example, predicting prices for the real estate sector. | ||
− | |||
|- | |- | ||
Line 274: | Line 269: | ||
|| | || | ||
* '''Classification''' is applied to predict a discrete-valued output. | * '''Classification''' is applied to predict a discrete-valued output. | ||
− | * For example, predicting the species of an iris flower | + | * For example, predicting the species of an '''iris''' flower. |
|- | |- | ||
|| | || | ||
− | || Let’s model a classification algorithm to predict the '''Species''' of an '''iris''' flower. | + | || Let’s '''model''' a '''classification algorithm''' to predict the '''Species''' of an '''iris''' flower. |
Here we will perform a '''2-class classification'''. | Here we will perform a '''2-class classification'''. | ||
Line 285: | Line 280: | ||
− | For this task, we will apply a '''Naive Bayes''' | + | For this task, we will apply a '''Naive Bayes classifier'''. |
|- | |- | ||
|| | || | ||
Line 291: | Line 286: | ||
|- | |- | ||
|| Highlight '''irisModel.R''' in the '''Source''' window button | || Highlight '''irisModel.R''' in the '''Source''' window button | ||
− | || In the '''Source''' window, click on the | + | || In the '''Source''' window, click on the '''script irisModel.R'''. |
|- | |- | ||
|| Libraries '''e1071''' and '''caret.''' | || Libraries '''e1071''' and '''caret.''' | ||
'''Install.packages() '''function. | '''Install.packages() '''function. | ||
− | || Here we need to install and import | + | || Here we need to install and import '''libraries e1071''' and '''caret.''' |
− | These packages are needed to fit a '''Naive Bayes classifier''' and visualize its performance. | + | These '''packages''' are needed to fit a '''Naive Bayes classifier''' and visualize its performance. |
− | To know more about these packages, please refer to '''Additional Reading Material'''. | + | To know more about these '''packages''', please refer to '''Additional Reading Material'''. |
− | As I have already installed '''e1071 '''and '''caret''' | + | As I have already installed '''e1071 '''and '''caret'''. I will directly import these. |
− | If you have not installed, please install using the '''install.packages ''' | + | If you have not installed, please install using the '''install.packages function'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 315: | Line 310: | ||
'''library(caret)''' | '''library(caret)''' | ||
− | || Let us type the following commands at the top of the script. | + | || Let us type the following '''commands''' at the top of the '''script'''. |
− | Press '''Ctrl''' + '''S''' keys to save the script. | + | Press '''Ctrl''' + '''S''' keys to save the '''script'''. |
|- | |- | ||
|| Highlight '''library(e1071)''' and | || Highlight '''library(e1071)''' and | ||
'''library(caret)''' in the '''Source''' window | '''library(caret)''' in the '''Source''' window | ||
− | || Select the commands and click the '''Run''' button to load these libraries. | + | || Select the '''commands''' and click the '''Run''' button to load these '''libraries'''. |
|- | |- | ||
|| Highlight '''iris_data''' in the '''Source''' window | || Highlight '''iris_data''' in the '''Source''' window | ||
'''n <- nrow(iris_data)''' | '''n <- nrow(iris_data)''' | ||
− | || Type the following command below the '''View(iris_data)''' | + | || Type the following '''command''' below the '''View(iris_data) command.''' |
− | Using this command we can find rows in the '''iris_data'''. | + | Using this '''command''' we can find rows in the '''iris_data'''. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 338: | Line 333: | ||
'''n_train <- round(0.80 * n) ''' | '''n_train <- round(0.80 * n) ''' | ||
− | || We will now reserve the number of data points for the '''training set.''' | + | || We will now reserve the number of '''data points''' for the '''training set.''' |
− | Type the following command. | + | Type the following '''command'''. |
− | For this model, I will use 80% of the data points for training the model. | + | For this '''model''', I will use 80% of the '''data points''' for '''training''' the '''model'''. |
− | The remaining 20% of the data points will be used for testing the model. | + | The remaining 20% of the '''data points''' will be used for '''testing''' the '''model'''. |
− | To know more about splitting the''' dataset, '''please refer | + | To know more about splitting the''' dataset, '''please refer to '''Additional Reading Material.''' |
− | + | ||
|- | |- | ||
|| Click on Save button. | || Click on Save button. | ||
− | || Save the script. | + | || Save the '''script'''. |
|- | |- | ||
Line 367: | Line 361: | ||
'''iris_train <- iris_data[train_indices, ] ''' | '''iris_train <- iris_data[train_indices, ] ''' | ||
− | || Next, we will create a '''vector''' of indices. | + | || Next, we will create a '''vector''' of '''indices'''. |
− | Type the following commands. | + | Type the following '''commands'''. |
Line 376: | Line 370: | ||
− | This vector will be used to extract the data points for the '''train''' | + | This '''vector''' will be used to extract the '''data points''' for the '''train set'''. |
|- | |- | ||
Line 383: | Line 377: | ||
Highlight the minus sign before '''train_indices''' | Highlight the minus sign before '''train_indices''' | ||
− | || Now, we will create a '''test''' | + | || Now, we will create a '''test set'''. |
− | Type the following command. | + | Type the following '''command'''. |
Line 392: | Line 386: | ||
− | It is to exclude the data points already used in the '''train''' | + | It is to exclude the '''data points''' already used in the '''train set'''. |
|- | |- | ||
Line 399: | Line 393: | ||
Click Save and Run buttons. | Click Save and Run buttons. | ||
− | || Save the script and select the | + | || Save the '''script''' and select the '''commands''' after '''View''' to the end. |
− | + | ||
− | + | ||
− | + | ||
+ | Click on the '''Run''' button to '''execute''' the selected '''commands'''. | ||
|- | |- | ||
Line 414: | Line 406: | ||
|| Highlight '''iris_train''' and '''iris_test '''in the '''Environment.''' | || Highlight '''iris_train''' and '''iris_test '''in the '''Environment.''' | ||
− | || Click the train set and test set to load them in the Source window. | + | || Click the '''train set''' and '''test set''' to load them in the '''Source''' window. |
Line 422: | Line 414: | ||
|| Drag the boundary. | || Drag the boundary. | ||
− | || Now, we will '''train''' a '''classification model''' with a '''Naive Bayes | + | || Now, we will '''train''' a '''classification model''' with a '''Naive Bayes classifier'''. |
Line 435: | Line 427: | ||
Highlight the above command | Highlight the above command | ||
− | || In the '''Source''' window, type the following command. | + | || In the '''Source''' window, type the following '''command'''. |
− | We will learn more about arguments in the upcoming tutorials in this series. | + | We will learn more about '''arguments''' in the upcoming tutorials in this series. |
− | Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together. | + | Save the '''script''' and '''run''' this line by pressing '''Ctrl''' + '''Enter''' keys together. |
|- | |- | ||
|| Highlight the '''iris_model''' command | || Highlight the '''iris_model''' command | ||
− | || Now, let's use the '''test''' | + | || Now, let's use the '''test set''' to evaluate the performance of the '''model''' created. |
|- | |- | ||
Line 453: | Line 445: | ||
'''class_prediction <- predict(object = iris_model, newdata = iris_test)''' | '''class_prediction <- predict(object = iris_model, newdata = iris_test)''' | ||
− | || Type the following command. | + | || Type the following '''command'''. |
− | Using this, we will predict the '''Species''' of the data points in the '''test''' | + | Using this, we will predict the '''Species''' of the '''data points''' in the '''test set'''. |
Line 462: | Line 454: | ||
|| Highlight the '''class_prediction''' command | || Highlight the '''class_prediction''' command | ||
− | || Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together. | + | || Save the '''script''' and run this line by pressing '''Ctrl''' + '''Enter''' keys together. |
|- | |- | ||
|| Highlight '''class_prediction''' in the '''Environment''' window | || Highlight '''class_prediction''' in the '''Environment''' window | ||
− | || Now we can use '''class_prediction''' values to evaluate the performance of our model. | + | || Now we can use '''class_prediction''' values to evaluate the performance of our '''model'''. |
Line 478: | Line 470: | ||
Confusion Matrix | Confusion Matrix | ||
|| | || | ||
− | * It is a performance measurement for '''ML''' | + | * It is a performance measurement for '''ML classification ''' problems. |
− | * In these classification problems, the output can be two or more classes.<br/> | + | * In these '''classification problems''', the output can be two or more '''classes'''.<br/> |
Line 497: | Line 489: | ||
'''reference = as.factor(iris_test$Species))''' | '''reference = as.factor(iris_test$Species))''' | ||
− | || Now, we will draw the '''confusion matrix''' to check the performance of this model. | + | || Now, we will draw the '''confusion matrix''' to check the performance of this '''model'''. |
− | In the '''Source''' window, type the following command | + | In the '''Source''' window, type the following '''command'''. |
− | Save the script and run this line by pressing '''Ctrl''' + '''Enter''' keys together. | + | Save the '''script''' and run this line by pressing '''Ctrl''' + '''Enter''' keys together. |
|- | |- | ||
Line 541: | Line 533: | ||
Highlight the figures in the '''confusion matrix''' on the '''Console''' window | Highlight the figures in the '''confusion matrix''' on the '''Console''' window | ||
− | || Accuracy of the model can be checked using the values of '''True Positive''' and '''True Negative.''' | + | || Accuracy of the '''model''' can be checked using the values of '''True Positive''' and '''True Negative.''' |
− | In this case, the accuracy of the model is 1. | + | In this case, the accuracy of the '''model''' is 1. |
− | The classification model correctly predicted the values for all the points in the '''test''' | + | The '''classification model''' correctly predicted the values for all the '''points''' in the '''test set'''. |
|- | |- | ||
Line 563: | Line 555: | ||
|| In this tutorial, we have learnt about: | || In this tutorial, we have learnt about: | ||
* '''Machine Learning''' and its types | * '''Machine Learning''' and its types | ||
− | * '''Supervised''' | + | * '''Supervised Learning''' |
− | * Classification model on '''iris''' | + | * '''Classification model''' on '''iris data ''' |
* '''Confusion Matrix''' | * '''Confusion Matrix''' | ||
− | |||
|- | |- | ||
− | |||
|| Show Slide | || Show Slide | ||
Revision as of 18:43, 21 February 2022
Title of the script: Supervised Learning
Author: Sudhakar Kumar
Keywords: R, RStudio, machine learning, supervised learning, unsupervised, classification, Naive Bayes, confusion matrix, video tutorial.
Visual Cue | Narration |
Show Slide
Opening Slide |
Welcome to this spoken tutorial on Supervised Learning. |
Show Slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show Slide
System Specifications |
This tutorial is recorded using,
It is recommended to install R version 4.1.0 or higher. |
Show Slide
Prerequisites |
To understand this tutorial, you should know,
If not, please access the relevant tutorials on R on this website. |
Show Slide
What is Machine Learning? |
Now let us see what machine learning is?
|
Show Slide
Classification of Machine Learning |
ML is broadly classified into the following types:
In this series, we will focus on Supervised and Unsupervised learning. |
Show Slide
Iris Flower
|
Let us consider a flower named iris.
An image of this flower is shown here.
|
Show Slide
Species of an iris flower
|
Based on the measurements, three species of iris flower are available:
|
Show Slide
Tabulating the Data |
Consider a situation:
|
Show Slide
Tabulating the Data |
She gets these flowers labeled as one of the three species by an expert.
|
Show Slide
Download Files |
For this tutorial, we will use:
Make a copy and then use them for practising. |
[Computer screen]
|
I have downloaded and moved these files to the SupervisedLearning folder.
|
Let us switch to RStudio. | |
Double click on irisModel.R to open in RStudio
Point to irisModel.R in RStudio. |
Let’s open the script irisModel.R in RStudio.
|
Highlight irisModel.R in the Source window | Run this script by clicking on the Source button.
|
Highlight iris_data in the Source window | The iris data frame is displayed in the Source window. |
Highlight 100 entries, 5 total columns at the bottom of the Source window | Here we can see five columns with 100 rows. |
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window | The columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species. |
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window | The first four columns are the features of an iris flower.
|
Highlight Species column in the Source window | In the Source window, scroll down to locate the different Species.
|
Show Slide
Posing the Problem |
Suppose that the botanist considers the following about the iris flower:
|
Show Slide
Mapping of Features and Labels |
We will map the dimensions of sepal and petal to iris species.
|
Highlight the function Show Slide Supervised Learning |
In Supervised learning,
|
Show Slide
Supervised Learning |
|
Show Slide
Types of Supervised Learning |
There are two types of supervised learning:
Regression and Classification.
|
Types of Supervised Learning |
|
Let’s model a classification algorithm to predict the Species of an iris flower.
Here we will perform a 2-class classification. The species which we will try to predict are setosa and versicolor.
| |
Let us switch to RStudio. | |
Highlight irisModel.R in the Source window button | In the Source window, click on the script irisModel.R. |
Libraries e1071 and caret.
Install.packages() function. |
Here we need to install and import libraries e1071 and caret.
|
[RStudio]
library(e1071) library(caret) |
Let us type the following commands at the top of the script.
|
Highlight library(e1071) and
library(caret) in the Source window |
Select the commands and click the Run button to load these libraries. |
Highlight iris_data in the Source window
n <- nrow(iris_data) |
Type the following command below the View(iris_data) command.
|
[RStudio]
Type n_train <- round(0.80 * n) |
We will now reserve the number of data points for the training set.
|
Click on Save button. | Save the script. |
[RStudio]
|
Next, we will create a vector of indices.
|
iris_test <- iris_data[-train_indices, ]
|
Now, we will create a test set.
|
Highlight the Source button
Click Save and Run buttons. |
Save the script and select the commands after View to the end.
|
Drag the boundary. | I will drag the boundary to see the Environment tab clearly. |
Highlight iris_train and iris_test in the Environment. | Click the train set and test set to load them in the Source window.
|
Drag the boundary. | Now, we will train a classification model with a Naive Bayes classifier.
|
[RStudio]
iris_model <- naiveBayes(formula = Species~., data = iris_train)
|
In the Source window, type the following command.
|
Highlight the iris_model command | Now, let's use the test set to evaluate the performance of the model created. |
[RStudio]
class_prediction <- predict(object = iris_model, newdata = iris_test) |
Type the following command.
Using this, we will predict the Species of the data points in the test set.
|
Highlight the class_prediction command | Save the script and run this line by pressing Ctrl + Enter keys together. |
Highlight class_prediction in the Environment window | Now we can use class_prediction values to evaluate the performance of our model.
|
Show Slide
Confusion Matrix |
|
Let us switch to RStudio. | |
[RStudio]
Highlight iris_model in the Source window confusionMatrix(data= class_prediction, reference = as.factor(iris_test$Species)) |
Now, we will draw the confusion matrix to check the performance of this model.
Save the script and run this line by pressing Ctrl + Enter keys together. |
Drag boundary. | Drag boundary to see the Console window clearly. |
[RStudio]
Highlight the Console window
|
In the Console window, scroll up and locate the Confusion Matrix and Statistics.
|
[RStudio]
Highlight Reference in the Console window Highlight Prediction in the Console window |
Here, the Reference represents the actual values.
|
[RStudio]
Highlight the figures in the confusion matrix on the Console window |
Accuracy of the model can be checked using the values of True Positive and True Negative.
|
With this we come to the end of tutorial.
Let us summarize. | |
Show Slide
Summary |
In this tutorial, we have learnt about:
|
Show Slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show Slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
Please contact us. |
Show Slide
Spoken Tutorial Forum to answer questions |
Do you have questions about THIS Spoken Tutorial?
Please visit this site. Choose the minute and second where you have the question.Explain your question briefly. The FOSSEE project will ensure an answer. You will have to register to ask questions. |
Show Slide
Spoken Tutorial Forum for specific questions: |
The Spoken Tutorial forum is for specific questions on this tutorial.
Please do not post unrelated and general questions on them. This will help reduce the clutter. With less clutter, we can use these discussions as instructional material. |
Show Slide
Forum to answer questions |
Do you have any general/technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
We give certificates to those who do this. For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education, Govt. of India. |
Show Slide
About the Contributors |
This tutorial is contributed by Sudhakar Kumar and Madhuri Ganapathi from IIT Bombay.
Thank you for watching. |