Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English

Title of the script: Supervised Learning

Author: Sudhakar Kumar

Keywords: R, RStudio, machine learning, supervised learning, unsupervised, classification, Naive Bayes, confusion matrix, video tutorial.

Visual Cue	Narration
Show Slide Opening Slide	Welcome to this spoken tutorial on Supervised Learning.
Show Slide Learning Objectives	In this tutorial, we will learn about: Machine Learning and its types Supervised learning Classification model on iris data Confusion matrix
Show Slide System Specifications	This tutorial is recorded using, Ubuntu Linux OS version 20.04 R version 4.1.2 RStudio version 1.4.1717 It is recommended to install R version 4.1.0 or higher.
Show Slide Prerequisites https://spoken-tutorial.org	To understand this tutorial, you should know, Basics of R programming Basics of Statistics. If not, please access the relevant tutorials on R on this website.
Show Slide What is Machine Learning?	Now let us see what machine learning is? ML is a science that enables computers to learn without being explicitly programmed Its applications include self-driven cars, speech recognition, etc. It is seen as a subset of Artificial Intelligence, also known as AI.
Show Slide Classification of Machine Learning	ML is broadly classified into the following types: Supervised learning, Unsupervised learning, Semi-supervised learning and Reinforcement learning. In this series, we will focus on Supervised and Unsupervised learning.
Show Slide Iris Flower Highlight the iris flower	Let us consider a flower named iris. An image of this flower is shown here. There are two critical parameters of an iris flower: Sepal, and Petal One can measure the length and width of these two parameters.
Show Slide Species of an iris flower Highlight the species of an iris flower	Based on the measurements, three species of iris flowers are available: Setosa Versicolor Verginica
Show Slide Tabulating the Data	Consider a situation: A botanist wants to distinguish the species of iris flowers. She collects four features of some iris flowers: Sepal length and Sepal width Petal length and Petal width
Show Slide Tabulating the Data	She gets these flowers labeled as one of the three species by an expert.
Show Slide Download Files	For this tutorial, we will use: A data set iris.csv A script file irisModel.R . Please download these files from the Code files link of this tutorial. Make a copy and then use them for practising.
[Computer screen] Highlight irisModel.R and the folder SupervisedLearning	I have downloaded and moved these files to the SupervisedLearning folder. This folder is located in the MLProject on my Desktop. I have also set the SupervisedLearning folder as my Working Directory.
Cursor near irisModel.R file.	Let us switch to RStudio.
Double click on irisModel.R to open in RStudio Point to irisModel.R in RStudio.	Let’s open the script irisModel.R in RStudio. For this, double-click on the script irisModel.R. Script irisModel.R opens in RStudio.
Highlight irisModel.R in the Source window	Run this script by clicking on the Source button.
Highlight iris_data in the Source window	The iris data frame is displayed in the Source window.
Highlight 100 entries, 5 total columns at the bottom of the Source window	Here we can see five columns with 100 rows.
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window	The columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species.
Highlight Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species in the Source window	The first four columns are the features of an iris flower. The fifth column, Species, is the label of each iris flower.
Highlight Species column in the Source window	In the Source window, scroll down to locate the different Species. Notice that there are two species of the iris flower, setosa and versicolor. A typical iris dataset contains three different Species.
Show Slide Posing the Problem	Suppose that the botanist considers the following about the iris flower: Can I build a model that learns from labels of known species? Can this model accurately predict the species from its measurements?
Show Slide Mapping of Features and Labels	We will map the dimensions of sepal and petal to iris species. The classification model would work as a function as given below: This mechanism is supervised learning.
Highlight the function Show Slide Supervised Learning	In Supervised learning, The desired output labels are available for training datasets. These labels can be called supervisors.
Show Slide Supervised Learning	While learning, the model makes predictions using the given training dataset. The model iteratively makes predictions on the training dataset. The supervisor corrects the model.
Show Slide Types of Supervised Learning	There are two types of supervised learning: Regression and Classification. Regression is applied to predict a continuous-valued output. For example, predicting prices for the real estate sector.
Types of Supervised Learning	Classification is applied to predict a discrete-valued output. For example, predicting the species of an iris flower.
	Let’s model a classification algorithm to predict the Species of an iris model. Here we will perform a 2-class classification. The species which we will try to predict are setosa and versicolor. For this task, we will apply a Naive Bayes classifier.
	Let us switch to RStudio.
Highlight irisModel.R in the Source window button	In the Source window, click on the script irisModel.R.
Libraries e1071 and caret. Install.packages() function.	Here we need to install and import libraries e1071 and caret. These packages are needed to fit a Naive Bayes classifier and visualize its performance. To know more about these packages, please refer to Additional Reading Material. I have already installed e1071 and caret. I will directly import these. If you have not installed, please install using the install.packages function.
[RStudio] library(e1071) library(caret)	Let us type the following commands at the top of the script. Press Ctrl, S keys to save the script.
Highlight library(e1071) and library(caret) in the Source window	Select the commands and click the Run button to load these libraries.
Highlight iris_data in the Source window n <- nrow(iris_data)	Type the following command below the View(iris_data) command. Using this command we can find rows in iris_data.
[RStudio] Type *n_train <- round(0.80 n)**	We will now reserve the number of data points for the training set. Type the following command. For this model, I will use 80% of the data points for training the model. The remaining 20% of the data points will be used for testing the model. To know more about splitting the dataset, please refer to Additional Reading Material.
Click on Save button.	Save the script.
[RStudio] train_indices <- sample(1:n, n_train) iris_train <- iris_data[train_indices, ]	Next, we will create a vector of indices. Type the following commands. It will be an 80% random sample of the total number of rows. This vector will be used to extract the data points for the training set.
iris_test <- iris_data[-train_indices, ] Highlight the minus sign before train_indices	Now, we will create a test set. Type the following command. Note that there is a minus sign before train_indices. It is to exclude the data points already used in the training set.
Highlight the Source button Click Save and Run buttons.	Save the script and select the commands after View to the end. Click on the Run button to execute the selected commands.
Drag the boundary.	I will drag the boundary to see the Environment tab clearly.
Highlight iris_train and iris_test in the Environment.	Click the train set and test set to load them in the Source window. In the Source window, click on iris_train and iris_test to see the details.
Drag the boundary.	Now, we will train a classification model with a Naive Bayes classifier. Again I will drag the boundary to see the Source window clearly.
[RStudio] iris_model <- naiveBayes(formula = Species~., data = iris_train) Highlight the above command	In the Source window, type the following command. We will learn more about arguments in the upcoming tutorials in this series. Save the script and run this line by pressing Ctrl + Enter keys together.
Highlight the iris_model command	Now, let's use the test set to evaluate the performance of the model created.
[RStudio] class_prediction <- predict(object = iris_model, newdata = iris_test)	Type the following command. Using this, we will predict the Species of the data points in the test set.
Highlight the class_prediction command	Save the script and run this line by pressing Ctrl + Enter keys together.
Highlight class_prediction in the Environment window	Now we can use the class_prediction values to evaluate the performance of our model. For this, we can use a confusion matrix.
Show Slide Confusion Matrix	It is a performance measurement for ML classification problems. In these classification problems, the output can be two or more classes. To know more about Confusion matrix, please refer to Additional Reading Material.
	Let us switch to RStudio.
[RStudio] Highlight iris_model in the Source window confusionMatrix(data= class_prediction, reference = as.factor(iris_test$Species))	Now, we will draw the confusion matrix to check the performance of this model. In the Source window, type the following command. Save the script and run this line by pressing Ctrl + Enter keys together.
Drag boundary.	Drag the boundary to see the Console window clearly.
[RStudio] Highlight the Console window Highlight the Confusion Matrix and Statistics	In the Console window, scroll up and locate the Confusion Matrix and Statistics. The confusion matrix and its corresponding values are displayed.
[RStudio] Highlight Reference in the Console window Highlight Prediction in the Console window	Here, the Reference represents the actual values. Prediction represents the predicted values.
[RStudio] Highlight the figures in the confusion matrix on the Console window	Accuracy of the model can be checked using the values of True Positive and True Negative. In this case, the accuracy of the model is 1. The classification model correctly predicted the values for all the points in the test set.
Only Narration	With this we come to the end of tutorial. Let us summarize.
Show Slide Summary	In this tutorial, we have learnt about: Machine Learning and its types Supervised Learning Classification model on iris data Confusion Matrix
Show Slide About the Spoken Tutorial Project	The video at the following link summarises the Spoken Tutorial project. Please download and watch it.
Show Slide Spoken Tutorial Workshops	We conduct workshops using Spoken Tutorials and give certificates. Please contact us.
Show Slide Spoken Tutorial Forum to answer questions	Do you have questions about THIS Spoken Tutorial? Please visit this site. Choose the minute and second where you have the question. Explain your question briefly. The FOSSEE project will ensure an answer. You will have to register to ask questions.
Show Slide Spoken Tutorial Forum for specific questions	The Spoken Tutorial forum is for specific questions on this tutorial. Please do not post unrelated and general questions on them. This will help reduce the clutter. With less clutter, we can use these discussions as instructional material.
Show Slide Forum to answer questions	Do you have any general/technical questions? Please visit the forum given in the link.
Show Slide Textbook Companion	The FOSSEE team coordinates the coding of solved examples of popular books and case study projects. We give certificates to those who do this. For more details, please visit these sites.
Show Slide Acknowledgment	The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education, Govt. of India.
Show Slide About the Contributors	This tutorial is contributed by Sudhakar Kumar and Madhuri Ganapathi from IIT Bombay. Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey

Machine-Learning-using-R - old 2022/C2/Supervised-Learning/English

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools