Difference between revisions of "Machine-Learning-using-R - old 2022/C3/Support-Vector-Machine-using-R/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with "Title of the script: Support Vector Machine using R Author: Tanmay Srinath Keywords: R, RStudio, support vector machine, machine learning, console, dataset, confusion matri...")
 
Line 24: Line 24:
  
 
|| In this tutorial, we will learn about:  
 
|| In this tutorial, we will learn about:  
* '''Support Vector Machine '''or''' SVM'''.
+
 
 +
* '''Support Vector Machine ''' or ''' SVM'''.
 +
 
 
* Advantages and applications of '''SVM '''.
 
* Advantages and applications of '''SVM '''.
* Practical implementation of '''SVM''' in '''R'''.
+
 
 +
* Practical implementation of '''SVM''' in '''R'''.
 +
 
 
* Disadvantages of '''SVM'''.
 
* Disadvantages of '''SVM'''.
  
Line 35: Line 39:
 
'''System Specifications'''
 
'''System Specifications'''
 
|| This tutorial is recorded using,
 
|| This tutorial is recorded using,
 +
 
* '''Ubuntu Linux '''OS version 20.04
 
* '''Ubuntu Linux '''OS version 20.04
 +
 
* '''R '''version''' '''4.1.2
 
* '''R '''version''' '''4.1.2
 +
 
* '''RStudio''' version 1.4.1717
 
* '''RStudio''' version 1.4.1717
  
Line 46: Line 53:
 
'''Prerequisites '''
 
'''Prerequisites '''
 
|| To follow this tutorial, the learner should know:
 
|| To follow this tutorial, the learner should know:
 +
 
* Basic programming in '''R'''.
 
* Basic programming in '''R'''.
 +
 
* Basics of '''Machine Learning'''.  
 
* Basics of '''Machine Learning'''.  
  
Line 56: Line 65:
  
 
|| '''Support Vector Machine'''.
 
|| '''Support Vector Machine'''.
 +
 
* It is a '''supervised machine learning''' technique.
 
* It is a '''supervised machine learning''' technique.
 +
 
* '''SVM''' constructs a line to separate 2-dimensional data into classes.
 
* '''SVM''' constructs a line to separate 2-dimensional data into classes.
 +
 
* For n-dimensional data, it constructs a hyperplane to separate the classes.
 
* For n-dimensional data, it constructs a hyperplane to separate the classes.
 +
  
 
|-  
 
|-  
Line 65: Line 78:
 
'''Advantages of SVM'''
 
'''Advantages of SVM'''
 
|| Advantages of '''SVM'''.
 
|| Advantages of '''SVM'''.
 +
 
* '''SVM''' is relatively memory efficient.
 
* '''SVM''' is relatively memory efficient.
 +
 
* '''SVM''' can be used for both classification and regression.
 
* '''SVM''' can be used for both classification and regression.
  
Line 74: Line 89:
  
 
|| Applications of''' SVM'''.
 
|| Applications of''' SVM'''.
 +
 
* Image classification.
 
* Image classification.
 +
 
* Recognition of handwritten characters.
 
* Recognition of handwritten characters.
 +
 
* Face detection.
 
* Face detection.
  
 
|-  
 
|-  
||  
+
|| '''Show Slide'''
 +
 
 +
'''Applications of SVM'''
 
|| The following slides contain links to 3 '''datasets'''.
 
|| The following slides contain links to 3 '''datasets'''.
  
Line 95: Line 115:
 
* '''Handwritten characters''':  
 
* '''Handwritten characters''':  
 
'''[https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset]'''
 
'''[https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset]'''
* '''Face detection''':
+
 
 +
* and '''Face detection''':
 
'''[https://archive.ics.uci.edu/ml/datasets/cmu+face+images https://archive.ics.uci.edu/ml/datasets/cmu+face+images]'''
 
'''[https://archive.ics.uci.edu/ml/datasets/cmu+face+images https://archive.ics.uci.edu/ml/datasets/cmu+face+images]'''
  
Line 126: Line 147:
 
I have also set the '''SVM''' folder as my '''Working Directory'''.
 
I have also set the '''SVM''' folder as my '''Working Directory'''.
 
|-  
 
|-  
||  
+
|| Cursor on SVM.R file.
 
|| Let us switch to '''RStudio'''.  
 
|| Let us switch to '''RStudio'''.  
 
|-  
 
|-  
Line 164: Line 185:
  
 
Highlight '''caret'''
 
Highlight '''caret'''
|| We will use the''' e1071 '''library for the '''svm() '''function.
+
|| We will use the''' e1071 '''library for the '''svm() ''' function.
  
 
We will use the '''caret''' package to create the '''confusion matrix'''.
 
We will use the '''caret''' package to create the '''confusion matrix'''.
Line 200: Line 221:
  
  
Hence, we subset the''' iris dataset''' to take only these columns.
+
Hence, we subset the ''' iris dataset''' to take only these columns.
  
  

Revision as of 15:20, 6 March 2023

Title of the script: Support Vector Machine using R

Author: Tanmay Srinath

Keywords: R, RStudio, support vector machine, machine learning, console, dataset, confusion matrix, radial kernel, video tutorial.


Visual Cue Narration
Show slide

Opening Slide

Welcome to this spoken tutorial on Support Vector Machine using R.
Show slide

Learning Objectives


In this tutorial, we will learn about:
  • Support Vector Machine or SVM.
  • Advantages and applications of SVM .
  • Practical implementation of SVM in R.
  • Disadvantages of SVM.


Show slide

System Specifications

This tutorial is recorded using,
  • Ubuntu Linux OS version 20.04
  • R version 4.1.2
  • RStudio version 1.4.1717

It is recommended to install R version 4.1.0 or higher.

Show slide

Prerequisites

To follow this tutorial, the learner should know:
  • Basic programming in R.
  • Basics of Machine Learning.

If not, please access the relevant tutorials on this website.

Show slide

Support Vector Machine

Support Vector Machine.
  • It is a supervised machine learning technique.
  • SVM constructs a line to separate 2-dimensional data into classes.
  • For n-dimensional data, it constructs a hyperplane to separate the classes.


Show slide

Advantages of SVM

Advantages of SVM.
  • SVM is relatively memory efficient.
  • SVM can be used for both classification and regression.
Show slide

Applications of SVM

Applications of SVM.
  • Image classification.
  • Recognition of handwritten characters.
  • Face detection.
Show Slide

Applications of SVM

The following slides contain links to 3 datasets.

These datasets correspond to each application of SVM.

One can use them to practically implement the applications.

Show Slide

Datasets for Implementation

  • Image classification:

https://github.com/zalandoresearch/fashion-mnist

  • Handwritten characters:

https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset

  • and Face detection:

https://archive.ics.uci.edu/ml/datasets/cmu+face+images

Show slide

Support Vector Machine

Now we will perform Support

vector Machine on the built-in iris dataset.

Show slide

Download Files

For this tutorial, I will use a script file SVM.R.

Please download this file from the Code files link of this tutorial.

Make a copy and then use it for practising.

[Computer screen]

Highlight SVM.R and the folder SVM.

I have downloaded and moved this file to the SVM folder.

The SVM folder is in the MLProject folder on the Desktop.


I have also set the SVM folder as my Working Directory.

Cursor on SVM.R file. Let us switch to RStudio.
Double click SVM.R to open in Rstudio.

Point to SVM.R in RStudio.

Open the script SVM.R in RStudio.

Script SVM.R opens in RStudio.

[Rstudio] highlight

library("e1071")

library("caret")


Select and run these commands to import the packages.


Highlight data(“iris”)

Double click in the Environment tab to load the iris dataset.


Run this command to import the iris dataset.

In the Environment tab double click on iris values to load the data.

Then click the iris data to load the dataset in the Source window.

[RStudio]

Highlight e1071

Highlight caret

We will use the e1071 library for the svm() function.

We will use the caret package to create the confusion matrix.

Cursor on the dataset. We will now split our dataset into dependent and independent variables.
[Rstudio] Highlight.

x <- subset(iris[,-c(1,2,5)])

y <- iris$Species

Type these commands.

Now run these commands.

Click the x data frame in the Environment tab In the Environment tab click the x data to load the data set in the Source window.
Point to the Petal.Length and Petal.Width columns.


Highlight:

x <- subset(iris[,-c(1,2,5)])

in the Source window.


Highlight:

y <- iris$Species

We will consider petal length and petal width as our two features.


Hence, we subset the iris dataset to take only these columns.


This variable contains the dependent feature species.

This is what our Support Vector Machine will try to predict.

Cursor in the Source window. Let us now create our Support Vector Machine model.
[RStudio]

svm_model <- svm(Species~Petal.Length+Petal.Width,data=iris)

summary(svm_model)

Type following command.
Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Click on the Run button.


Point to the output in the console window.

Run the command to see the output.


The output is shown in the console window.

Drag boundary to see the console window clearly. Drag boundary to see the console window clearly.
In the Console window.

Highlight:

summary(svm_model)


We have used the summary command to provide a summary of our SVM model.
Highlight SVM-Type: C-classification

Highlight SVM-Kernel: radial

Highlight cost: 1

Our SVM is performing a classification task.


We are using a radial kernel function.

To know more about it, please refer to the Additional Material on this tutorial page.


This is the cost of constraint violation.

It is set to 1 by default.

Highlight:

Number of Support Vectors: 37 ( 5 16 16 )


Highlight:

Number of Classes: 3

This gives us the number of support vectors that are used for each class.


The model uses 5 for setosa and 16 for both virginica and versicolor.


This tells us that our data has 3 classes.

Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Cursor in the Source window. Now we use our model to predict the class of species.
[RStudio]

pred <- predict(svm_model,x)

Type this command.
Highlight pred <- predict(svm_model,x)


Click on Save and Run buttons.

This command stores the results of the predict function in a variable pred.


Save and run the command.


We will use the pred variable to find the confusion matrix for our model.

[RStudio]

confusionMatrix(pred,y)

Click the Run button.

Type this command.


This command calculates the confusion matrix for our model.

It checks the predicted species values with the actual species values.


Run the command to see the output in the console window.

Drag boundary to see the console window clearly. Drag boundary to see the console window clearly.
Highlight output in console 50 Setosa samples have been correctly classified.


3 Versicolor samples have been incorrectly classified.


3 Virginica samples have been incorrectly classified.


Overall, the model has misclassified only 6 samples.

Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Cursor in the Source window. Let us plot our results.
[RStudio]

plot(svm_model, data=iris,formula = Petal.Length~Petal.Width)

Type the following command.
Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Highlight

library e1071


plot(svm_model, data=iris,formula = Petal.Length~Petal.Width)

The e1071 package contains a modified version of the plot function.


It takes an SVM model, to plot the data and show the formula.

Formula is used if there are multiple input variables.

Click the Run button.

Drag boundaries to see the plot window clearly.

Run the command to see the plot.

Drag boundaries to see the plot window clearly.

Highlight output in plot window The SVM boundaries have correctly classified most examples in our iris dataset.

This proves that our model is robust and accurate.

Let us now discuss the disadvantages of Support Vector Machine.
Show Slide

Disadvantages of SVM

SVM algorithm is time consuming for large data sets.If data is non-linear SVM may not perform well.
Only Narration. With this we come to the end of this tutorial.

Let us summarize.

Show Slide

Summary

In this tutorial we have learnt about:
  • Support Vector Machine or SVM
  • Advantages and applications of SVM.
  • Practical implementation of SVM in R.
  • Disadvantages of SVM.


Show Slide

Assignment

Now we will suggest the assignment for this Spoken Tutorial.
  • Perform SVM on Wine dataset.
  • This dataset can be found in the HDclassif package.
  • Install the package and import the dataset using the data() command.
  • Evaluate the model using a confusion matrix.


Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.


For more details please contact us.

Show Slide

Spoken Tutorial Forum to answer questions

Do you have questions in THIS Spoken Tutorial?

Choose the minute and second where you have the question.

Explain your question briefly.

Someone from the FOSSEE team will answer them.

Please visit this site.


Please post your time queries in this forum.
Show Slide

Forum to answer questions

Do you have any general/technical questions?

Please visit the forum given in the link.

Show Slide

Textbook Companion


The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.


We give certificates to those who do this.


For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India.
Show Slide

Thank You

This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.


Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey