Machine-Learning-using-R - old 2022/C3/Support-Vector-Machine-using-R/English

From Script | Spoken-Tutorial
Revision as of 18:40, 8 February 2023 by Madhurig (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of the script: Support Vector Machine using R

Author: Tanmay Srinath

Keywords: R, RStudio, support vector machine, machine learning, console, dataset, confusion matrix, radial kernel, video tutorial.


Visual Cue Narration
Show slide

Opening Slide

Welcome to this spoken tutorial on Support Vector Machine using R.
Show slide

Learning Objectives


In this tutorial, we will learn about:
  • Support Vector Machine or SVM.
  • Advantages and applications of SVM .
  • Practical implementation of SVM in R.
  • Disadvantages of SVM.


Show slide

System Specifications

This tutorial is recorded using,
  • Ubuntu Linux OS version 20.04
  • R version 4.1.2
  • RStudio version 1.4.1717

It is recommended to install R version 4.1.0 or higher.

Show slide

Prerequisites

To follow this tutorial, the learner should know:
  • Basic programming in R.
  • Basics of Machine Learning.

If not, please access the relevant tutorials on this website.

Show slide

Support Vector Machine

Support Vector Machine.
  • It is a supervised machine learning technique.
  • SVM constructs a line to separate 2-dimensional data into classes.
  • For n-dimensional data, it constructs a hyperplane to separate the classes.
Show slide

Advantages of SVM

Advantages of SVM.
  • SVM is relatively memory efficient.
  • SVM can be used for both classification and regression.
Show slide

Applications of SVM

Applications of SVM.
  • Image classification.
  • Recognition of handwritten characters.
  • Face detection.
The following slides contain links to 3 datasets.

These datasets correspond to each application of SVM.

One can use them to practically implement the applications.

Show Slide

Datasets for Implementation

  • Image classification:

https://github.com/zalandoresearch/fashion-mnist

  • Handwritten characters:

https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset

  • Face detection:

https://archive.ics.uci.edu/ml/datasets/cmu+face+images

Show slide

Support Vector Machine

Now we will perform Support

vector Machine on the built-in iris dataset.

Show slide

Download Files

For this tutorial, I will use a script file SVM.R.

Please download this file from the Code files link of this tutorial.

Make a copy and then use it for practising.

[Computer screen]

Highlight SVM.R and the folder SVM.

I have downloaded and moved this file to the SVM folder.

The SVM folder is in the MLProject folder on the Desktop.


I have also set the SVM folder as my Working Directory.

Let us switch to RStudio.
Double click SVM.R to open in Rstudio.

Point to SVM.R in RStudio.

Open the script SVM.R in RStudio.

Script SVM.R opens in RStudio.

[Rstudio] highlight

library("e1071")

library("caret")


Select and run these commands to import the packages.


Highlight data(“iris”)

Double click in the Environment tab to load the iris dataset.


Run this command to import the iris dataset.

In the Environment tab double click on iris values to load the data.

Then click the iris data to load the dataset in the Source window.

[RStudio]

Highlight e1071

Highlight caret

We will use the e1071 library for the svm() function.

We will use the caret package to create the confusion matrix.

Cursor on the dataset. We will now split our dataset into dependent and independent variables.
[Rstudio] Highlight.

x <- subset(iris[,-c(1,2,5)])

y <- iris$Species

Type these commands.

Now run these commands.

Click the x data frame in the Environment tab In the Environment tab click the x data to load the data set in the Source window.
Point to the Petal.Length and Petal.Width columns.


Highlight:

x <- subset(iris[,-c(1,2,5)])

in the Source window.


Highlight:

y <- iris$Species

We will consider petal length and petal width as our two features.


Hence, we subset the iris dataset to take only these columns.


This variable contains the dependent feature species.

This is what our Support Vector Machine will try to predict.

Cursor in the Source window. Let us now create our Support Vector Machine model.
[RStudio]

svm_model <- svm(Species~Petal.Length+Petal.Width,data=iris)

summary(svm_model)

Type following command.
Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Click on the Run button.


Point to the output in the console window.

Run the command to see the output.


The output is shown in the console window.

Drag boundary to see the console window clearly. Drag boundary to see the console window clearly.
In the Console window.

Highlight:

summary(svm_model)


We have used the summary command to provide a summary of our SVM model.
Highlight SVM-Type: C-classification

Highlight SVM-Kernel: radial

Highlight cost: 1

Our SVM is performing a classification task.


We are using a radial kernel function.

To know more about it, please refer to the Additional Material on this tutorial page.


This is the cost of constraint violation.

It is set to 1 by default.

Highlight:

Number of Support Vectors: 37 ( 5 16 16 )


Highlight:

Number of Classes: 3

This gives us the number of support vectors that are used for each class.


The model uses 5 for setosa and 16 for both virginica and versicolor.


This tells us that our data has 3 classes.

Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Cursor in the Source window. Now we use our model to predict the class of species.
[RStudio]

pred <- predict(svm_model,x)

Type this command.
Highlight pred <- predict(svm_model,x)


Click on Save and Run buttons.

This command stores the results of the predict function in a variable pred.


Save and run the command.


We will use the pred variable to find the confusion matrix for our model.

[RStudio]

confusionMatrix(pred,y)

Click the Run button.

Type this command.


This command calculates the confusion matrix for our model.

It checks the predicted species values with the actual species values.


Run the command to see the output in the console window.

Drag boundary to see the console window clearly. Drag boundary to see the console window clearly.
Highlight output in console 50 Setosa samples have been correctly classified.


3 Versicolor samples have been incorrectly classified.


3 Virginica samples have been incorrectly classified.


Overall, the model has misclassified only 6 samples.

Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Cursor in the Source window. Let us plot our results.
[RStudio]

plot(svm_model, data=iris,formula = Petal.Length~Petal.Width)

Type the following command.
Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Highlight

library e1071


plot(svm_model, data=iris,formula = Petal.Length~Petal.Width)

The e1071 package contains a modified version of the plot function.


It takes an SVM model, to plot the data and show the formula.

Formula is used if there are multiple input variables.

Click the Run button.

Drag boundaries to see the plot window clearly.

Run the command to see the plot.

Drag boundaries to see the plot window clearly.

Highlight output in plot window The SVM boundaries have correctly classified most examples in our iris dataset.

This proves that our model is robust and accurate.

Let us now discuss the disadvantages of Support Vector Machine.
Show Slide

Disadvantages of SVM

SVM algorithm is time consuming for large data sets.If data is non-linear SVM may not perform well.
Only Narration. With this we come to the end of this tutorial.

Let us summarize.

Show Slide

Summary

In this tutorial we have learnt about:
  • Support Vector Machine or SVM
  • Advantages and applications of SVM.
  • Practical implementation of SVM in R.
  • Disadvantages of SVM.


Show Slide

Assignment

Now we will suggest the assignment for this Spoken Tutorial.
  • Perform SVM on Wine dataset.
  • This dataset can be found in the HDclassif package.
  • Install the package and import the dataset using the data() command.
  • Evaluate the model using a confusion matrix.


Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.


For more details please contact us.

Show Slide

Spoken Tutorial Forum to answer questions

Do you have questions in THIS Spoken Tutorial?

Choose the minute and second where you have the question.

Explain your question briefly.

Someone from the FOSSEE team will answer them.

Please visit this site.


Please post your time queries in this forum.
Show Slide

Forum to answer questions

Do you have any general/technical questions?

Please visit the forum given in the link.

Show Slide

Textbook Companion


The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.


We give certificates to those who do this.


For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India.
Show Slide

Thank You

This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.


Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey