Machine-Learning-using-R - old 2022/C3/Quadratic-Discriminant-Analysis-in-R/English

From Script | Spoken-Tutorial
Jump to: navigation, search

Title of the script: Quadratic Discriminant Analysis in R

Author: Tanmay Srinath

Keywords: R, RStudio, machine learning, QDA, quadratic discriminant analysis, LDA, heteroscedastic gaussian data, MASS library, video tutorial.


Visual Cue Narration
Show slide

Opening Slide

Welcome to this spoken tutorial on Quadratic Discriminant Analysis in R.
Show slide

Learning Objectives

In this tutorial, we will learn about:
  • Quadratic Discriminant Analysis or QDA.
  • Differences between linear discriminant analysis and quadratic discriminant analysis.
  • When to use quadratic discriminant analysis.
  • Implementation of quadratic discriminant analysis in R.
Show slide

System Specifications

This tutorial is recorded using,
  • Ubuntu Linux OS version 20.04
  • R version 4.1.2
  • RStudio version 1.4.1717.

It is recommended to install R version 4.1.0 or higher.

Show slide

Prerequisites

https://spoken-tutorial.org

To follow this tutorial, the learner should know:
  • Basic programming in R.
  • Machine Learning in R.

If not, please access the relevant tutorials on this website.

Show slide

Quadratic Discriminant Analysis

Quadratic discriminant analysis .
  • It is the discriminant analysis that is performed on heteroscedastic gaussian data.
  • It is used when the covariance structures of the classes are different.


Show Slide

Differences between LDA and QDA

Differences between LDA and QDA.
  • LDA assumes that each class has the same covariance matrix.
  • On the other hand, QDA assumes that each class has a different covariance matrix.
  • LDA constructs a linear boundary, while QDA constructs an elliptical boundary.
  • When the covariance matrices of different classes are the same, QDA reduces to LDA.


Show Slides

When to use QDA

QDA is primarily used when data is multivariate gaussian.
Only Narration Let us see how we can do it in RStudio.
Show slide

Download Files

We will use a script file QDA.R

Please download this file from the Code files link of this tutorial.

Make a copy and then use it for practising.

[Computer screen]

Highlight


QDA.R and the folder QDA folder.

I have downloaded and moved these files to the QDA folder.

This folder is located in the MLProject folder on my Desktop.

I have also set the QDA folder as my Working Directory.

Cursor in QDA folder. Let us switch to RStudio.
Double-click QDA.R in Rstudio.

Point to QDA.R in RStudio.

Let us open the script QDA.R in RStudio.

Script QDA.R opens in RStudio.

Highlight

library(MASS)

data(iris)


Highlight MASS


Click in the Environment tab to load the iris dataset.

The MASS library contains the qda() function.


Run these commands to import the library and the dataset.


Click in the Environment tab to load the iris dataset.

Cursor on iris dataset. Now let us split our data into training and testing.
[RStudio]

set.seed(1)

trn_ind=sample

(1:nrow(iris),size=0.7*nrow(iris),

replace=FALSE)

train <- iris[trn_ind, ]

test <- iris[-c(trn_ind), ]

In the Source window, type these commands.
Highlight

set.seed(1)


Highlight

trn_ind=sample(1:nrow(iris),

size=0.7*nrow(iris),replace=FALSE)

train <- iris[trn_ind, ]

test <- iris[-c(trn_ind), ]

We set a seed for reproducible results.

We sample 70% of the data from iris for training and 30% for testing.

Select the commands and Click the Run button.

Click the test set and train set.

Select the commands and run them.

The datasets are shown in the Environment tab


Click the test set and train set to load them in the Source window.

Point to iris dataset. Now we will perform QDA on the iris dataset.
[RStudio]

model <- qda(Species~Petal.Length+

Petal.Width, data=train)

model


Highlight

model <-

qda(Species~Petal.Length+Petal.Width, data=train)


Click Save and Click Run buttons.

In the Source window type these commands.

This is the command that we use to create the model.

It compares species against petal length and petal width.


Save and run the commands.


The output is shown in the console.

Drag boundary to see the console window. Drag boundary to see the console window clearly.
Highlight output in console


Highlight Prior probabilities of group


Highlight Group means

These are the parameters of our model.

This indicates the composition of the training data.

These indicate the mean values of the predictor variables for each species.

Drag boundary to see the Source window. Drag boundary to see the Source window clearly.
Cursor in the Source window. Let us now use our model to make predictions on test data.
[RStudio]

predicted <- predict(model, test)

names(predicted)


Highlight

predicted <- predict(model, test)


Highlight

names(predicted)


Click on Save and Run buttons.

In the Source window type these commands.

This command is used to predict the species from the test data.

This command gives us the contents of the predicted variable.


Save and run the commands.

Highlight output in console

Highlight class


Highlight posterior

This shows us that our predicted variable has two components.

This is the predicted class.


This gives the posterior probability of an observation belonging to each class.

Cursor in the Source window. Let us now compute the accuracy of our model.
[RStudio]

table(test$Species,predicted$class)

Highlight


output in console

In the Source window type this command.


Save and run the command.

It will tabulate the original species against the predicted species.


Our model has no erroneous predictions.

This shows that QDA has successfully separated the 3 species of iris dataset.

Only Narration. With this we come to the end of this tutorial.

Let us summarise.

Show Slide

Summary

In this tutorial we have learnt about:
  • Quadratic Discriminant Analysis or QDA.
  • Differences between linear discriminant analysis and quadratic discriminant analysis.
  • When to use quadratic discriminant analysis.
  • Implementation of quadratic discriminant analysis in R.


Show Slide

Assignment

Here is an assignment for you.
  • Apply QDA on the Wine dataset.
  • This dataset can be found in the HDclassif package.
  • Install the package and import the dataset using the data() command
  • Measure the accuracy of the model.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.


For more details, please contact us.

Show Slide

Spoken Tutorial Forum to answer questions

Do you have questions in THIS Spoken Tutorial?

Choose the minute and second where you have the question.

Explain your question briefly.

Someone from the FOSSEE team will answer them.

Please visit this site.

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Do you have any general/technical questions?

Please visit the forum given in the link.

Show Slide

Textbook Companion

The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.

We give certificates to those who do this.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India.
Show Slide

Thank You

This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.

Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey