Machine-Learning-using-R - old 2022/C3/Linear-Discriminant-Analysis-in-R/English
Title of the script: Linear Discriminant Analysis in R
Author: Tanmay Srinath
Keywords: R, RStudio, machine learning, dimensionality reduction, LDA, confusion matrix, dataset, gaussian, Bayes classifier, Homoscedasticity, heteroscedastic, QDA, spoken tutorial, video tutorial.
Visual Cue | Narration |
Show Slide
Opening Slide |
Welcome to this spoken tutorial on Linear Discriminant Analysis in R. |
Show Slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show Slide
System Specifications |
This tutorial is recorded using,
It is recommended to install R version 4.1.0 or higher. |
Show Slide
Prerequisites |
To follow this tutorial, the learner should know:
If not, please access the relevant tutorials on R on this website. |
Show Slide
Linear Discriminant Analysis |
Linear Discriminant Analysis.
|
Show Slide
Applications of LDA |
LDA is primarily a multi-class classifier. |
Let us now understand the assumptions of LDA. | |
Show Slide
Assumptions of LDA |
|
Show Slide
Robustness of LDA |
|
Show Slide
LDA |
Now let us implement LDA on the iris dataset. |
Show Slide
Download Files |
We will use a script file LDA.R
Please download this file from the Code files link of this tutorial. Make a copy and then use it for practising. |
[Computer screen]
Highlight LDA.R and the folder LDA |
I have downloaded and moved this file to the LDA folder.
|
Show Slide
LDA Classifier Model |
In this tutorial, we will create a LDA classifier model on the iris dataset. |
Let us switch to RStudio. | |
Click LDA.R in RStudio.
|
Open the script LDA.R in RStudio.
|
Highlight library(MASS)
Highlight library(e1071)
Highlight library(caret) |
The MASS package contains the lda() function that we use for our analysis.
I will directly import them.
|
[RStudio]
library(MASS) library(e1071) library(ggplot2) library(caret) |
Select and run these commands to import the requisite packages. |
Highlight data(iris)
|
Run this command to import the iris dataset .
In the Environment tab, double click on iris values to load the data.
|
Point to the dataset. | Now we split our dataset into training and testing data. |
[RStudio]
set.seed(222) trn_ind=sample(1:nrow(iris), size=0.8*nrow(iris), replace=FALSE) |
Click on LDA.R in the Source window
|
Highlight set.seed(222)
sample(1:nrow(iris), size=0.8*nrow(iris),replace=FALSE)
Select the commands and click the Run button. |
First we set a seed for reproducible results.
It will be an 80% random sample of the total number of rows. We are sampling without replacement. This is done so that the model doesn’t train on duplicate rows.
|
Point to Environment tab. | The vector is shown in the Environment tab. |
Point to Environment tab. | We use the indices that we previously generated to obtain our train-test split. |
[RStudio]
train <- iris[trn_ind, ] test <- iris[-c(trn_ind), ] |
In the Source window type these commands. |
Highlight train <- iris[trn_ind]
Highlight test <- iris[-c(trn_ind), ] |
This command creates training data, consisting of 120 unique rows.
|
Select the commands and click the Run button.
|
Select the commands and run them.
Click the test set and train set to load them in the Source window. |
Cursor in the panel. | Let us train our LDA model. |
[RStudio]
lda_model <- lda(Species~., data=train) lda_model |
In the Source window, type these commands. |
Highlight
lda_model <- lda(Species~., data=train) Click on Save and Run buttons. |
We pass two parameters to the lda() function.
Save and run these commands.
|
Drag boundary to see the console window. | Drag boundary to see the console window clearly. |
Highlight output in console. | Our lda_model provides us a lot of information.
Let us go through them one at a time. |
Highlight Prior probabilities of groups .
Highlight Coefficients of linear discriminants . Highlight Proportion of trace. |
These explain the distribution of classes in the training dataset.
|
Drag boundary to see the Source window. | Drag boundary to see the Source window. |
Cursor in the window. | Let us use this model to make predictions on the testing data. |
[RStudio]
predicted_values <- predict(lda_model, test) |
In the Source window type this command and run it.
Let us check what predicted_values contain. |
Click the predicted_values data in the Environment tab.
|
Click the predicted_values data in the Environment tab.
|
[RStudio]
head(predicted_values$class) head(predicted_values$posterior) head(predicted_values$x) |
In the Source window type these commands.
The output is seen in the console window. |
Drag boundary to see the console window clearly. | Drag boundary to see the console window clearly. |
Highlight output of
head(predicted_value$class) in console
head(predicted_value$posterior) in console
head(predicted_value$x) in console |
It contains the type of species that the model has predicted for each observation.
It contains posterior probability of the observation belonging to each class. This contains the linear discriminants for each observation. |
Drag boundary to see the Source window clearly. | Drag boundary to see the Source window clearly. |
Cursor in the source window. | Now we will measure the performance of our model using the Confusion Matrix. |
[RStudio]
confusionMatrix (predicted_values$class,test$Species)
|
In the Source window type this command.
Save and run the command. |
Drag boundary to see the console window clearly. | Drag boundary to see the console window clearly. |
Highlight output in console | Our model has misclassified just one observation.
|
Drag boundary to see the Source window clearly. | Drag boundary to see the Source window clearly. |
Let us visualise how well our model separates different classes. | |
[RStudio]
lda_plot <- cbind(train, predict(lda_model)$x)
geom_point(aes(color = Species)) |
In the Source window , type these commands. |
[RStudio]
Highlight lda_plot <- cbind(train, predict(lda_model)$x)
ggplot(lda_plot, aes(LD1, LD2)) + geom_point(aes(color = Species)) Select the commands and run them.
|
This command creates the data for our plot.
|
[RStudio]
Highlight output in Plots |
We can see that our model has separated almost all the data points clearly. |
With this we come to end of this tutorial. Let us summarise. | |
Show Slide
Summary |
In this tutorial we have learnt about:
|
Show Slide
Assignment |
Now we will suggest an assignment.
Perform LDA on the in-built PlantGrowth dataset.
|
Show Slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show Slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
|
Show Slide
Spoken Tutorial Forum to answer questions Do you have questions in THIS Spoken Tutorial? Choose the minute and second where you have the question.Explain your question briefly. Someone from the FOSSEE team will answer them. Please visit this site. |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Do you have any general or technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
We give certificates to those who do this. For more details, please visit these sites. |
Show Slide
Acknowledgment |
The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India. |
Show Slide
Thank You |
This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.
Thank you for watching. |