Machine-Learning-using-R - old 2022/C3/Logistic-Regression-using-R/English

From Script | Spoken-Tutorial
Revision as of 11:24, 23 February 2023 by Madhurig (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of the script: Logistic Regression using R

Author: Tanmay Srinath

Keywords: R, RStudio, machine learning, supervised, unsupervised, classification, logistic regression, dataset, video tutorial.


Visual Cue Narration
Show Slide

Opening Slide

Welcome to this spoken tutorial on Logistic Regression using R.
Show Slide

Learning Objectives

In this tutorial, we will learn about:
  • Logistic Regression
  • Applications of Logistic Regression
  • Implementation of Logistic Regression in R
  • Drawbacks of Logistic Regression
Show Slide

System Specifications

This tutorial is recorded using,
  • Ubuntu Linux OS version 20.04
  • R version 4.1.2
  • RStudio version 1.4.1717
  • It is recommended to install R version 4.1.0 or higher.


Show Slide

Prerequisites

To follow this tutorial, the learner should know:
  • Basics of R programming.
  • Basics of Machine Learning using R.

If not, please access the relevant tutorials on this website.

Only Narration Let us learn what logistic regression is?
Show Slide

What is Logistic Regression?

  • It is a popular statistical model.
  • It is used for regression of binary responses and classification tasks.
  • It is used to predict a binary outcome based on a set of independent variables.
Only Narration Let us now learn a few practical applications of logistic regression.
Show Side

Applications of Logistic Regression

  • Predict if a patient has diabetes or not.
  • Classify various plant species.
  • Detect fraudulent credit card transactions.


Show Slide

Logistic Regression

Now we will implement logistic regression on the iris dataset.

Let us see how we can do it in RStudio.

Show Slide

Download Files

We will use a script file LogisticRegression.R

Please download this file from the Code files link of this tutorial.

Make a copy and then use it for practising.

[Computer screen]

Highlight:

LogisticRegression.R

Logistic Regression folder.

I have downloaded and moved this file to the Logistic Regression folder.

This folder located in the MLProject folder on my Desktop.


I have also set the Logistic Regression folder as my Working Directory.

Cursor in the Logistic Regression folder. Let us switch to RStudio.
Click LogisticRegression.R in RStudio

Point to LogisticRegression.R in RStudio.

Open the script LogisticRegression.R in RStudio.


Script LogisticRegression.R opens in RStudio.

Highlight:

library(stats4)

library(splines)

library(VGAM)

data(iris)

I have already installed the required libraries, so I will directly import them.


If you have not installed, please install the libraries before importing them.


Select and run these commands

Highlight

library(stats4)

library(splines)


Highlight

library(VGAM)

stats4 and splines packages are needed to load the VGAM package.


VGAM package is needed to perform logistic regression.

Click in the Environment tab to load the iris dataset. Click in the Environment tab to load the iris dataset.
Cursor in RStudio. Now we will scale our input variables.
[RStudio]

iris[1:4] <- scale(iris[1:4])

Click on Save and Run buttons.

Type this command.

This will scale our input variables.


Save and run the command.

Cursor in RStudio. Now let’s split our data into training and testing parts.
[RStudio]

set.seed(222)

trn_ind=sample(1:nrow(iris),

size=0.8*nrow(iris),replace=FALSE)

train <- iris[trn_ind, ]

test <- iris[-c(trn_ind),]


Type the following commands.
Highlight

set.seed(222)


Highlight

trn_ind=sample(1:nrow(iris),

size=0.8*nrow(iris),replace=FALSE)


Highlight

train <- iris[trn_ind, ]


Highlight

test <- iris[-c(trn_ind),]

This command sets a seed for reproducible results.


This command samples 80% out of 150 indices randomly.


It will be used for subsetting training data.

This command creates the train dataframe.


This command creates the test dataframe.

Highlight

set.seed(222)


Highlight

trn_ind=sample(1:nrow(iris),

size=0.8*nrow(iris),replace=FALSE)


Highlight

train <- iris[trn_ind, ]


Highlight test

<- iris[-c(trn_ind),]

Select the commands.

Click on the Run button.


Point to the Environment tab.


Select the commands and run them.


The data sets are shown in the Environment tab.

Click the test set and train set to load them in the Source window. Click the test set and train set to load them in the Source window.
Cursor in RStudio. Now we will train our model.
[RStudio]

model=vglm( Species ~ Petal.Length + Petal.Width,

family=multinomial, train)

Type the following command.


The command creates certain warnings.

These warnings don’t affect the output and can be ignored.

Highlight:

vglm()


Highlight:

Species ~ Petal.Length + Petal.Width


Highlight:

family=multinomial


Highlight train

This is the function that creates a logistic regression model.


This is the formula for our model.

We try to predict species based on petal length and petal width features.


This ensures that our model predicts probability for more than 2 classes.


This is the data that is used to train our model.

Click on Save and Run buttons. Save and run the command.

Here we can see the warings.

Cursor in the Source window. Now let us predict the output classes from test data.
[RStudio]

prob <- predict(model, test[,1:4], type="response")

pred <- apply(prob, 1, which.max)

Type and run these commands.


This command calculates the probability of each iris class for each test data entry.


This command selects the iris class with the highest probability value.

Highlight

prob <- predict(model, test[,1:4], type="response")

Highlight pred <- apply(prob, 1, which.max)

This predicts the probability of the given data belonging to a certain class.


This retrieves the predicted classes from the probabilities.

It is done by selecting the class with the highest probability using which.max.

Cursor in the Source window. Now let us measure the accuracy of our model.


This can be done by tabulating predicted species versus actual species.

[RStudio]

table(pred,test$Species)


Highlight output in console

Type and run this command.

The output is shown in the console window.


We can see that our model has misclassified only one sample.

This proves that the model is accurate and robust.

Show Slide

Drawbacks of Logistic Regression

Let us now understand the drawbacks of logistic regression.
  • Logistic regression works well when the response variables are independent.
  • Logistic regression doesn’t perform well when independent variables are non-linearly related.
Only Narration. With this we come to the end of this tutorial.

Let us summarise.

Show Slide

Summary

In this tutorial we have learnt about:
  • Logistic Regression
  • Applications of Logistic Regression
  • Implementation of Logistic Regression in R
  • Drawbacks of Logistic Regression.


Show Slide

Assignment

Now we will suggest an assignment for this Spoken Tutorial.
  • Apply logistic regression on the Wine dataset.
  • This dataset can be found in the HDclassif package.
  • Install the package and import the dataset using the data() command.
  • Measure its accuracy by tabulating the results.


Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.


For more details, please contact us.

Show Slide

Spoken Tutorial Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Do you have any general/technical questions?

Please visit the forum given in the link.

Show Slide

Textbook Companion

The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.

We give certificates to those who do this.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India.
Show Slide

Thank You

This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.

Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey