Machine-Learning-using-R - old 2022/C4/Decision-Tree-using-R/English

From Script | Spoken-Tutorial
Revision as of 17:53, 23 March 2023 by Madhurig (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title of the script: Decision Tree using R

Author: Tanmay Srinath

Keywords: R, RStudio, machine learning, supervised, unsupervised, classification, regression, decision tree, video tutorial

Visual Cue Narration
Show slide

Opening Slide

Welcome to this spoken tutorial on Decision Tree using R.
Show slide

Learning Objectives

In this tutorial, we will learn about:
  • Decision Tree
  • Advantages of Decision Trees.
  • Application of Decision Trees.
  • Practical implementation of Decision Trees in R.
  • Disadvantages of Decision Trees.


Show slide

System Specifications

This tutorial is recorded using,
  • Ubuntu Linux OS version 20.04
  • R version 4.2.0
  • RStudio version 2022.02.3
Show slide

Prerequisites

To follow this tutorial, the learner should know:
  • Basics of R programming
  • Basics of Machine Learning.

If not, please access the relevant tutorials on this website.

Show slide

Decision Tree

Now let’s learn about Decision Tree.
  • It is a visualisation technique.
  • It can be used for both classification and regression tasks.
  • It works on non-gaussian or discrete data.
Show slide

Advantages of Decision Tree

Next let’s see the advantages of Decision tree.
  • Decision trees are easy to understand.
  • They are mirrors of human decision making.
Show slide

Applications of Decision Tree

Now let us learn about the applications of decision tree.
  • Predicting salaries of employees.
  • Diagnosis of diseases and ailments.
Show Slide

Decision Tree

Now we will construct a Decision Tree on the built-in iris dataset.

It will be used to predict the species of a given data.

Show slide

Download Files

For this tutorial, I will use a script file DecisionTree.R.

Please download this file from the Code files link of this tutorial.

Make a copy and then use it for practising.

[Computer screen]

Highlight DecisionTree.R and the folder Decision Tree.

I have downloaded and moved this file to the Decision Tree folder.

This folder is located in the MLProject folder on my Desktop.

I have also set the Decision Tree folder as my Working Directory.

Let us switch to RStudio.
Double click Decision Tree.R on RStudio

Point to DecisionTree.R in RStudio.

Open the script DecisionTree.R in RStudio.


Script DecisionTree.R opens in RStudio.

[RStudio]

Highlight rpart

Highlight rpart.plot

We will use rpart package to construct a decision tree.

We will use rpart.plot package for plotting the decision tree.

[RStudio]

library(rpart)

library(rpart.plot)

data("iris")

Please install the rpart.plot package before importing.


Select and run these commands.

Click in the Environment tab to load the iris dataset. Click in the Environment tab to load the iris dataset.
Let us now create our Decision Tree model.
[RStudio]

set.seed(121)

tree <- rpart(Species~Petal.Length+Petal.Width,data=iris,method = 'class', control=rpart.control(minsplit = 5, usesurrogate = 2))


Highlight: set.seed(121)

Highlight


Species~Petal.Length+Petal.Width


Highlight data=iris


Highlight method = 'class'


Highlight control=rpart.control(minsplit = 5, usesurrogate = 2)


Click the Run button.

Point to Environment tab.

Type these commands.


I will drag the horizontal scroll bar to show the full command.

We set a seed for reproducible results.

This is the formula we use for this model.

This uses the entire iris dataset to train our model.

This tells our model that we are doing a classification task.

This parameter controls how our decision tree is designed.

Run the command.

Tree data is shown in the Environment tab.

Point to the model. Now let us plot our model.
[RStudio]

rpart.plot(tree, box.col=c("red", "green"))


Highlight output in plot window

Type and run this command.

Our decision tree has 4 levels.

The green nodes indicate the class that has been predicted.

Now we use our model to predict the class of species.
[RStudio]

pred <- predict(tree,newdata=iris[,-c(1,2,5)],type = 'class')


Highlight pred <- predict(tree,newdata=iris[,-c(1,2,5)],type = 'class')

Type and run this command.


This command stores the results of the predict function in a variable pred. We will use this variable to find the accuracy of our model.

[RStudio]

table(pred,iris$Species)

Type and run this command.


This tabulates the results by comparing the predicted species with the actual species.

Highlight output in console. 50 Setosa samples have been correctly classified.


3 Versicolor samples have been incorrectly classified.


1 Virginica sample has been incorrectly classified.


Overall, the model has misclassified only 4 samples.

Let us now discuss the disadvantages of Decision Tree.
Show Slide

Disadvantages of Decision tree

  • Decision trees have reduced predictive power.
  • They can be non-robust.
With this we come to the end of this tutorial.

Let us summarize.

Show Slide

Summary

In this tutorial we have learnt about:
  • Decision Tree
  • Advantages of Decision Trees.
  • Application of Decision Trees.
  • Practical implementation of Decision Trees in R.
  • Disadvantages of Decision Trees.
Now we will suggest the assignment for this Spoken Tutorial.
Show Slide

Assignment

  • Perform Decision Tree on PimaIndiansDiabetes dataset
  • Install and import the mlbench package.
  • Run the data("PimaIndiansDiabetes2") command to load the dataset.
  • Tabulate the results.
Show slide

About the Spoken Tutorial Project

The video at the following link summarises the Spoken Tutorial project.

Please download and watch it.

Show slide

Spoken Tutorial Workshops

We conduct workshops using Spoken Tutorials and give certificates.

For more details, please contact us.

Show Slide

Spoken Tutorial Forum to answer questions

Please post your timed queries in this forum.
Show Slide

Forum to answer questions

Do you have any general/technical questions?

Please visit the forum given in the link.

Show Slide

Textbook Companion

The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.

We give certificates to those who do this.

For more details, please visit these sites.

Show Slide

Acknowledgment

The Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India.
Show Slide

Thank You

This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.

Thank you for watching.

Contributors and Content Editors

Madhurig, Nancyvarkey