Machine-Learning-using-R - old 2022/C4/Random-Forest-using-R/English
Title of the script: Random Forest using R
Author: Tanmay Srinath
Keywords: R, RStudio, machine learning, supervised, unsupervised, classification, random forest, bagging, decision tree, video tutorial.
Visual Cue | Narration |
Show slide
Opening Slide |
Welcome to this spoken tutorial on Random Forest using R. |
Show slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show slide
System Specifications |
This tutorial is recorded using,
|
Show slide
Prerequisites |
To follow this tutorial, the learner should know:
|
Show slide
Random Forest |
Let us begin with Random Forest.
|
Show slide
Benefits of Random Forest |
Now we will learn some benefits of using Random forests.
The final output is based on average or majority ranking. Hence the problem of overfitting is taken care of. |
Show slide
Bagging |
Random Forest uses a concept called bagging to improve its performance.
Let us learn more about it. |
Show slide
Bagging |
Bagging
|
Show slide
Bagging |
|
Show slide
Bagging in Random Forests |
|
Show Slide
Applications of Random Forest |
Now let us learn about a few applications of Random Forest.
https://archive.ics.uci.edu/ml /datasets/Online+Retail+II
https://archive.ics.uci.edu/ml/ datasets/breast+cancer+wisconsin+(diagnostic) |
Show Slide
Random Forest |
Now let us implement Random Forest on the iris dataset. |
Show slide
Download Files |
For this tutorial, we will use a script file RandomForest.R.
Please download this file from the Code files link of this tutorial. Make a copy and then use it for practising. |
[Computer screen]
Highlight RandomForest.R and the folder RandomForest. |
I have downloaded and moved this file to the RandomForest folder.
|
Let us switch to RStudio. | |
Click Script RandomForest.R
Point to RandomForest.R in RStudio. |
Open the script RandomForest.R in RStudio.
Script RandomForest.R opens in RStudio. |
[RStudio]
Highlight library(randomForest) |
We will be using the randomForest package for creating our model.
|
[RStudio]
library(randomForest) data(iris) |
Select and run these commands. |
Click in the Environment tab to load the iris dataset. | Click in the Environment tab to load the iris dataset. |
Cursor in the Source window. | Now let us create our Random Forest model. |
[RStudio]
set.seed(1007) model=randomForest(formula = Species ~ ., data = iris, ntree=1000) print(model) |
Type the following commands. |
Highlight set.seed(1007)
Highlight data = iris Highlight ntree=1000 |
We set a seed for reproducible results.
The remaining attributes are independent variables.
|
Click Save and Run buttons. | Save and run the commands.
|
Drag boundary to see the console window clearly. | Drag boundary to see the console window clearly. |
Highlight output in console
|
Our model’s specifications are displayed here.
|
Highlight output in console
|
This gives the number of variables sampled randomly at each split.
where p is the number of features. |
Highlight output in console
Highlight Confusion matrix |
This gives the OOB or out-of-bag error rate for our model.
It measures prediction error of random forests and decision trees. Our model has misclassified 8 samples, 3 in versicolor and 5 in virginica. |
Drag boundary to see the Source window clearly. | Drag boundary to see the Source window clearly. |
Cursor in the Source window. | Let us plot our model. |
[RStudio]
plot(model)
|
Type and run this command to get a plot.
|
Highlight Output in Plot window. | This plot, compares the number of trees against the error of the model.
This knowledge will be used to tune our Random Forest. |
Show Slide
Tuning a Random Forest |
Tuning a Random Forest
|
Drag boundaries to see the Source window clearly. | Let now us switch to RStudio.
|
[RStudio]
tuneRF(iris[,-5], iris[,5], stepFactor = 0.5, plot = TRUE, ntreeTry = 380, improve = 0.01)
Highlight stepFactor = 0.5 Highlight ntreeTry = 380
Click the Run button. |
Type this command.
This is the data that we pass as input. We are predicting the species, given here by iris.
|
Drag boundary to see the console window clearly.
Highlight output in console And plot window. |
Drag boundary to see the console window clearly.
We see that our model is performing just as well for 2 and 4 features. So we will stick with 2 features. |
Cursor on the interface. | Now let us create the optimised model. |
Drag boundary to see the Source window clearly. | Drag boundary to see the Source window clearly. |
[RStudio]
model_new=randomForest(Species~., data=iris, ntree = 380, mtry = 2) print(model_new) |
Type and run these commands. |
Highlight output in console | We can see that OOB error has dropped to 4%. |
Cursor in the Source window. | Let us create a variable importance plot. |
[RStudio]
varImpPlot(model_new)
Drag boundaries of the windows. |
Type this command.
Drag boundaries to see the plot clearly. We see that Petal Length is the most important variable for the Random Forest model. |
Only Narration. | With this we come to the end of this tutorial. Let us summarize. |
Show Slide
Summary |
In this tutorial we have learnt about:
|
Show Slide
Assignment |
Now we will suggest an assignment for this Spoken Tutorial.
Tune the model using tuneRF() command. |
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
|
Show Slide
Spoken Tutorial Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Do you have any general/technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
|
Show Slide
Acknowledgment |
Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India. |
Show Slide
Thank You |
This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.
Thank you for watching. |