Difference between revisions of "Machine-Learning-using-R - old 2022/C4/Random-Forest-using-R/English"
(Created page with "'''Title of the script''':''' Random Forest''' '''using R''' '''Author''': Tanmay Srinath '''Keywords''': R, RStudio, machine learning, supervised, unsupervised, classifica...") |
Nancyvarkey (Talk | contribs) m (Nancyvarkey moved page Machine-Learning-using-R/C4/Random-Forest-using-R/English to Machine-Learning-using-R - old 2022/C4/Random-Forest-using-R/English without leaving a redirect: Archiving previous version because new version will be created) |
||
(One intermediate revision by one other user not shown) | |||
Line 23: | Line 23: | ||
* Benefits of '''Random Forest''' | * Benefits of '''Random Forest''' | ||
* Applications of '''Random Forest''' | * Applications of '''Random Forest''' | ||
− | * '''Random Forest '''on '''iris''' data set | + | * '''Random Forest ''' on '''iris''' data set |
* Tuning a '''Random Forest''' model. | * Tuning a '''Random Forest''' model. | ||
Line 31: | Line 31: | ||
'''System Specifications''' | '''System Specifications''' | ||
|| This tutorial is recorded using, | || This tutorial is recorded using, | ||
− | * '''Ubuntu Linux '''OS version 20.04 | + | * '''Ubuntu Linux ''' OS version 20.04 |
− | * '''R '''version 4.2.0 | + | * '''R ''' version 4.2.0 |
− | * '''RStudio version '''2022.02.3 | + | * '''RStudio version ''' 2022.02.3 |
Line 52: | Line 52: | ||
'''Random Forest''' | '''Random Forest''' | ||
− | || Let us begin with Random Forest. | + | || Let us begin with '''Random Forest'''. |
+ | |||
* It is a powerful and versatile '''supervised machine learning algorithm'''. | * It is a powerful and versatile '''supervised machine learning algorithm'''. | ||
− | * It grows and combines multiple '''decision trees '''to create a “'''forest'''”. | + | |
+ | * It grows and combines multiple '''decision trees ''' to create a “'''forest'''”. | ||
+ | |||
* It can be used for both '''classification ''' and '''regression''' problems. | * It can be used for both '''classification ''' and '''regression''' problems. | ||
|- | |- | ||
− | + | ||
− | + | ||
− | + | ||
|| '''Show slide''' | || '''Show slide''' | ||
'''Benefits of Random Forest''' | '''Benefits of Random Forest''' | ||
− | || Random forests are created from subsets of data. | + | ||Now we will learn some benefits of using Random forests. |
+ | |||
+ | |||
+ | Random forests are created from subsets of data. | ||
The final output is based on average or majority ranking. | The final output is based on average or majority ranking. | ||
Line 75: | Line 79: | ||
'''Bagging ''' | '''Bagging ''' | ||
− | || '''Random Forest '''uses a concept called '''bagging''' to improve its performance. | + | || '''Random Forest ''' uses a concept called '''bagging''' to improve its performance. |
Let us learn more about it. | Let us learn more about it. | ||
Line 103: | Line 107: | ||
|| | || | ||
* Bagging process is used to decorrelate the trees that make up a random forest. | * Bagging process is used to decorrelate the trees that make up a random forest. | ||
− | * A random sample of predictors are chosen for each split in the '''decision tree | + | * A random sample of predictors are chosen for each split in the '''decision tree'''. |
* Thus, the average of the '''trees''' will be less variable and more reliable. | * Thus, the average of the '''trees''' will be less variable and more reliable. | ||
|- | |- | ||
− | + | ||
− | + | ||
− | + | ||
|| '''Show Slide''' | || '''Show Slide''' | ||
'''Applications of Random Forest''' | '''Applications of Random Forest''' | ||
− | || | + | ||Now let us learn about a few applications of '''Random Forest'''. |
+ | |||
* It is used in customer segmentation. | * It is used in customer segmentation. | ||
Line 127: | Line 130: | ||
|- | |- | ||
− | || | + | ||'''Show Slide''' |
− | || Now let us implement '''Random Forest '''on the '''iris''' dataset. | + | |
+ | '''Random Forest''' | ||
+ | || Now let us implement '''Random Forest ''' on the '''iris''' dataset. | ||
|- | |- | ||
|| '''Show slide ''' | || '''Show slide ''' | ||
Line 135: | Line 140: | ||
|| For this tutorial, we will use a '''script''' file''' RandomForest.R'''. | || For this tutorial, we will use a '''script''' file''' RandomForest.R'''. | ||
− | Please download this file from the''' Code files''' link of this tutorial. | + | Please download this file from the ''' Code files''' link of this tutorial. |
Make a copy and then use it for practising. | Make a copy and then use it for practising. | ||
Line 148: | Line 153: | ||
− | I have also set the '''RandomForest '''folder as my '''Working Directory'''. | + | I have also set the '''RandomForest ''' folder as my '''Working Directory'''. |
|- | |- | ||
|| | || | ||
Line 182: | Line 187: | ||
|| Click in the '''Environment '''tab to load the iris dataset. | || Click in the '''Environment '''tab to load the iris dataset. | ||
|- | |- | ||
− | || | + | || Cursor in the Source window. |
|| Now let us create our '''Random Forest '''model. | || Now let us create our '''Random Forest '''model. | ||
|- | |- | ||
Line 222: | Line 227: | ||
− | The number of '''trees '''we are growing for this model is set to 1000. | + | The number of '''trees ''' that we are growing for this model is set to 1000. |
|- | |- | ||
|| Click '''Save''' and '''Run''' buttons. | || Click '''Save''' and '''Run''' buttons. | ||
Line 228: | Line 233: | ||
− | The output is shown in the console window. | + | The output is shown in the '''console ''' window. |
|- | |- | ||
|| Drag boundary to see the '''console '''window clearly. | || Drag boundary to see the '''console '''window clearly. | ||
Line 256: | Line 261: | ||
− | + | The default value for classification is '''sqrt(p)''', | |
where p is the number of features. | where p is the number of features. | ||
Line 270: | Line 275: | ||
It measures prediction error of random forests and decision trees. | It measures prediction error of random forests and decision trees. | ||
− | Our model has misclassified 8 samples, 3 in '''versicolor '''and 5 | + | Our model has misclassified 8 samples, 3 in '''versicolor '''and 5 in '''virginica'''. |
|- | |- | ||
|| Drag boundary to see the '''Source '''window clearly. | || Drag boundary to see the '''Source '''window clearly. | ||
|| Drag boundary to see the '''Source''' window clearly. | || Drag boundary to see the '''Source''' window clearly. | ||
|- | |- | ||
− | || | + | || Cursor in the Source window. |
− | || Let us plot our model | + | || Let us plot our model. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 290: | Line 295: | ||
|- | |- | ||
|| Highlight '''Output in Plot window'''. | || Highlight '''Output in Plot window'''. | ||
− | || This plot, compares the number of '''tree'''s against the '''error '''of the model. | + | || This plot, compares the number of '''tree'''s against the '''error ''' of the model. |
Line 299: | Line 304: | ||
|| '''Show Slide''' | || '''Show Slide''' | ||
− | ''' | + | '''Tuning a Random Forest''' |
− | || | + | || Tuning a Random Forest |
* Some times the default parameters of the model are not optimal. | * Some times the default parameters of the model are not optimal. | ||
* Thus we need to tune our '''Random Forest''' by changing a few parameters. | * Thus we need to tune our '''Random Forest''' by changing a few parameters. | ||
− | * In R, this is done using the '''tuneRF() '''function. | + | * In '''R''', this is done using the '''tuneRF() '''function. |
|- | |- | ||
||Drag boundaries to see the Source window clearly. | ||Drag boundaries to see the Source window clearly. | ||
− | || | + | || Let now us switch to '''RStudio'''. |
− | Drag boundaries to see the Source window clearly. | + | Drag boundaries to see the '''Source''' window clearly. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 334: | Line 339: | ||
This is the data that we pass as input. | This is the data that we pass as input. | ||
− | We are predicting the species, given here by iris | + | We are predicting the species, given here by iris. |
Line 343: | Line 348: | ||
− | This indicates the amount that the OOB error needs to improve. | + | This indicates the amount that the '''OOB''' error needs to improve. |
− | Run the command to see the output in the console | + | Run the command to see the output in the console and plot windows. |
|- | |- | ||
− | || Highlight output in console | + | ||Drag boundary to see the console window clearly. |
+ | |||
+ | Highlight output in console | ||
And plot window. | And plot window. | ||
− | || We see that our model is performing just as well for 2 and 4 features. | + | ||Drag boundary to see the console window clearly. |
+ | |||
+ | We see that our model is performing just as well for 2 and 4 features. | ||
So we will stick with 2 features. | So we will stick with 2 features. | ||
|- | |- | ||
− | || | + | || Cursor on the interface. |
|| Now let us create the optimised model. | || Now let us create the optimised model. | ||
|- | |- | ||
+ | ||Drag boundary to see the Source window clearly. | ||
+ | ||Drag boundary to see the Source window clearly. | ||
+ | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 366: | Line 378: | ||
|- | |- | ||
|| Highlight output in '''console''' | || Highlight output in '''console''' | ||
− | || We can see that OOB error has dropped to 4%. | + | || We can see that '''OOB''' error has dropped to 4%. |
|- | |- | ||
− | || | + | || Cursor in the Source window. |
− | || Let us create a variable importance plot | + | || Let us create a variable importance plot. |
|- | |- | ||
|| [RStudio] | || [RStudio] | ||
Line 378: | Line 390: | ||
Highlight output in plot window. | Highlight output in plot window. | ||
+ | |||
+ | Drag boundaries of the windows. | ||
||Type this command. | ||Type this command. | ||
− | Run the command to see the output | + | Run the command to see the output in the plot window. |
+ | Drag boundaries to see the plot clearly. | ||
− | We see that Petal Length is the most important variable for the '''Random Forest''' model. | + | We see that '''Petal Length''' is the most important variable for the '''Random Forest''' model. |
|- | |- | ||
||Only Narration. | ||Only Narration. | ||
Line 392: | Line 407: | ||
'''Summary''' | '''Summary''' | ||
− | || In this tutorial we have learnt: | + | || In this tutorial we have learnt about: |
+ | |||
* '''Random Forest''' | * '''Random Forest''' | ||
* '''Bagging''' | * '''Bagging''' | ||
Line 408: | Line 424: | ||
− | Create a '''Random Forest''' for '''PimaIndiansDiabetes '''dataset. | + | Create a '''Random Forest''' for '''PimaIndiansDiabetes ''' dataset. |
− | Tune the model using''' tuneRF()''' command. | + | Tune the model using ''' tuneRF()''' command. |
|- | |- | ||
|| '''Show slide''' | || '''Show slide''' |
Latest revision as of 08:32, 9 October 2023
Title of the script: Random Forest using R
Author: Tanmay Srinath
Keywords: R, RStudio, machine learning, supervised, unsupervised, classification, random forest, bagging, decision tree, video tutorial.
Visual Cue | Narration |
Show slide
Opening Slide |
Welcome to this spoken tutorial on Random Forest using R. |
Show slide
Learning Objectives |
In this tutorial, we will learn about:
|
Show slide
System Specifications |
This tutorial is recorded using,
|
Show slide
Prerequisites |
To follow this tutorial, the learner should know:
|
Show slide
Random Forest |
Let us begin with Random Forest.
|
Show slide
Benefits of Random Forest |
Now we will learn some benefits of using Random forests.
The final output is based on average or majority ranking. Hence the problem of overfitting is taken care of. |
Show slide
Bagging |
Random Forest uses a concept called bagging to improve its performance.
Let us learn more about it. |
Show slide
Bagging |
Bagging
|
Show slide
Bagging |
|
Show slide
Bagging in Random Forests |
|
Show Slide
Applications of Random Forest |
Now let us learn about a few applications of Random Forest.
https://archive.ics.uci.edu/ml /datasets/Online+Retail+II
https://archive.ics.uci.edu/ml/ datasets/breast+cancer+wisconsin+(diagnostic) |
Show Slide
Random Forest |
Now let us implement Random Forest on the iris dataset. |
Show slide
Download Files |
For this tutorial, we will use a script file RandomForest.R.
Please download this file from the Code files link of this tutorial. Make a copy and then use it for practising. |
[Computer screen]
Highlight RandomForest.R and the folder RandomForest. |
I have downloaded and moved this file to the RandomForest folder.
|
Let us switch to RStudio. | |
Click Script RandomForest.R
Point to RandomForest.R in RStudio. |
Open the script RandomForest.R in RStudio.
Script RandomForest.R opens in RStudio. |
[RStudio]
Highlight library(randomForest) |
We will be using the randomForest package for creating our model.
|
[RStudio]
library(randomForest) data(iris) |
Select and run these commands. |
Click in the Environment tab to load the iris dataset. | Click in the Environment tab to load the iris dataset. |
Cursor in the Source window. | Now let us create our Random Forest model. |
[RStudio]
set.seed(1007) model=randomForest(formula = Species ~ ., data = iris, ntree=1000) print(model) |
Type the following commands. |
Highlight set.seed(1007)
Highlight data = iris Highlight ntree=1000 |
We set a seed for reproducible results.
The remaining attributes are independent variables.
|
Click Save and Run buttons. | Save and run the commands.
|
Drag boundary to see the console window clearly. | Drag boundary to see the console window clearly. |
Highlight output in console
|
Our model’s specifications are displayed here.
|
Highlight output in console
|
This gives the number of variables sampled randomly at each split.
where p is the number of features. |
Highlight output in console
Highlight Confusion matrix |
This gives the OOB or out-of-bag error rate for our model.
It measures prediction error of random forests and decision trees. Our model has misclassified 8 samples, 3 in versicolor and 5 in virginica. |
Drag boundary to see the Source window clearly. | Drag boundary to see the Source window clearly. |
Cursor in the Source window. | Let us plot our model. |
[RStudio]
plot(model)
|
Type and run this command to get a plot.
|
Highlight Output in Plot window. | This plot, compares the number of trees against the error of the model.
This knowledge will be used to tune our Random Forest. |
Show Slide
Tuning a Random Forest |
Tuning a Random Forest
|
Drag boundaries to see the Source window clearly. | Let now us switch to RStudio.
|
[RStudio]
tuneRF(iris[,-5], iris[,5], stepFactor = 0.5, plot = TRUE, ntreeTry = 380, improve = 0.01)
Highlight stepFactor = 0.5 Highlight ntreeTry = 380
Click the Run button. |
Type this command.
This is the data that we pass as input. We are predicting the species, given here by iris.
|
Drag boundary to see the console window clearly.
Highlight output in console And plot window. |
Drag boundary to see the console window clearly.
We see that our model is performing just as well for 2 and 4 features. So we will stick with 2 features. |
Cursor on the interface. | Now let us create the optimised model. |
Drag boundary to see the Source window clearly. | Drag boundary to see the Source window clearly. |
[RStudio]
model_new=randomForest(Species~., data=iris, ntree = 380, mtry = 2) print(model_new) |
Type and run these commands. |
Highlight output in console | We can see that OOB error has dropped to 4%. |
Cursor in the Source window. | Let us create a variable importance plot. |
[RStudio]
varImpPlot(model_new)
Drag boundaries of the windows. |
Type this command.
Drag boundaries to see the plot clearly. We see that Petal Length is the most important variable for the Random Forest model. |
Only Narration. | With this we come to the end of this tutorial. Let us summarize. |
Show Slide
Summary |
In this tutorial we have learnt about:
|
Show Slide
Assignment |
Now we will suggest an assignment for this Spoken Tutorial.
Tune the model using tuneRF() command. |
Show slide
About the Spoken Tutorial Project |
The video at the following link summarises the Spoken Tutorial project.
Please download and watch it. |
Show slide
Spoken Tutorial Workshops |
We conduct workshops using Spoken Tutorials and give certificates.
|
Show Slide
Spoken Tutorial Forum to answer questions |
Please post your timed queries in this forum. |
Show Slide
Forum to answer questions |
Do you have any general/technical questions?
Please visit the forum given in the link. |
Show Slide
Textbook Companion |
The FOSSEE team coordinates the coding of solved examples of popular books and case study projects.
|
Show Slide
Acknowledgment |
Spoken Tutorial and FOSSEE projects are funded by the Ministry of Education Govt of India. |
Show Slide
Thank You |
This tutorial is contributed by Tanmay Srinath and Madhuri Ganapathi from IIT Bombay.
Thank you for watching. |