Difference between revisions of "Python-for-Machine-Learning/C3/Random-Forest/English"

Revision as of 15:38, 16 July 2025

Visual Cue	Narration
Show slide: Welcome	Welcome to the Spoken Tutorial on Random Forest.
Show Slide: Learning Objectives	In this tutorial, we will learn about Ensemble Learning and Random Forest
Show Slide:	To Record this tutorial, I am using Ubuntu Linux operating system 24.04 Jupyter Notebook IDE
Show Slide: Prerequisite	To follow this tutorial, The learner must have basic knowledge of Python. For pre-requisite Python tutorials, please visit this website.
Show Slide: Code files	The files used in this tutorial are provided in the Code files link. Please download and extract the files. Make a copy and then use them while practicing.
Show slide: Ensemble Learning	Ensemble learning combines multiple models to improve performance. It reduces errors by averaging predictions from different models. One popular ensemble method is Random Forest, which uses decision trees.
Show slide: Random Forest	Random Forest builds multiple trees and takes the majority vote. This improves accuracy and reduces overfitting compared to a single tree. It is widely used for classification and regression tasks.
Point to the RandomForest.ipynb	RandomForest.ipynb is the python notebook file for the demonstration of Random Forest
Press Ctrl,Alt and T keys Type conda activate ml Press Enter	Let us open the Linux terminal. Press Ctrl, Alt and T keys together. Activate the machine learning environment by typing conda space activate space ml Press Enter.
Go to the Downloads folder Type cd Downloads Press Enter Type jupyter notebook Press Enter	I have saved my code file in the Downloads folder. Please navigate to the respective folder of your code file location. Then type, jupyter space notebook and press Enter to open Jupyter Notebook.
Show Jupyter Notebook Home page: Click on RandomForest.ipynb file	We can see the Jupyter Notebook Home page has opened in the web browser. Click the RandomForest dot ipynb file to open it. Note that each cell will have the output displayed in this file.
Highlight the lines: import pandas as pd import numpy as np from sklearn.metrics import accuracy_score	We start by importing the required libraries for the Random Forest model. Please remember to Execute each cell by pressing Shift and Enter to get output.
Highlight the lines: california = fetch_california_housing() housing = pd.DataFrame(california.data, columns=california.feature_names) housing['MedHouseVal'] = california.target	We use the California Housing dataset from sklearn library. It has housing data from California districts. We analyze the dataset and predict the MedHouseValue. Now, we load the California Housing dataset into a Pandas DataFrame. We add the target column, MedHouseVal, which represents median house value.
Highlight the lines: housing.head()	To check the dataset, we display the first few rows using the head function.
Highlight the lines: housing.shape	Next, we use the shape function to check the number of rows and columns.
Highlight the lines: housing.info()	Now, we print dataset details to understand its structure and statistics.
Highlight the lines: housing_sorted = housing.sort_values(by="HouseAge")	Let's visualize house age vs median value using a line plot.
Highlight the output:	The plot shows newer homes have higher values, then prices stabilize. The shaded region represents price variability in different age groups.
Highlight the lines: skewed_features = ['AveRooms', 'AveBedrms', 'Population', 'AveOccup']	Before training, we check for skewed features that may affect predictions. Next, we apply log transformation to reduce skewness in these features.
Highlight the lines: scaler = StandardScaler()	Now, we scale the feature values using StandardScaler. This ensures all features have a similar range for better modeling.
Highlight the lines: x = housing.drop(columns=["MedHouseVal"]) x.head()	Then we separate the features by dropping the target column. After that, we display the first few rows to check the data.
Highlight the lines: y = housing.MedHouseVal y.head()	Next, we extract the target variable and store it in y.
Highlight the lines: x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)	Now, we split the dataset into training and testing sets.
Highlight the lines: rf =	Next, we create a Random Forest Regressor with 100 trees. Then, we train the model using the training data.
Highlight the lines: y_train_pred = rf.predict(x_train)	After training, we predict house values for the training set.
Highlight the lines: training_mse =	Now, we compute the Mean Squared Error for training data. We also compute the adjusted R squared score for better evaluation.
Highlight the output:	The Training MSE is 0.037, showing a low prediction error. The Training Adjusted R squared Score is 0.972.A high Adjusted R squared score suggests the model explains most variance.
Highlight the lines: y_pred = rf.predict(x_test)	Now, we predict house prices for the test dataset.
Highlight the lines: print("Random Forest - Actual vs Predicted:")	Next, we compare actual vs predicted values for the test set.
Highlight the lines: test_mse = mean_squared_error(y_test,y_pred)	After that, we calculate the MSE for the test predictions. Then, we compute the Adjusted R squared score for the test set.
Highlight the output	For the test set, the MSE is 0.257, indicating reasonable accuracy. The Test Adjusted R squared Score is 0.804. A higher Adjusted R Squared score indicates a good fit.
Highlight the lines: residuals = y_test - y_pred plt.show()	To further analyze performance, we examine the residuals.
Highlight the output:	Residuals are the difference between actual and predicted values. The red dashed line marks zero residuals for reference. Most residuals are near zero, meaning the prediction error is low.
Highlight the lines: feature_importances = rf.feature_importances_ plt.show()	Besides accuracy, we analyze feature importance to see key prediction factors. Feature importance shows how much each feature impacts predictions. Higher value mean a feature is more important for the model.
Show the output:	The plot shows MedInc has the highest impact on house price predictions. Other features contribute less but still affect model performance.
Narration	Thus, we successfully built a Random Forest model for house price prediction. The model showed high accuracy, indicating strong performance.
Show slide: Summary	This brings us to the end of the tutorial. Let us summarize. In this tutorial, we have learnt about Ensemble Learning and Random Forest
Show Slide: Assignment	As an assignment, please do the following: Replace the test_size parameter as shown here. Observe the change in MSE and Adjusted R squared score
Show Slide: Assignment Solution Show x img	After completing the assignment, the output should match as the expected result.
Show Slide: FOSSEE Forum	For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question.
Show Slide: Thank you	This is Harini Theiveegan, a FOSSEE Summer Fellow 2025, IIT Bombay signing off. Thanks for joining

Contributors and Content Editors

Madhurig, Nirmala Venkat

Difference between revisions of "Python-for-Machine-Learning/C3/Random-Forest/English"

Revision as of 15:38, 16 July 2025

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-<div style="margin-left:1.27cm;margin-right:0cm;"></div>
 {| border="1"
 |-
@@ Line 8: / Line 4: @@
 || '''Narration'''
 |-
-|| <div style="color:#000000;">Show slide:</div>
+|| Show slide:
-<div style="color:#000000;">'''Welcome'''</div>
+'''Welcome'''
-|| <span style="color:#000000;">Welcome to the Spoken Tutorial on</span><span style="color:#000000;">''' Random Forest'''</span><span style="color:#000000;">.</span>
+|| Welcome to the Spoken Tutorial on''' Random Forest'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
-|| <div style="color:#000000;">Show Slide:</div>
+|| Show Slide:
-<div style="color:#000000;">'''Learning Objectives'''</div>
+'''Learning Objectives'''
 || In this tutorial, we will learn about
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Ensemble Learning and'''</div>
+* '''Ensemble Learning and'''
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Random Forest'''</div>
+* '''Random Forest'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
-|| <span style="color:#000000;">Show Slide:</span>
+|| Show Slide:
 || To Record this tutorial, I am using
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Ubuntu Linux operating system 2</span>4<span style="color:#000000;">.04'''</span></div>
+* '''Ubuntu Linux operating system 24.04'''
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Jupyter Notebook'''</span><span style="color:#000000;"> </span><span style="color:#000000;">'''IDE'''</span></div>
+* '''Jupyter Notebook''' '''IDE'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
-|| <div style="color:#000000;">Show Slide:</div>
+|| Show Slide:
-<span style="color:#000000;">'''Prerequisite'''</span>
+'''Prerequisite'''
 || To follow this tutorial,
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">The learner must have basic </span>knowledge of<span style="color:#000000;"> </span>'''P<span style="color:#000000;">ython</span>.'''</div>
+* The learner must have basic knowledge of '''Python.'''
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">For pre-requisite </span><span style="color:#000000;">'''Python'''</span><span style="color:#000000;"> tutorials, please visit this website.</span></div>
+* For pre-requisite '''Python''' tutorials, please visit this website.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
-|| <div style="color:#000000;">Show Slide:</div>
+|| Show Slide:
-<div style="color:#000000;">'''Code files'''</div>
+'''Code files'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;">The files used in this tutorial are provided in the '''Code files '''link.</div>
+* The files used in this tutorial are provided in the '''Code files '''link.
-* <div style="color:#252525;margin-left:1.27cm;margin-right:0cm;">Please download and extract the files.</div>
+* Please download and extract the files.
-* <div style="color:#252525;margin-left:1.27cm;margin-right:0cm;">Make a copy and then use them while practicing.</div>
+* Make a copy and then use them while practicing.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show slide:
 '''Ensemble Learning'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Ensemble learning'''</span><span style="color:#000000;"> combines multiple models to improve performance.</span></div>
+* '''Ensemble learning''' combines multiple models to improve performance.
-* <div style="color:#000000;margin-left:1.27cm;margin-right:0cm;">It reduces errors by averaging predictions from different models.</div>
+* It reduces errors by averaging predictions from different models.
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">One popular ensemble method is </span><span style="color:#000000;">'''Random Forest'''</span><span style="color:#000000;">, which uses decision trees.</span></div>
+* One popular ensemble method is '''Random Forest''', which uses decision trees.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show slide:
 '''Random Forest'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Random Forest'''</span><span style="color:#000000;"> builds </span><span style="color:#000000;">'''multiple trees'''</span><span style="color:#000000;"> and takes the majority vote.</span></div>
+* '''Random Forest''' builds '''multiple trees''' and takes the majority vote.
-* <div style="color:#000000;margin-left:1.27cm;margin-right:0cm;">This improves accuracy and reduces overfitting compared to a single tree.</div>
+* This improves accuracy and reduces overfitting compared to a single tree.
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">It is widely used for </span><span style="color:#000000;">'''classification'''</span><span style="color:#000000;"> and </span><span style="color:#000000;">'''regression'''</span><span style="color:#000000;"> tasks.</span></div>
+* It is widely used for '''classification''' and '''regression''' tasks.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Point to the''' RandomForest.ipynb'''
 ||
@@ Line 70: / Line 66: @@
 '''RandomForest'''.'''ipynb''' is the python notebook file for the demonstration of Random Forest
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Press '''Ctrl,Alt and T''' keys
@@ Line 83: / Line 79: @@
 Press '''Enter.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Go to the '''Downloads '''folder
@@ Line 95: / Line 91: @@
 Then type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show '''Jupyter Notebook Home page''':
@@ Line 104: / Line 100: @@
 Note that each cell will have the output displayed in this file.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 114: / Line 110: @@
 Please remember to Execute each cell by pressing '''Shift and Enter''' to get output.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 131: / Line 127: @@
 We add the target column, '''MedHouseVal,''' which represents median house value.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 '''housing.head() '''
 || To check the dataset, we display the first few rows using the '''head function'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 142: / Line 138: @@
 || Next, we use the '''shape function''' to check the number of rows and columns.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 148: / Line 144: @@
 || Now, we print '''dataset details''' to understand its '''structure''' and '''statistics.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 154: / Line 150: @@
 || Let's visualize '''house age vs median value''' using a '''line plot'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the output:
@@ Line 160: / Line 156: @@
 The shaded region represents '''price variability''' in different age groups.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 168: / Line 164: @@
 Next, we apply '''log transformation''' to reduce '''skewness''' in these features.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 176: / Line 172: @@
 This ensures all features have a similar range for better modeling.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 185: / Line 181: @@
 After that, we display the first few rows to check the data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 '''y = housing.MedHouseVal'''
@@ Line 191: / Line 187: @@
 || Next, we extract the '''target variable''' and store it in '''y.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 '''x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42) '''
 || Now, we split the dataset into '''training''' and '''testing sets.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 203: / Line 199: @@
 Then, we train the model using the training data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 209: / Line 205: @@
 || After training, we predict house values for the training set.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 217: / Line 213: @@
 We also compute the '''adjusted R squared score''' for better evaluation.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the output:
 || The '''Training MSE '''is '''0.037''', showing a low prediction error.
 The '''Training Adjusted R squared Score''' is '''0.972'''.A high Adjusted R squared score suggests the model explains most variance.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 '''y_pred = rf.predict(x_test)'''
 || Now, we predict house prices for the test dataset.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 '''print("Random Forest - Actual vs Predicted:") '''
 || Next, we compare actual vs predicted values for the test set.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 240: / Line 236: @@
 Then, we compute the '''Adjusted R squared score''' for the test set.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the output
 || For the test set, the '''MSE''' is '''0.257''', indicating reasonable accuracy'''.'''
@@ Line 247: / Line 243: @@
 A higher Adjusted R Squared score indicates a good fit.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 254: / Line 250: @@
 || To further analyze performance, we examine the '''residuals'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the output:
 || Residuals are the difference between '''actual and predicted values.'''
@@ Line 261: / Line 257: @@
 Most residuals are near zero, meaning the prediction error is low.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 272: / Line 268: @@
 Higher value mean a feature is more important for the model.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show the output:
 || The plot shows '''MedInc''' has the highest impact on house price predictions.
 Other features contribute less but still affect model performance.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Narration
 || Thus, we successfully built a '''Random Forest model''' for house price prediction.
 The model showed '''high accuracy''', indicating '''strong performance.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show slide:
@@ Line 289: / Line 285: @@
 In this tutorial, we have learnt about
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Ensemble Learning and'''</div>
+* '''Ensemble Learning and'''
-* <div style="color:#000000;margin-left:1.27cm;margin-right:0cm;">'''Random Forest'''</div>
+*  '''Random Forest'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 298: / Line 294: @@
 || As an assignment, please do the following:
-* <div style="margin-left:1.27cm;margin-right:0cm;">Replace the '''test_size parameter''' as shown here.</div>
+* Replace the '''test_size parameter''' as shown here.
-* <div style="margin-left:1.27cm;margin-right:0cm;">Observe the change in '''MSE '''and '''Adjusted R squared score'''</div>
+* Observe the change in '''MSE '''and '''Adjusted R squared score'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 308: / Line 304: @@
 '''Show x img'''
 || After completing the assignment, the output should match as the expected result.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 314: / Line 310: @@
 || For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.199cm;padding-right:0.191cm;"
+|-
 || Show Slide: