Difference between revisions of "Python-for-Machine-Learning/C3/Decision-Tree/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- |- style="border:0.5pt solid #000000;padding-top:0cm;pad...")
 
Line 2: Line 2:
  
  
<div style="margin-left:1.27cm;margin-right:0cm;"></div>
+
 
 
{| border="1"
 
{| border="1"
 
|-
 
|-
Line 8: Line 8:
 
|| '''Narration'''
 
|| '''Narration'''
 
|-
 
|-
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
|| <div style="color:#000000;">Show slide:</div>
+
|| Show slide:
  
<div style="color:#000000;">'''Welcome'''</div>
+
'''Welcome'''
  
|| <span style="color:#000000;">Welcome to the Spoken Tutorial on</span><span style="color:#000000;">''' Decision Tree'''</span><span style="color:#000000;">.</span>
+
||Welcome to the Spoken Tutorial on''' Decision Tree'''.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
|| <div style="color:#000000;">Show Slide:</div>
+
|| Show Slide:
  
<div style="color:#000000;">'''Learning Objectives'''</div>
+
'''Learning Objectives'''
  
 
|| In this tutorial, we will learn about
 
|| In this tutorial, we will learn about
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Decision Tree'''</div>
+
* '''Decision Tree'''
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Decision Tree Structure and Nodes'''</div>
+
* '''Decision Tree Structure and Nodes'''
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
|| <div style="color:#000000;">Show Slide:</div>
+
|| Show Slide:
 +
 
 +
'''System Requirements'''
  
<div style="color:#000000;">'''System Requirements'''</div>
 
<div style="color:#000000;"></div>
 
 
|| To Record this tutorial, I am using
 
|| To Record this tutorial, I am using
* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Ubuntu Linux </span>OS<span style="color:#000000;"> 2</span>4<span style="color:#000000;">.04'''</span></div>
+
* '''Ubuntu Linux OS 24.04'''
* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Jupyter Notebook'''</span><span style="color:#000000;"> </span><span style="color:#000000;">'''IDE'''</span></div>
+
* '''Jupyter Notebook''' '''IDE'''
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
|| <div style="color:#000000;">Show Slide:</div>
+
|| Show Slide:
  
<div style="color:#000000;">'''Prerequisite'''</div>
+
'''Prerequisite'''
  
To follow this tutorial,<div style="margin-left:0.635cm;margin-right:0cm;"></div>
+
To follow this tutorial,
 
|| To follow this tutorial,
 
|| To follow this tutorial,
* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">The learner must have basic </span>knowledge of<span style="color:#000000;"> </span>'''P<span style="color:#000000;">ython</span>.'''</div>
+
* The learner must have basic knowledge of '''Python.'''
* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">For prerequisite </span><span style="color:#000000;">'''Python'''</span><span style="color:#000000;"> tutorials, please visit this website.</span></div>
+
* For prerequisite '''Python''' tutorials, please visit this website.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
|| <div style="color:#000000;">Show Slide:</div>
+
|| Show Slide:
  
<div style="color:#000000;">'''Code files'''</div>
+
'''Code files'''
 
||
 
||
* <div style="margin-left:1.27cm;margin-right:0cm;">The files used in this tutorial are provided in the '''Code files '''link.</div>
+
* The files used in this tutorial are provided in the '''Code files '''link.
* <div style="color:#252525;margin-left:1.27cm;margin-right:0cm;">Please download and extract the files.</div>
+
* Please download and extract the files.
* <div style="color:#252525;margin-left:1.27cm;margin-right:0cm;">Make a copy and then use them while practicing.</div>
+
* Make a copy and then use them while practicing.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show Slide:  
 
|| Show Slide:  
  
 
'''Decision Tree'''
 
'''Decision Tree'''
 
||
 
||
* <div style="margin-left:1.27cm;margin-right:0cm;">A '''decision tree''' is a tool used in machine learning that helps make decisions.</div>
+
* A '''decision tree''' is a tool used in machine learning that helps make decisions.
* <div style="margin-left:1.27cm;margin-right:0cm;">It uses a tree like structure to make predictions.</div>
+
* It uses a tree like structure to make predictions.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show Slide:
 
|| Show Slide:
  
Line 66: Line 66:
 
'''Show dt.png img'''
 
'''Show dt.png img'''
 
||
 
||
* <div style="margin-left:1.27cm;margin-right:0cm;">The '''root node''' starts the decision tree with a question or condition.</div>
+
* The '''root node''' starts the decision tree with a question or condition.
* <div style="margin-left:1.27cm;margin-right:0cm;">Based on the answer, we follow a '''branch''' to another '''node.'''</div>
+
* Based on the answer, we follow a '''branch''' to another '''node.'''
* <div style="margin-left:1.27cm;margin-right:0cm;">A '''branch''' connects nodes, where each '''node '''represents a condition with outcomes.</div>
+
* A '''branch''' connects nodes, where each '''node '''represents a condition with outcomes.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| '''Show dt.png img'''
 
|| '''Show dt.png img'''
  
<div style="color:#ff0000;"></div>
+
<div style="color:#ff0000;">
 
||
 
||
* <div style="margin-left:1.27cm;margin-right:0cm;">This new '''node''' poses another question or condition.</div>
+
* This new '''node''' poses another question or condition.
* <div style="margin-left:1.27cm;margin-right:0cm;">We repeat this process of asking questions and following '''branches.'''</div>
+
* We repeat this process of asking questions and following '''branches.'''
* <div style="margin-left:1.27cm;margin-right:0cm;">Finally, we reach a '''leaf node''', which gives us the final decision or outcome.</div>
+
* Finally, we reach a '''leaf node''', which gives us the final decision or outcome.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Hover over the files
 
|| Hover over the files
|| I have created required files for the demonstration of Decision Tree.<span style="color:#ff0000;"> </span>
+
|| I have created required files for the demonstration of Decision Tree.<span style="color:#ff0000;">  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Open the file drug200.csv and point to the fields as per narration.
 
|| Open the file drug200.csv and point to the fields as per narration.
 
|| To implement the '''Decision Tree model, '''we use the '''drug200 dot csv '''dataset.
 
|| To implement the '''Decision Tree model, '''we use the '''drug200 dot csv '''dataset.
Line 90: Line 90:
 
'''drug200 '''dataset contains '''Age, Sex, BP, Cholesterol, Na to K ratio''' and '''Drug.'''
 
'''drug200 '''dataset contains '''Age, Sex, BP, Cholesterol, Na to K ratio''' and '''Drug.'''
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Point to the '''DecisionTree.pynb'''
 
|| Point to the '''DecisionTree.pynb'''
 
|| '''DecisionTree dot ipynb''' is the ipython notebook file for this demonstration.
 
|| '''DecisionTree dot ipynb''' is the ipython notebook file for this demonstration.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Press '''Ctrl,Alt and T''' keys  
 
|| Press '''Ctrl,Alt and T''' keys  
  
Line 103: Line 103:
 
'''conda space activate''' '''space ml'''
 
'''conda space activate''' '''space ml'''
 
Press '''Enter.'''
 
Press '''Enter.'''
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Go to the '''Downloads '''folder
 
|| Go to the '''Downloads '''folder
  
Line 116: Line 116:
  
 
Type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook.
 
Type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show '''Jupyter Notebook Home page''':
 
|| Show '''Jupyter Notebook Home page''':
  
Line 124: Line 124:
 
Click the''' DecisionTree dot ipynb '''file to open it.
 
Click the''' DecisionTree dot ipynb '''file to open it.
  
<div style="color:#000000;">Note that each cell will have the output displayed in this file.</div>
+
Note that each cell will have the output displayed in this file.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
|| <div style="color:#000000;">Highlight The lines:</div>
+
|| Highlight The lines:
<div style="color:#000000;">'''import numpy as np'''</div>
+
'''import numpy as np'''
<div style="color:#000000;">'''import pandas as pd'''</div>
+
'''import pandas as pd'''
<div style="color:#000000;">'''from sklearn.model_selection import train_test_split '''</div>
+
'''from sklearn.model_selection import train_test_split '''
<span style="color:#000000;">'''from sklearn.tree import DecisionTreeClassifier '''</span>
+
'''from sklearn.tree import DecisionTreeClassifier '''
 
Press '''Shift+Enter'''
 
Press '''Shift+Enter'''
  
Line 137: Line 137:
 
Please remember to Execute each cell by pressing '''Shift and Enter''' to get output.
 
Please remember to Execute each cell by pressing '''Shift and Enter''' to get output.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
 
'''df_drug=pd.read_csv("drug200.csv") '''
 
'''df_drug=pd.read_csv("drug200.csv") '''
Line 144: Line 144:
 
|| We start by loading the '''dataset''' from a CSV file and display the first few rows.
 
|| We start by loading the '''dataset''' from a CSV file and display the first few rows.
  
<div style="color:#000000;"></div>
+
 
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight The lines:
 
|| Highlight The lines:
 
'''print("Length of Dataset:", len(df_drug)) print("Dataset Shape:", df_drug.shape)'''
 
'''print("Length of Dataset:", len(df_drug)) print("Dataset Shape:", df_drug.shape)'''
Line 151: Line 151:
 
|| Then we print the '''number of rows''' and the '''shape''' of the dataset.
 
|| Then we print the '''number of rows''' and the '''shape''' of the dataset.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
 
'''le_sex = LabelEncoder() '''
 
'''le_sex = LabelEncoder() '''
Line 157: Line 157:
 
Press '''Shift+Enter'''
 
Press '''Shift+Enter'''
 
|| Next, we encode the categorical variables like '''Sex and BP''' into numerical values.
 
|| Next, we encode the categorical variables like '''Sex and BP''' into numerical values.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight The lines:
 
|| Highlight The lines:
  
Line 165: Line 165:
 
|| We then separate the '''features x''' and the target '''variable y'''.
 
|| We then separate the '''features x''' and the target '''variable y'''.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
 
'''x'''
 
'''x'''
 
|| Now we print the values of '''features.'''
 
|| Now we print the values of '''features.'''
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
 
'''y'''
 
'''y'''
 
|| Similarly, we print the values of the target.
 
|| Similarly, we print the values of the target.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 184: Line 184:
  
 
We also print the '''dataset shape''' after removing the duplicates.
 
We also print the '''dataset shape''' after removing the duplicates.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
 
'''numerical_columns = df.select_dtypes(include=['int64', 'float64']).columns'''
 
'''numerical_columns = df.select_dtypes(include=['int64', 'float64']).columns'''
 
'''scaler = StandardScaler()'''
 
'''scaler = StandardScaler()'''
 
|| Now we use '''StandardScaler''' to standardize the numerical columns.
 
|| Now we use '''StandardScaler''' to standardize the numerical columns.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 199: Line 199:
 
|| We then visualize the '''data distribution''' for numerical columns.
 
|| We then visualize the '''data distribution''' for numerical columns.
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 206: Line 206:
 
Press '''Shift+Enter'''
 
Press '''Shift+Enter'''
 
|| Now, we split the data into '''training''' and '''testing sets.'''
 
|| Now, we split the data into '''training''' and '''testing sets.'''
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 217: Line 217:
  
 
By navigating through the root and branches, we arrive at a decision of classes.
 
By navigating through the root and branches, we arrive at a decision of classes.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 225: Line 225:
  
 
Once trained, we predict the '''target values''' for the training set.
 
Once trained, we predict the '''target values''' for the training set.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines
 
|| Highlight the lines
  
 
'''print("Training Accuracy is", accuracy_score(y_train, y_pred_train) * 100 )'''
 
'''print("Training Accuracy is", accuracy_score(y_train, y_pred_train) * 100 )'''
 
|| Now we print the '''training accuracy.'''
 
|| Now we print the '''training accuracy.'''
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:'''y_pred_en = clf_entropy.predict(x_test) '''
 
|| Highlight the lines:'''y_pred_en = clf_entropy.predict(x_test) '''
  
 
'''y_pred_en '''
 
'''y_pred_en '''
 
|| Next we make predictions on the test data.
 
|| Next we make predictions on the test data.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight The lines:
 
|| Highlight The lines:
  
Line 244: Line 244:
 
Press '''Shift+Enter'''
 
Press '''Shift+Enter'''
 
|| Then we print the''' test accuracy'''
 
|| Then we print the''' test accuracy'''
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the output
 
|| Highlight the output
 
|| The accuracy is '''98.33''' which indicates the model performs very well.
 
|| The accuracy is '''98.33''' which indicates the model performs very well.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 256: Line 256:
  
 
It shows how well the model is correctly classifying the instances.
 
It shows how well the model is correctly classifying the instances.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 267: Line 267:
  
 
This report gives '''precision, recall, f1-score''', and '''support''' for each class.
 
This report gives '''precision, recall, f1-score''', and '''support''' for each class.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 280: Line 280:
  
 
Then, we predict class probabilities using the model.
 
Then, we predict class probabilities using the model.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 295: Line 295:
  
 
The '''ROC curve''' shows how well the model distinguishes between classes.
 
The '''ROC curve''' shows how well the model distinguishes between classes.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show the output:
 
|| Show the output:
  
Line 309: Line 309:
  
 
So, our classifier performs very well for all classes.
 
So, our classifier performs very well for all classes.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Highlight the lines:
 
|| Highlight the lines:
  
Line 320: Line 320:
  
 
Then we save and display the''' tree visualization''' as a PNG image file.
 
Then we save and display the''' tree visualization''' as a PNG image file.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show the output
 
|| Show the output
 
|| The tree helps classify which drug to give based on patient features.  
 
|| The tree helps classify which drug to give based on patient features.  
Line 329: Line 329:
  
 
The tree splits until leaves mostly contain one drug, showing zero entropy.
 
The tree splits until leaves mostly contain one drug, showing zero entropy.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Narration
 
|| Narration
 
|| Thus, we built a decision tree to predict '''drug''' types based on patient data.
 
|| Thus, we built a decision tree to predict '''drug''' types based on patient data.
  
 
The model showed '''high accuracy''', indicating '''strong predictive performance.'''
 
The model showed '''high accuracy''', indicating '''strong predictive performance.'''
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show slide:
 
|| Show slide:
  
Line 341: Line 341:
  
 
In this tutorial, we have learnt about
 
In this tutorial, we have learnt about
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Decision Tree'''</div>
+
* '''Decision Tree'''
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Decision Tree Structure and Nodes'''</div>
+
* '''Decision Tree Structure and Nodes'''
  
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show Slide:
 
|| Show Slide:
  
Line 352: Line 352:
 
|| As an assignment, please do the following:
 
|| As an assignment, please do the following:
  
<span style="background-color:#ffffff;">Replace the max underscore depth as shown here.</span>
+
Replace the max underscore depth as shown here.
  
<span style="background-color:#ffffff;">Observe the change in Testing accuracy</span>.
+
Observe the change in Testing accuracy.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show Slide:
 
|| Show Slide:
  
Line 362: Line 362:
 
'''Show x img'''
 
'''Show x img'''
 
|| After completing the assignment, the output should match as the expected result.
 
|| After completing the assignment, the output should match as the expected result.
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show Slide:
 
|| Show Slide:
  
 
'''FOSSEE Forum'''
 
'''FOSSEE Forum'''
 
|| For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question
 
|| For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question
|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+
|-  
 
|| Show Slide:  
 
|| Show Slide:  
  
 
'''Thank you'''
 
'''Thank you'''
  
|| This is '''Harini Theiveegan''',<span style="color:#000000;"> a FOSSEE Summer Fellowship 2025, IIT Bombay signing off</span>
+
|| This is '''Harini Theiveegan''', a FOSSEE Summer Fellowship 2025, IIT Bombay signing off
  
<div style="color:#000000;">Thanks for joining</div>
+
Thanks for joining
 
|-
 
|-
 
|}
 
|}

Revision as of 13:15, 28 June 2025



Visual Cue Narration
Show slide:

Welcome

Welcome to the Spoken Tutorial on Decision Tree.
Show Slide:

Learning Objectives

In this tutorial, we will learn about
  • Decision Tree
  • Decision Tree Structure and Nodes
Show Slide:

System Requirements

To Record this tutorial, I am using
  • Ubuntu Linux OS 24.04
  • Jupyter Notebook IDE
Show Slide:

Prerequisite

To follow this tutorial,

To follow this tutorial,
  • The learner must have basic knowledge of Python.
  • For prerequisite Python tutorials, please visit this website.
Show Slide:

Code files

  • The files used in this tutorial are provided in the Code files link.
  • Please download and extract the files.
  • Make a copy and then use them while practicing.
Show Slide:

Decision Tree

  • A decision tree is a tool used in machine learning that helps make decisions.
  • It uses a tree like structure to make predictions.
Show Slide:

Working of Decision Tree

Show dt.png img

  • The root node starts the decision tree with a question or condition.
  • Based on the answer, we follow a branch to another node.
  • A branch connects nodes, where each node represents a condition with outcomes.
Show dt.png img
  • This new node poses another question or condition.
  • We repeat this process of asking questions and following branches.
  • Finally, we reach a leaf node, which gives us the final decision or outcome.
Hover over the files I have created required files for the demonstration of Decision Tree.
Open the file drug200.csv and point to the fields as per narration. To implement the Decision Tree model, we use the drug200 dot csv dataset.

Here, we analyze patient’s data to predict the most suitable drug for them.

drug200 dataset contains Age, Sex, BP, Cholesterol, Na to K ratio and Drug.

Point to the DecisionTree.pynb DecisionTree dot ipynb is the ipython notebook file for this demonstration.
Press Ctrl,Alt and T keys

Type conda activate ml

Press Enter

Let us open the Linux terminal by pressing Ctrl, Alt and T keys together.

Activate the machine learning environment by typing conda space activate space ml Press Enter.

Go to the Downloads folder

Type cd Downloads Press Enter Type jupyter notebook

Press Enter

I have saved my code file in the Downloads folder.

Please navigate to the respective folder of your code file location.

Type, jupyter space notebook and press Enter to open Jupyter Notebook.

Show Jupyter Notebook Home page:

Click on DecisionTree.ipynb file

We can see the Jupyter Notebook Home page has opened in the web browser.

Click the DecisionTree dot ipynb file to open it.

Note that each cell will have the output displayed in this file.

Highlight The lines:

import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier Press Shift+Enter

These are the necessary libraries to be imported for the Decision Tree.

Please remember to Execute each cell by pressing Shift and Enter to get output.

Highlight the lines:

df_drug=pd.read_csv("drug200.csv") df_drug.head() Press Shift+Enter

We start by loading the dataset from a CSV file and display the first few rows.


Highlight The lines:

print("Length of Dataset:", len(df_drug)) print("Dataset Shape:", df_drug.shape) Press Shift+Enter

Then we print the number of rows and the shape of the dataset.
Highlight the lines:

le_sex = LabelEncoder() le_BP = LabelEncoder() Press Shift+Enter

Next, we encode the categorical variables like Sex and BP into numerical values.
Highlight The lines:

x=df_drug.drop(columns=['Drug']).values y = df_drug['Drug'].values Press Shift+Enter

We then separate the features x and the target variable y.
Highlight the lines:

x

Now we print the values of features.
Highlight the lines:

y

Similarly, we print the values of the target.
Highlight the lines:

print("\nNumber of Duplicate Rows:", df_drug.duplicated().sum()) df = df_drug.drop_duplicates() print("Dataset Shape After Removing Duplicates:", df.shape)

Next, we check for duplicate rows and remove them if found.

We also print the dataset shape after removing the duplicates.

Highlight the lines:

numerical_columns = df.select_dtypes(include=['int64', 'float64']).columns scaler = StandardScaler()

Now we use StandardScaler to standardize the numerical columns.
Highlight the lines:

df.hist(figsize=(10, 8), bins=20, color='skyblue', edgecolor='black')

plt.suptitle("Distribution of Numerical Features")

plt.show()

We then visualize the data distribution for numerical columns.
Highlight the lines:

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=100)

Press Shift+Enter

Now, we split the data into training and testing sets.
Highlight the lines:

clf_entropy = DecisionTreeClassifier(criterion="entropy", random_state=5, max_depth=4, min_samples_leaf=5)

Press Shift+Enter

After that we create a decision tree classifier using entropy as the criterion.

Entropy is the measure of disorder in the dataset. It helps to classify the features into root and branches of the decision tree.

By navigating through the root and branches, we arrive at a decision of classes.

Highlight the lines:

clf_entropy.fit(x_train, y_train) y_pred_train = clf_entropy.predict(x_train)

We then fit the classifier to the training data.

Once trained, we predict the target values for the training set.

Highlight the lines

print("Training Accuracy is", accuracy_score(y_train, y_pred_train) * 100 )

Now we print the training accuracy.
Highlight the lines:y_pred_en = clf_entropy.predict(x_test)

y_pred_en

Next we make predictions on the test data.
Highlight The lines:

accuracy = round(accuracy_score(y_test, y_pred_en) * 100, 3)

print("Accuracy is", accuracy)

Press Shift+Enter

Then we print the test accuracy
Highlight the output The accuracy is 98.33 which indicates the model performs very well.
Highlight the lines:

cm = confusion_matrix(y_test, y_pred_en) plt.figure(figsize=(10, 7)) Press Shift+Enter

To analyze model performance further, we create and display a confusion matrix.

It shows how well the model is correctly classifying the instances.

Highlight the lines:

report = classification_report(y_test, y_pred_en,zero_division=0)

print("Classification Report:")

Press Shift+Enter

We also generate and print the classification report.

This report gives precision, recall, f1-score, and support for each class.

Highlight the lines:

classes = np.unique(y) y_test_bin = label_binarize(y_test, classes=classes) y_score = clf_entropy.predict_proba(x_test)

Now we get all unique target classes from the dataset.

Next, we binarize y underscore test for multi class ROC plotting.

We binarize to handle multi class ROC as it needs binary format.

Then, we predict class probabilities using the model.

Highlight the lines:

fpr = dict() tpr = dict() roc_auc = dict() n_classes = len(classes) for i in range(n_classes): plt.legend(loc="lower right") plt.grid(True) plt.show()

We plot the ROC curve for each class using the predicted probabilities.

The ROC curve shows how well the model distinguishes between classes.

Show the output: The output displays the multi class ROC curve of our classifier.

DrugA, DrugB, DrugC have an AUC score of 1, indicating perfect classification.

DrugX and DrugY have AUC scores of 0.96 and 0.98, which are very high.

The closer the curve is to the top left, the better the model performs.

Here, all curves are close to the top left corner of the plot.

So, our classifier performs very well for all classes.

Highlight the lines:

feature_names = df_drug.columns[:-1]

plt.figure(figsize=(29, 10))

Then we extract the column names excluding Drug for tree visualization.

Next, we set the figure size and plot the decision tree.

Then we save and display the tree visualization as a PNG image file.

Show the output The tree helps classify which drug to give based on patient features.

It first checks the Na to K value, then splits further using BP, Age, and Cholesterol.

Each colored box shows sample count and predicted drug type.

The tree splits until leaves mostly contain one drug, showing zero entropy.

Narration Thus, we built a decision tree to predict drug types based on patient data.

The model showed high accuracy, indicating strong predictive performance.

Show slide:

Summary

This brings us to the end of the tutorial. Let us summarize.

In this tutorial, we have learnt about

  • Decision Tree
  • Decision Tree Structure and Nodes
Show Slide:

Assignment

As an assignment, please do the following

As an assignment, please do the following:

Replace the max underscore depth as shown here.

Observe the change in Testing accuracy.

Show Slide:

Assignment Solution

Show x img

After completing the assignment, the output should match as the expected result.
Show Slide:

FOSSEE Forum

For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question
Show Slide:

Thank you

This is Harini Theiveegan, a FOSSEE Summer Fellowship 2025, IIT Bombay signing off

Thanks for joining

Contributors and Content Editors

Madhurig, Nirmala Venkat

Retrieved from "https://script.spoken-tutorial.org/index.php?title=Python-for-Machine-Learning/C3/Decision-Tree/English&oldid=56998"