Difference between revisions of "Python-for-Machine-Learning/C2/K-Nearest-Neighbor-Classification/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- || Show Slide: '''Welcome and Title Slide''' || Welcome t...")
 
Line 1: Line 1:
  
  
<div style="margin-left:1.27cm;margin-right:0cm;"></div>
+
 
 
{| border="1"
 
{| border="1"
 
|-
 
|-
Line 7: Line 7:
 
|| '''Narration'''
 
|| '''Narration'''
 
|-
 
|-
|| Show Slide: '''Welcome and Title Slide'''
+
|| '''Show Slide''':
 +
 
 +
'''Welcome and Title Slide'''
 
|| Welcome to the Spoken Tutorial on “'''K Nearest Neighbors Classification.'''”
 
|| Welcome to the Spoken Tutorial on “'''K Nearest Neighbors Classification.'''”
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''Learning Objectives'''
 
'''Learning Objectives'''
 
|| In this tutorial, we will learn about
 
|| In this tutorial, we will learn about
* <div style="margin-left:1.27cm;margin-right:0cm;">The fundamentals of KNN Algorithm</div>
+
* The fundamentals of KNN Algorithm
* <div style="margin-left:1.27cm;margin-right:0cm;">Implementing KNN for Classification using Iris dataset</div>
+
* Implementing KNN for Classification using Iris dataset
* <div style="margin-left:1.27cm;margin-right:0cm;">Evaluating the performance of the trained model</div>
+
* Evaluating the performance of the trained model
 +
 
 +
|-
 +
|| '''Show Slide''':
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
'''System Requirements'''
|| Show Slide:
+
 
|| To record this tutorial, I am using
 
|| To record this tutorial, I am using
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Ubuntu Linux OS version 24.04'''</div>
+
* '''Ubuntu Linux OS version 24.04'''
* <div style="margin-left:1.27cm;margin-right:0cm;">'''Jupyter Notebook IDE'''</div>
+
* '''Jupyter Notebook IDE'''
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''Pre- requisites'''
 
'''Pre- requisites'''
 
|| To follow this tutorial,
 
|| To follow this tutorial,
* <div style="margin-left:1.27cm;margin-right:0cm;">The learner must have basic knowledge of '''Python'''.</div>
+
* The learner must have basic knowledge of '''Python'''.
* <div style="margin-left:1.27cm;margin-right:0cm;">For pre-requisite '''Python''' tutorials, please visit this website.</div>
+
* For pre-requisite '''Python''' tutorials, please visit this website.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
||'''Show Slide''':
  
 
'''Code Files'''
 
'''Code Files'''
 
||  
 
||  
* <div style="margin-left:1.27cm;margin-right:0cm;">The files used in this tutorial are provided in the '''Code files link'''.</div>
+
* The files used in this tutorial are provided in the '''Code files link'''.
* <div style="margin-left:1.27cm;margin-right:0cm;">Please download and extract the files.</div>
+
* Please download and extract the files.
* <div style="margin-left:1.27cm;margin-right:0cm;">Make a copy and then use them while practicing.</div>
+
* Make a copy and then use them while practicing.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''KNN'''
 
'''KNN'''
 
||  
 
||  
* <div style="margin-left:1.27cm;margin-right:0cm;">'''KNN''' stands for '''K Nearest Neighbors'''.</div>
+
* '''KNN''' stands for '''K Nearest Neighbors'''.
* <div style="margin-left:1.27cm;margin-right:0cm;">Nearest Neighbor algorithm predicts using closest similar training data points.</div>
+
* Nearest Neighbor algorithm predicts using closest similar training data points.
* <div style="margin-left:1.27cm;margin-right:0cm;">'''K '''indicates the number of neighboring points to be considered for prediction.</div>
+
* '''K '''indicates the number of neighboring points to be considered for prediction.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide
+
|| '''Show Slide''':
  
 
'''KNN Classification'''
 
'''KNN Classification'''
 
||
 
||
* <div style="margin-left:1.27cm;margin-right:0cm;">Features of '''K''' nearest samples are compared to determine '''similarity'''.</div>
+
* Features of '''K''' nearest samples are compared to determine '''similarity'''.
* <div style="margin-left:1.27cm;margin-right:0cm;">The new data point gets the most frequent class among its neighbors.</div>
+
* The new data point gets the most frequent class among its neighbors.
* <div style="margin-left:1.27cm;margin-right:0cm;">'''KNN''' is versatile and can effectively handle '''multi-class''' classification problems.</div>
+
* '''KNN''' is versatile and can effectively handle '''multi-class''' classification problems.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show '''Slide'''
+
|| '''Show Slide''':
  
 
'''Iris Dataset'''
 
'''Iris Dataset'''
Line 73: Line 77:
  
 
Classes are classified by '''sepal length''', '''sepal width''',''' petal length''', '''petal width'''.
 
Classes are classified by '''sepal length''', '''sepal width''',''' petal length''', '''petal width'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Show '''image'''
 
|| Show '''image'''
  
Line 89: Line 93:
  
 
These neighbors determine the '''black dot’s class'''.
 
These neighbors determine the '''black dot’s class'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Hover over the files
 
|| Hover over the files
  
Line 98: Line 102:
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;" | Press '''Ctrl+Alt+T''' keys
+
| | Press '''Ctrl+Alt+T''' keys
  
 
Type '''conda activate ml'''
 
Type '''conda activate ml'''
  
 
Press '''Enter'''
 
Press '''Enter'''
| style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;" | Let us open the Linux terminal by pressing '''Ctrl, Alt''' and '''T '''keys together.
+
|| Let us open the Linux terminal by pressing '''Ctrl, Alt''' and '''T '''keys together.
  
 
Activate the machine learning environment by typing
 
Activate the machine learning environment by typing
Line 111: Line 115:
 
Press '''Enter.'''
 
Press '''Enter.'''
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;" | Type '''cd Downloads'''
+
|| Type '''cd Downloads'''
  
 
Press '''Enter'''
 
Press '''Enter'''
 
Type '''jupyter notebook'''
 
Type '''jupyter notebook'''
 
Press '''Enter'''
 
Press '''Enter'''
| style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;" | I have saved my code file in the '''Downloads '''folder.
+
|| I have saved my code file in the '''Downloads '''folder.
  
 
Please navigate to the respective folder of your code file location.
 
Please navigate to the respective folder of your code file location.
  
 
Type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook.
 
Type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| '''Jupyter Notebook Home Page''' will be opened.
 
|| '''Jupyter Notebook Home Page''' will be opened.
  
 
Click on '''KNNClassification.ipynb'''
 
Click on '''KNNClassification.ipynb'''
  
<div style="color:#ff0000;"></div>
+
 
 
|| We can see the homepage of the '''Jupyter notebook''' has opened in the web browser.
 
|| We can see the homepage of the '''Jupyter notebook''' has opened in the web browser.
  
Line 133: Line 137:
 
Open the file by clicking on it.
 
Open the file by clicking on it.
  
<div style="color:#000000;">Note that each cell will have the output displayed in this file.</div>
+
Note that each cell will have the output displayed in this file.
  
 
Let us see the implementation of the '''KNN classification '''model.
 
Let us see the implementation of the '''KNN classification '''model.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight
 
|| Highlight
  
Line 147: Line 151:
  
 
Please press '''Shift plus Enter''' to execute the code in each cell.
 
Please press '''Shift plus Enter''' to execute the code in each cell.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight  
 
|| Highlight  
  
Line 154: Line 158:
  
 
The '''iris''' dataset is a built-in dataset available in the '''Scikit- learn library'''.
 
The '''iris''' dataset is a built-in dataset available in the '''Scikit- learn library'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Only narration
 
|| Only narration
  
Line 160: Line 164:
 
|| Let us explore the dataset.
 
|| Let us explore the dataset.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight
 
|| Highlight
 
|| We list out the feature names of the Iris dataset as shown here.
 
|| We list out the feature names of the Iris dataset as shown here.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''iris.target_names'''
 
|| Highlight '''iris.target_names'''
 
|| Next let us list out the target names.
 
|| Next let us list out the target names.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight  
 
|| Highlight  
 
 
|| The target '''classes''' represent the different species of the '''iris''' flower.
 
|| The target '''classes''' represent the different species of the '''iris''' flower.
  
 
'''Setosa''' is shown as '''0''', '''versicolor''' as '''1''' and '''virginica''' as '''2'''.
 
'''Setosa''' is shown as '''0''', '''versicolor''' as '''1''' and '''virginica''' as '''2'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''iris_df = pd.DataFrame(iris.data,columns=iris.feature_names)'''
 
|| Highlight '''iris_df = pd.DataFrame(iris.data,columns=iris.feature_names)'''
 
|| First we create a '''Dataframe''' named '''iris underscore df'''.
 
|| First we create a '''Dataframe''' named '''iris underscore df'''.
Line 180: Line 183:
  
 
It holds the '''features''' and '''target class'''.
 
It holds the '''features''' and '''target class'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''iris_df.head()'''
 
|| Highlight '''iris_df.head()'''
  
Line 187: Line 190:
  
 
The default is '''5 rows''', but the value can be changed by specifying the argument.
 
The default is '''5 rows''', but the value can be changed by specifying the argument.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''iris_df['target'] = iris.target'''
 
|| Highlight '''iris_df['target'] = iris.target'''
  
 
'''iris_df.head()'''
 
'''iris_df.head()'''
 
|| We add a new target column with class labels to the '''dataframe'''.
 
|| We add a new target column with class labels to the '''dataframe'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''iris_df.shape()'''
 
|| Highlight '''iris_df.shape()'''
 
|| The '''shape '''method gives the '''shape''' of the''' dataframe '''in''' rows '''and''' columns'''.
 
|| The '''shape '''method gives the '''shape''' of the''' dataframe '''in''' rows '''and''' columns'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Only narration
 
|| Only narration
  
Line 211: Line 214:
  
 
'''plt dot show''' is used to display the generated plots.
 
'''plt dot show''' is used to display the generated plots.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| <div style="color:#000000;">Show output '''plots'''</div>
+
|| Show output '''plots'''
 +
 
  
<div style="color:#ff0000;"></div>
 
 
|| We see the '''pairplots''' visualizing feature relationships.
 
|| We see the '''pairplots''' visualizing feature relationships.
  
'''Scatter plots''' compare two features and help to identify patterns<span style="color:#000000;">.</span>
+
'''Scatter plots''' compare two features and help to identify patterns.
  
 
'''Diagonal KDE plots''' show the distribution of each feature for different classes.
 
'''Diagonal KDE plots''' show the distribution of each feature for different classes.
Line 224: Line 227:
  
 
Clusters indicate which species are overlapping.
 
Clusters indicate which species are overlapping.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Only Narration
 
|| Only Narration
  
Line 233: Line 236:
  
 
Then copy the remaining features into the variable '''X.'''
 
Then copy the remaining features into the variable '''X.'''
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''y = iris_df['target']'''
 
|| Highlight '''y = iris_df['target']'''
 
|| Next, we assign the '''target''' column to '''y'''.
 
|| Next, we assign the '''target''' column to '''y'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''X'''  
 
|| Highlight '''X'''  
  
 
|| We see that '''X''' contains all features except '''target''' species.
 
|| We see that '''X''' contains all features except '''target''' species.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''y'''
 
|| Highlight '''y'''
  
Line 247: Line 250:
  
 
It is the species of the iris flower.
 
It is the species of the iris flower.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Only narration.
 
|| Only narration.
  
Line 264: Line 267:
  
 
It guarantees we get the same result across multiple executions.
 
It guarantees we get the same result across multiple executions.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight''' X_train, X_test, y_train, y_test'''
 
|| Highlight''' X_train, X_test, y_train, y_test'''
 
|| We assign the split data into four variables.
 
|| We assign the split data into four variables.
Line 281: Line 284:
  
 
It is used for evaluating the model performance.
 
It is used for evaluating the model performance.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''knn = KNeighborsClassifier(n_neighbors=7)'''  
 
|| Highlight '''knn = KNeighborsClassifier(n_neighbors=7)'''  
 
|| Now, we train the KNN classifier using '''KNeighborsClassifier '''with''' 7 neighbors'''.
 
|| Now, we train the KNN classifier using '''KNeighborsClassifier '''with''' 7 neighbors'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''knn.fit(X_train, y_train)'''
 
|| Highlight '''knn.fit(X_train, y_train)'''
 
|| We train the KNN classifier using the''' fit method''' on the training data.
 
|| We train the KNN classifier using the''' fit method''' on the training data.
  
 
'''Fit method''' adjusts the model parameters using the training data.  
 
'''Fit method''' adjusts the model parameters using the training data.  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''y_train_pred = knn.predict(X_train)'''  
 
|| Highlight '''y_train_pred = knn.predict(X_train)'''  
 
|| We predict the labels for the training data.
 
|| We predict the labels for the training data.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''training_accuracy = accuracy_score(y_train, y_train_pred)'''
 
|| Highlight '''training_accuracy = accuracy_score(y_train, y_train_pred)'''
  
Line 301: Line 304:
  
 
It helps to measure how well the model is performing.
 
It helps to measure how well the model is performing.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Highlight <span style="background-color:#ffffff;">'''Training Accuracy: 0.956'''</span>
+
|| Highlight
 +
 
 +
'''Training Accuracy: 0.956'''
 
|| The accuracy is '''0.956''' which is quite good.
 
|| The accuracy is '''0.956''' which is quite good.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Highlight '''print("\nClassification Report:") '''
+
|| Highlight  
 +
 
 +
'''print("\nClassification Report:") '''
  
 
'''print(classification_report(y_train, y_train_pred))'''
 
'''print(classification_report(y_train, y_train_pred))'''
<div style="color:#ff0000;"></div>
 
  
<div style="color:#ff0000;"></div>
 
|| <div style="color:#000000;">Next, we print the '''classification report'''.</div>
 
  
<div style="color:#000000;">The classification report helps to evaluate how well the model is performing.</div>
+
 
 +
|| Next, we print the '''classification report'''.
 +
 
 +
The classification report helps to evaluate how well the model is performing.
  
 
'''Precision''' tells how many positive predictions made by the model were correct.
 
'''Precision''' tells how many positive predictions made by the model were correct.
Line 322: Line 329:
  
 
'''Support''' is the count of true instances of each class in the dataset.
 
'''Support''' is the count of true instances of each class in the dataset.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Show '''output table'''
 
|| Show '''output table'''
  
Line 335: Line 342:
  
 
'''macro and weighted average''' reflect consistent performances across the dataset.
 
'''macro and weighted average''' reflect consistent performances across the dataset.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Only narration
 
|| Only narration
|| Next, we evaluate the model on the <span style="background-color:transparent;">testing</span> data.
+
|| Next, we evaluate the model on the <span style="background-color:transparent;">testing data.
  
 
First, we''' predict''' the class label for a''' single test sample.'''
 
First, we''' predict''' the class label for a''' single test sample.'''
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''sample_test_data = '''
 
|| Highlight '''sample_test_data = '''
  
Line 351: Line 358:
 
Then, we print the '''predicted''' and '''actual class labels''' of the data sample.
 
Then, we print the '''predicted''' and '''actual class labels''' of the data sample.
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''output'''
 
|| Highlight '''output'''
 
|| The predicted class for unseen data is '''2''', which is '''virginica'''.
 
|| The predicted class for unseen data is '''2''', which is '''virginica'''.
  
 
The actual class is also '''2''', indicating the prediction is correct.
 
The actual class is also '''2''', indicating the prediction is correct.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight''' accuracy = accuracy_score(y_test, y_pred) '''
 
|| Highlight''' accuracy = accuracy_score(y_test, y_pred) '''
  
 
'''print(f"Testing Accuracy: {accuracy:.3f}") '''
 
'''print(f"Testing Accuracy: {accuracy:.3f}") '''
 
|| Now, we calculate and print the '''accuracy'''.
 
|| Now, we calculate and print the '''accuracy'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''Testing Accuracy: 0.983'''
 
|| Highlight '''Testing Accuracy: 0.983'''
 
|| The '''accuracy''' is approximately '''0.983'''.  
 
|| The '''accuracy''' is approximately '''0.983'''.  
  
 
We can conclude that the model is performing well on unseen data.
 
We can conclude that the model is performing well on unseen data.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Only Narration<span style="color:#ff0000;"> </span>
+
|| Only Narration
  
 
Highlight '''y_test_bin = label_binarize(y_test, classes=[0, 1, 2]) '''
 
Highlight '''y_test_bin = label_binarize(y_test, classes=[0, 1, 2]) '''
  
|| <span style="color:#000000;">Finally </span><span style="color:#000000;">we</span> plot the '''precision-recall curve''' to visualize the performance for each class.
+
|| Finally we plot the '''precision-recall curve''' to visualize the performance for each class.
  
 
It calculates '''precision''' and '''recall. '''
 
It calculates '''precision''' and '''recall. '''
  
 
It computes '''average precision '''and''' '''summarizes the '''precision recall curve''' into a single score.
 
It computes '''average precision '''and''' '''summarizes the '''precision recall curve''' into a single score.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''plt.figure(figsize=(10, 6)) '''
 
|| Highlight '''plt.figure(figsize=(10, 6)) '''
 
|| '''plt dot plot''' function plots the '''precision recall curve'''.
 
|| '''plt dot plot''' function plots the '''precision recall curve'''.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''plt.xlabel("Recall")'''
 
|| Highlight '''plt.xlabel("Recall")'''
  
Line 388: Line 395:
 
'''plt.show()'''
 
'''plt.show()'''
 
|| '''plt dot show''' displays the final precision recall curve.
 
|| '''plt dot show''' displays the final precision recall curve.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Show output '''plot'''
 
|| Show output '''plot'''
  
Line 395: Line 402:
 
'''KNN classifier''' achieved '''perfect precision-recall''' for all classes.
 
'''KNN classifier''' achieved '''perfect precision-recall''' for all classes.
  
<div style="color:#000000;">'''High AP''' indicates the model performs well in distinguishing all three classes.</div>
+
'''High AP''' indicates the model performs well in distinguishing all three classes.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
 
|| Highlight '''print("\nClassification Report:")'''
 
|| Highlight '''print("\nClassification Report:")'''
  
 
'''print(classification_report(y_test, y_pred))'''
 
'''print(classification_report(y_test, y_pred))'''
|| <div style="color:#000000;">Finally, we evaluate the performance using the '''classification report''' '''.'''</div>
+
|| Finally, we evaluate the performance using the '''classification report''' '''.'''
  
 
The report offers a detailed assessment of the model’s performance.
 
The report offers a detailed assessment of the model’s performance.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''Summary'''
 
'''Summary'''
 
|| This brings us to the end of the tutorial. Let us summarize.
 
|| This brings us to the end of the tutorial. Let us summarize.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''Assignment'''
 
'''Assignment'''
 
|| As an assignment, please do the following
 
|| As an assignment, please do the following
* <div style="margin-left:1.27cm;margin-right:0cm;">Use '''K '''as''' 7''' and''' test size '''as '''0.2'''.</div>
+
* Use '''K '''as''' 7''' and''' test size '''as '''0.2'''.
* <div style="margin-left:1.27cm;margin-right:0cm;">Evaluate the model performance using the '''classification report.'''</div>
+
* Evaluate the model performance using the '''classification report.'''
  
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''Assignment Solution'''
 
'''Assignment Solution'''
Line 425: Line 432:
  
 
We will get an '''accuracy''' of '''96 '''percent.
 
We will get an '''accuracy''' of '''96 '''percent.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide:
+
|| '''Show Slide''':
  
 
'''FOSSEE Forum'''
 
'''FOSSEE Forum'''
 
|| For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question.
 
|| For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question.
|- style="border:1pt solid #000000;padding:0.176cm;"
+
|-  
|| Show Slide: '''Thank You'''
+
|| '''Show Slide''':
 +
 
 +
'''Thank You'''
 
|| This is '''Anvita Thadavoose Manjummel''', a FOSSEE Summer Fellow 2025, IIT Bombay
 
|| This is '''Anvita Thadavoose Manjummel''', a FOSSEE Summer Fellow 2025, IIT Bombay
  
Line 437: Line 446:
 
|-
 
|-
 
|}
 
|}
'''Narration'''
 
|-
 

Revision as of 14:21, 28 June 2025


Visual Cue Narration
Show Slide:

Welcome and Title Slide

Welcome to the Spoken Tutorial on “K Nearest Neighbors Classification.
Show Slide:

Learning Objectives

In this tutorial, we will learn about
  • The fundamentals of KNN Algorithm
  • Implementing KNN for Classification using Iris dataset
  • Evaluating the performance of the trained model
Show Slide:

System Requirements

To record this tutorial, I am using
  • Ubuntu Linux OS version 24.04
  • Jupyter Notebook IDE
Show Slide:

Pre- requisites

To follow this tutorial,
  • The learner must have basic knowledge of Python.
  • For pre-requisite Python tutorials, please visit this website.
Show Slide:

Code Files

  • The files used in this tutorial are provided in the Code files link.
  • Please download and extract the files.
  • Make a copy and then use them while practicing.
Show Slide:

KNN

  • KNN stands for K Nearest Neighbors.
  • Nearest Neighbor algorithm predicts using closest similar training data points.
  • K indicates the number of neighboring points to be considered for prediction.
Show Slide:

KNN Classification

  • Features of K nearest samples are compared to determine similarity.
  • The new data point gets the most frequent class among its neighbors.
  • KNN is versatile and can effectively handle multi-class classification problems.
Show Slide:

Iris Dataset irisflowers.png

Hover over setosa, versicolor and virginica images

Hover over sepal length, sepal width, petal length and petal width

In this tutorial, we are using the Iris plants dataset.

It has 3 distinct flower classes.

Classes are classified by sepal length, sepal width, petal length, petal width.

Show image

Iris Dataset iris.png

The three flower classes appear as clusters in different colors on the graph.

We will use the four features of the flower to classify the three distinct classes.

A black dot represents a flower in the dataset that lacks a defined class.

Our goal is to predict the black dot’s class based on its nearest neighbors.

The closest points to the black dot are called its neighbors.

These neighbors determine the black dot’s class.

Hover over the files

Point to KNN classification.ipynb

I have created required files for the demonstration of KNN classification.

KNN classification dot ipynb is the python notebook file for this demonstration.

Press Ctrl+Alt+T keys

Type conda activate ml

Press Enter

Let us open the Linux terminal by pressing Ctrl, Alt and T keys together.

Activate the machine learning environment by typing

conda space activate space ml

Press Enter.

Type cd Downloads

Press Enter Type jupyter notebook Press Enter

I have saved my code file in the Downloads folder.

Please navigate to the respective folder of your code file location.

Type, jupyter space notebook and press Enter to open Jupyter Notebook.

Jupyter Notebook Home Page will be opened.

Click on KNNClassification.ipynb


We can see the homepage of the Jupyter notebook has opened in the web browser.

Locate the KNN classification dot ipynb file.

Open the file by clicking on it.

Note that each cell will have the output displayed in this file.

Let us see the implementation of the KNN classification model.

Highlight

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

We import these libraries for KNN Classification.

Please press Shift plus Enter to execute the code in each cell.

Highlight

iris = load_iris()

First, we load the dataset into a variable named iris.

The iris dataset is a built-in dataset available in the Scikit- learn library.

Only narration

Highlight iris.feature_names

Let us explore the dataset.
Highlight We list out the feature names of the Iris dataset as shown here.
Highlight iris.target_names Next let us list out the target names.
Highlight The target classes represent the different species of the iris flower.

Setosa is shown as 0, versicolor as 1 and virginica as 2.

Highlight iris_df = pd.DataFrame(iris.data,columns=iris.feature_names) First we create a Dataframe named iris underscore df.

We load the Iris dataset to the dataframe.

It holds the features and target class.

Highlight iris_df.head()

Highlight output

Using the head method the first few rows are displayed.

The default is 5 rows, but the value can be changed by specifying the argument.

Highlight iris_df['target'] = iris.target

iris_df.head()

We add a new target column with class labels to the dataframe.
Highlight iris_df.shape() The shape method gives the shape of the dataframe in rows and columns.
Only narration

Highlight

sns.pairplot( iris_df, hue='target', diag_kind='kde', palette='colorblind' ) plt.show()

Next, we plot a pairplot to visualize the iris dataset.

It visualizes relationships between different features.

It creates scatterplots for each pair of features, colored by class labels.

plt dot show is used to display the generated plots.

Show output plots


We see the pairplots visualizing feature relationships.

Scatter plots compare two features and help to identify patterns.

Diagonal KDE plots show the distribution of each feature for different classes.

Different colors represent different target classes.

Clusters indicate which species are overlapping.

Only Narration

Highlight X = iris_df.drop('target', axis=1)

Now, we split the dataset into X and y to prepare the data for training.

First we remove the target column named target.

Then copy the remaining features into the variable X.

Highlight y = iris_df['target'] Next, we assign the target column to y.
Highlight X We see that X contains all features except target species.
Highlight y

Highlight the output

We see that y contains the target classes.

It is the species of the iris flower.

Only narration.

Highlight train_test_split(X, y, test_size=0.4, random_state=42)

Next, we split the data into training and testing sets.

We use the train underscore test underscore split method.

The split ratio is adjustable through the test underscore size parameter.

We set the test underscore size as 0.4.

Here, we use 40 percent of the data for testing and 60 percent for training.

Setting random state equal to 42 ensures the split is reproducible.

It guarantees we get the same result across multiple executions.

Highlight X_train, X_test, y_train, y_test We assign the split data into four variables.

X underscore train contains the features of the training data.

It is used for model training.

y underscore train contains the target values for the training data.

X underscore test contains features of the test data.

It is used for making predictions.

y underscore test contains the actual class labels for the test data.

It is used for evaluating the model performance.

Highlight knn = KNeighborsClassifier(n_neighbors=7) Now, we train the KNN classifier using KNeighborsClassifier with 7 neighbors.
Highlight knn.fit(X_train, y_train) We train the KNN classifier using the fit method on the training data.

Fit method adjusts the model parameters using the training data.

Highlight y_train_pred = knn.predict(X_train) We predict the labels for the training data.
Highlight training_accuracy = accuracy_score(y_train, y_train_pred)

print("Training Accuracy: {training_accuracy:.3f}")

We calculate and print the accuracy of the model.

Accuracy is the ratio of correct prediction to the total number of instances.

It helps to measure how well the model is performing.

Highlight

Training Accuracy: 0.956

The accuracy is 0.956 which is quite good.
Highlight

print("\nClassification Report:")

print(classification_report(y_train, y_train_pred))


Next, we print the classification report.

The classification report helps to evaluate how well the model is performing.

Precision tells how many positive predictions made by the model were correct.

F1 Score is the balance between precision and recall.

Recall shows how well the model detects actual positive cases correctly.

Support is the count of true instances of each class in the dataset.

Show output table

Box for the data

From the report, we conclude that precision and recall is high for all classes.

F1-Score shows good overall performance across all classes.

The accuracy is 96 percent.

This means the model made correct predictions for 96 percent of the instances.

macro and weighted average reflect consistent performances across the dataset.

Only narration Next, we evaluate the model on the testing data.

First, we predict the class label for a single test sample.

Highlight sample_test_data = We extract the 10th row of the test dataset and reshape it for prediction.

We use the predict method to predict the class label using the trained model.

We store the actual class label in the actual underscore class variable.

Then, we print the predicted and actual class labels of the data sample.

Highlight output The predicted class for unseen data is 2, which is virginica.

The actual class is also 2, indicating the prediction is correct.

Highlight accuracy = accuracy_score(y_test, y_pred)

print(f"Testing Accuracy: {accuracy:.3f}")

Now, we calculate and print the accuracy.
Highlight Testing Accuracy: 0.983 The accuracy is approximately 0.983.

We can conclude that the model is performing well on unseen data.

Only Narration

Highlight y_test_bin = label_binarize(y_test, classes=[0, 1, 2])

Finally we plot the precision-recall curve to visualize the performance for each class.

It calculates precision and recall.

It computes average precision and summarizes the precision recall curve into a single score.

Highlight plt.figure(figsize=(10, 6)) plt dot plot function plots the precision recall curve.
Highlight plt.xlabel("Recall")

plt.ylabel("Precision")

plt.title("Precision-Recall

plt.show()

plt dot show displays the final precision recall curve.
Show output plot The Precision-Recall curve shows the trade-off between precision and recall.

KNN classifier achieved perfect precision-recall for all classes.

High AP indicates the model performs well in distinguishing all three classes.

Highlight print("\nClassification Report:")

print(classification_report(y_test, y_pred))

Finally, we evaluate the performance using the classification report .

The report offers a detailed assessment of the model’s performance.

Show Slide:

Summary

This brings us to the end of the tutorial. Let us summarize.
Show Slide:

Assignment

As an assignment, please do the following
  • Use K as 7 and test size as 0.2.
  • Evaluate the model performance using the classification report.
Show Slide:

Assignment Solution

K=7, train_test_split = 0.2

Here is the classification report for K equals 7 with a 0.2 train-test split.

We will get an accuracy of 96 percent.

Show Slide:

FOSSEE Forum

For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question.
Show Slide:

Thank You

This is Anvita Thadavoose Manjummel, a FOSSEE Summer Fellow 2025, IIT Bombay

Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat