Difference between revisions of "Python-for-Machine-Learning/C2/K-Nearest-Neighbor-Classification/English"
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- || Show Slide: '''Welcome and Title Slide''' || Welcome t...") |
|||
| Line 1: | Line 1: | ||
| − | + | ||
{| border="1" | {| border="1" | ||
|- | |- | ||
| Line 7: | Line 7: | ||
|| '''Narration''' | || '''Narration''' | ||
|- | |- | ||
| − | || Show Slide: '''Welcome and Title Slide''' | + | || '''Show Slide''': |
| + | |||
| + | '''Welcome and Title Slide''' | ||
|| Welcome to the Spoken Tutorial on “'''K Nearest Neighbors Classification.'''” | || Welcome to the Spoken Tutorial on “'''K Nearest Neighbors Classification.'''” | ||
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''Learning Objectives''' | '''Learning Objectives''' | ||
|| In this tutorial, we will learn about | || In this tutorial, we will learn about | ||
| − | * | + | * The fundamentals of KNN Algorithm |
| − | * | + | * Implementing KNN for Classification using Iris dataset |
| − | * | + | * Evaluating the performance of the trained model |
| + | |||
| + | |- | ||
| + | || '''Show Slide''': | ||
| − | + | '''System Requirements''' | |
| − | + | ||
|| To record this tutorial, I am using | || To record this tutorial, I am using | ||
| − | * | + | * '''Ubuntu Linux OS version 24.04''' |
| − | * | + | * '''Jupyter Notebook IDE''' |
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''Pre- requisites''' | '''Pre- requisites''' | ||
|| To follow this tutorial, | || To follow this tutorial, | ||
| − | * | + | * The learner must have basic knowledge of '''Python'''. |
| − | * | + | * For pre-requisite '''Python''' tutorials, please visit this website. |
| − | |- | + | |- |
| − | || Show Slide: | + | ||'''Show Slide''': |
'''Code Files''' | '''Code Files''' | ||
|| | || | ||
| − | * | + | * The files used in this tutorial are provided in the '''Code files link'''. |
| − | * | + | * Please download and extract the files. |
| − | * | + | * Make a copy and then use them while practicing. |
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''KNN''' | '''KNN''' | ||
|| | || | ||
| − | * | + | * '''KNN''' stands for '''K Nearest Neighbors'''. |
| − | * | + | * Nearest Neighbor algorithm predicts using closest similar training data points. |
| − | * | + | * '''K '''indicates the number of neighboring points to be considered for prediction. |
| − | |- | + | |- |
| − | || Show Slide | + | || '''Show Slide''': |
'''KNN Classification''' | '''KNN Classification''' | ||
|| | || | ||
| − | * | + | * Features of '''K''' nearest samples are compared to determine '''similarity'''. |
| − | * | + | * The new data point gets the most frequent class among its neighbors. |
| − | * | + | * '''KNN''' is versatile and can effectively handle '''multi-class''' classification problems. |
| − | |- | + | |- |
| − | || | + | || '''Show Slide''': |
'''Iris Dataset''' | '''Iris Dataset''' | ||
| Line 73: | Line 77: | ||
Classes are classified by '''sepal length''', '''sepal width''',''' petal length''', '''petal width'''. | Classes are classified by '''sepal length''', '''sepal width''',''' petal length''', '''petal width'''. | ||
| − | |- | + | |- |
|| Show '''image''' | || Show '''image''' | ||
| Line 89: | Line 93: | ||
These neighbors determine the '''black dot’s class'''. | These neighbors determine the '''black dot’s class'''. | ||
| − | |- | + | |- |
|| Hover over the files | || Hover over the files | ||
| Line 98: | Line 102: | ||
|- | |- | ||
| − | | | + | | | Press '''Ctrl+Alt+T''' keys |
Type '''conda activate ml''' | Type '''conda activate ml''' | ||
Press '''Enter''' | Press '''Enter''' | ||
| − | | | + | || Let us open the Linux terminal by pressing '''Ctrl, Alt''' and '''T '''keys together. |
Activate the machine learning environment by typing | Activate the machine learning environment by typing | ||
| Line 111: | Line 115: | ||
Press '''Enter.''' | Press '''Enter.''' | ||
|- | |- | ||
| − | | | + | || Type '''cd Downloads''' |
Press '''Enter''' | Press '''Enter''' | ||
Type '''jupyter notebook''' | Type '''jupyter notebook''' | ||
Press '''Enter''' | Press '''Enter''' | ||
| − | | | + | || I have saved my code file in the '''Downloads '''folder. |
Please navigate to the respective folder of your code file location. | Please navigate to the respective folder of your code file location. | ||
Type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook. | Type, '''jupyter space notebook '''and press Enter to open Jupyter Notebook. | ||
| − | |- | + | |- |
|| '''Jupyter Notebook Home Page''' will be opened. | || '''Jupyter Notebook Home Page''' will be opened. | ||
Click on '''KNNClassification.ipynb''' | Click on '''KNNClassification.ipynb''' | ||
| − | + | ||
|| We can see the homepage of the '''Jupyter notebook''' has opened in the web browser. | || We can see the homepage of the '''Jupyter notebook''' has opened in the web browser. | ||
| Line 133: | Line 137: | ||
Open the file by clicking on it. | Open the file by clicking on it. | ||
| − | + | Note that each cell will have the output displayed in this file. | |
Let us see the implementation of the '''KNN classification '''model. | Let us see the implementation of the '''KNN classification '''model. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 147: | Line 151: | ||
Please press '''Shift plus Enter''' to execute the code in each cell. | Please press '''Shift plus Enter''' to execute the code in each cell. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 154: | Line 158: | ||
The '''iris''' dataset is a built-in dataset available in the '''Scikit- learn library'''. | The '''iris''' dataset is a built-in dataset available in the '''Scikit- learn library'''. | ||
| − | |- | + | |- |
|| Only narration | || Only narration | ||
| Line 160: | Line 164: | ||
|| Let us explore the dataset. | || Let us explore the dataset. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
|| We list out the feature names of the Iris dataset as shown here. | || We list out the feature names of the Iris dataset as shown here. | ||
| − | |- | + | |- |
|| Highlight '''iris.target_names''' | || Highlight '''iris.target_names''' | ||
|| Next let us list out the target names. | || Next let us list out the target names. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| − | |||
|| The target '''classes''' represent the different species of the '''iris''' flower. | || The target '''classes''' represent the different species of the '''iris''' flower. | ||
'''Setosa''' is shown as '''0''', '''versicolor''' as '''1''' and '''virginica''' as '''2'''. | '''Setosa''' is shown as '''0''', '''versicolor''' as '''1''' and '''virginica''' as '''2'''. | ||
| − | |- | + | |- |
|| Highlight '''iris_df = pd.DataFrame(iris.data,columns=iris.feature_names)''' | || Highlight '''iris_df = pd.DataFrame(iris.data,columns=iris.feature_names)''' | ||
|| First we create a '''Dataframe''' named '''iris underscore df'''. | || First we create a '''Dataframe''' named '''iris underscore df'''. | ||
| Line 180: | Line 183: | ||
It holds the '''features''' and '''target class'''. | It holds the '''features''' and '''target class'''. | ||
| − | |- | + | |- |
|| Highlight '''iris_df.head()''' | || Highlight '''iris_df.head()''' | ||
| Line 187: | Line 190: | ||
The default is '''5 rows''', but the value can be changed by specifying the argument. | The default is '''5 rows''', but the value can be changed by specifying the argument. | ||
| − | |- | + | |- |
|| Highlight '''iris_df['target'] = iris.target''' | || Highlight '''iris_df['target'] = iris.target''' | ||
'''iris_df.head()''' | '''iris_df.head()''' | ||
|| We add a new target column with class labels to the '''dataframe'''. | || We add a new target column with class labels to the '''dataframe'''. | ||
| − | |- | + | |- |
|| Highlight '''iris_df.shape()''' | || Highlight '''iris_df.shape()''' | ||
|| The '''shape '''method gives the '''shape''' of the''' dataframe '''in''' rows '''and''' columns'''. | || The '''shape '''method gives the '''shape''' of the''' dataframe '''in''' rows '''and''' columns'''. | ||
| − | |- | + | |- |
|| Only narration | || Only narration | ||
| Line 211: | Line 214: | ||
'''plt dot show''' is used to display the generated plots. | '''plt dot show''' is used to display the generated plots. | ||
| − | |- | + | |- |
| − | || | + | || Show output '''plots''' |
| + | |||
| − | |||
|| We see the '''pairplots''' visualizing feature relationships. | || We see the '''pairplots''' visualizing feature relationships. | ||
| − | '''Scatter plots''' compare two features and help to identify patterns | + | '''Scatter plots''' compare two features and help to identify patterns. |
'''Diagonal KDE plots''' show the distribution of each feature for different classes. | '''Diagonal KDE plots''' show the distribution of each feature for different classes. | ||
| Line 224: | Line 227: | ||
Clusters indicate which species are overlapping. | Clusters indicate which species are overlapping. | ||
| − | |- | + | |- |
|| Only Narration | || Only Narration | ||
| Line 233: | Line 236: | ||
Then copy the remaining features into the variable '''X.''' | Then copy the remaining features into the variable '''X.''' | ||
| − | |- | + | |- |
|| Highlight '''y = iris_df['target']''' | || Highlight '''y = iris_df['target']''' | ||
|| Next, we assign the '''target''' column to '''y'''. | || Next, we assign the '''target''' column to '''y'''. | ||
| − | |- | + | |- |
|| Highlight '''X''' | || Highlight '''X''' | ||
|| We see that '''X''' contains all features except '''target''' species. | || We see that '''X''' contains all features except '''target''' species. | ||
| − | |- | + | |- |
|| Highlight '''y''' | || Highlight '''y''' | ||
| Line 247: | Line 250: | ||
It is the species of the iris flower. | It is the species of the iris flower. | ||
| − | |- | + | |- |
|| Only narration. | || Only narration. | ||
| Line 264: | Line 267: | ||
It guarantees we get the same result across multiple executions. | It guarantees we get the same result across multiple executions. | ||
| − | |- | + | |- |
|| Highlight''' X_train, X_test, y_train, y_test''' | || Highlight''' X_train, X_test, y_train, y_test''' | ||
|| We assign the split data into four variables. | || We assign the split data into four variables. | ||
| Line 281: | Line 284: | ||
It is used for evaluating the model performance. | It is used for evaluating the model performance. | ||
| − | |- | + | |- |
|| Highlight '''knn = KNeighborsClassifier(n_neighbors=7)''' | || Highlight '''knn = KNeighborsClassifier(n_neighbors=7)''' | ||
|| Now, we train the KNN classifier using '''KNeighborsClassifier '''with''' 7 neighbors'''. | || Now, we train the KNN classifier using '''KNeighborsClassifier '''with''' 7 neighbors'''. | ||
| − | |- | + | |- |
|| Highlight '''knn.fit(X_train, y_train)''' | || Highlight '''knn.fit(X_train, y_train)''' | ||
|| We train the KNN classifier using the''' fit method''' on the training data. | || We train the KNN classifier using the''' fit method''' on the training data. | ||
'''Fit method''' adjusts the model parameters using the training data. | '''Fit method''' adjusts the model parameters using the training data. | ||
| − | |- | + | |- |
|| Highlight '''y_train_pred = knn.predict(X_train)''' | || Highlight '''y_train_pred = knn.predict(X_train)''' | ||
|| We predict the labels for the training data. | || We predict the labels for the training data. | ||
| − | |- | + | |- |
|| Highlight '''training_accuracy = accuracy_score(y_train, y_train_pred)''' | || Highlight '''training_accuracy = accuracy_score(y_train, y_train_pred)''' | ||
| Line 301: | Line 304: | ||
It helps to measure how well the model is performing. | It helps to measure how well the model is performing. | ||
| − | |- | + | |- |
| − | || Highlight | + | || Highlight |
| + | |||
| + | '''Training Accuracy: 0.956''' | ||
|| The accuracy is '''0.956''' which is quite good. | || The accuracy is '''0.956''' which is quite good. | ||
| − | |- | + | |- |
| − | || Highlight '''print("\nClassification Report:") ''' | + | || Highlight |
| + | |||
| + | '''print("\nClassification Report:") ''' | ||
'''print(classification_report(y_train, y_train_pred))''' | '''print(classification_report(y_train, y_train_pred))''' | ||
| − | |||
| − | |||
| − | |||
| − | + | ||
| + | || Next, we print the '''classification report'''. | ||
| + | |||
| + | The classification report helps to evaluate how well the model is performing. | ||
'''Precision''' tells how many positive predictions made by the model were correct. | '''Precision''' tells how many positive predictions made by the model were correct. | ||
| Line 322: | Line 329: | ||
'''Support''' is the count of true instances of each class in the dataset. | '''Support''' is the count of true instances of each class in the dataset. | ||
| − | |- | + | |- |
|| Show '''output table''' | || Show '''output table''' | ||
| Line 335: | Line 342: | ||
'''macro and weighted average''' reflect consistent performances across the dataset. | '''macro and weighted average''' reflect consistent performances across the dataset. | ||
| − | |- | + | |- |
|| Only narration | || Only narration | ||
| − | || Next, we evaluate the model on the <span style="background-color:transparent;">testing | + | || Next, we evaluate the model on the <span style="background-color:transparent;">testing data. |
First, we''' predict''' the class label for a''' single test sample.''' | First, we''' predict''' the class label for a''' single test sample.''' | ||
| − | |- | + | |- |
|| Highlight '''sample_test_data = ''' | || Highlight '''sample_test_data = ''' | ||
| Line 351: | Line 358: | ||
Then, we print the '''predicted''' and '''actual class labels''' of the data sample. | Then, we print the '''predicted''' and '''actual class labels''' of the data sample. | ||
| − | |- | + | |- |
|| Highlight '''output''' | || Highlight '''output''' | ||
|| The predicted class for unseen data is '''2''', which is '''virginica'''. | || The predicted class for unseen data is '''2''', which is '''virginica'''. | ||
The actual class is also '''2''', indicating the prediction is correct. | The actual class is also '''2''', indicating the prediction is correct. | ||
| − | |- | + | |- |
|| Highlight''' accuracy = accuracy_score(y_test, y_pred) ''' | || Highlight''' accuracy = accuracy_score(y_test, y_pred) ''' | ||
'''print(f"Testing Accuracy: {accuracy:.3f}") ''' | '''print(f"Testing Accuracy: {accuracy:.3f}") ''' | ||
|| Now, we calculate and print the '''accuracy'''. | || Now, we calculate and print the '''accuracy'''. | ||
| − | |- | + | |- |
|| Highlight '''Testing Accuracy: 0.983''' | || Highlight '''Testing Accuracy: 0.983''' | ||
|| The '''accuracy''' is approximately '''0.983'''. | || The '''accuracy''' is approximately '''0.983'''. | ||
We can conclude that the model is performing well on unseen data. | We can conclude that the model is performing well on unseen data. | ||
| − | |- | + | |- |
| − | || Only Narration | + | || Only Narration |
Highlight '''y_test_bin = label_binarize(y_test, classes=[0, 1, 2]) ''' | Highlight '''y_test_bin = label_binarize(y_test, classes=[0, 1, 2]) ''' | ||
| − | || | + | || Finally we plot the '''precision-recall curve''' to visualize the performance for each class. |
It calculates '''precision''' and '''recall. ''' | It calculates '''precision''' and '''recall. ''' | ||
It computes '''average precision '''and''' '''summarizes the '''precision recall curve''' into a single score. | It computes '''average precision '''and''' '''summarizes the '''precision recall curve''' into a single score. | ||
| − | |- | + | |- |
|| Highlight '''plt.figure(figsize=(10, 6)) ''' | || Highlight '''plt.figure(figsize=(10, 6)) ''' | ||
|| '''plt dot plot''' function plots the '''precision recall curve'''. | || '''plt dot plot''' function plots the '''precision recall curve'''. | ||
| − | |- | + | |- |
|| Highlight '''plt.xlabel("Recall")''' | || Highlight '''plt.xlabel("Recall")''' | ||
| Line 388: | Line 395: | ||
'''plt.show()''' | '''plt.show()''' | ||
|| '''plt dot show''' displays the final precision recall curve. | || '''plt dot show''' displays the final precision recall curve. | ||
| − | |- | + | |- |
|| Show output '''plot''' | || Show output '''plot''' | ||
| Line 395: | Line 402: | ||
'''KNN classifier''' achieved '''perfect precision-recall''' for all classes. | '''KNN classifier''' achieved '''perfect precision-recall''' for all classes. | ||
| − | + | '''High AP''' indicates the model performs well in distinguishing all three classes. | |
| − | |- | + | |- |
|| Highlight '''print("\nClassification Report:")''' | || Highlight '''print("\nClassification Report:")''' | ||
'''print(classification_report(y_test, y_pred))''' | '''print(classification_report(y_test, y_pred))''' | ||
| − | || | + | || Finally, we evaluate the performance using the '''classification report''' '''.''' |
The report offers a detailed assessment of the model’s performance. | The report offers a detailed assessment of the model’s performance. | ||
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''Summary''' | '''Summary''' | ||
|| This brings us to the end of the tutorial. Let us summarize. | || This brings us to the end of the tutorial. Let us summarize. | ||
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''Assignment''' | '''Assignment''' | ||
|| As an assignment, please do the following | || As an assignment, please do the following | ||
| − | * | + | * Use '''K '''as''' 7''' and''' test size '''as '''0.2'''. |
| − | * | + | * Evaluate the model performance using the '''classification report.''' |
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''Assignment Solution''' | '''Assignment Solution''' | ||
| Line 425: | Line 432: | ||
We will get an '''accuracy''' of '''96 '''percent. | We will get an '''accuracy''' of '''96 '''percent. | ||
| − | |- | + | |- |
| − | || Show Slide: | + | || '''Show Slide''': |
'''FOSSEE Forum''' | '''FOSSEE Forum''' | ||
|| For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question. | || For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question. | ||
| − | |- | + | |- |
| − | || Show Slide: '''Thank You''' | + | || '''Show Slide''': |
| + | |||
| + | '''Thank You''' | ||
|| This is '''Anvita Thadavoose Manjummel''', a FOSSEE Summer Fellow 2025, IIT Bombay | || This is '''Anvita Thadavoose Manjummel''', a FOSSEE Summer Fellow 2025, IIT Bombay | ||
| Line 437: | Line 446: | ||
|- | |- | ||
|} | |} | ||
| − | |||
| − | |||
Revision as of 14:21, 28 June 2025
| Visual Cue | Narration |
| Show Slide:
Welcome and Title Slide |
Welcome to the Spoken Tutorial on “K Nearest Neighbors Classification.” |
| Show Slide:
Learning Objectives |
In this tutorial, we will learn about
|
| Show Slide:
System Requirements |
To record this tutorial, I am using
|
| Show Slide:
Pre- requisites |
To follow this tutorial,
|
| Show Slide:
Code Files |
|
| Show Slide:
KNN |
|
| Show Slide:
KNN Classification |
|
| Show Slide:
Iris Dataset irisflowers.png Hover over setosa, versicolor and virginica images Hover over sepal length, sepal width, petal length and petal width |
In this tutorial, we are using the Iris plants dataset.
It has 3 distinct flower classes. Classes are classified by sepal length, sepal width, petal length, petal width. |
| Show image
Iris Dataset iris.png |
The three flower classes appear as clusters in different colors on the graph.
We will use the four features of the flower to classify the three distinct classes. A black dot represents a flower in the dataset that lacks a defined class. Our goal is to predict the black dot’s class based on its nearest neighbors. The closest points to the black dot are called its neighbors. These neighbors determine the black dot’s class. |
| Hover over the files
Point to KNN classification.ipynb |
I have created required files for the demonstration of KNN classification.
KNN classification dot ipynb is the python notebook file for this demonstration. |
| Press Ctrl+Alt+T keys
Type conda activate ml Press Enter |
Let us open the Linux terminal by pressing Ctrl, Alt and T keys together.
Activate the machine learning environment by typing conda space activate space ml Press Enter. |
| Type cd Downloads
Press Enter Type jupyter notebook Press Enter |
I have saved my code file in the Downloads folder.
Please navigate to the respective folder of your code file location. Type, jupyter space notebook and press Enter to open Jupyter Notebook. |
| Jupyter Notebook Home Page will be opened.
Click on KNNClassification.ipynb
|
We can see the homepage of the Jupyter notebook has opened in the web browser.
Locate the KNN classification dot ipynb file. Open the file by clicking on it. Note that each cell will have the output displayed in this file. Let us see the implementation of the KNN classification model. |
| Highlight
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns |
We import these libraries for KNN Classification.
Please press Shift plus Enter to execute the code in each cell. |
| Highlight
iris = load_iris() |
First, we load the dataset into a variable named iris.
The iris dataset is a built-in dataset available in the Scikit- learn library. |
| Only narration
Highlight iris.feature_names |
Let us explore the dataset. |
| Highlight | We list out the feature names of the Iris dataset as shown here. |
| Highlight iris.target_names | Next let us list out the target names. |
| Highlight | The target classes represent the different species of the iris flower.
Setosa is shown as 0, versicolor as 1 and virginica as 2. |
| Highlight iris_df = pd.DataFrame(iris.data,columns=iris.feature_names) | First we create a Dataframe named iris underscore df.
We load the Iris dataset to the dataframe. It holds the features and target class. |
| Highlight iris_df.head()
Highlight output |
Using the head method the first few rows are displayed.
The default is 5 rows, but the value can be changed by specifying the argument. |
| Highlight iris_df['target'] = iris.target
iris_df.head() |
We add a new target column with class labels to the dataframe. |
| Highlight iris_df.shape() | The shape method gives the shape of the dataframe in rows and columns. |
| Only narration
Highlight sns.pairplot( iris_df, hue='target', diag_kind='kde', palette='colorblind' ) plt.show() |
Next, we plot a pairplot to visualize the iris dataset.
It visualizes relationships between different features. It creates scatterplots for each pair of features, colored by class labels. plt dot show is used to display the generated plots. |
| Show output plots
|
We see the pairplots visualizing feature relationships.
Scatter plots compare two features and help to identify patterns. Diagonal KDE plots show the distribution of each feature for different classes. Different colors represent different target classes. Clusters indicate which species are overlapping. |
| Only Narration
Highlight X = iris_df.drop('target', axis=1) |
Now, we split the dataset into X and y to prepare the data for training.
First we remove the target column named target. Then copy the remaining features into the variable X. |
| Highlight y = iris_df['target'] | Next, we assign the target column to y. |
| Highlight X | We see that X contains all features except target species. |
| Highlight y
Highlight the output |
We see that y contains the target classes.
It is the species of the iris flower. |
| Only narration.
Highlight train_test_split(X, y, test_size=0.4, random_state=42) |
Next, we split the data into training and testing sets.
We use the train underscore test underscore split method. The split ratio is adjustable through the test underscore size parameter. We set the test underscore size as 0.4. Here, we use 40 percent of the data for testing and 60 percent for training. Setting random state equal to 42 ensures the split is reproducible. It guarantees we get the same result across multiple executions. |
| Highlight X_train, X_test, y_train, y_test | We assign the split data into four variables.
X underscore train contains the features of the training data. It is used for model training. y underscore train contains the target values for the training data. X underscore test contains features of the test data. It is used for making predictions. y underscore test contains the actual class labels for the test data. It is used for evaluating the model performance. |
| Highlight knn = KNeighborsClassifier(n_neighbors=7) | Now, we train the KNN classifier using KNeighborsClassifier with 7 neighbors. |
| Highlight knn.fit(X_train, y_train) | We train the KNN classifier using the fit method on the training data.
Fit method adjusts the model parameters using the training data. |
| Highlight y_train_pred = knn.predict(X_train) | We predict the labels for the training data. |
| Highlight training_accuracy = accuracy_score(y_train, y_train_pred)
print("Training Accuracy: {training_accuracy:.3f}") |
We calculate and print the accuracy of the model.
Accuracy is the ratio of correct prediction to the total number of instances. It helps to measure how well the model is performing. |
| Highlight
Training Accuracy: 0.956 |
The accuracy is 0.956 which is quite good. |
| Highlight
print("\nClassification Report:") print(classification_report(y_train, y_train_pred))
|
Next, we print the classification report.
The classification report helps to evaluate how well the model is performing. Precision tells how many positive predictions made by the model were correct. F1 Score is the balance between precision and recall. Recall shows how well the model detects actual positive cases correctly. Support is the count of true instances of each class in the dataset. |
| Show output table
Box for the data |
From the report, we conclude that precision and recall is high for all classes.
F1-Score shows good overall performance across all classes. The accuracy is 96 percent. This means the model made correct predictions for 96 percent of the instances. macro and weighted average reflect consistent performances across the dataset. |
| Only narration | Next, we evaluate the model on the testing data.
First, we predict the class label for a single test sample. |
| Highlight sample_test_data = | We extract the 10th row of the test dataset and reshape it for prediction.
We use the predict method to predict the class label using the trained model. We store the actual class label in the actual underscore class variable. Then, we print the predicted and actual class labels of the data sample. |
| Highlight output | The predicted class for unseen data is 2, which is virginica.
The actual class is also 2, indicating the prediction is correct. |
| Highlight accuracy = accuracy_score(y_test, y_pred)
print(f"Testing Accuracy: {accuracy:.3f}") |
Now, we calculate and print the accuracy. |
| Highlight Testing Accuracy: 0.983 | The accuracy is approximately 0.983.
We can conclude that the model is performing well on unseen data. |
| Only Narration
Highlight y_test_bin = label_binarize(y_test, classes=[0, 1, 2]) |
Finally we plot the precision-recall curve to visualize the performance for each class.
It calculates precision and recall. It computes average precision and summarizes the precision recall curve into a single score. |
| Highlight plt.figure(figsize=(10, 6)) | plt dot plot function plots the precision recall curve. |
| Highlight plt.xlabel("Recall")
plt.ylabel("Precision") plt.title("Precision-Recall plt.show() |
plt dot show displays the final precision recall curve. |
| Show output plot | The Precision-Recall curve shows the trade-off between precision and recall.
KNN classifier achieved perfect precision-recall for all classes. High AP indicates the model performs well in distinguishing all three classes. |
| Highlight print("\nClassification Report:")
print(classification_report(y_test, y_pred)) |
Finally, we evaluate the performance using the classification report .
The report offers a detailed assessment of the model’s performance. |
| Show Slide:
Summary |
This brings us to the end of the tutorial. Let us summarize. |
| Show Slide:
Assignment |
As an assignment, please do the following
|
| Show Slide:
Assignment Solution K=7, train_test_split = 0.2 |
Here is the classification report for K equals 7 with a 0.2 train-test split.
We will get an accuracy of 96 percent. |
| Show Slide:
FOSSEE Forum |
For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question. |
| Show Slide:
Thank You |
This is Anvita Thadavoose Manjummel, a FOSSEE Summer Fellow 2025, IIT Bombay
Thanks for joining. |