Difference between revisions of "Python-for-Machine-Learning/C2/K-Nearest-Neighbor-Regression/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- | style="border-top:none;border-bottom:0.75pt solid #0000...")
 
(No difference)

Latest revision as of 18:42, 9 June 2025


Visual Cue Narration
Show Slide

Welcome

Welcome to the Spoken Tutorial on K Nearest Neighbor Regression.
Show Slide:

Learning Objectives

In this tutorial, we will learn about
  • Distance metrics for nearest neighbor identification.
  • Applying KNN regression to predict the petal length of iris flowers.
  • Evaluation using MSE and Adjusted R squared score.
Show Slide:

System Requirements

  • Ubuntu Linux OS 24.04
  • Jupyter Notebook IDE
To record this tutorial, I am using
  • Ubuntu Linux OS version 24.04
  • Jupyter Notebook IDE
Show Slide:

Pre-requisites

To follow this tutorial,
  • The learner must have basic knowledge of Python.
  • For pre-requisite Python tutorials, please visit this website.
Show Slide:

Code files

  • The files used in this tutorial are provided in the Code files link.
  • Please download and extract the files.
  • Make a copy and then use them while practicing.
Show Slide

KNN Regression

  • KNN regression is an algorithm that predicts values using nearby data points.
  • The prediction in regression is based on the average value of the target variable.
  • K indicates the number of neighboring points to be considered for prediction.
Show Slide:

Distance Metrics

The various distance metrics used in KNN for finding Nearest Neighbors are
  • Euclidean distance measures the shortest path between two points in space.
  • Scikit-learn library uses Euclidean as the default distance metric.
Show Slide:

Distance Metrics

  • Manhattan distance is the sum of absolute differences between coordinates.
  • Minkowski and Chebyshev distances are other common distance metrics used in KNN.
Hover over the file I have created the required file for the demonstration of KNN regression.
Point to the KNNregression.ipynb KNNregression dot ipynb is the ipython notebook file for this demonstration.
Press Ctrl,Alt+T keys

Type conda activate ml

Press Enter

Let us open the Linux terminal by pressing Ctrl,Alt and T keys together.

Activate the machine learning environment as shown.

Go to the Downloads folder

Type cd Downloads

Press Enter

Type jupyter notebook

Press Enter

I have saved my code file in the Downloads folder.

Please navigate to the respective folder of your code file location.

Then type, jupyter space notebook and press Enter.

Show Jupyter Notebook Home page

Click on

KNNregression.ipynb file

We see the Jupyter Notebook Home page.

Let us open the KNNregression dot ipynb file by clicking it.

Note that each cell will have the output displayed in this file.
Highlight the lines:import pandas as pd

import matplotlib.pyplot as

import seaborn as sns

These are the necessary libraries to be imported for KNN regression.

Please remember to Execute the cell by pressing Shift and Enter to get output.

Highlight the line:

iris = load_iris()

Press Shift and Enter

First, we load the dataset into a variable named iris.

We are using the Iris dataset, loading it from the sklearn library.

Highlight the lines:

iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)

iris_df['target'] = iris.target

Press Shift and Enter

We create a DataFrame with feature names as columns.

Then we create a new column target and assign class labels to it.

Highlight the lines:

iris_df

Press Shift and Enter

Now we display the Dataframe showing all the feature values and target labels.
Highlight the lines:

print("Length of Dataset:",

Press Shift and Enter

Then we print the dataset length, shape, and the names of all features.
Highlight the lines:

target_feature = 'petal length (cm)'

Press Shift and Enter

After that we store petal length as the target feature name.
Highlight the lines:

X = iris_df.drop(columns=[target_feature, 'target'],axis=1)

Now we separate the features X and the target variable y.
Highlight the line:

X

Press Shift and Enter

We see that feature set X contains all features except the petal length.
Highlight the line:

y

Press Shift and Enter

The target set y contains the target feature petal length.
Highlight the lines:

plt.figure(figsize=(10, 6))

sns.boxplot(data=iris_df.drop(columns=['target']))

Now we create a boxplot to visualize feature distributions before scaling.

The boxplot shows how the features vary, their range, and any unusual values.

Highlight the lines:

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

Press Shift and Enter

Next, we apply Standard Scaling to normalize features using StandardScaler.

X underscore scaled is the transformed data.

It contains the data with mean 0 and standard deviation 1.

Highlight the lines:

plt.figure(figsize=(10, 6))

Then, we plot a boxplot to show feature distributions after scaling.

This helps us visualize how standardization affects the data.

We can observe in the output that all features are scaled to a mean of 0.

The standard deviation of all the features is now 1.

Highlight the lines: Next, we split the data into training and testing sets.
Highlight the lines:

mse_scores = []

K_range = range(1, 15)

Press Shift and Enter

We use the Elbow method to help identify the optimal number of neighbors.
Highlight the lines:

plt.figure(figsize=(10, 6))

Now, we visualize the Elbow method for KNN regression using a plot.
Highlight the output: We can observe that the error decreases initially and then increases after a point.

In the plot, the lowest MSE value appears to be at K equals 5 and 2.

So, we infer that the model performs best at K equals 5 or 2.

After K equals 5, the MSE increases, suggesting no further improvement.

Highlight the lines:

knn = KNeighborsRegressor(n_neighbors=5)

Press Shift and Enter

Then we initialize the KNN Regressor using the KNeighborsRegressor function.

Our ideal K value is 5, so we initialize KNN regressor with 5 neighbors.

Highlight the lines:

knn.fit(X_train, y_train)

Press Shift and Enter

Further, we train the KNN regressor using the fit method on train data.
Highlight the lines:

y_train_pred = knn.predict(X_train)

Press Shift and Enter

Now, we predict labels for the training set.
Highlight the lines:

training_mse = mean_squared_error(y_train, y_train_pred)

Press Shift and Enter

We then calculate the Mean Squared Error for the training set.
Highlight the lines:

def adjusted_r2_score(y_true, y_pred, n, p):

Then, we define the function for Adjusted R squared score for regression.

It adjusts for the number of predictors.

Highlight the lines:

n_train = X_train.shape[0]

The training underscore adj underscore r2 computes the adjusted R square score.
Highlight We print the MSE and Adjusted R squared score of the training set.
Highlight the output:

Training Mean Squared Error: 0.079

Training Adjusted R^2 Score: 0.973

Training MSE of 0.079 indicates the model has low error.

It implies good performance.

The training Adjusted R squared score is 0.973.

It means the model does a great job of predicting the data accurately.

Highlight the lines:

y_pred = knn.predict(X_test)

Press Shift and Enter

Let us predict labels for the test set.
Highlight the lines:

print(comparison_df)

Press Shift and Enter

We compare actual and predicted values to assess model accuracy.

This helps evaluate how well the model generalizes to new data.

Highlight the lines:

plt.figure(figsize=(6, 4))

sns.scatterplot(x=y_test, y=y_pred)

plt.show()

Now, we plot a scatter plot which compares actual vs. predicted values.
Show the output: We can observe in the output that most points align with the red dashed line.

The red dashed line represents a perfect prediction match.

Highlight the lines:

test_mse = mean_squared_error(y_test, y_pred)

Press Shift and Enter

Now we calculate Mean Squared Error of the regression model for the test set.
Highlight the lines:

n_test = X_test.shape[0]

p_test = X_test.shape[1]

Then we calculate the Adjusted R squared score for the test set.
Highlight the lines:

print("Test Mean Squared Error:", format(test_mse, ".3f"))

We print the MSE and Adjusted R square score of the test set.
Highlight the output lines:

Test Mean Squared Error: 0.105

Test Adjusted R² Score: 0.964

The test MSE score indicates the model has low error on test data.

It implies good generalization on test data.

We get the adjusted R squared score of 0.964 for test data.

The model fits the test data well, explaining 96.4 percent of its variance.

Show Slide:

Summary

This brings us to the end of the tutorial. Let us summarize.

In this tutorial, we have learnt about

  • Distance metrics for nearest neighbor identification.
  • Applying KNN regression to predict the petal length of iris flowers.
  • Evaluation using MSE and Adjusted R squared score.
Show Slide:

Assignment

As an assignment, please do the following:

As an assignment, please do the following:
  • Let’s use Manhattan distance.
  • Modify knn = KNeighborsRegressor(n_neighbors=5, metric='manhattan')

Now observe the change in MSE and Adjusted R squared score

Show Slide:

Assignment Solution

Show Man.JPG

After completing the assignment, the output should match the expected result.
Show Slide:

FOSSEE Forum

For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question.
Show Slide:

Thank you

This is Harini Theiveegan,a FOSSEE Summer Fellow 2025, IIT Bombay signing off
Thanks for joining.

Contributors and Content Editors

Nirmala Venkat