Difference between revisions of "Python-for-Machine-Learning/C2/K-Nearest-Neighbor-Regression/English"
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- | style="border-top:none;border-bottom:0.75pt solid #0000...") |
(No difference)
|
Latest revision as of 18:42, 9 June 2025
| Visual Cue | Narration |
| Show Slide
Welcome |
Welcome to the Spoken Tutorial on K Nearest Neighbor Regression. |
| Show Slide:
Learning Objectives |
In this tutorial, we will learn about
|
| Show Slide:
System Requirements
|
To record this tutorial, I am using
|
| Show Slide:
Pre-requisites |
To follow this tutorial,
|
| Show Slide:
Code files |
|
| Show Slide
KNN Regression |
|
| Show Slide:
Distance Metrics |
The various distance metrics used in KNN for finding Nearest Neighbors are
|
| Show Slide:
Distance Metrics |
|
| Hover over the file | I have created the required file for the demonstration of KNN regression. |
| Point to the KNNregression.ipynb | KNNregression dot ipynb is the ipython notebook file for this demonstration. |
| Press Ctrl,Alt+T keys
Type conda activate ml Press Enter |
Let us open the Linux terminal by pressing Ctrl,Alt and T keys together.
Activate the machine learning environment as shown. |
| Go to the Downloads folder
Type cd Downloads Press Enter Type jupyter notebook Press Enter |
I have saved my code file in the Downloads folder.
Please navigate to the respective folder of your code file location. Then type, jupyter space notebook and press Enter. |
| Show Jupyter Notebook Home page
Click on KNNregression.ipynb file |
We see the Jupyter Notebook Home page.
Let us open the KNNregression dot ipynb file by clicking it. Note that each cell will have the output displayed in this file.
|
| Highlight the lines:import pandas as pd
import matplotlib.pyplot as import seaborn as sns |
These are the necessary libraries to be imported for KNN regression.
Please remember to Execute the cell by pressing Shift and Enter to get output. |
| Highlight the line:
iris = load_iris() Press Shift and Enter |
First, we load the dataset into a variable named iris.
We are using the Iris dataset, loading it from the sklearn library. |
| Highlight the lines:
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names) iris_df['target'] = iris.target Press Shift and Enter |
We create a DataFrame with feature names as columns.
Then we create a new column target and assign class labels to it. |
| Highlight the lines:
iris_df Press Shift and Enter |
Now we display the Dataframe showing all the feature values and target labels. |
| Highlight the lines:
print("Length of Dataset:", Press Shift and Enter |
Then we print the dataset length, shape, and the names of all features. |
| Highlight the lines:
target_feature = 'petal length (cm)' Press Shift and Enter |
After that we store petal length as the target feature name. |
| Highlight the lines:
X = iris_df.drop(columns=[target_feature, 'target'],axis=1) |
Now we separate the features X and the target variable y. |
| Highlight the line:
X Press Shift and Enter |
We see that feature set X contains all features except the petal length. |
| Highlight the line:
y Press Shift and Enter |
The target set y contains the target feature petal length. |
| Highlight the lines:
plt.figure(figsize=(10, 6)) sns.boxplot(data=iris_df.drop(columns=['target'])) |
Now we create a boxplot to visualize feature distributions before scaling.
The boxplot shows how the features vary, their range, and any unusual values. |
| Highlight the lines:
scaler = StandardScaler() X_scaled = scaler.fit_transform(X) Press Shift and Enter |
Next, we apply Standard Scaling to normalize features using StandardScaler.
X underscore scaled is the transformed data. It contains the data with mean 0 and standard deviation 1. |
| Highlight the lines:
plt.figure(figsize=(10, 6)) |
Then, we plot a boxplot to show feature distributions after scaling.
This helps us visualize how standardization affects the data. We can observe in the output that all features are scaled to a mean of 0. The standard deviation of all the features is now 1. |
| Highlight the lines: | Next, we split the data into training and testing sets. |
| Highlight the lines:
mse_scores = [] K_range = range(1, 15) Press Shift and Enter |
We use the Elbow method to help identify the optimal number of neighbors. |
| Highlight the lines:
plt.figure(figsize=(10, 6)) |
Now, we visualize the Elbow method for KNN regression using a plot. |
| Highlight the output: | We can observe that the error decreases initially and then increases after a point.
In the plot, the lowest MSE value appears to be at K equals 5 and 2. So, we infer that the model performs best at K equals 5 or 2. After K equals 5, the MSE increases, suggesting no further improvement. |
| Highlight the lines:
knn = KNeighborsRegressor(n_neighbors=5) Press Shift and Enter |
Then we initialize the KNN Regressor using the KNeighborsRegressor function.
Our ideal K value is 5, so we initialize KNN regressor with 5 neighbors. |
| Highlight the lines:
knn.fit(X_train, y_train) Press Shift and Enter |
Further, we train the KNN regressor using the fit method on train data. |
| Highlight the lines:
y_train_pred = knn.predict(X_train) Press Shift and Enter |
Now, we predict labels for the training set. |
| Highlight the lines:
training_mse = mean_squared_error(y_train, y_train_pred) Press Shift and Enter |
We then calculate the Mean Squared Error for the training set. |
| Highlight the lines:
def adjusted_r2_score(y_true, y_pred, n, p): |
Then, we define the function for Adjusted R squared score for regression.
It adjusts for the number of predictors. |
| Highlight the lines:
n_train = X_train.shape[0] |
The training underscore adj underscore r2 computes the adjusted R square score. |
| Highlight | We print the MSE and Adjusted R squared score of the training set. |
| Highlight the output:
Training Mean Squared Error: 0.079 Training Adjusted R^2 Score: 0.973 |
Training MSE of 0.079 indicates the model has low error.
It implies good performance. The training Adjusted R squared score is 0.973. It means the model does a great job of predicting the data accurately. |
| Highlight the lines:
y_pred = knn.predict(X_test) Press Shift and Enter |
Let us predict labels for the test set. |
| Highlight the lines:
print(comparison_df) Press Shift and Enter |
We compare actual and predicted values to assess model accuracy.
This helps evaluate how well the model generalizes to new data. |
| Highlight the lines:
plt.figure(figsize=(6, 4)) sns.scatterplot(x=y_test, y=y_pred) plt.show() |
Now, we plot a scatter plot which compares actual vs. predicted values. |
| Show the output: | We can observe in the output that most points align with the red dashed line.
The red dashed line represents a perfect prediction match. |
| Highlight the lines:
test_mse = mean_squared_error(y_test, y_pred) Press Shift and Enter |
Now we calculate Mean Squared Error of the regression model for the test set. |
| Highlight the lines:
n_test = X_test.shape[0] p_test = X_test.shape[1] |
Then we calculate the Adjusted R squared score for the test set. |
| Highlight the lines:
print("Test Mean Squared Error:", format(test_mse, ".3f")) |
We print the MSE and Adjusted R square score of the test set. |
| Highlight the output lines:
Test Mean Squared Error: 0.105 Test Adjusted R² Score: 0.964 |
The test MSE score indicates the model has low error on test data.
It implies good generalization on test data. We get the adjusted R squared score of 0.964 for test data. The model fits the test data well, explaining 96.4 percent of its variance. |
| Show Slide:
Summary |
This brings us to the end of the tutorial. Let us summarize.
In this tutorial, we have learnt about
|
| Show Slide:
Assignment As an assignment, please do the following: |
As an assignment, please do the following:
Now observe the change in MSE and Adjusted R squared score |
| Show Slide:
Assignment Solution Show Man.JPG |
After completing the assignment, the output should match the expected result. |
| Show Slide:
FOSSEE Forum |
For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question. |
| Show Slide:
Thank you |
This is Harini Theiveegan,a FOSSEE Summer Fellow 2025, IIT Bombay signing off
Thanks for joining.
|