Difference between revisions of "Python-for-Machine-Learning/C2/K-Nearest-Neighbor-Regression/English"

Latest revision as of 18:42, 9 June 2025

Visual Cue	Narration
Show Slide Welcome	Welcome to the Spoken Tutorial on K Nearest Neighbor Regression.
Show Slide: Learning Objectives	In this tutorial, we will learn about Distance metrics for nearest neighbor identification. Applying KNN regression to predict the petal length of iris flowers. Evaluation using MSE and Adjusted R squared score.
Show Slide: System Requirements Ubuntu Linux OS 24.04 Jupyter Notebook IDE	To record this tutorial, I am using Ubuntu Linux OS version 24.04 Jupyter Notebook IDE
Show Slide: Pre-requisites	To follow this tutorial, The learner must have basic knowledge of Python. For pre-requisite Python tutorials, please visit this website.
Show Slide: Code files	The files used in this tutorial are provided in the Code files link. Please download and extract the files. Make a copy and then use them while practicing.
Show Slide KNN Regression	KNN regression is an algorithm that predicts values using nearby data points. The prediction in regression is based on the average value of the target variable. K indicates the number of neighboring points to be considered for prediction.
Show Slide: Distance Metrics	The various distance metrics used in KNN for finding Nearest Neighbors are Euclidean distance measures the shortest path between two points in space. Scikit-learn library uses Euclidean as the default distance metric.
Show Slide: Distance Metrics	Manhattan distance is the sum of absolute differences between coordinates. Minkowski and Chebyshev distances are other common distance metrics used in KNN.
Hover over the file	I have created the required file for the demonstration of KNN regression.
Point to the KNNregression.ipynb	KNNregression dot ipynb is the ipython notebook file for this demonstration.
Press Ctrl,Alt+T keys Type conda activate ml Press Enter	Let us open the Linux terminal by pressing Ctrl,Alt and T keys together. Activate the machine learning environment as shown.
Go to the Downloads folder Type cd Downloads Press Enter Type jupyter notebook Press Enter	I have saved my code file in the Downloads folder. Please navigate to the respective folder of your code file location. Then type, jupyter space notebook and press Enter.
Show Jupyter Notebook Home page Click on KNNregression.ipynb file	We see the Jupyter Notebook Home page. Let us open the KNNregression dot ipynb file by clicking it. Note that each cell will have the output displayed in this file.
Highlight the lines:import pandas as pd import matplotlib.pyplot as import seaborn as sns	These are the necessary libraries to be imported for KNN regression. Please remember to Execute the cell by pressing Shift and Enter to get output.
Highlight the line: iris = load_iris() Press Shift and Enter	First, we load the dataset into a variable named iris. We are using the Iris dataset, loading it from the sklearn library.
Highlight the lines: iris_df = pd.DataFrame(iris.data, columns=iris.feature_names) iris_df['target'] = iris.target Press Shift and Enter	We create a DataFrame with feature names as columns. Then we create a new column target and assign class labels to it.
Highlight the lines: iris_df Press Shift and Enter	Now we display the Dataframe showing all the feature values and target labels.
Highlight the lines: print("Length of Dataset:", Press Shift and Enter	Then we print the dataset length, shape, and the names of all features.
Highlight the lines: target_feature = 'petal length (cm)' Press Shift and Enter	After that we store petal length as the target feature name.
Highlight the lines: X = iris_df.drop(columns=[target_feature, 'target'],axis=1)	Now we separate the features X and the target variable y.
Highlight the line: X Press Shift and Enter	We see that feature set X contains all features except the petal length.
Highlight the line: y Press Shift and Enter	The target set y contains the target feature petal length.
Highlight the lines: plt.figure(figsize=(10, 6)) sns.boxplot(data=iris_df.drop(columns=['target']))	Now we create a boxplot to visualize feature distributions before scaling. The boxplot shows how the features vary, their range, and any unusual values.
Highlight the lines: scaler = StandardScaler() X_scaled = scaler.fit_transform(X) Press Shift and Enter	Next, we apply Standard Scaling to normalize features using StandardScaler. X underscore scaled is the transformed data. It contains the data with mean 0 and standard deviation 1.
Highlight the lines: plt.figure(figsize=(10, 6))	Then, we plot a boxplot to show feature distributions after scaling. This helps us visualize how standardization affects the data. We can observe in the output that all features are scaled to a mean of 0. The standard deviation of all the features is now 1.
Highlight the lines:	Next, we split the data into training and testing sets.
Highlight the lines: mse_scores = [] K_range = range(1, 15) Press Shift and Enter	We use the Elbow method to help identify the optimal number of neighbors.
Highlight the lines: plt.figure(figsize=(10, 6))	Now, we visualize the Elbow method for KNN regression using a plot.
Highlight the output:	We can observe that the error decreases initially and then increases after a point. In the plot, the lowest MSE value appears to be at K equals 5 and 2. So, we infer that the model performs best at K equals 5 or 2. After K equals 5, the MSE increases, suggesting no further improvement.
Highlight the lines: knn = KNeighborsRegressor(n_neighbors=5) Press Shift and Enter	Then we initialize the KNN Regressor using the KNeighborsRegressor function. Our ideal K value is 5, so we initialize KNN regressor with 5 neighbors.
Highlight the lines: knn.fit(X_train, y_train) Press Shift and Enter	Further, we train the KNN regressor using the fit method on train data.
Highlight the lines: y_train_pred = knn.predict(X_train) Press Shift and Enter	Now, we predict labels for the training set.
Highlight the lines: training_mse = mean_squared_error(y_train, y_train_pred) Press Shift and Enter	We then calculate the Mean Squared Error for the training set.
Highlight the lines: def adjusted_r2_score(y_true, y_pred, n, p):	Then, we define the function for Adjusted R squared score for regression. It adjusts for the number of predictors.
Highlight the lines: n_train = X_train.shape[0]	The training underscore adj underscore r2 computes the adjusted R square score.
Highlight	We print the MSE and Adjusted R squared score of the training set.
Highlight the output: Training Mean Squared Error: 0.079 Training Adjusted R^2 Score: 0.973	Training MSE of 0.079 indicates the model has low error. It implies good performance. The training Adjusted R squared score is 0.973. It means the model does a great job of predicting the data accurately.
Highlight the lines: y_pred = knn.predict(X_test) Press Shift and Enter	Let us predict labels for the test set.
Highlight the lines: print(comparison_df) Press Shift and Enter	We compare actual and predicted values to assess model accuracy. This helps evaluate how well the model generalizes to new data.
Highlight the lines: plt.figure(figsize=(6, 4)) sns.scatterplot(x=y_test, y=y_pred) plt.show()	Now, we plot a scatter plot which compares actual vs. predicted values.
Show the output:	We can observe in the output that most points align with the red dashed line. The red dashed line represents a perfect prediction match.
Highlight the lines: test_mse = mean_squared_error(y_test, y_pred) Press Shift and Enter	Now we calculate Mean Squared Error of the regression model for the test set.
Highlight the lines: n_test = X_test.shape[0] p_test = X_test.shape[1]	Then we calculate the Adjusted R squared score for the test set.
Highlight the lines: print("Test Mean Squared Error:", format(test_mse, ".3f"))	We print the MSE and Adjusted R square score of the test set.
Highlight the output lines: Test Mean Squared Error: 0.105 Test Adjusted R² Score: 0.964	The test MSE score indicates the model has low error on test data. It implies good generalization on test data. We get the adjusted R squared score of 0.964 for test data. The model fits the test data well, explaining 96.4 percent of its variance.
Show Slide: Summary	This brings us to the end of the tutorial. Let us summarize. In this tutorial, we have learnt about Distance metrics for nearest neighbor identification. Applying KNN regression to predict the petal length of iris flowers. Evaluation using MSE and Adjusted R squared score.
Show Slide: Assignment As an assignment, please do the following:	As an assignment, please do the following: Let’s use Manhattan distance. Modify knn = KNeighborsRegressor(n_neighbors=5, metric='manhattan') Now observe the change in MSE and Adjusted R squared score
Show Slide: Assignment Solution Show Man.JPG	After completing the assignment, the output should match the expected result.
Show Slide: FOSSEE Forum	For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question.
Show Slide: Thank you	This is Harini Theiveegan,a FOSSEE Summer Fellow 2025, IIT Bombay signing off Thanks for joining.

Contributors and Content Editors

Nirmala Venkat

Difference between revisions of "Python-for-Machine-Learning/C2/K-Nearest-Neighbor-Regression/English"

Latest revision as of 18:42, 9 June 2025

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools