Difference between revisions of "Python-for-Machine-Learning/C2/Linear-Regression/English"

Revision as of 15:48, 4 July 2025

Visual Cue	Narration
Show slide: Welcome	Welcome to the Spoken Tutorial on Linear Regression.
Show Slide: Learning Objectives	In this tutorial, we will learn about Linear Regression Simple Linear Regression Multiple Linear Regression Evaluation Metrics
Show Slide: System Requirements	To Record this tutorial, I am using Ubuntu Linux operating system 24.04 Jupyter Notebook IDE
Show Slide: Prerequisite	To follow this tutorial, The learner must have basic knowledge of Python. For prerequisite Python tutorials, please visit this website.
Show Slide: Code files	The files used in this tutorial are provided in the Code files link. Please download and extract the files. Make a copy and then use them while practicing.
Show Slide: Linear Regression	Linear regression is a predictive technique used in machine learning. It builds the relationship between a dependent and independent variable. Linear regression is categorized into Simple and Multiple linear regression.
Show Slide: Simple Linear Regression	Simple Linear Regression is a way to find relationships between two variables. It studies how one independent variable affects one dependent variable.
Show Slide: Multiple Linear Regression	Multiple linear Regression is an extension of simple linear regression. It examines how multiple factors influence a single outcome.
Show Slide: Evaluation Metrics	To assess the model’s performance, we use evaluation metrics. These metrics indicate how well the regression model fits the data. The two common metrics are Mean Absolute Error and R squared score.
Hover over the files	I have created required files for the demonstration of Linear Regression.
Open the file salaries.csv and point to the fields as per narration. Open the file salaries_mlr.csv and point to the fields as per narration.	To implement Simple Linear Regression, we use the salaries dot csv dataset. This dataset contains salaries based on years of experience. We use salaries underscore mlr dot csv dataset for Multiple Linear Regression. This dataset contains multiple columns as shown.
Point to the LinearRegression.ipynb	LinearRegression dot ipynb is the python notebook file for this demonstration.
Press Ctrl,Alt+T keys Type conda activate ml Press Enter	Let us open the Linux terminal. Press Ctrl, Alt and T keys together. First, we need to activate the machine learning environment. Run the command conda space activate space ml. Press Enter.
Go to the Downloads folder Type cd Downloads Press Enter Type jupyter notebook Press Enter	I have saved my code file in the Downloads folder. Please navigate to the respective folder of your code file location. Then type, jupyter space notebook and press Enter.
Show Jupyter Notebook Home page Click onLinearRegression.ipynb file	We see the Jupyter Notebook Home page. Click the LinearRegression dot ipynb file to open it. Note that each cell will have the output displayed in this file.
Highlight the lines: import numpy as np import pandas as pd Press Shift+Enter	We start by importing the required libraries for Simple Linear Regression. Make sure to Press Shift and Enter to execute the code in each cell.
Highlight the lines: df_salary=pd.read_csv("salaries.csv")	Let us load the dataset into a variable called df underscore salary.
Highlight the lines: df_salary.head()	Next, we display the first few rows of the data.
Highlight the lines: df_salary.describe()	Now, we generate summary statistics for the numerical columns.
Highlight the lines: sns.heatmap(df_salary.corr(), annot=True, cmap="coolwarm") plt.show()	Correlation heatmap shows how attributes in the dataset are related.
Narration:	Correlation measures how two variables are related to each other. Correlation measures the relationship between two variables The correlation values range from -1 to 1.
Show the Correlation matrix output 4.47	Here, experience and income have a correlation of 0.97.This means that as experience increases, income also increases strongly. Let us understand the correlation value ranges.
Show Slide: Correlation Matrix	A value of 1 means a perfect positive correlation. A value of -1 means a perfect negative correlation. A value of 0 means no correlation
Highlight the lines: plt.figure(figsize=(6,4))	Now we create a boxplot to visualize the income distribution.
Show the output	This image is a boxplot of income before removing outliers. Outliers are extreme values that differ significantly from other data points. They are the small circles on the right side of the boxplot. Here, incomes around 60,000 to 65,000 are considered as outliers. The line inside the box is the median.
Highlight the lines: Q1 = df_salary[['experience', 'income']].quantile(0.25) Q3 = df_salary[['experience', 'income']].quantile(0.75) IQR = Q3 - Q1	Next, we will remove these outliers using the Interquartile Range method. We calculate first quartile Q1 and third quartile Q3 for experience and income. Then, we compute the IQR and remove the outliers.
Highlight the lines: plt.figure(figsize=(6,4))	Now, we plot the income distribution after removing outliers.
Show the output	Observe that the small circles are gone, showing outliers were removed.
Highlight the lines: x=df_salary['experience'] y=df_salary['income']	Now, we define x as experience and y as income from the dataset.
Highlight the lines:	Then, we split the data into training and testing sets.
Highlight the lines: x_train=np.array(x_train).reshape(-1,1) x_test=np.array(x_test).reshape(-1,1)	We then reshape the x underscore train lists into 2D array. The same is done for x underscore test for compatibility.
Highlight the lines: lr=LinearRegression() lr.fit(x_train,y_train)	Now, we initialize a Linear Regression model and train it using training data.
Highlight the lines: print("Intercept (W0):", lr.intercept_) print("Coefficient (W1):", lr.coef_)	Then, we print the intercept W0 and coefficient W1 of the model. These define the model’s slope and relationship between experience and income.
Highlight the lines: y_pred_train = lr.predict(x_train) y_pred_train = y_pred_train.round().astype(int) y_pred_train	Now, we use the trained model to make predictions on the training data. We round the predictions to whole numbers for better readability. Then, we display the rounded predictions.
Highlight the lines: mae_train = mean_absolute_error(y_train, y_pred_train) print("MAE (Training):", mae_train)	Next, we calculate the Mean Absolute Error on the training data. Mean Absolute Error measures prediction accuracy.
Highlight the lines: r2_score(y_pred_train, y_train)	Then, we compute the R squared score to evaluate the model’s performance. R squared score measures how well the model explains the variance in the data. A value closer to 1 indicates a stronger fit.
Highlight the lines: y_pred_test = lr.predict(x_test) y_pred_test = y_pred_test.round().astype(int) y_pred_test	Now, we make predictions on the test data.
Highlight the lines: plt.scatter(x_test,y_test)	To visualize performance, we create a scatter plot of actual vs predicted values.
Show the output	In the output we can see that most points are close to the line. It shows a positive correlation.
Highlight the lines: mean_absolute_error(y_test,y_pred_test)	Now, compute the Mean Absolute Error on the test data.
Highlight the lines: r2_score(y_pred_test, y_test)	Then, we calculate and display the R squared score.
Narration	The model has a Mean Absolute Error of 1626.41, indicating prediction errors. The R-squared score of 0.87 shows the model explains most of the variance. Overall, the model performs well but has some prediction errors.
	Now let us see the implementation of Multiple Linear Regression.
Highlight the lines: df_salaries = pd.read_csv(r"salaries_mlr.csv")	First, load the dataset for Multiple Linear Regression.
Highlight the lines: df_salaries.tail()	Then, we display the last five rows.
Highlight the lines: df_salaries.dtypes	Next, we check the data types of each column in the dataset.
Highlight the lines: df_salaries.isnull().sum()	We also check for any missing values in the dataset by summing them.
Highlight the lines: df_salaries['gender'] = df_salaries['gender'].map({'m': 1, 'f': 0})	Now, we convert gender column to numeric values, 1 for Male and 0 for Female.
Highlight the lines: X = df_salaries.drop(columns='income') y = df_salaries['income']	Then, we separate the features X and the target variable y for prediction.
Highlight the lines: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)	Now, we split the data into training and testing sets.
Highlight the lines: model = LinearRegression() model.fit(X_train, y_train)	We initialize a Linear Regression model and train it using the training data.
Highlight the lines: coefficients = pd.DataFrame({'Feature': X.columns, 'Coefficient': model.coef_})	Next, we print the model's coefficients and intercept.
Highlight the lines: y_train_pred = model.predict(X_train) y_train_pred = y_train_pred.round().astype(int) y_train_pred	Now, we make predictions on the training data.
Highlight the lines: mae_train = mean_absolute_error(y_train, y_train_pred) print(f'Training data MAE: {mae_train}')	Next, we compute the Mean Absolute Error for training data.
Highlight the lines: r2_train = r2_score(y_train, y_train_pred) n_train = len(y_train	Then, we computethe R squared score to measure the model performance After that, we compute and print the adjusted R squared score.
Highlight the lines: y_test_pred = model.predict(X_test) y_test_pred = y_test_pred.round().astype(int)	Moving forward, we make predictions on the test data.
Highlight the lines: plt.scatter(y_test, y_test_pred, color='red', label='Predicted') plt.scatter(y_test, y_test, color='blue', alpha=0.5, label='Actual')	We compare actual vs predicted income using a scatter plot.
Highlight the lines: mae_test = mean_absolute_error(y_test, y_test_pred) print(f'Testing data MAE: {mae_test}')	Then, we compute the Mean Absolute Error for the test data.
Highlight the lines: r2_test = r2_score(y_test, y_test_pred) n_test = len(y_test) k_test = X_test.shape[1]	Next, we calculate the R squared score for the test data.
Narration	The model has an MAE of 1700.15, showing the average prediction error in income.The Adjusted R squared score is 0.921. It indicates the model explains 92.1 percent of income variance.
Highlight the lines: residuals = y_test - y_test_pred plt.show()	Now, we analyse the residuals to check model errors. We create a scatter plot of predicted values versus residuals.
Highlight the output	This is a residual plot for the regression model. The red dashed line represents zero residual. Points above the line mean predictions are lower than actual values. Points below the line mean predictions are higher than actual values. Most residuals are close to zero, meaning predictions are fairly accurate.
Narration	Thus, we successfully implemented Multiple Linear Regression.
Show slide: Summary	This brings us to the end of the tutorial. Let us summarize. In this tutorial, we have learnt about Linear Regression Simple Linear Regression Multiple Linear Regression Evaluation Metrics
Show Slide: Assignment In Multiple Linear Regression code, Replace the test_size parameter as shown here.	In Multiple Linear Regression code, Replace the test_size parameter as shown here. Observe the change in MAE and Adjusted R squared score.
Show Slide: Assignment Solution Show s1 img file	After completing the assignment, the output should match the expected result.
Show Slide: FOSSEE Forum	For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question.
Show Slide: Thank you	This is Harini Theiveegan, a FOSSEE Summer Fellow 2025, IIT Bombay signing off Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat

Difference between revisions of "Python-for-Machine-Learning/C2/Linear-Regression/English"

Revision as of 15:48, 4 July 2025

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
-<div style="margin-left:1.27cm;margin-right:0cm;"></div>
 {| border="1"
 |-
@@ Line 7: / Line 7: @@
 || '''Narration'''
 |-
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
-|| <div style="color:#000000;">Show slide:</div>
+|| Show slide:
-<div style="color:#000000;">'''Welcome'''</div>
+'''Welcome'''
 || Welcome to the Spoken Tutorial on''' Linear Regression'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 19: / Line 19: @@
 || In this tutorial, we will learn about
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Linear Regression'''</span></div>
+* '''Linear Regression'''
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Simple Linear Regression'''</span></div>
+* '''Simple Linear Regression'''
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Multiple Linear Regression'''</div>
+* '''Multiple Linear Regression'''
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Evaluation Metrics'''</span></div>
+* '''Evaluation Metrics'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 30: / Line 30: @@
 || To Record this tutorial, I am using
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="color:#000000;">'''Ubuntu Linux operating system 24.04'''</span></div>
+* '''Ubuntu Linux operating system 24.04'''
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#000000;">'''Jupyter Notebook'''</span><span style="background-color:transparent;color:#000000;"> </span><span style="background-color:transparent;color:#000000;">'''IDE'''</span></div>
+* '''Jupyter Notebook''' '''IDE'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''Prerequisite'''
 || To follow this tutorial,
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#000000;">The learner must have basic knowledge of </span><span style="background-color:transparent;color:#000000;">'''Python.'''</span></div>
+* The learner must have basic knowledge of '''Python.'''
-* <div style="margin-left:1.27cm;margin-right:0cm;">For prerequisite '''Python''' tutorials, please visit this website.</div>
+* For prerequisite '''Python''' tutorials, please visit this website.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''Code files'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#000000;">The files used in this tutorial are provided in the </span><span style="background-color:transparent;color:#000000;">'''Code files '''</span><span style="background-color:transparent;color:#000000;">link.</span></div>
+* The files used in this tutorial are provided in the '''Code files '''link.
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#252525;">Please download and extract the files.</span></div>
+* Please download and extract the files.
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#252525;">Make a copy and then use them while practicing.</span></div>
+* Make a copy and then use them while practicing.
+|-
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
 || Show Slide:
@@ Line 55: / Line 54: @@
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Linear regression''' is a predictive technique used in machine learning. </div>
+* '''Linear regression''' is a predictive technique used in machine learning.
-* <div style="margin-left:1.27cm;margin-right:0cm;">It builds the relationship between a '''dependent''' and '''independent''' variable.</div>
+* It builds the relationship between a '''dependent''' and '''independent''' variable.
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#000000;">Linear regression is categorized into </span><span style="background-color:transparent;color:#000000;">'''Simple'''</span><span style="background-color:transparent;color:#000000;"> and </span><span style="background-color:transparent;color:#000000;">'''Multiple linear regression'''</span><span style="background-color:transparent;color:#000000;">.</span></div>
+* Linear regression is categorized into '''Simple''' and '''Multiple linear regression'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 65: / Line 64: @@
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;">'''Simple Linear Regression '''</span><span style="background-color:transparent;">is a way to find </span>relationships<span style="background-color:transparent;"> between two variables.</span></div>
+* '''Simple Linear Regression '''is a way to find relationships between two variables.
-* <div style="margin-left:1.27cm;margin-right:0cm;">It studies how one '''independent variable''' affects one '''dependent variable'''.</div>
+* It studies how one '''independent variable''' affects one '''dependent variable'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''Multiple Linear Regression'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;">'''Multiple linear Regression'''</span><span style="background-color:transparent;"> is an extension of simple linear regression.</span></div>
+* '''Multiple linear Regression''' is an extension of simple linear regression.
-* <div style="margin-left:1.27cm;margin-right:0cm;">It examines how multiple factors influence a single outcome.</div>
+* It examines how multiple factors influence a single outcome.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''Evaluation Metrics'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;">To assess the model’s performance, we use '''evaluation metrics'''.</div>
+* To assess the model’s performance, we use '''evaluation metrics'''.
-* <div style="margin-left:1.27cm;margin-right:0cm;">These metrics indicate how well the '''regression model''' fits the data. </div>
+* These metrics indicate how well the '''regression model''' fits the data.
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;">The two common metrics are </span><span style="background-color:transparent;">'''Mean Absolute Error '''</span><span style="background-color:transparent;">and </span><span style="background-color:transparent;">'''R squared score.'''</span></div>
+* The two common metrics are '''Mean Absolute Error '''and '''R squared score.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Hover over the files
 || I have created required files for the demonstration of''' Linear Regression.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Open the file salaries.csv and point to the fields as per narration.
@@ Line 98: / Line 97: @@
 This dataset contains multiple columns as shown.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Point to the '''LinearRegression.ipynb '''
 || '''LinearRegression dot ipynb '''is the python notebook file for this demonstration.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Press '''Ctrl,Alt'''+'''T '''keys
@@ Line 115: / Line 114: @@
 Press '''Enter.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Go to the '''Downloads '''folder
@@ Line 131: / Line 130: @@
 Then type, '''jupyter space notebook and''' press''' Enter.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show '''Jupyter Notebook Home page'''
@@ Line 139: / Line 138: @@
 Click the '''LinearRegression dot ipynb''' file to open it.
-<div style="color:#000000;">Note that each cell will have the output displayed in this file.</div>
+Note that each cell will have the output displayed in this file.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 153: / Line 152: @@
 Make sure to Press''' Shift '''and''' Enter''' to execute the code in each cell.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 159: / Line 158: @@
 || Let us load the dataset into a variable called '''df underscore salary.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 165: / Line 164: @@
 || Next, we display the '''first few rows''' of the data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 171: / Line 170: @@
 || Now, we generate '''summary statistics''' for the numerical columns.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 179: / Line 178: @@
 || '''Correlation heatmap''' shows how attributes in the dataset are related.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Narration:
 || '''Correlation''' measures how two variables are related to each other.
@@ Line 185: / Line 184: @@
 '''Correlation''' measures the relationship between two variables
-<div style="color:#000000;">The '''correlation''' '''values range from -1 to 1'''.</div>
+The '''correlation''' '''values range from -1 to 1'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show the Correlation matrix output 4.47
-|| <div style="color:#000000;">Here, experience and income have a correlation of '''0.97.'''This means that as '''experience increases''', '''income also increases''' strongly.</div>
+|| Here, experience and income have a correlation of '''0.97.'''This means that as '''experience increases''', '''income also increases''' strongly.
 Let us understand the correlation value ranges.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''Correlation Matrix'''
 ||
-* <div style="margin-left:1.27cm;margin-right:0cm;">A value of '''1''' means a '''perfect''' '''positive correlation'''.</div>
+* A value of '''1''' means a '''perfect''' '''positive correlation'''.
-* <div style="margin-left:1.27cm;margin-right:0cm;">A value of '''-1''' means a '''perfect negative correlation'''.</div>
+* A value of '''-1''' means a '''perfect negative correlation'''.
-* <div style="margin-left:1.27cm;margin-right:0cm;">A value of '''0 '''means '''no correlation'''</div>
+* A value of '''0 '''means '''no correlation'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 208: / Line 207: @@
 || Now we create a '''boxplot''' to visualize the income distribution.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show the output
@@ Line 221: / Line 220: @@
 The line inside the box is the '''median'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 236: / Line 235: @@
 Then, we compute the''' IQR '''and remove the outliers.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 243: / Line 242: @@
 || Now, we plot the income distribution after '''removing outliers'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show the output
 || Observe that the small circles are gone, showing outliers were removed.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 255: / Line 254: @@
 '''y=df_salary['income'] '''
 || Now, we define '''x''' as '''experience''' and '''y''' as '''income''' from the dataset.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 || Then, we split the data into '''training''' and '''testing''' '''sets'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 270: / Line 269: @@
 The same is done for '''x underscore test''' for '''compatibility.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 278: / Line 277: @@
 || Now, we initialize a '''Linear Regression model''' and train it using training data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 289: / Line 288: @@
 These define the model’s '''slope and relationship''' between experience and income.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 303: / Line 302: @@
 Then, we display the rounded predictions.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 314: / Line 313: @@
 '''Mean Absolute Error''' measures '''prediction accuracy.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 324: / Line 323: @@
 A '''value closer to''' '''1''' indicates a '''stronger fit.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 334: / Line 333: @@
 || Now, we make predictions on the test data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 341: / Line 340: @@
 || To visualize performance, we create a '''scatter plot of actual vs predicted values'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show the output
 || In the output we can see that most points are close to the line.
@@ Line 347: / Line 346: @@
 It shows a '''positive correlation.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 354: / Line 353: @@
 || Now, compute the '''Mean Absolute Error '''on the test data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
 '''r2_score(y_pred_test, y_test) '''
 || Then, we calculate and display the '''R squared score'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Narration
@@ Line 368: / Line 367: @@
 Overall, the model performs well but has some prediction errors.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 ||
 || Now let us see the implementation of '''Multiple Linear Regression'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 379: / Line 378: @@
 || First, load the dataset for '''Multiple Linear Regression'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 385: / Line 384: @@
 || Then, we display the '''last five rows.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 391: / Line 390: @@
 || Next, we check the '''data types''' of each column in the dataset.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 397: / Line 396: @@
 || We also check for any '''missing values''' in the dataset by summing them.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 403: / Line 402: @@
 || Now, we convert '''gender column''' to numeric values, '''1 for Male''' and '''0 for Female'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 411: / Line 410: @@
 || Then, we separate the '''features X''' and the '''target variable y''' for prediction.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 417: / Line 416: @@
 || Now, we split the data into '''training and testing sets.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 425: / Line 424: @@
 || We initialize a''' Linear Regression model''' and train it using the training data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 432: / Line 431: @@
 || Next, we print the model's '''coefficients and intercept'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 441: / Line 440: @@
 '''y_train_pred'''
 || Now, we make '''predictions on the training data.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 447: / Line 446: @@
 || Next, we compute the '''Mean Absolute Error for training data'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 457: / Line 456: @@
 After that, we compute and print the '''adjusted R squared '''score.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 465: / Line 464: @@
 || Moving forward, we make '''predictions on the test data.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 472: / Line 471: @@
 '''plt.scatter(y_test, y_test, color='blue', alpha=0.5, label='Actual') '''
 || We compare '''actual vs predicted income''' using a '''scatter plot.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 478: / Line 477: @@
 || Then, we compute the '''Mean Absolute Error''' for the test data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 487: / Line 486: @@
 '''k_test = X_test.shape[1] '''
 || Next, we calculate the '''R squared score '''for the test data.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Narration
 || The model has an '''MAE''' of '''1700.15''', showing the average prediction error in income.The '''Adjusted R squared score''' is '''0.921'''.
@@ Line 493: / Line 492: @@
 It indicates the model explains '''92.1 percent''' of income variance.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the lines:
@@ Line 503: / Line 502: @@
 We create a '''scatter plot''' of '''predicted values versus residuals.'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Highlight the output
@@ Line 516: / Line 515: @@
 Most '''residuals''' are '''close to zero''', meaning predictions are fairly accurate.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Narration
 || Thus, we successfully implemented '''Multiple Linear Regression'''.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show slide:
@@ Line 527: / Line 526: @@
 In this tutorial, we have learnt about
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Linear Regression'''</div>
+* '''Linear Regression'''
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Simple Linear Regression'''</div>
+* '''Simple Linear Regression'''
-* <div style="margin-left:1.27cm;margin-right:0cm;">'''Multiple Linear Regression'''</div>
+* '''Multiple Linear Regression'''
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#000000;">'''Evaluation Metrics'''</span></div>
+* '''Evaluation Metrics'''
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
@@ Line 538: / Line 537: @@
 In Multiple Linear Regression code,
-* <div style="margin-left:1.27cm;margin-right:0cm;"><span style="background-color:transparent;color:#000000;">Replace the test_size parameter as shown here.</span></div>
+* Replace the test_size parameter as shown here.
-<div style="margin-left:1.27cm;margin-right:0cm;"></div>
 || In Multiple Linear Regression code,
-* <div style="margin-left:1.27cm;margin-right:0cm;">R<span style="background-color:transparent;color:#000000;">eplace the </span><span style="background-color:transparent;color:#000000;">'''test_size parameter'''</span><span style="background-color:transparent;color:#000000;"> as shown here.</span></div>
+* Replace the '''test_size parameter''' as shown here.
-* <div style="margin-left:1.27cm;margin-right:0cm;">Ob<span style="background-color:transparent;color:#000000;">serve the change in </span><span style="background-color:transparent;color:#000000;">'''MAE '''</span><span style="background-color:transparent;color:#000000;">and </span><span style="background-color:transparent;color:#000000;">'''Adjusted R squared score'''</span><span style="background-color:transparent;color:#000000;">.</span></div>
+* Observe the change in '''MAE '''and '''Adjusted R squared score'''.
-<div style="color:#000000;"></div>
+|-
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
 || Show Slide:
@@ Line 554: / Line 553: @@
 Show s1 img file
 || After completing the assignment, the output should match the expected result.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''FOSSEE Forum'''
-|| For any general or technical questions on <span style="background-color:#ffffff;">'''Python</span><span style="background-color:#ffffff;"> for Machine Learning'''</span>, visit the''' FOSSEE forum''' and post your question.
+|| For any general or technical questions on '''Python for Machine Learning''', visit the''' FOSSEE forum''' and post your question.
-|- style="border:0.5pt solid #000000;padding-top:0cm;padding-bottom:0cm;padding-left:0.191cm;padding-right:0.191cm;"
+|-
 || Show Slide:
 '''Thank you'''
-|| <div style="color:#000000;">This is '''Harini Theiveegan''', a FOSSEE Summer Fellow 2025, IIT Bombay signing off</div>
+|| This is '''Harini Theiveegan''', a FOSSEE Summer Fellow 2025, IIT Bombay signing off
 Thanks for joining.