Python-for-Machine-Learning/C3/Support-Vector-Machine/English

From Script | Spoken-Tutorial
Jump to: navigation, search
Visual Cue Narration
Show slide:

Welcome

Welcome to the Spoken Tutorial on Support Vector Machine.
Show Slide:

Learning Objectives

In this tutorial, we will learn about
  • Support Vector Machine (SVM)
  • Linear SVM and
  • Non Linear SVM
Show Slide:

System Requirements

To record this tutorial, I am using
  • Ubuntu Linux OS version 24.04
  • Jupyter Notebook IDE
Show Slide:

Prerequisite

To follow this tutorial,
  • The learner must have basic knowledge of Python.
  • For prerequisite Python tutorials, please visit this website.
Show Slide:

Code files

  • The files used in this tutorial are provided in the Code files link.
  • Please download and extract the files.
  • Make a copy and then use them while practicing.
Show Slide

SVM

  • SVM is a supervised learning algorithm used for classification and regression.
  • It finds the best boundary, called a hyperplane, to separate classes.
Show Slide

Hyperplane and Margin

Show margin.png

Narration

  • The best hyperplane is the one that leaves the largest gap between classes.
  • This gap is called the margin, and a larger margin reduces errors.
Narration Next we will see about Linear SVM and Non-Linear SVM.
Show Slide

Linear SVM

  • If a straight line hyperplane can separate the data, we use Linear SVM.
  • Linear SVM aims to find the hyperplane that maximizes the margin.
Show Slide

Non-Linear SVM

  • When data is not linearly separable, we use Non Linear SVM.
  • Non Linear SVM uses the kernel trick to transform the data.
  • Kernels help find decision boundaries for data that isn’t linearly separable.
Hover over the files I have created required files for the demonstration of SVM.
Open the file housingcalifornia.csv and point to the fields as per narration. To implement the SVM model, we use the californiahousing dot csv dataset.

The columns in the dataset helps to classify whether a house price is High or Low.

Point to the SVM.ipynb SVM dot ipynb is the python notebook file for this demonstration.
Press Ctrl,Alt and T keys

Type conda activate ml

Press Enter

Let us open the Linux terminal. Press Ctrl, Alt and T keys together.

Activate the machine learning environment as shown

Go to the Downloads folder

Type cd Downloads

Type jupyter notebook

Press Enter

I have saved my code file in the Downloads folder.

Please navigate to the directory of your respective code file location.

Then type, jupyter space notebook and press Enter.

Show Jupyter Notebook Home page:

Click on SVM.ipynb

We can see the Jupyter Notebook Home page has opened in the web browser.

Click the SVM dot ipynb file to open it.

Note that each cell will have the output displayed in this file.

Highlight the lines

import pandas as pd import seaborn as sns from sklearn.decomposition import PCA

Press Shift and Enter

We start by importing the required libraries for SVM classification.

Now, we will implement a Linear SVM model.

Make sure to Press Shift and Enter to execute the code in each cell.

Highlight the lines:

housing_df = pd.read_csv('californiahousing.csv')

First, we load the dataset from a CSV file.
Highlight the lines:

housing_df.head()

Next, we display the first few rows using the head function.
Highlight the lines:

housing_df.shape

Then, we check the dataset’s shape to see the number of rows and columns.
Highlight the lines:

selected_features = ["MedInc", "HouseAge", "AveRooms", "AveBedrms", "Housing Price"]

Now, let’s visualize relationships between features using a pair plot.
Show the output Here is the output displaying feature relationships in the dataset.
Highlight the lines: Since our data has categories, we use Label Encoding to convert them.
Highlight the lines: Next, we separate the features and target variable for model training.
Highlight the lines:

X

Then we print the feature set X.
Highlight the lines:

y

Similarly, we print target variable y.
Highlight the lines:

X_train, X_test, y_train, y_test =

Now, we split the data into training and testing sets.
Highlight the lines:

scaler = MinMaxScaler() X_train_scaled =

Following this, we apply Min Max Scaler to keep the data within a fixed range.
Highlight the lines: Now, we train a Linear SVM model using the training data.

To set up a Linear SVM, we use the Linear kernel.

Highlight the lines:

y_train_pred_linear = svc_linear.predict(X_train_scaled)

Once trained, we make predictions on the training data.
Highlight the lines: Now, we check the training accuracy to evaluate model learning.
Highlight the lines:

y_pred_linear =

Next, we predict target values for the test data.
Highlight the lines: Then, we compare the actual target values with the predicted values.
Highlight the lines: We now calculate and display the accuracy of the Linear SVM model.
Highlight the output:

Accuracy: 0.840

We see the accuracy is 0.84, indicating strong model performance.
Highlight the lines: Now, we generate a classification report to evaluate model performance.
Highlight the lines:

train_sizes, train_scores, test_scores = learning_curve

Next, we plot a learning curve to see how accuracy changes with training size.
Show the output

Hover over training accuracy line and validation accuracy line in the plot.

The plot shows how accuracy changes with different training sizes.

The blue and red lines show training and validation accuracy respectively.

The learning curve helps to analyze model performance before further tuning.

Narration Let’s move to Non Linear SVM.
Highlight the lines:

svc_rbf = SVC(kernel='rbf', C=10,

To set up a Non Linear SVM, we use the Radial Basis Function kernel.

We set the regularization parameter C to 10 for better separation.

We also use class weighting to handle class imbalance.

Highlight the lines:

y_train_pred_rbf = svc_rbf.predict(X_train_scaled)

Now, we predict the training labels using the trained Non Linear SVM model.
Highlight the lines: Next, we calculate and display the training accuracy.
Highlight the lines:

y_pred_rbf = svc_rbf.predict(X_test_scaled)

Now, we generate predictions on the test data.
Highlight the lines: Then we compare actual values with predicted values using a Dataframe.
Highlight the lines: We now check the model’s final accuracy.
Highlight the output

Accuracy: 0.840

With an accuracy of 84 percent, the model performs well.
Highlight the lines: Now, let's analyze it further with a classification report.
Highlight the lines:

pca = PCA(n_components=2) X_train_pca = pca.fit_transform(X_train_scaled) X_test_pca = pca.transform(X_test_scaled)

After evaluating the model, let's visualize how SVM separates the classes.

We now plot the support vectors, which help define the decision boundary.

Show the output This plot shows an SVM model trained with an RBF kernel.

Each point represents a data sample from the dataset.Red and blue colors indicate two different target classes.Black X marks represent the model's support vectors.Support vectors are the key points defining the decision boundary.

Thus, this is a 2D visualization of an originally 9D dataset.

Show Slide:

Summary

This brings us to the end of the tutorial. Let us summarize.
Show Slide: In Linear SVM code,
  • Change the value of C to 5 as shown here
  • Observe the change in accuracy.
Show Slide:

Assignment Solution

Show Linear.PNG image file

After completing the assignment, the output should match the expected result.
Show Slide:

FOSSEE Forum

For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question
Show Slide:

Thank you

This is Harini Theiveegan, a FOSSEE Summer Fellow 2025, IIT Bombay signing off

Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat