Python-for-Machine-Learning/C3/Support-Vector-Machine/English

Visual Cue	Narration
Show slide: Welcome	Welcome to the Spoken Tutorial on Support Vector Machine.
Show Slide: Learning Objectives	In this tutorial, we will learn about Support Vector Machine (SVM) Linear SVM and Non Linear SVM
Show Slide: System Requirements	To record this tutorial, I am using Ubuntu Linux OS version 24.04 Jupyter Notebook IDE
Show Slide: Prerequisite	To follow this tutorial, The learner must have basic knowledge of Python. For prerequisite Python tutorials, please visit this website.
Show Slide: Code files	The files used in this tutorial are provided in the Code files link. Please download and extract the files. Make a copy and then use them while practicing.
Show Slide SVM	SVM is a supervised learning algorithm used for classification and regression. It finds the best boundary, called a hyperplane, to separate classes.
Show Slide Hyperplane and Margin Show margin.png Narration	The best hyperplane is the one that leaves the largest gap between classes. This gap is called the margin, and a larger margin reduces errors.
Narration	Next we will see about Linear SVM and Non-Linear SVM.
Show Slide Linear SVM	If a straight line hyperplane can separate the data, we use Linear SVM. Linear SVM aims to find the hyperplane that maximizes the margin.
Show Slide Non-Linear SVM	When data is not linearly separable, we use Non Linear SVM. Non Linear SVM uses the kernel trick to transform the data. Kernels help find decision boundaries for data that isn’t linearly separable.
Hover over the files	I have created required files for the demonstration of SVM.
Open the file housingcalifornia.csv and point to the fields as per narration.	To implement the SVM model, we use the californiahousing dot csv dataset. The columns in the dataset helps to classify whether a house price is High or Low.
Point to the SVM.ipynb	SVM dot ipynb is the python notebook file for this demonstration.
Press Ctrl,Alt and T keys Type conda activate ml Press Enter	Let us open the Linux terminal. Press Ctrl, Alt and T keys together. Activate the machine learning environment as shown
Go to the Downloads folder Type cd Downloads Type jupyter notebook Press Enter	I have saved my code file in the Downloads folder. Please navigate to the directory of your respective code file location. Then type, jupyter space notebook and press Enter.
Show Jupyter Notebook Home page: Click on SVM.ipynb	We can see the Jupyter Notebook Home page has opened in the web browser. Click the SVM dot ipynb file to open it. Note that each cell will have the output displayed in this file.
Highlight the lines import pandas as pd import seaborn as sns from sklearn.decomposition import PCA Press Shift and Enter	We start by importing the required libraries for SVM classification. Now, we will implement a Linear SVM model. Make sure to Press Shift and Enter to execute the code in each cell.
Highlight the lines: housing_df = pd.read_csv('californiahousing.csv')	First, we load the dataset from a CSV file.
Highlight the lines: housing_df.head()	Next, we display the first few rows using the head function.
Highlight the lines: housing_df.shape	Then, we check the dataset’s shape to see the number of rows and columns.
Highlight the lines: selected_features = ["MedInc", "HouseAge", "AveRooms", "AveBedrms", "Housing Price"]	Now, let’s visualize relationships between features using a pair plot.
Show the output	Here is the output displaying feature relationships in the dataset.
Highlight the lines:	Since our data has categories, we use Label Encoding to convert them.
Highlight the lines:	Next, we separate the features and target variable for model training.
Highlight the lines: X	Then we print the feature set X.
Highlight the lines: y	Similarly, we print target variable y.
Highlight the lines: X_train, X_test, y_train, y_test =	Now, we split the data into training and testing sets.
Highlight the lines: scaler = MinMaxScaler() X_train_scaled =	Following this, we apply Min Max Scaler to keep the data within a fixed range.
Highlight the lines:	Now, we train a Linear SVM model using the training data. To set up a Linear SVM, we use the Linear kernel.
Highlight the lines: y_train_pred_linear = svc_linear.predict(X_train_scaled)	Once trained, we make predictions on the training data.
Highlight the lines:	Now, we check the training accuracy to evaluate model learning.
Highlight the lines: y_pred_linear =	Next, we predict target values for the test data.
Highlight the lines:	Then, we compare the actual target values with the predicted values.
Highlight the lines:	We now calculate and display the accuracy of the Linear SVM model.
Highlight the output: Accuracy: 0.840	We see the accuracy is 0.84, indicating strong model performance.
Highlight the lines:	Now, we generate a classification report to evaluate model performance.
Highlight the lines: train_sizes, train_scores, test_scores = learning_curve	Next, we plot a learning curve to see how accuracy changes with training size.
Show the output Hover over training accuracy line and validation accuracy line in the plot.	The plot shows how accuracy changes with different training sizes. The blue and red lines show training and validation accuracy respectively. The learning curve helps to analyze model performance before further tuning.
Narration	Let’s move to Non Linear SVM.
Highlight the lines: svc_rbf = SVC(kernel='rbf', C=10,	To set up a Non Linear SVM, we use the Radial Basis Function kernel. We set the regularization parameter C to 10 for better separation. We also use class weighting to handle class imbalance.
Highlight the lines: y_train_pred_rbf = svc_rbf.predict(X_train_scaled)	Now, we predict the training labels using the trained Non Linear SVM model.
Highlight the lines:	Next, we calculate and display the training accuracy.
Highlight the lines: y_pred_rbf = svc_rbf.predict(X_test_scaled)	Now, we generate predictions on the test data.
Highlight the lines:	Then we compare actual values with predicted values using a Dataframe.
Highlight the lines:	We now check the model’s final accuracy.
Highlight the output Accuracy: 0.840	With an accuracy of 84 percent, the model performs well.
Highlight the lines:	Now, let's analyze it further with a classification report.
Highlight the lines: pca = PCA(n_components=2) X_train_pca = pca.fit_transform(X_train_scaled) X_test_pca = pca.transform(X_test_scaled)	After evaluating the model, let's visualize how SVM separates the classes. We now plot the support vectors, which help define the decision boundary.
Show the output	This plot shows an SVM model trained with an RBF kernel. Each point represents a data sample from the dataset.Red and blue colors indicate two different target classes.Black X marks represent the model's support vectors.Support vectors are the key points defining the decision boundary. Thus, this is a 2D visualization of an originally 9D dataset.
Show Slide: Summary	This brings us to the end of the tutorial. Let us summarize.
Show Slide:	In Linear SVM code, Change the value of C to 5 as shown here Observe the change in accuracy.
Show Slide: Assignment Solution Show Linear.PNG image file	After completing the assignment, the output should match the expected result.
Show Slide: FOSSEE Forum	For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question
Show Slide: Thank you	This is Harini Theiveegan, a FOSSEE Summer Fellow 2025, IIT Bombay signing off Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat

Python-for-Machine-Learning/C3/Support-Vector-Machine/English

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools