Difference between revisions of "Python-for-Machine-Learning/C3/Support-Vector-Machine/English"
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- |- style="border:0.5pt solid #000000;padding-top:0cm;paddi...") |
|||
| (One intermediate revision by one other user not shown) | |||
| Line 1: | Line 1: | ||
| − | |||
{| border="1" | {| border="1" | ||
|- | |- | ||
| Line 6: | Line 5: | ||
|| '''Narration''' | || '''Narration''' | ||
|- | |- | ||
| − | + | ||
| − | || | + | || Show slide: |
'''Welcome''' | '''Welcome''' | ||
| − | | | + | || Welcome to the Spoken Tutorial on '''Support Vector Machine.''' |
|- | |- | ||
| − | | | + | || Show Slide: |
'''Learning Objectives''' | '''Learning Objectives''' | ||
| − | | | + | || In this tutorial, we will learn about |
| − | * | + | * '''Support Vector Machine (SVM)''' |
| − | * | + | * '''Linear SVM '''and |
| − | * | + | * '''Non Linear SVM ''' |
| − | |- | + | |- |
|| Show Slide: | || Show Slide: | ||
'''System Requirements''' | '''System Requirements''' | ||
|| To record this tutorial, I am using | || To record this tutorial, I am using | ||
| − | * | + | * '''Ubuntu Linux OS version 24.04''' |
| − | * | + | * '''Jupyter Notebook IDE''' |
|- | |- | ||
| − | | | + | || Show Slide: |
| − | + | '''Prerequisite''' | |
| − | | | + | || To follow this tutorial, |
| − | * | + | * The learner must have basic knowledge of '''Python.''' |
| − | * | + | * For prerequisite '''Python''' tutorials, please visit this website. |
|- | |- | ||
| − | | | + | || Show Slide: |
'''Code files''' | '''Code files''' | ||
| − | | | + | || |
| − | * | + | * The files used in this tutorial are provided in the '''Code files '''link. |
| − | * | + | * Please download and extract the files. |
| − | * | + | * Make a copy and then use them while practicing. |
|- | |- | ||
| − | | | + | || Show Slide |
'''SVM''' | '''SVM''' | ||
| − | | | + | || |
| − | * | + | * '''SVM''' is a '''supervised learning algorithm''' used for classification and regression. |
| − | * | + | * It finds the best boundary, called a '''hyperplane''', to separate classes. |
|- | |- | ||
| − | | | + | || Show Slide |
'''Hyperplane and Margin''' | '''Hyperplane and Margin''' | ||
| Line 58: | Line 57: | ||
Narration | Narration | ||
| − | | | + | || |
| − | * | + | * The best '''hyperplane''' is the one that leaves the largest gap between classes. |
| − | * | + | * This gap is called the '''margin''', and a larger margin reduces errors. |
|- | |- | ||
| − | | | + | || Narration |
| − | | | + | || Next we will see about Linear SVM and Non-Linear SVM. |
|- | |- | ||
| − | | | + | || Show Slide |
'''Linear SVM''' | '''Linear SVM''' | ||
| − | | | + | || |
| − | * | + | * If a straight line hyperplane can separate the data, we use '''Linear SVM'''. |
| − | * | + | * '''Linear SVM''' aims to find the hyperplane that maximizes the margin. |
|- | |- | ||
| − | | | + | || Show Slide |
'''Non-Linear SVM''' | '''Non-Linear SVM''' | ||
| − | | | + | || |
| − | * | + | * When data is not linearly separable, we use '''Non Linear SVM.''' |
| − | * | + | * Non Linear SVM uses the '''kernel trick''' to transform the data. |
| − | * | + | * '''Kernels '''help find decision boundaries for data that isn’t linearly separable. |
|- | |- | ||
| − | | | + | || Hover over the files |
| − | | | + | || I have created required files for the demonstration of '''SVM'''. |
|- | |- | ||
| − | | | + | || Open the file housingcalifornia.csv and point to the fields as per narration. |
| − | | | + | || To implement the '''SVM model, '''we use the '''californiahousing dot csv '''dataset. |
The columns in the dataset helps to classify whether a house price is High or Low. | The columns in the dataset helps to classify whether a house price is High or Low. | ||
|- | |- | ||
| − | | | + | || Point to the '''SVM.ipynb''' |
| − | | | + | || '''SVM dot ipynb''' is the python notebook file for this demonstration. |
| − | |- | + | |- |
|| Press '''Ctrl,Alt and T''' keys | || Press '''Ctrl,Alt and T''' keys | ||
| Line 101: | Line 100: | ||
Activate the machine learning environment as shown | Activate the machine learning environment as shown | ||
|- | |- | ||
| − | | | + | || Go to the '''Downloads '''folder |
Type '''cd Downloads''' | Type '''cd Downloads''' | ||
| Line 108: | Line 107: | ||
Press '''Enter ''' | Press '''Enter ''' | ||
| − | | | + | || I have saved my code file in the '''Downloads''' folder. |
| − | Please navigate to the directory of your respective code file location. | + | Please navigate to the directory of your respective '''code file''' location. |
Then type, '''jupyter space notebook '''and press''' Enter.''' | Then type, '''jupyter space notebook '''and press''' Enter.''' | ||
|- | |- | ||
| − | | | + | || Show Jupyter Notebook Home page: |
Click on''' SVM.ipynb''' | Click on''' SVM.ipynb''' | ||
| − | | | + | || We can see the '''Jupyter Notebook''' '''Home page''' has opened in the web browser. |
Click the '''SVM dot ipynb''' file to open it. | Click the '''SVM dot ipynb''' file to open it. | ||
| − | + | Note that each cell will have the output displayed in this file. | |
|- | |- | ||
| − | | | + | || Highlight''' '''the lines |
'''import pandas as pd ''' | '''import pandas as pd ''' | ||
| Line 130: | Line 129: | ||
Press''' Shift '''and''' Enter''' | Press''' Shift '''and''' Enter''' | ||
| − | | | + | || We start by importing the required libraries for '''SVM classification.''' |
Now, we will implement a '''Linear SVM''' model. | Now, we will implement a '''Linear SVM''' model. | ||
| Line 136: | Line 135: | ||
Make sure to Press''' Shift '''and''' Enter''' to execute the code in each cell. | Make sure to Press''' Shift '''and''' Enter''' to execute the code in each cell. | ||
|- | |- | ||
| − | | | + | || Highlight''' '''the lines: |
'''housing_df = pd.read_csv('californiahousing.csv')''' | '''housing_df = pd.read_csv('californiahousing.csv')''' | ||
| − | | | + | || First, we '''load the dataset''' from a CSV file. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''housing_df.head() ''' | '''housing_df.head() ''' | ||
| − | | | + | || Next, we display the first few rows using the '''head function'''. |
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
'''housing_df.shape ''' | '''housing_df.shape ''' | ||
|| Then, we check the '''dataset’s shape''' to see the number of rows and columns. | || Then, we check the '''dataset’s shape''' to see the number of rows and columns. | ||
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
'''selected_features = ["MedInc", "HouseAge", "AveRooms", "AveBedrms", "Housing Price"] ''' | '''selected_features = ["MedInc", "HouseAge", "AveRooms", "AveBedrms", "Housing Price"] ''' | ||
|| Now, let’s visualize relationships between features using a pair''' plot'''. | || Now, let’s visualize relationships between features using a pair''' plot'''. | ||
| − | |- | + | |- |
|| Show the output | || Show the output | ||
|| Here is the output displaying feature relationships in the dataset. | || Here is the output displaying feature relationships in the dataset. | ||
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
|| Since our data has categories, we use '''Label Encoding''' to convert them. | || Since our data has categories, we use '''Label Encoding''' to convert them. | ||
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Next, we separate the '''features''' and '''target '''variable for model training. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''X''' | '''X''' | ||
| − | | | + | || Then we print the '''feature set X.''' |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''y''' | '''y''' | ||
| − | | | + | || Similarly, we print '''target variable y.''' |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''X_train, X_test, y_train, y_test =''' | '''X_train, X_test, y_train, y_test =''' | ||
| − | | | + | || Now, we split the data into '''training''' and '''testing''' '''sets.''' |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''scaler = MinMaxScaler()''' | '''scaler = MinMaxScaler()''' | ||
'''X_train_scaled =''' | '''X_train_scaled =''' | ||
| − | | | + | || Following this, we apply '''Min Max Scaler''' to keep the data within a fixed range. |
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
|| Now, we train a '''Linear SVM''' model using the training data. | || Now, we train a '''Linear SVM''' model using the training data. | ||
| Line 194: | Line 193: | ||
To set up a '''Linear SVM''', we use the''' Linear''' '''kernel'''. | To set up a '''Linear SVM''', we use the''' Linear''' '''kernel'''. | ||
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
| Line 201: | Line 200: | ||
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Now, we check the training '''accuracy''' to evaluate model learning. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''y_pred_linear =''' | '''y_pred_linear =''' | ||
| − | | | + | || Next, we predict target values for the test data. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Then, we compare the actual target values with the predicted values. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || We now calculate and display the '''accuracy''' of '''the Linear SVM model.''' |
|- | |- | ||
| − | | | + | || Highlight the output: |
'''Accuracy: 0.840''' | '''Accuracy: 0.840''' | ||
| − | | | + | || We see the accuracy is '''0.84''', indicating strong model performance. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Now, we generate a classification report to evaluate model performance. |
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
| Line 233: | Line 232: | ||
|| Next, we plot a '''learning curve''' to see how accuracy changes with training size. | || Next, we plot a '''learning curve''' to see how accuracy changes with training size. | ||
|- | |- | ||
| − | | | + | || Show the output |
Hover over training accuracy line and validation accuracy line in the plot. | Hover over training accuracy line and validation accuracy line in the plot. | ||
| − | | | + | || The plot shows how '''accuracy changes with different training sizes.''' |
| − | + | The blue and red lines show '''training''' and '''validation accuracy''' respectively. | |
| − | + | The learning curve helps to analyze model performance before further tuning. | |
|- | |- | ||
| − | | | + | || Narration |
| − | | | + | || Let’s move to '''Non Linear SVM'''. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''svc_rbf = SVC(kernel='rbf', C=10,''' | '''svc_rbf = SVC(kernel='rbf', C=10,''' | ||
| − | | | + | || To set up a '''Non Linear SVM''', we use the''' Radial Basis Function kernel'''. |
We set the '''regularization parameter C''' to 10 for better separation. | We set the '''regularization parameter C''' to 10 for better separation. | ||
| Line 255: | Line 254: | ||
We also use '''class weighting''' to handle '''class imbalance'''. | We also use '''class weighting''' to handle '''class imbalance'''. | ||
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''y_train_pred_rbf = svc_rbf.predict(X_train_scaled) ''' | '''y_train_pred_rbf = svc_rbf.predict(X_train_scaled) ''' | ||
| − | | | + | || Now, we predict the training labels using the trained '''Non Linear SVM''' model. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Next, we calculate and display the '''training accuracy.''' |
| − | |- | + | |- |
|| Highlight the lines: | || Highlight the lines: | ||
| Line 269: | Line 268: | ||
|| Now, we generate predictions on the test data. | || Now, we generate predictions on the test data. | ||
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Then we compare actual values with predicted values using a Dataframe. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || We now check the model’s final '''accuracy'''. |
|- | |- | ||
| − | | | + | || Highlight the output |
'''Accuracy: 0.840''' | '''Accuracy: 0.840''' | ||
| − | | | + | || With an accuracy of '''84 percent''', the model performs well. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
| − | | | + | || Now, let's analyze it further with a '''classification report'''. |
|- | |- | ||
| − | | | + | || Highlight the lines: |
'''pca = PCA(n_components=2) ''' | '''pca = PCA(n_components=2) ''' | ||
'''X_train_pca = pca.fit_transform(X_train_scaled)''' | '''X_train_pca = pca.fit_transform(X_train_scaled)''' | ||
'''X_test_pca = pca.transform(X_test_scaled)''' | '''X_test_pca = pca.transform(X_test_scaled)''' | ||
| − | | | + | || After evaluating the model, let's visualize how '''SVM''' separates the classes. |
We now plot the '''support vectors''', which help define the '''decision boundary'''. | We now plot the '''support vectors''', which help define the '''decision boundary'''. | ||
| − | |- | + | |- |
|| Show the output | || Show the output | ||
|| This plot shows an '''SVM model''' trained with an '''RBF kernel'''. | || This plot shows an '''SVM model''' trained with an '''RBF kernel'''. | ||
| Line 300: | Line 299: | ||
Thus, this is a '''2D visualization''' of an originally 9D dataset. | Thus, this is a '''2D visualization''' of an originally 9D dataset. | ||
| − | |- | + | |- |
|| Show Slide: | || Show Slide: | ||
| Line 306: | Line 305: | ||
|| This brings us to the end of the tutorial. Let us summarize. | || This brings us to the end of the tutorial. Let us summarize. | ||
|- | |- | ||
| − | | | + | | | Show Slide: |
| style="border-top:0.5pt solid #000000;border-bottom:0.75pt solid #000000;border-left:none;border-right:0.6pt solid #000000;padding:0.106cm;" | In Linear SVM code, | | style="border-top:0.5pt solid #000000;border-bottom:0.75pt solid #000000;border-left:none;border-right:0.6pt solid #000000;padding:0.106cm;" | In Linear SVM code, | ||
| − | * | + | * Change the '''value of C to''' '''5''' as shown here |
| − | * | + | * Observe the change in '''accuracy'''. |
| − | |- | + | |- |
|| Show Slide: | || Show Slide: | ||
| Line 319: | Line 318: | ||
|| After completing the assignment, the output should match the expected result. | || After completing the assignment, the output should match the expected result. | ||
|- | |- | ||
| − | | | + | || Show Slide: |
'''FOSSEE Forum''' | '''FOSSEE Forum''' | ||
| − | | | + | || For any general or technical questions on '''Python for Machine Learning''', visit the '''FOSSEE forum''' and post your question |
|- | |- | ||
| − | | | + | || Show Slide: |
'''Thank you''' | '''Thank you''' | ||
| − | | | + | || This is '''Harini Theiveegan''', a FOSSEE Summer Fellow 2025, IIT Bombay signing off |
Thanks for joining. | Thanks for joining. | ||
|- | |- | ||
|} | |} | ||
Latest revision as of 21:43, 10 July 2025
| Visual Cue | Narration |
| Show slide:
Welcome |
Welcome to the Spoken Tutorial on Support Vector Machine. |
| Show Slide:
Learning Objectives |
In this tutorial, we will learn about
|
| Show Slide:
System Requirements |
To record this tutorial, I am using
|
| Show Slide:
Prerequisite |
To follow this tutorial,
|
| Show Slide:
Code files |
|
| Show Slide
SVM |
|
| Show Slide
Hyperplane and Margin Show margin.png Narration |
|
| Narration | Next we will see about Linear SVM and Non-Linear SVM. |
| Show Slide
Linear SVM |
|
| Show Slide
Non-Linear SVM |
|
| Hover over the files | I have created required files for the demonstration of SVM. |
| Open the file housingcalifornia.csv and point to the fields as per narration. | To implement the SVM model, we use the californiahousing dot csv dataset.
The columns in the dataset helps to classify whether a house price is High or Low. |
| Point to the SVM.ipynb | SVM dot ipynb is the python notebook file for this demonstration. |
| Press Ctrl,Alt and T keys
Type conda activate ml Press Enter |
Let us open the Linux terminal. Press Ctrl, Alt and T keys together.
Activate the machine learning environment as shown |
| Go to the Downloads folder
Type cd Downloads Type jupyter notebook Press Enter |
I have saved my code file in the Downloads folder.
Please navigate to the directory of your respective code file location. Then type, jupyter space notebook and press Enter. |
| Show Jupyter Notebook Home page:
Click on SVM.ipynb |
We can see the Jupyter Notebook Home page has opened in the web browser.
Click the SVM dot ipynb file to open it. Note that each cell will have the output displayed in this file. |
| Highlight the lines
import pandas as pd import seaborn as sns from sklearn.decomposition import PCA Press Shift and Enter |
We start by importing the required libraries for SVM classification.
Now, we will implement a Linear SVM model. Make sure to Press Shift and Enter to execute the code in each cell. |
| Highlight the lines:
housing_df = pd.read_csv('californiahousing.csv') |
First, we load the dataset from a CSV file. |
| Highlight the lines:
housing_df.head() |
Next, we display the first few rows using the head function. |
| Highlight the lines:
housing_df.shape |
Then, we check the dataset’s shape to see the number of rows and columns. |
| Highlight the lines:
selected_features = ["MedInc", "HouseAge", "AveRooms", "AveBedrms", "Housing Price"] |
Now, let’s visualize relationships between features using a pair plot. |
| Show the output | Here is the output displaying feature relationships in the dataset. |
| Highlight the lines: | Since our data has categories, we use Label Encoding to convert them. |
| Highlight the lines: | Next, we separate the features and target variable for model training. |
| Highlight the lines:
X |
Then we print the feature set X. |
| Highlight the lines:
y |
Similarly, we print target variable y. |
| Highlight the lines:
X_train, X_test, y_train, y_test = |
Now, we split the data into training and testing sets. |
| Highlight the lines:
scaler = MinMaxScaler() X_train_scaled = |
Following this, we apply Min Max Scaler to keep the data within a fixed range. |
| Highlight the lines: | Now, we train a Linear SVM model using the training data.
To set up a Linear SVM, we use the Linear kernel. |
| Highlight the lines:
y_train_pred_linear = svc_linear.predict(X_train_scaled) |
Once trained, we make predictions on the training data. |
| Highlight the lines: | Now, we check the training accuracy to evaluate model learning. |
| Highlight the lines:
y_pred_linear = |
Next, we predict target values for the test data. |
| Highlight the lines: | Then, we compare the actual target values with the predicted values. |
| Highlight the lines: | We now calculate and display the accuracy of the Linear SVM model. |
| Highlight the output:
Accuracy: 0.840 |
We see the accuracy is 0.84, indicating strong model performance. |
| Highlight the lines: | Now, we generate a classification report to evaluate model performance. |
| Highlight the lines:
train_sizes, train_scores, test_scores = learning_curve |
Next, we plot a learning curve to see how accuracy changes with training size. |
| Show the output
Hover over training accuracy line and validation accuracy line in the plot. |
The plot shows how accuracy changes with different training sizes.
The blue and red lines show training and validation accuracy respectively. The learning curve helps to analyze model performance before further tuning. |
| Narration | Let’s move to Non Linear SVM. |
| Highlight the lines:
svc_rbf = SVC(kernel='rbf', C=10, |
To set up a Non Linear SVM, we use the Radial Basis Function kernel.
We set the regularization parameter C to 10 for better separation. We also use class weighting to handle class imbalance. |
| Highlight the lines:
y_train_pred_rbf = svc_rbf.predict(X_train_scaled) |
Now, we predict the training labels using the trained Non Linear SVM model. |
| Highlight the lines: | Next, we calculate and display the training accuracy. |
| Highlight the lines:
y_pred_rbf = svc_rbf.predict(X_test_scaled) |
Now, we generate predictions on the test data. |
| Highlight the lines: | Then we compare actual values with predicted values using a Dataframe. |
| Highlight the lines: | We now check the model’s final accuracy. |
| Highlight the output
Accuracy: 0.840 |
With an accuracy of 84 percent, the model performs well. |
| Highlight the lines: | Now, let's analyze it further with a classification report. |
| Highlight the lines:
pca = PCA(n_components=2) X_train_pca = pca.fit_transform(X_train_scaled) X_test_pca = pca.transform(X_test_scaled) |
After evaluating the model, let's visualize how SVM separates the classes.
We now plot the support vectors, which help define the decision boundary. |
| Show the output | This plot shows an SVM model trained with an RBF kernel.
Each point represents a data sample from the dataset.Red and blue colors indicate two different target classes.Black X marks represent the model's support vectors.Support vectors are the key points defining the decision boundary. Thus, this is a 2D visualization of an originally 9D dataset. |
| Show Slide:
Summary |
This brings us to the end of the tutorial. Let us summarize. |
| Show Slide: | In Linear SVM code,
|
| Show Slide:
Assignment Solution Show Linear.PNG image file |
After completing the assignment, the output should match the expected result. |
| Show Slide:
FOSSEE Forum |
For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question |
| Show Slide:
Thank you |
This is Harini Theiveegan, a FOSSEE Summer Fellow 2025, IIT Bombay signing off
Thanks for joining. |