Difference between revisions of "Python-for-Machine-Learning/C2/Logistic-Regression-MultiClass-Classification/English"

From Script | Spoken-Tutorial
Jump to: navigation, search
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- |- style="border:0.5pt solid #000000;padding-top:0cm;pad...")
(No difference)

Revision as of 16:43, 24 June 2025


Visual Cue Narration
Show slide:

Welcome

Welcome to the Spoken Tutorial on Logistic Regression - Multiclass Classification.
Show slide:

Learning Objectives

In this tutorial, we will learn about
  • Multiclass Classification for Logistic Regression
Show slide:

System Requirements

To record this tutorial, I am using
  • Ubuntu Linux OS version 24.04
  • Jupyter Notebook IDE
Show slide:

Prerequisite

To follow this tutorial,
  • The learner must have basic knowledge of Python.
  • For pre-requisite Python tutorials, please visit this website.
Show slide:

Code files

  • The files used in this tutorial are provided in the Code files link.
  • Please download and extract the files.
  • Make a copy and then use them while practicing.
Show slide:

Iris flower classification

To implement the Multiclass classification model we will,
  • Use the iris dataset to classify the iris flower.
  • To know more about the iris dataset please watch earlier tutorials.
Point to the LR_Multiclass.ipynb LR_Multiclass dot ipynb is the ipython notebook file created for this demonstration.
Press Ctrl+Alt+T keys

Type conda activate ml Press Enter Highlight: (ml)

Let us open the Linux terminal by pressing Ctrl, Alt and T keys together.

Activate the machine learning environment as shown.

Type cd Downloads

Type jupyter notebook

Press Enter

I have saved my code file in the Downloads folder.

Please navigate to the respective folder of your code file location.

Then type, jupyter space notebook and press Enter.

Show Jupyter Notebook Home page:

Double Click on LR_Multiclass.ipynb file

We can see the Jupyter Notebook Home page has opened in the web browser.

Click on the LR underscore Multiclass dot ipynb file to open it.

Note that each cell will have the output displayed in this file.

Let us see the implementation of multiclass logistic regression.

Highlight import pandas as pd These are the necessary libraries to be imported for Multiclass classification.
Only narration

Highlight: iris = load_iris()

iris.data[:5]

We first load the Iris dataset using the load underscore iris method.

The dataset is stored in the variable iris.

Then we display the first five rows using the head method.

Highlight Data Preprocessing Now, let us prepare the data for training.
Highlight

X = iris.data

We create variable X and assign all feature columns to it.
Highlight Y = iris.target Next, we assign the target column to the variable Y.
Highlight df = pd.DataFrame(X, columns=iris.feature_names)

df['target'] = Y

To analyze the data better, we create a DataFrame df using pd dot DataFrame.
Highlight corr_matrix = df[iris.feature_names].corr() We compute correlation values between features of the Iris dataset using df dot corr.

Now, we visualize this correlation using a heatmap.

The heatmap shows how features relate to one another.

Highlight Train and Test Split Next, we split the data into training and testing sets.
Highlight Model Instantiation of Multiclass Classification and Model training Let us now build a multiclass classification model.
Highlight mlr = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)

mlr.fit(X_train, Y_train)

We create an instance of LogisticRegression from the sklearn library.

Set multi underscore class equals multinomial and solver equals lbfgs.

We also set max underscore iter equals 1000 to ensure convergence.

Now we train the model using the fit method on the training data.

Ignore the warning in the output cell, if any.

Highlight

Y_train_pred = mlr.predict(X_train)

Now, we calculate and print the training accuracy.
Hightlight Training Accuracy: 0.981 The training accuracy is approximately 0.981, which is quite good.
Highlight

Train Log Loss: 0.1308

Next, we calculate the cross-entropy loss for the training data.

A Loss of 0.1308 shows the model is making accurate predictions.

Highlight

plt.figure(figsize=(8, 6))

for i in range(Y_train_pred_proba.shape[1]): # Iterate over each class

fpr, tpr, _ = roc_curve(Y_train == i, Y_train_pred_proba[:, i]) # One-vs-rest for each class

roc_auc = auc(fpr, tpr)

Let us now plot the ROC curve and calculate the ROC-AUC score.

The ROC curve shows TPR vs FPR at various threshold values.

TPR stands for True Positive Rate that is recall.

It is the fraction of actual positives correctly identified.

FPR stands for False Positive Rate.

It is the fraction of actual negatives wrongly classified as positives.

Show output plot
The ROC curve shows near-perfect classification.
The curves stay close to the top-left corner.
All three classes achieve an AUC of 1.00.
This indicates the model effectively distinguishes the classes.
Highlight: Predictions for Test Data Further, we predict labels for x underscore test.
Highlight test_data = X_test[15].reshape(1, -1)

predicted_class = mlr.predict(test_data)

We test the model on a single sample, similar to binary classification.

We compare the predicted with the actual test class.

Highlight: Predicted class: 0, Actual class: 0 The predicted value is 0, which is Setosa.

The actual value is also 0, hence prediction is correct.

Highlight Y_pred = mlr.predict(x_test) y underscore pred stores predicted labels for all test samples.
Highlight: print("Multiclass classification - Actual vs Predicted:") We compare the actual class labels with the predicted labels.
Highlight: Multiclass Logistic Regression - Actual vs Predicted: The output shows both actual and predicted label arrays.
Highlight: test_accuracy = accuracy_score(y_test, y_pred)

print(f"Test Accuracy: {test_accuracy:.3f}")

Now we calculate the test accuracy.
Highlight: Test Accuracy: 0.978 We get an accuracy of approximately 0.978, which is pretty good.
Highlight

# Predict probabilities for test set

Y_test_pred_proba = mlr.predict_proba(X_test)

We also compute ROC-AUC score and cross-entropy loss for test data.
Highlight

Test ROC-AUC Score (OvR): 0.9968

Test Log Loss: 0.1616

ROC-AUC score of 0.9968 indicates excellent performance.

Cross-entropy loss of 0.1616 shows the predictions are accurate.

Highlight

conf_matrix = confusion_matrix(Y_test, Y_pred)

Let us visualize the confusion matrix of the model.

It shows how well the model classifies each class.

Show output plot This matrix has three classes: 0, 1, and 2.

The diagonal values represent correct predictions.

One sample from Class 1 was incorrectly predicted as Class 2.

The absence of other misclassified values indicates that the model performs well.

A strong diagonal pattern suggests high classification accuracy.

Only narration Now we have successfully classified different Iris flower classes.

This brings us to the end of the tutorial. Let us summarize.

Show slide:

Summary

In this tutorial, we have learnt about*
Multiclass Classification for Logistic Regression
Show slide:

Assignment

As an assignment, please do the following:
  • Generate the classification report of the model using sklearn method as shown.
  • Use classification_report from sklearn dot metrics to display results.
  • This shows precision, recall, f1-score, and support for each class.
Show slide: After completing the assignment, the output should match the expected result.
Show Slide:
FOSSEE Forum
For any general or technical questions on Python for

Machine Learning, visit the FOSSEE forum and post your question.

Show Slide:
Thank You
This is Anvita Thadavoose Manjummel, a FOSSEE Summer Fellow 2025, IIT Bombay signing off.
Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat