Difference between revisions of "Python-for-Machine-Learning/C2/Logistic-Regression-MultiClass-Classification/English"
(Created page with " <div style="margin-left:1.27cm;margin-right:0cm;"></div> {| border="1" |- || '''Visual Cue''' || '''Narration''' |- |- style="border:0.5pt solid #000000;padding-top:0cm;pad...") |
|||
| Line 1: | Line 1: | ||
| − | |||
| − | |||
| − | |||
| − | |||
{| border="1" | {| border="1" | ||
|- | |- | ||
|| '''Visual Cue''' | || '''Visual Cue''' | ||
|| '''Narration''' | || '''Narration''' | ||
| − | + | ||
| − | |- | + | |- |
| − | || | + | || Show slide: |
'''Welcome''' | '''Welcome''' | ||
|| Welcome to the Spoken Tutorial on '''Logistic Regression - Multiclass Classification.''' | || Welcome to the Spoken Tutorial on '''Logistic Regression - Multiclass Classification.''' | ||
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
'''Learning Objectives''' | '''Learning Objectives''' | ||
|| In this tutorial, we will learn about | || In this tutorial, we will learn about | ||
| − | * | + | * Multiclass Classification for Logistic Regression |
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
'''System Requirements''' | '''System Requirements''' | ||
|| To record this tutorial, I am using | || To record this tutorial, I am using | ||
| − | * | + | * '''Ubuntu Linux '''OS version''' 24.04''' |
| − | * | + | * '''Jupyter Notebook '''IDE |
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
'''Prerequisite''' | '''Prerequisite''' | ||
|| To follow this tutorial, | || To follow this tutorial, | ||
| − | * | + | * The learner must have basic knowledge of '''Python.''' |
| − | * | + | * For pre-requisite '''Python''' tutorials, please visit this website. |
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
'''Code files''' | '''Code files''' | ||
|| | || | ||
| − | * | + | * The files used in this tutorial are provided in the '''Code files '''link. |
| − | * | + | * Please download and extract the files. |
| − | * | + | * Make a copy and then use them while practicing. |
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
'''Iris flower classification''' | '''Iris flower classification''' | ||
|| To implement the '''Multiclass classification model '''we will, | || To implement the '''Multiclass classification model '''we will, | ||
| − | * | + | * Use the '''iris '''dataset to classify the '''iris '''flower. |
| − | * | + | * To know more about the '''iris''' dataset please watch earlier tutorials. |
| − | |- | + | |- |
|| Point to the '''LR_Multiclass.ipynb''' | || Point to the '''LR_Multiclass.ipynb''' | ||
|| '''LR_Multiclass dot ipynb '''is the ipython notebook file created for this demonstration. | || '''LR_Multiclass dot ipynb '''is the ipython notebook file created for this demonstration. | ||
| − | |- | + | |- |
|| Press '''Ctrl+Alt'''+'''T '''keys | || Press '''Ctrl+Alt'''+'''T '''keys | ||
| Line 65: | Line 61: | ||
Activate the machine learning environment as shown. | Activate the machine learning environment as shown. | ||
| − | |- | + | |- |
|| Type '''cd Downloads''' | || Type '''cd Downloads''' | ||
| Line 76: | Line 72: | ||
Then type, '''jupyter space notebook '''and press''' Enter.''' | Then type, '''jupyter space notebook '''and press''' Enter.''' | ||
| − | |- | + | |- |
|| Show Jupyter Notebook Home page: | || Show Jupyter Notebook Home page: | ||
| Line 84: | Line 80: | ||
Click on the''' LR underscore Multiclass dot ipynb '''file to open it. | Click on the''' LR underscore Multiclass dot ipynb '''file to open it. | ||
| − | + | Note that each cell will have the output displayed in this file. | |
Let us see the implementation of '''multiclass logistic regression'''. | Let us see the implementation of '''multiclass logistic regression'''. | ||
| − | |- | + | |- |
|| Highlight '''import pandas as pd ''' | || Highlight '''import pandas as pd ''' | ||
|| These are the necessary libraries to be imported for '''Multiclass classification.''' | || These are the necessary libraries to be imported for '''Multiclass classification.''' | ||
| − | |- | + | |- |
|| Only narration | || Only narration | ||
| Line 103: | Line 99: | ||
Then we display the first five rows using the '''head''' method. | Then we display the first five rows using the '''head''' method. | ||
| − | |- | + | |- |
|| Highlight '''Data Preprocessing''' | || Highlight '''Data Preprocessing''' | ||
|| Now, let us prepare the data for training. | || Now, let us prepare the data for training. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
'''X = iris.data ''' | '''X = iris.data ''' | ||
|| We create variable '''X''' and assign all feature columns to it. | || We create variable '''X''' and assign all feature columns to it. | ||
| − | |- | + | |- |
|| Highlight '''Y = iris.target''' | || Highlight '''Y = iris.target''' | ||
|| Next, we assign the target column to the variable '''Y'''. | || Next, we assign the target column to the variable '''Y'''. | ||
| − | |- | + | |- |
|| Highlight '''df = pd.DataFrame(X, columns=iris.feature_names)''' | || Highlight '''df = pd.DataFrame(X, columns=iris.feature_names)''' | ||
'''df['target'] = Y''' | '''df['target'] = Y''' | ||
|| To analyze the data better, we create a DataFrame '''df''' using '''pd dot DataFrame'''. | || To analyze the data better, we create a DataFrame '''df''' using '''pd dot DataFrame'''. | ||
| − | |- | + | |- |
|| Highlight '''corr_matrix = df[iris.feature_names].corr()''' | || Highlight '''corr_matrix = df[iris.feature_names].corr()''' | ||
|| We compute '''correlation''' values between features of the '''Iris''' dataset using '''df dot corr'''. | || We compute '''correlation''' values between features of the '''Iris''' dataset using '''df dot corr'''. | ||
| Line 126: | Line 122: | ||
The '''heatmap''' shows how features relate to one another. | The '''heatmap''' shows how features relate to one another. | ||
| − | |- | + | |- |
|| Highlight '''Train and Test Split''' | || Highlight '''Train and Test Split''' | ||
|| Next, we split the data into training and testing sets. | || Next, we split the data into training and testing sets. | ||
| − | |- | + | |- |
|| Highlight '''Model Instantiation of Multiclass Classification and Model training''' | || Highlight '''Model Instantiation of Multiclass Classification and Model training''' | ||
|| Let us now build a multiclass classification model. | || Let us now build a multiclass classification model. | ||
| − | |- | + | |- |
|| Highlight '''mlr = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000) ''' | || Highlight '''mlr = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000) ''' | ||
| Line 145: | Line 141: | ||
Ignore the warning in the output cell, if any. | Ignore the warning in the output cell, if any. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 152: | Line 148: | ||
|| Now, we calculate and print the '''training accuracy'''. | || Now, we calculate and print the '''training accuracy'''. | ||
| − | |- | + | |- |
|| Hightlight '''Training Accuracy: 0.981''' | || Hightlight '''Training Accuracy: 0.981''' | ||
|| The '''training accuracy''' is approximately '''0.981''', which is quite good. | || The '''training accuracy''' is approximately '''0.981''', which is quite good. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 162: | Line 158: | ||
A Loss of '''0.1308''' shows the model is making accurate predictions. | A Loss of '''0.1308''' shows the model is making accurate predictions. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 185: | Line 181: | ||
It is the fraction of actual negatives wrongly classified as positives. | It is the fraction of actual negatives wrongly classified as positives. | ||
| − | |- | + | |- |
|| Show output plot | || Show output plot | ||
| − | || | + | || The '''ROC curve''' shows near-perfect classification. |
| − | + | The curves stay close to the '''top-left corner'''. | |
| − | + | All three classes achieve an '''AUC''' of '''1.00'''. | |
| − | + | This indicates the model effectively distinguishes the classes. | |
| − | |- | + | |- |
|| Highlight: '''Predictions for Test Data''' | || Highlight: '''Predictions for Test Data''' | ||
|| Further, we predict labels for x underscore test. | || Further, we predict labels for x underscore test. | ||
| − | |- | + | |- |
|| Highlight '''test_data = X_test[15].reshape(1, -1)''' | || Highlight '''test_data = X_test[15].reshape(1, -1)''' | ||
| Line 204: | Line 200: | ||
We compare the predicted with the actual test class. | We compare the predicted with the actual test class. | ||
| − | |- | + | |- |
|| Highlight: '''Predicted class: 0, Actual class: 0''' | || Highlight: '''Predicted class: 0, Actual class: 0''' | ||
|| The predicted value is 0, which is Setosa. | || The predicted value is 0, which is Setosa. | ||
The actual value is also 0, hence prediction is correct. | The actual value is also 0, hence prediction is correct. | ||
| − | |- | + | |- |
|| Highlight '''Y_pred = mlr.predict(x_test)''' | || Highlight '''Y_pred = mlr.predict(x_test)''' | ||
|| '''y underscore pred '''stores predicted labels for all test samples. | || '''y underscore pred '''stores predicted labels for all test samples. | ||
| − | |- | + | |- |
|| Highlight: '''print("Multiclass classification - Actual vs Predicted:")''' | || Highlight: '''print("Multiclass classification - Actual vs Predicted:")''' | ||
|| We compare the actual class labels with the predicted labels. | || We compare the actual class labels with the predicted labels. | ||
| − | |- | + | |- |
|| Highlight:''' Multiclass Logistic Regression - Actual vs Predicted:''' | || Highlight:''' Multiclass Logistic Regression - Actual vs Predicted:''' | ||
|| The output shows both actual and predicted label arrays. | || The output shows both actual and predicted label arrays. | ||
| − | |- | + | |- |
|| Highlight: '''test_accuracy = accuracy_score(y_test, y_pred)''' | || Highlight: '''test_accuracy = accuracy_score(y_test, y_pred)''' | ||
| Line 224: | Line 220: | ||
|| Now we calculate the '''test accuracy'''. | || Now we calculate the '''test accuracy'''. | ||
| − | |- | + | |- |
|| Highlight:''' Test Accuracy: 0.978''' | || Highlight:''' Test Accuracy: 0.978''' | ||
|| We get an accuracy of approximately '''0.978''', which is pretty good. | || We get an accuracy of approximately '''0.978''', which is pretty good. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 236: | Line 232: | ||
|| We also compute '''ROC-AUC score''' and '''cross-entropy loss''' for test data. | || We also compute '''ROC-AUC score''' and '''cross-entropy loss''' for test data. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 246: | Line 242: | ||
'''Cross-entropy loss''' of '''0.1616''' shows the predictions are accurate. | '''Cross-entropy loss''' of '''0.1616''' shows the predictions are accurate. | ||
| − | |- | + | |- |
|| Highlight | || Highlight | ||
| Line 255: | Line 251: | ||
It shows how well the model classifies each class. | It shows how well the model classifies each class. | ||
| − | |- | + | |- |
|| Show output plot | || Show output plot | ||
|| This matrix has three classes: 0, 1, and 2. | || This matrix has three classes: 0, 1, and 2. | ||
| Line 266: | Line 262: | ||
A '''strong diagonal pattern''' suggests '''high classification accuracy'''. | A '''strong diagonal pattern''' suggests '''high classification accuracy'''. | ||
| − | |- | + | |- |
|| Only narration | || Only narration | ||
|| Now we have successfully classified different Iris flower classes. | || Now we have successfully classified different Iris flower classes. | ||
This brings us to the end of the tutorial. Let us summarize. | This brings us to the end of the tutorial. Let us summarize. | ||
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
'''Summary''' | '''Summary''' | ||
| − | || In this tutorial, we have learnt about* | + | || In this tutorial, we have learnt about* Multiclass Classification for Logistic Regression |
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
| Line 286: | Line 282: | ||
* This shows '''precision''', '''recall''', '''f1-score''', and '''support''' for each class. | * This shows '''precision''', '''recall''', '''f1-score''', and '''support''' for each class. | ||
| − | |- | + | |- |
|| Show slide: | || Show slide: | ||
|| After completing the assignment, the output should match the expected result. | || After completing the assignment, the output should match the expected result. | ||
| − | |- | + | |- |
| − | || | + | || Show Slide: |
| − | + | '''FOSSEE Forum''' | |
| − | || | + | || For any general or technical questions on '''Python for''' |
| − | + | '''Machine Learning''', visit the''' FOSSEE forum''' and post your question. | |
| − | |- | + | |- |
| − | || | + | || Show Slide: |
| − | + | '''Thank You''' | |
| − | || | + | || This is '''Anvita Thadavoose Manjummel''', a FOSSEE Summer Fellow 2025, IIT Bombay signing off. |
| − | + | Thanks for joining. | |
|- | |- | ||
|} | |} | ||
Revision as of 12:43, 11 July 2025
| Visual Cue | Narration |
| Show slide:
Welcome |
Welcome to the Spoken Tutorial on Logistic Regression - Multiclass Classification. |
| Show slide:
Learning Objectives |
In this tutorial, we will learn about
|
| Show slide:
System Requirements |
To record this tutorial, I am using
|
| Show slide:
Prerequisite |
To follow this tutorial,
|
| Show slide:
Code files |
|
| Show slide:
Iris flower classification |
To implement the Multiclass classification model we will,
|
| Point to the LR_Multiclass.ipynb | LR_Multiclass dot ipynb is the ipython notebook file created for this demonstration. |
| Press Ctrl+Alt+T keys
Type conda activate ml Press Enter Highlight: (ml) |
Let us open the Linux terminal by pressing Ctrl, Alt and T keys together.
Activate the machine learning environment as shown. |
| Type cd Downloads
Type jupyter notebook Press Enter |
I have saved my code file in the Downloads folder.
Please navigate to the respective folder of your code file location. Then type, jupyter space notebook and press Enter. |
| Show Jupyter Notebook Home page:
Double Click on LR_Multiclass.ipynb file |
We can see the Jupyter Notebook Home page has opened in the web browser.
Click on the LR underscore Multiclass dot ipynb file to open it. Note that each cell will have the output displayed in this file. Let us see the implementation of multiclass logistic regression. |
| Highlight import pandas as pd | These are the necessary libraries to be imported for Multiclass classification. |
| Only narration
Highlight: iris = load_iris() iris.data[:5] |
We first load the Iris dataset using the load underscore iris method.
The dataset is stored in the variable iris. Then we display the first five rows using the head method. |
| Highlight Data Preprocessing | Now, let us prepare the data for training. |
| Highlight
X = iris.data |
We create variable X and assign all feature columns to it. |
| Highlight Y = iris.target | Next, we assign the target column to the variable Y. |
| Highlight df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = Y |
To analyze the data better, we create a DataFrame df using pd dot DataFrame. |
| Highlight corr_matrix = df[iris.feature_names].corr() | We compute correlation values between features of the Iris dataset using df dot corr.
Now, we visualize this correlation using a heatmap. The heatmap shows how features relate to one another. |
| Highlight Train and Test Split | Next, we split the data into training and testing sets. |
| Highlight Model Instantiation of Multiclass Classification and Model training | Let us now build a multiclass classification model. |
| Highlight mlr = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)
mlr.fit(X_train, Y_train) |
We create an instance of LogisticRegression from the sklearn library.
Set multi underscore class equals multinomial and solver equals lbfgs. We also set max underscore iter equals 1000 to ensure convergence. Now we train the model using the fit method on the training data. Ignore the warning in the output cell, if any. |
| Highlight
Y_train_pred = mlr.predict(X_train) |
Now, we calculate and print the training accuracy. |
| Hightlight Training Accuracy: 0.981 | The training accuracy is approximately 0.981, which is quite good. |
| Highlight
Train Log Loss: 0.1308 |
Next, we calculate the cross-entropy loss for the training data.
A Loss of 0.1308 shows the model is making accurate predictions. |
| Highlight
plt.figure(figsize=(8, 6)) for i in range(Y_train_pred_proba.shape[1]): # Iterate over each class fpr, tpr, _ = roc_curve(Y_train == i, Y_train_pred_proba[:, i]) # One-vs-rest for each class roc_auc = auc(fpr, tpr) |
Let us now plot the ROC curve and calculate the ROC-AUC score.
The ROC curve shows TPR vs FPR at various threshold values. TPR stands for True Positive Rate that is recall. It is the fraction of actual positives correctly identified. FPR stands for False Positive Rate. It is the fraction of actual negatives wrongly classified as positives. |
| Show output plot | The ROC curve shows near-perfect classification.
The curves stay close to the top-left corner. All three classes achieve an AUC of 1.00. This indicates the model effectively distinguishes the classes. |
| Highlight: Predictions for Test Data | Further, we predict labels for x underscore test. |
| Highlight test_data = X_test[15].reshape(1, -1)
predicted_class = mlr.predict(test_data) |
We test the model on a single sample, similar to binary classification.
We compare the predicted with the actual test class. |
| Highlight: Predicted class: 0, Actual class: 0 | The predicted value is 0, which is Setosa.
The actual value is also 0, hence prediction is correct. |
| Highlight Y_pred = mlr.predict(x_test) | y underscore pred stores predicted labels for all test samples. |
| Highlight: print("Multiclass classification - Actual vs Predicted:") | We compare the actual class labels with the predicted labels. |
| Highlight: Multiclass Logistic Regression - Actual vs Predicted: | The output shows both actual and predicted label arrays. |
| Highlight: test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test Accuracy: {test_accuracy:.3f}") |
Now we calculate the test accuracy. |
| Highlight: Test Accuracy: 0.978 | We get an accuracy of approximately 0.978, which is pretty good. |
| Highlight
# Predict probabilities for test set Y_test_pred_proba = mlr.predict_proba(X_test) |
We also compute ROC-AUC score and cross-entropy loss for test data. |
| Highlight
Test ROC-AUC Score (OvR): 0.9968 Test Log Loss: 0.1616 |
ROC-AUC score of 0.9968 indicates excellent performance.
Cross-entropy loss of 0.1616 shows the predictions are accurate. |
| Highlight
conf_matrix = confusion_matrix(Y_test, Y_pred) |
Let us visualize the confusion matrix of the model.
It shows how well the model classifies each class. |
| Show output plot | This matrix has three classes: 0, 1, and 2.
The diagonal values represent correct predictions. One sample from Class 1 was incorrectly predicted as Class 2. The absence of other misclassified values indicates that the model performs well. A strong diagonal pattern suggests high classification accuracy. |
| Only narration | Now we have successfully classified different Iris flower classes.
This brings us to the end of the tutorial. Let us summarize. |
| Show slide:
Summary |
In this tutorial, we have learnt about* Multiclass Classification for Logistic Regression |
| Show slide:
Assignment |
As an assignment, please do the following:
|
| Show slide: | After completing the assignment, the output should match the expected result. |
| Show Slide:
FOSSEE Forum |
For any general or technical questions on Python for
Machine Learning, visit the FOSSEE forum and post your question. |
| Show Slide:
Thank You |
This is Anvita Thadavoose Manjummel, a FOSSEE Summer Fellow 2025, IIT Bombay signing off.
Thanks for joining. |