Difference between revisions of "Python-for-Machine-Learning/C2/Logistic-Regression-MultiClass-Classification/English"

Revision as of 16:43, 24 June 2025

Visual Cue	Narration
Show slide: Welcome	Welcome to the Spoken Tutorial on Logistic Regression - Multiclass Classification.
Show slide: Learning Objectives	In this tutorial, we will learn about Multiclass Classification for Logistic Regression
Show slide: System Requirements	To record this tutorial, I am using Ubuntu Linux OS version 24.04 Jupyter Notebook IDE
Show slide: Prerequisite	To follow this tutorial, The learner must have basic knowledge of Python. For pre-requisite Python tutorials, please visit this website.
Show slide: Code files	The files used in this tutorial are provided in the Code files link. Please download and extract the files. Make a copy and then use them while practicing.
Show slide: Iris flower classification	To implement the Multiclass classification model we will, Use the iris dataset to classify the iris flower. To know more about the iris dataset please watch earlier tutorials.
Point to the LR_Multiclass.ipynb	LR_Multiclass dot ipynb is the ipython notebook file created for this demonstration.
Press Ctrl+Alt+T keys Type conda activate ml Press Enter Highlight: (ml)	Let us open the Linux terminal by pressing Ctrl, Alt and T keys together. Activate the machine learning environment as shown.
Type cd Downloads Type jupyter notebook Press Enter	I have saved my code file in the Downloads folder. Please navigate to the respective folder of your code file location. Then type, jupyter space notebook and press Enter.
Show Jupyter Notebook Home page: Double Click on LR_Multiclass.ipynb file	We can see the Jupyter Notebook Home page has opened in the web browser. Click on the LR underscore Multiclass dot ipynb file to open it. Note that each cell will have the output displayed in this file. Let us see the implementation of multiclass logistic regression.
Highlight import pandas as pd	These are the necessary libraries to be imported for Multiclass classification.
Only narration Highlight: iris = load_iris() iris.data[:5]	We first load the Iris dataset using the load underscore iris method. The dataset is stored in the variable iris. Then we display the first five rows using the head method.
Highlight Data Preprocessing	Now, let us prepare the data for training.
Highlight X = iris.data	We create variable X and assign all feature columns to it.
Highlight Y = iris.target	Next, we assign the target column to the variable Y.
Highlight df = pd.DataFrame(X, columns=iris.feature_names) df['target'] = Y	To analyze the data better, we create a DataFrame df using pd dot DataFrame.
Highlight corr_matrix = df[iris.feature_names].corr()	We compute correlation values between features of the Iris dataset using df dot corr. Now, we visualize this correlation using a heatmap. The heatmap shows how features relate to one another.
Highlight Train and Test Split	Next, we split the data into training and testing sets.
Highlight Model Instantiation of Multiclass Classification and Model training	Let us now build a multiclass classification model.
Highlight mlr = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000) mlr.fit(X_train, Y_train)	We create an instance of LogisticRegression from the sklearn library. Set multi underscore class equals multinomial and solver equals lbfgs. We also set max underscore iter equals 1000 to ensure convergence. Now we train the model using the fit method on the training data. Ignore the warning in the output cell, if any.
Highlight Y_train_pred = mlr.predict(X_train)	Now, we calculate and print the training accuracy.
Hightlight Training Accuracy: 0.981	The training accuracy is approximately 0.981, which is quite good.
Highlight Train Log Loss: 0.1308	Next, we calculate the cross-entropy loss for the training data. A Loss of 0.1308 shows the model is making accurate predictions.
Highlight plt.figure(figsize=(8, 6)) for i in range(Y_train_pred_proba.shape[1]): # Iterate over each class fpr, tpr, _ = roc_curve(Y_train == i, Y_train_pred_proba[:, i]) # One-vs-rest for each class roc_auc = auc(fpr, tpr)	Let us now plot the ROC curve and calculate the ROC-AUC score. The ROC curve shows TPR vs FPR at various threshold values. TPR stands for True Positive Rate that is recall. It is the fraction of actual positives correctly identified. FPR stands for False Positive Rate. It is the fraction of actual negatives wrongly classified as positives.
Show output plot	The ROC curve shows near-perfect classification. The curves stay close to the top-left corner. All three classes achieve an AUC of 1.00. This indicates the model effectively distinguishes the classes.
Highlight: Predictions for Test Data	Further, we predict labels for x underscore test.
Highlight test_data = X_test[15].reshape(1, -1) predicted_class = mlr.predict(test_data)	We test the model on a single sample, similar to binary classification. We compare the predicted with the actual test class.
Highlight: Predicted class: 0, Actual class: 0	The predicted value is 0, which is Setosa. The actual value is also 0, hence prediction is correct.
Highlight Y_pred = mlr.predict(x_test)	y underscore pred stores predicted labels for all test samples.
Highlight: print("Multiclass classification - Actual vs Predicted:")	We compare the actual class labels with the predicted labels.
Highlight: Multiclass Logistic Regression - Actual vs Predicted:	The output shows both actual and predicted label arrays.
Highlight: test_accuracy = accuracy_score(y_test, y_pred) print(f"Test Accuracy: {test_accuracy:.3f}")	Now we calculate the test accuracy.
Highlight: Test Accuracy: 0.978	We get an accuracy of approximately 0.978, which is pretty good.
Highlight # Predict probabilities for test set Y_test_pred_proba = mlr.predict_proba(X_test)	We also compute ROC-AUC score and cross-entropy loss for test data.
Highlight Test ROC-AUC Score (OvR): 0.9968 Test Log Loss: 0.1616	ROC-AUC score of 0.9968 indicates excellent performance. Cross-entropy loss of 0.1616 shows the predictions are accurate.
Highlight conf_matrix = confusion_matrix(Y_test, Y_pred)	Let us visualize the confusion matrix of the model. It shows how well the model classifies each class.
Show output plot	This matrix has three classes: 0, 1, and 2. The diagonal values represent correct predictions. One sample from Class 1 was incorrectly predicted as Class 2. The absence of other misclassified values indicates that the model performs well. A strong diagonal pattern suggests high classification accuracy.
Only narration	Now we have successfully classified different Iris flower classes. This brings us to the end of the tutorial. Let us summarize.
Show slide: Summary	In this tutorial, we have learnt about* Multiclass Classification for Logistic Regression
Show slide: Assignment	As an assignment, please do the following: Generate the classification report of the model using sklearn method as shown. Use classification_report from sklearn dot metrics to display results. This shows precision, recall, f1-score, and support for each class.
Show slide:	After completing the assignment, the output should match the expected result.
Show Slide: FOSSEE Forum	For any general or technical questions on Python for Machine Learning, visit the FOSSEE forum and post your question.
Show Slide: Thank You	This is Anvita Thadavoose Manjummel, a FOSSEE Summer Fellow 2025, IIT Bombay signing off. Thanks for joining.

Contributors and Content Editors

Madhurig, Nirmala Venkat

Difference between revisions of "Python-for-Machine-Learning/C2/Logistic-Regression-MultiClass-Classification/English"

Revision as of 16:43, 24 June 2025

Contributors and Content Editors

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools