This is the second blog post on Logistic Regression. The first post on this topic is available here. The first post discussed the formulation of Logistic Regression and the estimation of its parameters using the concept of Maximum Likelihood Estimation. In this post I shall discuss some other aspects associated with Logistic Regression.
Handling Multi-class problems
Logistic regression, by default, is a binary classifier. But, with some modifications, the same can be adopted for a multiclass classification problem also. Most commonly used methods are:
- One versus Rest method
- Multinomial Logistic Regression (with deviance loss)
One versus Rest method
This method is very much intuitive in nature. As the name suggests, one of the classes is considered the positive class and the remaining all are put into the negative class to make it binomial in nature. Hence, if there are K classes, there will be K logistic regression models. When an input is given, the input is passed through all the models. Each model predicts the probability of occurrence of the corresponding class. Finally, that class is chosen which has the highest probability among the K probabilities. Mathematically, if for the ith data, the class is k , then
Correspondingly, the associated probability is calculated as per the equation given below.
Multinomial Logistic Regression (with deviance loss)
Another method of performing multiclass classification using logistic regression is to use numerical optimisation algorithms on a loss function. In this process, for K classes, not K models are built. Instead, all classes are modelled using the softmax function. Suppose W∈RK×d is the weight matrix (d being the dimension of the data) and b∈RK is the bias. In that case, a score function is used for every class as given below z=Wx+b If wk denotes the parameters associated with the kth class, the score function zk will be zk=wTkx+bk . This scoring function is converted to probability using the softmax function as shown below.
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import StandardScaler # Generate synthetic dataset X, y = make_classification(n_samples=1000, n_features=5, n_informative=5, n_redundant=0, n_classes=4, n_clusters_per_class=1, random_state=42) # Standardize features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Fit One-vs-Rest Logistic Regression ovr_model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=1000) ovr_model.fit(X_scaled, y) # Fit Softmax (Multinomial) Logistic Regression softmax_model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000) softmax_model.fit(X_scaled, y) # Plot heatmap of feature importance for OvR plt.figure(figsize=(10, 4)) sns.heatmap(ovr_model.coef_, annot=True, cmap='coolwarm', center=0, xticklabels=[f'Feature {i}' for i in range(X.shape[1])], yticklabels=[f'Class {i}' for i in range(ovr_model.coef_.shape[0])]) plt.title("Feature Importance - One-vs-Rest Logistic Regression") plt.xlabel("Features") plt.ylabel("Classes") plt.tight_layout() plt.show() # Plot heatmap of feature importance for Softmax plt.figure(figsize=(10, 4)) sns.heatmap(softmax_model.coef_, annot=True, cmap='coolwarm', center=0, xticklabels=[f'Feature {i}' for i in range(X.shape[1])], yticklabels=[f'Class {i}' for i in range(softmax_model.coef_.shape[0])]) plt.title("Feature Importance - Softmax (Multinomial) Logistic Regression") plt.xlabel("Features") plt.ylabel("Classes") plt.tight_layout() plt.show()
The corresponding heat maps are shown below.
As per the above plots, if feature 0 and feature 4 are increased by one unit, the log odds for class 2 will increase by 0.92 and 0.77, respectively, if softmax logistic regression is used. In a similar way, other parameters can be explained for different classes as well for both OvR and Softmax Logistic regression.
Conclusion
This post discussed the theoretical aspects associated with multiclass classification using logistic regression. The Python code is also provided for better understanding with feature importance plots. In the next post, I shall discuss the metrics associated with measuring the model's performance.
No comments:
Post a Comment