Planted
Oct 2, 2023 Last tended
Nov 12, 2023

Logistic Regression

Logistic regression is a regression algorithm that predicts a probability. It applies a linear function of input features to generate raw predictions and then passes them through a sigmoid function to produce values between 0 and 1.

We can represent a logistic regression model with the following equation:

\hat{y} = \frac{1}{1 + e^{- z}}

where:

z is a linear combination of input features, b + w₁x₁ + ··· + w_nx_n

Though logistic regression is a regression model, it’s commonly used for classification tasks, wherein its output represents the probability that an example belongs to a given class. For instance, in binary classification, the output of a logistic regression model represents the probability that an example belongs to the positive class.

Binary classification using logistic regression

To perform binary classification, we choose a classification threshold (e.g., 0.5) and convert the model’s output to a positive or negative label based on the threshold. If the model’s output is greater than or equal to the threshold (e.g, 0.75), we classify the example as positive. However, if the model’s output is less than the threshold (e.g., 0.25), we classify the example as negative.

Logistic regression in scikit-learn

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X, y = make_classification(
    random_state=4, n_features=1, n_redundant=0, n_informative=1,
    n_clusters_per_class=1)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1)

clf = LogisticRegression().fit(X_train, y_train)
score = clf.score(X_test, y_test)

print(f"Mean accuracy: {score:.2f}")

Mean accuracy: 0.95

Portions of this page are reproduced from work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License.

This page references the following sources: