Logistic Regression
Logistic regression is a regression algorithm that predicts a probability. It applies a linear function of input features to generate raw predictions and then passes them through a sigmoid function to produce values between 0 and 1.
We can represent a logistic regression model with the following equation:
where:
-
z
is a linear combination of input features, b + w1x1 + Β·Β·Β· + wnxn
Though logistic regression is a regression model, itβs commonly used for classification tasks, wherein its output represents the probability that an example belongs to a given class. For instance, in binary classification, the output of a logistic regression model represents the probability that an example belongs to the positive class.
To perform binary classification, we choose a classification threshold (e.g., 0.5) and convert the modelβs output to a positive or negative label based on the threshold. If the modelβs output is greater than or equal to the threshold (e.g, 0.75), we classify the example as positive. However, if the modelβs output is less than the threshold (e.g., 0.25), we classify the example as negative.
Logistic regression in scikit-learn
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X, y = make_classification(
random_state=4, n_features=1, n_redundant=0, n_informative=1,
n_clusters_per_class=1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=1)
clf = LogisticRegression().fit(X_train, y_train)
score = clf.score(X_test, y_test)
print(f"Mean accuracy: {score:.2f}")
Portions of this page are reproduced from work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License.
This page references the following sources: