F-score
The F-score is a metric used to evaluate classification models. It’s calculated using a model’s precision and recall. The F-score ranges from 0 (the worst performance) to 1 (the best performance).
F1 score
The F1 score is the harmonic mean, a type of average, of precision and recall.
The F1 score gives equal weight to precision and recall.
Fβ score
The Fβ score is a generic F-score that lets you choose how much weight to give to precision and recall using a β parameter.
In the Fβ score, recall is β times as important as precision. The larger the β, the more importance we give to recall and the less importance we give to precision. Two common values for β are 2, which gives recall twice as much weight as precision, and 0.5, which gives recall half as much weight as precision.
The F1 score, which gives equal weight to precision and recall, is the Fβ score with a β of 1.
Computing f-scores in scikit-learn
import numpy as np
from sklearn.metrics import f1_score, fbeta_score
y_true = np.array([1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1])
f1 = f1_score(y_true, y_pred)
f_half = fbeta_score(y_true, y_pred, beta=0.5)
f2 = fbeta_score(y_true, y_pred, beta=2)
print(f"F1 score: {f1:.2f}")
print(f"F0.5 score: {f_half:.2f}")
print(f"F2 score: {f2:.2f}")
This page references the following sources: