Planted
Sep 22, 2023 Last tended
Nov 12, 2023

F-score

The F-score is a metric used to evaluate classification models. It’s calculated using a model’s precision and recall. The F-score ranges from 0 (the worst performance) to 1 (the best performance).

F₁ score

The F₁ score is the harmonic mean, a type of average, of precision and recall.

F_{1} = \frac{2 \cdot precision \cdot recall}{precision + recall}

The F₁ score gives equal weight to precision and recall.

F_β score

The F_β score is a generic F-score that lets you choose how much weight to give to precision and recall using a β parameter.

F_{β} = \frac{(1 + β^{2}) \cdot precision \cdot recall}{(β^{2} \cdot precision) + recall}

In the F_β score, recall is β times as important as precision. The larger the β, the more importance we give to recall and the less importance we give to precision. Two common values for β are 2, which gives recall twice as much weight as precision, and 0.5, which gives recall half as much weight as precision.

The F₁ score, which gives equal weight to precision and recall, is the F_β score with a β of 1.

Computing f-scores in scikit-learn

import numpy as np
from sklearn.metrics import f1_score, fbeta_score

y_true = np.array([1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1])

f1 = f1_score(y_true, y_pred)
f_half = fbeta_score(y_true, y_pred, beta=0.5)
f2 = fbeta_score(y_true, y_pred, beta=2)

print(f"F1 score: {f1:.2f}")
print(f"F0.5 score: {f_half:.2f}")
print(f"F2 score: {f2:.2f}")

F1 score: 0.90 F0.5 score: 0.96 F2 score: 0.85

This page references the following sources:

F1 score

Fβ score

Computing f-scores in scikit-learn

F₁ score

F_β score