F-score

The F-score is a metric used to evaluate classification models. It’s calculated using a model’s precision and recall. The F-score ranges from 0 (the worst performance) to 1 (the best performance).

F1 score

The F1 score is the harmonic mean, a type of average, of precision and recall.

F 1 = 2 precision recall precision + recall

The F1 score gives equal weight to precision and recall.

Fβ score

The Fβ score is a generic F-score that lets you choose how much weight to give to precision and recall using a β parameter.

F β = ( 1 + β 2 ) precision recall ( β 2 precision ) + recall

In the Fβ score, recall is β times as important as precision. The larger the β, the more importance we give to recall and the less importance we give to precision. Two common values for β are 2, which gives recall twice as much weight as precision, and 0.5, which gives recall half as much weight as precision.

The F1 score, which gives equal weight to precision and recall, is the Fβ score with a β of 1.

Computing f-scores in scikit-learn

import numpy as np
from sklearn.metrics import f1_score, fbeta_score

y_true = np.array([1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1])

f1 = f1_score(y_true, y_pred)
f_half = fbeta_score(y_true, y_pred, beta=0.5)
f2 = fbeta_score(y_true, y_pred, beta=2)

print(f"F1 score: {f1:.2f}")
print(f"F0.5 score: {f_half:.2f}")
print(f"F2 score: {f2:.2f}")
F1 score: 0.90 F0.5 score: 0.96 F2 score: 0.85

This page references the following sources:

Here are all the notes in this garden, along with their links, visualized as a graph.