Skip to content
Open
60 changes: 42 additions & 18 deletions ignite/metrics/cohen_kappa.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@


class CohenKappa(EpochMetric):
"""Compute different types of Cohen's Kappa: Non-Wieghted, Linear, Quadratic.
Accumulating predictions and the ground-truth during an epoch and applying
`sklearn.metrics.cohen_kappa_score <https://scikit-learn.org/stable/modules/
generated/sklearn.metrics.cohen_kappa_score.html>`_ .
"""Compute different types of Cohen's Kappa: Non-Weighted, Linear, Quadratic.
Accumulating predictions and the ground-truth during an epoch and computing
Cohen's Kappa using native PyTorch operations via the formula:
κ = (p_o - p_e) / (1 - p_e), where p_o is the observed agreement and p_e
is the expected agreement computed from marginal probabilities.

Args:
output_transform: a callable that is used to transform the
Expand All @@ -19,10 +20,9 @@ class CohenKappa(EpochMetric):
you want to compute the metric with respect to one of the outputs.
weights: a string is used to define the type of Cohen's Kappa whether Non-Weighted or Linear
or Quadratic. Default, None.
check_compute_fn: Default False. If True, `cohen_kappa_score
<https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html>`_
is run on the first batch of data to ensure there are
no issues. User will be warned in case there are any issues computing the function.
check_compute_fn: Default False. If True, the compute function is run on the first batch
of data to ensure there are no issues. User will be warned in case there are any issues
computing the function.
device: optional device specification for internal storage.
skip_unrolling: specifies whether output should be unrolled before being fed to update method. Should be
true for multi-output model, for example, if ``y_pred`` contains multi-output as ``(y_pred_a, y_pred_b)``
Expand All @@ -31,7 +31,7 @@ class CohenKappa(EpochMetric):
Examples:
To use with ``Engine`` and ``process_function``, simply attach the metric instance to the engine.
The output of the engine's ``process_function`` needs to be in the format of
``(y_pred, y)`` or ``{'y_pred': y_pred, 'y': y, ...}``. If not, ``output_tranform`` can be added
``(y_pred, y)`` or ``{'y_pred': y_pred, 'y': y, ...}``. If not, ``output_transform`` can be added
to the metric to transform the output into the form expected by the metric.

.. include:: defaults.rst
Expand All @@ -52,6 +52,9 @@ class CohenKappa(EpochMetric):

.. versionchanged:: 0.5.1
``skip_unrolling`` argument is added.

.. versionchanged:: 0.6.0
Replaced scikit-learn dependency with a native PyTorch implementation.
"""

def __init__(
Expand All @@ -62,10 +65,6 @@ def __init__(
device: str | torch.device = torch.device("cpu"),
skip_unrolling: bool = False,
):
try:
from sklearn.metrics import cohen_kappa_score # noqa: F401
except ImportError:
raise ModuleNotFoundError("This contrib module requires scikit-learn to be installed.")
if weights not in (None, "linear", "quadratic"):
raise ValueError("Kappa Weighting type must be None or linear or quadratic.")

Expand All @@ -81,8 +80,33 @@ def __init__(
)

def _cohen_kappa_score(self, y_targets: torch.Tensor, y_preds: torch.Tensor) -> float:
from sklearn.metrics import cohen_kappa_score

y_true = y_targets.cpu().numpy()
y_pred = y_preds.cpu().numpy()
return cohen_kappa_score(y_true, y_pred, weights=self.weights)
if y_targets.ndim > 1 or y_preds.ndim > 1:
raise ValueError("multilabel-indicator is not supported")
n_classes = int(max(y_targets.max().item(), y_preds.max().item())) + 1

indices = y_targets * n_classes + y_preds
conf = torch.bincount(indices, minlength=n_classes * n_classes).reshape(n_classes, n_classes).double()
Copy link
Copy Markdown
Collaborator

@vfdev-5 vfdev-5 Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use ignite's confusion matrix class to compute this metric?

If we use ConfusionMatrix, we should provide num_classes as input argument... In order to keep backward compatibility, we can add num_classes arg to the constructor as optional kwargs.
We can have two private cohen kappa implementations: one using EpochMetric (current one) and second using ConfusionMatrix. The public CohenKappa can route depending on num_classes arg.

In the private implementation using EpochMetric we should still use ConfusionMatrix to compute the confusion matrix in compute() method instead of doing that manually (as currently) to avoid bugs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used bincount as I was aware of it and from start planned as such but now as mentioned it is possible. in the start of this PR it was not intuitive to me as it needs num_classes at init. Current impl infers it dynamically from data, but it is indeed possible.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again this method is better only when the user knows num_classes if not we will need a fall back to infer it from the data on its own. Both of the methods have same time complexity, just advantage of using ConfusionMatrix is edge case coverage over current approach, would love to hear what would be a better approach!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benefit of using ConfusionMatrix is that we do not store the list of y_preds and y_true in RAM vs the current implementation using EpochMetric. Drawback of using ConfusionMatrix is that we should specify num_classes in the metric constructor and also it gives a backward compatibility break.

OK, the suggestion for this PR: let's use ConfusionMatrix in the _cohen_kappa_score in order to avoid using manual confusion matrix computation. We can instantiate an object of ConfusionMatrix with the number of classes and do a single update and call compute.

n = conf.sum()

if self.weights is None:
p_o = conf.trace() / n
row = conf.sum(dim=1)
col = conf.sum(dim=0)
p_e = (row * col).sum() / (n * n)

else:
idx = torch.arange(n_classes, device=y_targets.device)
if self.weights == "linear":
w = torch.abs(idx.unsqueeze(0) - idx.unsqueeze(1)).double()
else:
w = ((idx.unsqueeze(0) - idx.unsqueeze(1)) ** 2).double()

w = w / w.max()
p_o = 1 - (w * conf).sum() / n
row = conf.sum(dim=1)
col = conf.sum(dim=0)
expected = row.unsqueeze(1) * col.unsqueeze(0) / n
p_e = 1 - (w * expected).sum() / n

kappa = (p_o - p_e) / (1 - p_e)
return kappa.item()
13 changes: 0 additions & 13 deletions tests/ignite/metrics/test_cohen_kappa.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import os
from unittest.mock import patch

import pytest
import sklearn
import torch
from sklearn.metrics import cohen_kappa_score

Expand All @@ -14,17 +12,6 @@
torch.manual_seed(12)


@pytest.fixture()
def mock_no_sklearn():
with patch.dict("sys.modules", {"sklearn.metrics": None}):
yield sklearn


def test_no_sklearn(mock_no_sklearn):
with pytest.raises(ModuleNotFoundError, match=r"This contrib module requires scikit-learn to be installed."):
CohenKappa()


def test_no_update():
ck = CohenKappa()

Expand Down