remove sklearn dependency from cohenkappa score calculation logic and applied custom calculation and updated tests by avishkarsonni · Pull Request #3731 · pytorch/ignite

avishkarsonni · 2026-04-14T09:17:27Z

Fixes #3701

Description:

Remove scikit-learn dependency from CohenKappa by implementing a native PyTorch version.

Replaced sklearn.metrics.cohen_kappa_score with a pure PyTorch implementation
Removed forced GPU→CPU transfer (.cpu().numpy()) — metric now runs fully on the configured device
Built confusion matrix with torch.bincount (single vectorised kernel) instead of a Python loop
Used float64 throughout to match sklearn's numerical precision
Added explicit multilabel input validation (previously handled implicitly by sklearn

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

… applied custom calculation and updated tests

aaishwarymishra · 2026-04-15T06:05:26Z

Thanks for the pr! In this metric we inherited form the EpochMetric because we need to send all the values to the scikit-learn. We are removing the dependency so I think we should inherit from Metric or a subclass of it and update there update method to efficiently update the

po: Relative observed agreement among raters
pe: Hypothetical probability of chance agreement

instead of using raw tensor values for final computation in compute.

avishkarsonni · 2026-04-16T11:32:02Z

so there is no need to bring all the raw tensors to the memory because there is no scikit-learn now, got it

aaishwarymishra · 2026-04-16T15:05:27Z

@avishkarsonni yes, instead of storing raw tensors in update we can just update the variables used for the calculation of po and pe, on compute we calculate these values from those variables and calculate the metric.
It saves memory as we are not storing tensors and compute as we are not performing a big calculation on huge tensors at end.
You can look at other metrics to get reference how its done on non EpochMetric.

Prathamesh8989 · 2026-04-20T16:11:33Z

Nice implementation! Using torch.bincount for the confusion matrix is a good approach and keeps the computation on GPU.

I had a couple of small observations:

It may be safer to guard against the case where p_e == 1, since (1 - p_e) would cause a division by zero.
If n == 0 (empty batch), the computation would also fail due to division.
It might be worth explicitly ensuring the confusion matrix tensor stays on the same device as the inputs.

Overall the implementation looks clean and matches the expected Cohen’s Kappa formulation.

vfdev-5 · 2026-04-21T08:42:55Z

+        n_classes = int(max(y_targets.max().item(), y_preds.max().item())) + 1
+
+        indices = y_targets * n_classes + y_preds
+        conf = torch.bincount(indices, minlength=n_classes * n_classes).reshape(n_classes, n_classes).double()


Why can't we use ignite's confusion matrix class to compute this metric?

If we use ConfusionMatrix, we should provide num_classes as input argument... In order to keep backward compatibility, we can add num_classes arg to the constructor as optional kwargs.
We can have two private cohen kappa implementations: one using EpochMetric (current one) and second using ConfusionMatrix. The public CohenKappa can route depending on num_classes arg.

In the private implementation using EpochMetric we should still use ConfusionMatrix to compute the confusion matrix in compute() method instead of doing that manually (as currently) to avoid bugs.

I used bincount as I was aware of it and from start planned as such but now as mentioned it is possible. in the start of this PR it was not intuitive to me as it needs num_classes at init. Current impl infers it dynamically from data, but it is indeed possible.

And again this method is better only when the user knows num_classes if not we will need a fall back to infer it from the data on its own. Both of the methods have same time complexity, just advantage of using ConfusionMatrix is edge case coverage over current approach, would love to hear what would be a better approach!

Benefit of using ConfusionMatrix is that we do not store the list of y_preds and y_true in RAM vs the current implementation using EpochMetric. Drawback of using ConfusionMatrix is that we should specify num_classes in the metric constructor and also it gives a backward compatibility break.

OK, the suggestion for this PR: let's use ConfusionMatrix in the _cohen_kappa_score in order to avoid using manual confusion matrix computation. We can instantiate an object of ConfusionMatrix with the number of classes and do a single update and call compute.

…Kappa score

vfdev-5 · 2026-04-22T10:43:50Z

+    cm = ConfusionMatrix(num_classes=num_classes, device=y_pred.device)
+    y_pred_oh = F.one_hot(y_pred.long(), num_classes).float()
+    cm.update((y_pred_oh, y.long()))
+    conf = cm.compute().double()


Passing to double like this wont work on MPS

but then the precision issue comes up as MPS needs float32 and CUDA goes for float64, I've been picking at this issue for some time, can I use torch.promote_types to pick highest precision? or that would create an overhead? or there is another way to add a try-catch block like if it rejects double() it goes to float()?

having different precision for the mps and cuda will be weird , either we can stick with .float() or convert the gpu tensors to cpu tensors and then call the .double()

the second one is what I am going with, can do good with better precision

I suggest to use different precisions double for cpu/cuda and float on mps as it does not support double using metric._double_dtype arg:

ignite/ignite/metrics/metric.py

Lines 389 to 390 in c830a06

self._double_dtype = torch.float64

if self._device.type == "mps":

…) method to better preserve precision for mps

vfdev-5

@avishkarsonni thanks a lot for the update of the code using the suggested approach.
However, I'm not totally agree with Aaishwarya about passing to cpu from an accelerator to perform computation on double. If device is cuda supporting double, passing to cpu is not optimal.
I think we can use _double_dtype for that:

ignite/ignite/metrics/metric.py

Lines 389 to 390 in c830a06

    
           self._double_dtype = torch.float64 
        
           if self._device.type == "mps":

aaishwarymishra · 2026-05-05T16:32:49Z

@avishkarsonni thanks a lot for the update of the code using the suggested approach. However, I'm not totally agree with Aaishwarya about passing to cpu from an accelerator to perform computation on double. If device is cuda supporting double, passing to cpu is not optimal. I think we can use _double_dtype for that:

ignite/ignite/metrics/metric.py

Lines 389 to 390 in c830a06

self._double_dtype = torch.float64

if self._device.type == "mps":

I am not sure what you mean exactly,
if you mean use '.double()' when cuda in available and fallback to cpu for mps I think its the better approach then mine initial one.

vfdev-5 · 2026-05-05T16:35:31Z

The suggestion is to use self._double_dtype in places where we need to cast a tensor to double dtype, like here:

ignite/ignite/metrics/multilabel_confusion_matrix.py

Lines 137 to 138 in c830a06

    
           conf = self.confusion_matrix.to(dtype=self._double_dtype) 
        
           sums = conf.sum(dim=(1, 2))

avishkarsonni · 2026-05-05T16:36:51Z

I too agree that wherever we can use the cuda acceleration we should use it, but it again creates a situation where we have to treat mps and cuda differently and cannot make the code agnostic, I don't know what is the better approach

vfdev-5 · 2026-05-05T17:41:14Z

it is just a precision difference, we do not treat them essentially differently, so it is fine to use my suggestion.
Once mps supports double so we can easily make it work.

…e available device using _double_dtype.

avishkarsonni added 2 commits April 14, 2026 13:09

remove sklearn dependency from cohenkappa score calculation logic and…

9c1b2ab

… applied custom calculation and updated tests

updated the docstring with the correct changes

d6782e7

github-actions Bot added the module: metrics Metrics module label Apr 14, 2026

Merge branch 'pytorch:master' into master

c5eefb0

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

changed the implementation to ConfusionMatrix for calculation of Cohe…

add7192

…Kappa score

vfdev-5 reviewed Apr 22, 2026

View reviewed changes

added the conversion of GPU tensors to CPU and then call the .double(…

8808d09

…) method to better preserve precision for mps

vfdev-5 reviewed May 5, 2026

View reviewed changes

avishkarsonni and others added 2 commits May 7, 2026 21:39

Merge branch 'pytorch:master' into master

e80d750

Changed the approach to the match the dtype of confision matrix to th…

16bcc77

…e available device using _double_dtype.

avishkarsonni requested review from aaishwarymishra and vfdev-5 May 7, 2026 16:19

	self._double_dtype = torch.float64
	if self._device.type == "mps":

Uh oh!

Conversation

avishkarsonni commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Uh oh!

aaishwarymishra commented Apr 15, 2026

Uh oh!

avishkarsonni commented Apr 16, 2026

Uh oh!

aaishwarymishra commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Prathamesh8989 commented Apr 20, 2026

Uh oh!

vfdev-5 Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avishkarsonni Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

avishkarsonni Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

vfdev-5 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

vfdev-5 Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

avishkarsonni May 1, 2026

Choose a reason for hiding this comment

Uh oh!

aaishwarymishra May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avishkarsonni May 5, 2026

Choose a reason for hiding this comment

Uh oh!

vfdev-5 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

aaishwarymishra commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented May 5, 2026

Uh oh!

avishkarsonni commented May 5, 2026

Uh oh!

vfdev-5 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avishkarsonni commented Apr 14, 2026 •

edited

Loading

aaishwarymishra commented Apr 16, 2026 •

edited

Loading

vfdev-5 Apr 21, 2026 •

edited

Loading

aaishwarymishra May 1, 2026 •

edited

Loading

aaishwarymishra commented May 5, 2026 •

edited

Loading