Skip to content

remove sklearn dependency from cohenkappa score calculation logic and applied custom calculation and updated tests#3731

Open
avishkarsonni wants to merge 7 commits intopytorch:masterfrom
avishkarsonni:master
Open

remove sklearn dependency from cohenkappa score calculation logic and applied custom calculation and updated tests#3731
avishkarsonni wants to merge 7 commits intopytorch:masterfrom
avishkarsonni:master

Conversation

@avishkarsonni
Copy link
Copy Markdown

@avishkarsonni avishkarsonni commented Apr 14, 2026

Fixes #3701

Description:

Remove scikit-learn dependency from CohenKappa by implementing a native PyTorch version.

  • Replaced sklearn.metrics.cohen_kappa_score with a pure PyTorch implementation
  • Removed forced GPU→CPU transfer (.cpu().numpy()) — metric now runs fully on the configured device
  • Built confusion matrix with torch.bincount (single vectorised kernel) instead of a Python loop
  • Used float64 throughout to match sklearn's numerical precision
  • Added explicit multilabel input validation (previously handled implicitly by sklearn

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions Bot added the module: metrics Metrics module label Apr 14, 2026
@aaishwarymishra
Copy link
Copy Markdown
Collaborator

Thanks for the pr! In this metric we inherited form the EpochMetric because we need to send all the values to the scikit-learn. We are removing the dependency so I think we should inherit from Metric or a subclass of it and update there update method to efficiently update the

po: Relative observed agreement among raters
pe: Hypothetical probability of chance agreement

instead of using raw tensor values for final computation in compute.

@avishkarsonni
Copy link
Copy Markdown
Author

so there is no need to bring all the raw tensors to the memory because there is no scikit-learn now, got it

@aaishwarymishra
Copy link
Copy Markdown
Collaborator

aaishwarymishra commented Apr 16, 2026

@avishkarsonni yes, instead of storing raw tensors in update we can just update the variables used for the calculation of po and pe, on compute we calculate these values from those variables and calculate the metric.
It saves memory as we are not storing tensors and compute as we are not performing a big calculation on huge tensors at end.
You can look at other metrics to get reference how its done on non EpochMetric.

@Prathamesh8989
Copy link
Copy Markdown
Contributor

Nice implementation! Using torch.bincount for the confusion matrix is a good approach and keeps the computation on GPU.

I had a couple of small observations:

It may be safer to guard against the case where p_e == 1, since (1 - p_e) would cause a division by zero.
If n == 0 (empty batch), the computation would also fail due to division.
It might be worth explicitly ensuring the confusion matrix tensor stays on the same device as the inputs.

Overall the implementation looks clean and matches the expected Cohen’s Kappa formulation.

Comment thread ignite/metrics/cohen_kappa.py Outdated
n_classes = int(max(y_targets.max().item(), y_preds.max().item())) + 1

indices = y_targets * n_classes + y_preds
conf = torch.bincount(indices, minlength=n_classes * n_classes).reshape(n_classes, n_classes).double()
Copy link
Copy Markdown
Collaborator

@vfdev-5 vfdev-5 Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use ignite's confusion matrix class to compute this metric?

If we use ConfusionMatrix, we should provide num_classes as input argument... In order to keep backward compatibility, we can add num_classes arg to the constructor as optional kwargs.
We can have two private cohen kappa implementations: one using EpochMetric (current one) and second using ConfusionMatrix. The public CohenKappa can route depending on num_classes arg.

In the private implementation using EpochMetric we should still use ConfusionMatrix to compute the confusion matrix in compute() method instead of doing that manually (as currently) to avoid bugs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used bincount as I was aware of it and from start planned as such but now as mentioned it is possible. in the start of this PR it was not intuitive to me as it needs num_classes at init. Current impl infers it dynamically from data, but it is indeed possible.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again this method is better only when the user knows num_classes if not we will need a fall back to infer it from the data on its own. Both of the methods have same time complexity, just advantage of using ConfusionMatrix is edge case coverage over current approach, would love to hear what would be a better approach!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benefit of using ConfusionMatrix is that we do not store the list of y_preds and y_true in RAM vs the current implementation using EpochMetric. Drawback of using ConfusionMatrix is that we should specify num_classes in the metric constructor and also it gives a backward compatibility break.

OK, the suggestion for this PR: let's use ConfusionMatrix in the _cohen_kappa_score in order to avoid using manual confusion matrix computation. We can instantiate an object of ConfusionMatrix with the number of classes and do a single update and call compute.

Comment thread ignite/metrics/cohen_kappa.py Outdated
cm = ConfusionMatrix(num_classes=num_classes, device=y_pred.device)
y_pred_oh = F.one_hot(y_pred.long(), num_classes).float()
cm.update((y_pred_oh, y.long()))
conf = cm.compute().double()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing to double like this wont work on MPS

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then the precision issue comes up as MPS needs float32 and CUDA goes for float64, I've been picking at this issue for some time, can I use torch.promote_types to pick highest precision? or that would create an overhead? or there is another way to add a try-catch block like if it rejects double() it goes to float()?

Copy link
Copy Markdown
Collaborator

@aaishwarymishra aaishwarymishra May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having different precision for the mps and cuda will be weird , either we can stick with .float() or convert the gpu tensors to cpu tensors and then call the .double()

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second one is what I am going with, can do good with better precision

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use different precisions double for cpu/cuda and float on mps as it does not support double using metric._double_dtype arg:

self._double_dtype = torch.float64
if self._device.type == "mps":

…) method to better preserve precision for mps
Copy link
Copy Markdown
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avishkarsonni thanks a lot for the update of the code using the suggested approach.
However, I'm not totally agree with Aaishwarya about passing to cpu from an accelerator to perform computation on double. If device is cuda supporting double, passing to cpu is not optimal.
I think we can use _double_dtype for that:

self._double_dtype = torch.float64
if self._device.type == "mps":

@aaishwarymishra
Copy link
Copy Markdown
Collaborator

aaishwarymishra commented May 5, 2026

@avishkarsonni thanks a lot for the update of the code using the suggested approach. However, I'm not totally agree with Aaishwarya about passing to cpu from an accelerator to perform computation on double. If device is cuda supporting double, passing to cpu is not optimal. I think we can use _double_dtype for that:

self._double_dtype = torch.float64
if self._device.type == "mps":

I am not sure what you mean exactly,
if you mean use '.double()' when cuda in available and fallback to cpu for mps I think its the better approach then mine initial one.

@vfdev-5
Copy link
Copy Markdown
Collaborator

vfdev-5 commented May 5, 2026

The suggestion is to use self._double_dtype in places where we need to cast a tensor to double dtype, like here:

conf = self.confusion_matrix.to(dtype=self._double_dtype)
sums = conf.sum(dim=(1, 2))

@avishkarsonni
Copy link
Copy Markdown
Author

I too agree that wherever we can use the cuda acceleration we should use it, but it again creates a situation where we have to treat mps and cuda differently and cannot make the code agnostic, I don't know what is the better approach

@vfdev-5
Copy link
Copy Markdown
Collaborator

vfdev-5 commented May 5, 2026

it is just a precision difference, we do not treat them essentially differently, so it is fine to use my suggestion.
Once mps supports double so we can easily make it work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: metrics Metrics module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove scikit-learn dependency from CohenKappa metric by implementing a native PyTorch version

4 participants