feat(metrics): add Perplexity metric to ignite.metrics.nlp by steaphenai · Pull Request #3743 · pytorch/ignite

steaphenai · 2026-04-20T09:41:36Z

Closes #3742

Summary

Add a new Perplexity metric implementation in ignite.metrics.nlp.perplexity.
Export Perplexity from both ignite.metrics.nlp and top-level ignite.metrics.
Add dedicated tests for correctness, accumulation behavior, reset behavior, return type, and invalid inputs.

Test plan

python -m pytest tests/ignite/metrics/nlp/test_perplexity.py -v
Smoke test:
- python -c "from ignite.metrics.nlp import Perplexity; import torch; ppl = Perplexity(); ppl.reset(); ppl.update((torch.randn(2,5,3), torch.randint(0,5,(2,3)))); print('PPL =', ppl.compute())"

Files changed

ignite/metrics/nlp/perplexity.py
ignite/metrics/nlp/__init__.py
ignite/metrics/__init__.py
tests/ignite/metrics/nlp/test_perplexity.py

Expose a new token-level Perplexity metric in ignite.metrics.nlp and top-level ignite.metrics, with dedicated unit tests to validate correctness and behavior.

Prathamesh8989 · 2026-04-20T16:19:16Z

Nice addition! Perplexity is definitely a useful metric for language modeling and it fits well under ignite.metrics.nlp.

The test coverage looks solid — especially the token-weighted accumulation test, which ensures correctness across batches with different sequence lengths.

One small suggestion: it might be useful to add a GPU test to ensure the metric behaves correctly when tensors are on CUDA devices, since many language modeling workloads run on GPU.

Something like:

def test_gpu_support():
    if not torch.cuda.is_available():
        pytest.skip()

Overall the implementation and tests look clean and consistent with existing Ignite metrics.

steaphenai · 2026-04-20T17:05:41Z

Good point, thanks. I’d like to keep this PR scoped to the Perplexity implementation and core correctness tests. We can add a dedicated CUDA test if maintainers want explicit GPU coverage.
@vfdev-5 thoughts?

vfdev-5

@steaphenai thanks for the PR, I made a quick pass and left few comments.

The tests look shallow and there is no reference implementation that we test against.
I suggest to check what we can use for reference implementation.
In terms of testing on accelerators check other tests like test_accuracy.py to inspire from.

steaphenai · 2026-04-21T10:30:18Z

@steaphenai thanks for the PR, I made a quick pass and left few comments.

The tests look shallow and there is no reference implementation that we test against. I suggest to check what we can use for reference implementation. In terms of testing on accelerators check other tests like test_accuracy.py to inspire from.

Thanks for the quick review, @vfdev-5.
I addressed the two code comments in the latest push:
detached tensors from the grad graph in Perplexity.update()
removed test_returns_float
I’ll also use existing metric tests (e.g., test_accuracy.py) as reference for accelerator-oriented test patterns as needed.

steaphenai · 2026-04-21T11:31:09Z

I added an explicit reference implementation check (_reference_perplexity) and validated both single-batch and multi-batch token-weighted accumulation against it and also I aligned test structure with existing Ignite metric patterns (as in test_accuracy.py): available_device parametrization, device assertions, and distributed-marked test layout.
Local run: python -m pytest tests/ignite/metrics/nlp/test_perplexity.py -m "not distributed" -v -> 10 passed, 12 skipped (CUDA/MPS skipped when unavailable).

…on test

Co-authored-by: vfdev <vfdev.5@gmail.com>

… trivial test

vfdev-5 · 2026-04-23T15:29:39Z

@steaphenai code style check is failing: https://github.com/pytorch/ignite/actions/runs/24830927009/job/72725024776?pr=3743

…enai/ignite into feat/perplexity-metric-pr

…engths

…umulation test

steaphenai · 2026-04-23T16:50:04Z

@vfdev-5 Could you approve the workflows to run? The required checks are pending approval.

vfdev-5 · 2026-04-24T15:41:27Z

@steaphenai this failure is real: https://github.com/pytorch/ignite/actions/runs/24847148226/job/72879291595?pr=3743
Check other metrics how do they handle double dtype

…rs for MPS compatibility

steaphenai · 2026-04-24T21:09:07Z

@vfdev-5 I checked other metrics in ignite.metrics.nlp. I found that BLEU and ROUGE don't force dtype=torch.double on their accumulators. I've removed the explicit dtype=torch.double and dtype=torch.long from _sum_of_nll and _num_tokens in reset(), and removed dtype=torch.double from the .to() call in update() to match that pattern and fix MPS compatibility.

github-actions Bot added the module: metrics Metrics module label Apr 20, 2026

feat(metrics): add Perplexity metric to NLP metrics

fa394ca

Expose a new token-level Perplexity metric in ignite.metrics.nlp and top-level ignite.metrics, with dedicated unit tests to validate correctness and behavior.

steaphenai force-pushed the feat/perplexity-metric-pr branch from 8453d4e to fa394ca Compare April 20, 2026 10:33

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

fix(metrics): detach Perplexity accumulators and refine tests

e1e1e5f

test(metrics): align Perplexity tests with metric patterns

4335359

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

vfdev-5 marked this pull request as draft April 21, 2026 12:37

fix(metrics): address Perplexity review follow-ups

d5ab433

steaphenai marked this pull request as ready for review April 21, 2026 14:25

vfdev-5 marked this pull request as draft April 21, 2026 15:38

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

test(metrics): use _reference_perplexity in token-weighted accumulati…

dae37e9

…on test

steaphenai marked this pull request as ready for review April 22, 2026 06:27

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

Update tests/ignite/metrics/nlp/test_perplexity.py

f9ecaa1

Co-authored-by: vfdev <vfdev.5@gmail.com>

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/__init__.py

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py

feat(metrics): add ignore_index to Perplexity, expose in docs, remove…

e5c0cfd

… trivial test

github-actions Bot added the docs label Apr 23, 2026

Merge branch 'master' into feat/perplexity-metric-pr

46a3b8d

steaphenai added 3 commits April 23, 2026 21:15

style: fix ruff formatting in test_perplexity.py

fa8fb7f

Merge branch 'feat/perplexity-metric-pr' of https://github.com/steaph…

49ccdc4

…enai/ignite into feat/perplexity-metric-pr

fix(tests): fix token weighted accumulation test with different seq l…

c7a3720

…engths

fix(tests): use _reference_perplexity and matching seq lengths in acc…

3143650

…umulation test

fix(metrics): remove explicit double dtype from Perplexity accumulato…

b514e12

…rs for MPS compatibility

Uh oh!

Conversation

steaphenai commented Apr 20, 2026

Summary

Test plan

Files changed

Uh oh!

Prathamesh8989 commented Apr 20, 2026

Uh oh!

steaphenai commented Apr 20, 2026

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

steaphenai commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steaphenai commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 commented Apr 23, 2026

Uh oh!

steaphenai commented Apr 23, 2026

Uh oh!

vfdev-5 commented Apr 24, 2026

Uh oh!

steaphenai commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steaphenai commented Apr 21, 2026 •

edited

Loading

steaphenai commented Apr 21, 2026 •

edited

Loading