Skip to content

PERF: restore libjoin fastpath for CategoricalIndex intersection/union#65176

Open
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-cat-intersection
Open

PERF: restore libjoin fastpath for CategoricalIndex intersection/union#65176
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-cat-intersection

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Summary

  • GH#64951 fixed incorrect results for CategoricalIndex.union/intersection with mismatched category order (GH#55335) by blanket-disabling _can_use_libjoin for unordered CategoricalIndex. This caused a ~10x regression in the categoricals.Indexing.time_intersection ASV benchmark.
  • Instead of disabling libjoin entirely, override _intersection and _union in CategoricalIndex to reorder_categories when category order doesn't match, then delegate to the base class libjoin fastpath. This is the same approach Index.join already uses.
  • When categories already match (the common case), super() is called directly with no overhead.

Test plan

  • Added test_setop_matching_category_order covering both union and intersection through the libjoin fastpath (monotonic + unique + matching categories)
  • Existing test_setop_mismatched_category_order (GH#55335) still passes
  • Full categorical index test suite passes (157 tests)
  • Full setops test suite passes (2026 tests)

🤖 Generated with Claude Code

GH#64951 fixed incorrect results for CategoricalIndex.union/intersection
with mismatched category order (GH#55335) by blanket-disabling libjoin
for unordered CategoricalIndex. This caused a ~10x regression in the
categoricals.Indexing.time_intersection ASV benchmark.

Instead of disabling libjoin entirely, override _intersection and _union
in CategoricalIndex to reorder_categories when the category order
doesn't match, then delegate to the base class libjoin fastpath.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 11, 2026
Call reorder_categories on the underlying Categorical (._data) instead
of on CategoricalIndex directly, since mypy cannot see the method added
by the @inherit_names decorator.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel marked this pull request as ready for review April 11, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant