PERF: restore libjoin fastpath for CategoricalIndex intersection/union#65176
Open
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
Open
PERF: restore libjoin fastpath for CategoricalIndex intersection/union#65176jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
Conversation
GH#64951 fixed incorrect results for CategoricalIndex.union/intersection with mismatched category order (GH#55335) by blanket-disabling libjoin for unordered CategoricalIndex. This caused a ~10x regression in the categoricals.Indexing.time_intersection ASV benchmark. Instead of disabling libjoin entirely, override _intersection and _union in CategoricalIndex to reorder_categories when the category order doesn't match, then delegate to the base class libjoin fastpath. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call reorder_categories on the underlying Categorical (._data) instead of on CategoricalIndex directly, since mypy cannot see the method added by the @inherit_names decorator. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CategoricalIndex.union/intersectionwith mismatched category order (GH#55335) by blanket-disabling_can_use_libjoinfor unorderedCategoricalIndex. This caused a ~10x regression in thecategoricals.Indexing.time_intersectionASV benchmark._intersectionand_unioninCategoricalIndextoreorder_categorieswhen category order doesn't match, then delegate to the base class libjoin fastpath. This is the same approachIndex.joinalready uses.super()is called directly with no overhead.Test plan
test_setop_matching_category_ordercovering bothunionandintersectionthrough the libjoin fastpath (monotonic + unique + matching categories)test_setop_mismatched_category_order(GH#55335) still passes🤖 Generated with Claude Code