PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195
Draft
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
Draft
PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
Conversation
Add period_ordinals_from_fields Cython function that converts arrays of year/month/day/hour/minute/second fields to period ordinals in a single C-level loop, replacing the Python-space list-append loop in _range_from_fields. Reuse the same function in to_datetime's _assemble_from_unit_mappings with freq=FR_US to construct datetime64[us] directly from field arrays, avoiding the object-dtype round-trip through ensure_object + array_strptime with format="%Y%m%d". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
period_ordinals_from_fieldsCython function that converts arrays of date/time fields to period ordinals in a single C-level loop, with optional date validation_range_from_fieldsto call the new Cython function instead of looping in Python-space and appending to a list; vectorize the quarter-to-calendar-month conversion with numpy ops_assemble_from_unit_mappingswithfreq=FR_USto constructdatetime64[us]directly from field arrays, avoiding the object-dtype round-trip throughensure_object+array_strptimewithformat="%Y%m%d"PeriodIndex.from_fields(2k monthly)PeriodIndex.from_fields(100k monthly)to_datetime(DataFrame)100k unique datesto_datetime(DataFrame)100k repeated datesThe old
to_datetime(DataFrame)path relied on_maybe_cachefor repeated values but degraded to ~15ms with unique dates due to per-elementstr()+ strptime. The new path is uniformly fast.Test plan
pandas/tests/indexes/period/test_constructors.py(108 + 3 new tests pass)pandas/tests/tools/test_to_datetime.py(939 + 9 new tests pass)pandas/tests/indexes/period/(466 tests pass)pandas/tests/arrays/period/(40 tests pass)New tests cover: non-DEC quarter fiscal year, all-6-field hourly periods, empty arrays, leap-year Feb 29 validation, invalid day-of-month (raise + coerce), fractional float coerce, empty DataFrame, UTC with time fields.
🤖 Generated with Claude Code