Skip to content

PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195

Draft
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-range_from_fields
Draft

PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-range_from_fields

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Summary

  • Add period_ordinals_from_fields Cython function that converts arrays of date/time fields to period ordinals in a single C-level loop, with optional date validation
  • Vectorize _range_from_fields to call the new Cython function instead of looping in Python-space and appending to a list; vectorize the quarter-to-calendar-month conversion with numpy ops
  • Reuse the same function in _assemble_from_unit_mappings with freq=FR_US to construct datetime64[us] directly from field arrays, avoiding the object-dtype round-trip through ensure_object + array_strptime with format="%Y%m%d"
Benchmark Before After Speedup
PeriodIndex.from_fields (2k monthly) 0.47 ms 0.03 ms 16x
PeriodIndex.from_fields (100k monthly) 22.7 ms 0.92 ms 25x
to_datetime(DataFrame) 100k unique dates 14.9 ms 1.7 ms 9x
to_datetime(DataFrame) 100k repeated dates 1.6 ms 1.7 ms ~1x (parity)

The old to_datetime(DataFrame) path relied on _maybe_cache for repeated values but degraded to ~15ms with unique dates due to per-element str() + strptime. The new path is uniformly fast.

Test plan

  • pandas/tests/indexes/period/test_constructors.py (108 + 3 new tests pass)
  • pandas/tests/tools/test_to_datetime.py (939 + 9 new tests pass)
  • pandas/tests/indexes/period/ (466 tests pass)
  • pandas/tests/arrays/period/ (40 tests pass)
  • mypy, pyright, pre-commit all clean

New tests cover: non-DEC quarter fiscal year, all-6-field hourly periods, empty arrays, leap-year Feb 29 validation, invalid day-of-month (raise + coerce), fractional float coerce, empty DataFrame, UTC with time fields.

🤖 Generated with Claude Code

Add period_ordinals_from_fields Cython function that converts arrays
of year/month/day/hour/minute/second fields to period ordinals in a
single C-level loop, replacing the Python-space list-append loop in
_range_from_fields.

Reuse the same function in to_datetime's _assemble_from_unit_mappings
with freq=FR_US to construct datetime64[us] directly from field arrays,
avoiding the object-dtype round-trip through ensure_object + array_strptime
with format="%Y%m%d".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant