Skip to content

Model transform: chaining warm up #1105

Draft
pengyu-hou wants to merge 1 commit intomainfrom
pengyu--chaining-warm-up
Draft

Model transform: chaining warm up #1105
pengyu-hou wants to merge 1 commit intomainfrom
pengyu--chaining-warm-up

Conversation

@pengyu-hou
Copy link
Copy Markdown
Collaborator

Summary

This PR will add warm up methods for the JoinSourceRunner for the model transform pipelines. We are seeing timeout errors and traffic spikes for the first ten mins of the streaming job.

Why / Goal

Resolve the timeout issue for model transform

Test Plan

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested

Checklist

  • Documentation update

Reviewers

…t latency

Pre-initializes lazy components before streaming starts and uses first N
real requests to warm up KV store connections, CatalystUtil.session,
Janino codegen, and derivation UDF on executor JVMs — eliminating the
5-10 minute timeout spike on new deploys.

- Add PoolMap.warmup() and PooledCatalystUtil.warmup() to pre-populate
  the CatalystUtil pool beyond the default initialSize=2
- Add driver-side warmupDriver(): forces CatalystUtil.session, TTLCaches
  (GroupByServingInfo, JoinCodec), deriveFunc + CatalystUtil pool
- Add executor-side warm-up in enrichBaseJoin: runs first N real rows
  through fetchBaseJoin (60s timeout) then invokes deriveFunc with real
  base values to warm up UDF lazy state and JIT; results discarded and
  all rows re-processed normally
- Fallback: if fetchBaseJoin times out, still force deriveFunc init so
  enrichModelTransforms avoids CatalystUtil cold-start timeout

Config (spark.chronon.stream.chain.*):
  warmup.enabled=true, warmup.request_count=10,
  warmup.timeout_seconds=60, warmup.pool_size=4
@pengyu-hou pengyu-hou force-pushed the pengyu--chaining-warm-up branch from 2cc35e2 to e848f48 Compare April 1, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant