Skip to content

feat: introduce concurrency env for udf invocation#3317

Merged
vigith merged 15 commits into
mainfrom
concurrency
May 26, 2026
Merged

feat: introduce concurrency env for udf invocation#3317
vigith merged 15 commits into
mainfrom
concurrency

Conversation

@yhl25
Copy link
Copy Markdown
Contributor

@yhl25 yhl25 commented Mar 17, 2026

Summary

  • Adds a first-class Concurrency field to MonoVertexLimits, PipelineLimits, and VertexLimits, replacing the undocumented NUMAFLOW_UDF_CONCURRENCY and MAX_ACK_PENDING env vars.
  • Splits the two knobs cleanly: readBatchSize controls only the size of one read; concurrency caps how many messages may be in-flight (read but not yet acked).
  • Promotes READ_AHEAD to a controller-injected env var with sensible defaults — false for source vertices and MonoVertex (cheap re-reads, source ordering preserved), true for Map/Sink/Reduce (keeps ISBs full). Operators can override on the container template.
  • Wires both knobs through the Rust dataplane (monovertex + pipeline forwarders, ISB reader, source streaming loop).
  • Ordered processing now auto-forces concurrency = 1 on Map/Sink vertices in the controller, so users no longer need to remember to set readBatchSize: 1 to get FIFO semantics.

Behavior

With read-ahead enabled, the maximum in-flight count per vertex is concurrency + readBatchSize (the data plane keeps reading until in-flight hits concurrency, then may pre-fetch one more batch). With read-ahead disabled, it's min(concurrency, readBatchSize).

To force strictly sequential processing, set concurrency: 1 (and disable READ_AHEAD for non-source vertices, or rely on the source-vertex default).

Signed-off-by: Yashash Lokesh <yashashhl25@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 74.74747% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.55%. Comparing base (a77bc25) to head (ebb67b3).

Files with missing lines Patch % Lines
pkg/reconciler/pipeline/controller.go 41.66% 6 Missing and 1 partial ⚠️
rust/numaflow-core/src/shared/create_components.rs 33.33% 6 Missing ⚠️
pkg/apis/numaflow/v1alpha1/mono_vertex_types.go 42.85% 4 Missing ⚠️
pkg/apis/numaflow/v1alpha1/vertex_types.go 60.00% 4 Missing ⚠️
pkg/apis/numaflow/v1alpha1/pipeline_types.go 50.00% 1 Missing and 1 partial ⚠️
rust/numaflow-core/src/config/monovertex.rs 91.66% 1 Missing ⚠️
rust/numaflow-core/src/config/pipeline.rs 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3317      +/-   ##
==========================================
- Coverage   82.63%   82.55%   -0.09%     
==========================================
  Files         307      307              
  Lines       77544    77618      +74     
==========================================
- Hits        64079    64076       -3     
- Misses      12907    12985      +78     
+ Partials      558      557       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yhl25 yhl25 marked this pull request as ready for review May 8, 2026 05:03
@yhl25 yhl25 requested review from vigith and whynowy as code owners May 8, 2026 05:03
Signed-off-by: Yashash Lokesh <yashashhl25@gmail.com>
@yhl25 yhl25 marked this pull request as draft May 8, 2026 06:17
yhl25 added 3 commits May 8, 2026 10:16
Signed-off-by: Yashash Lokesh <yashashhl25@gmail.com>
Signed-off-by: Yashash Lokesh <yashashhl25@gmail.com>
Comment on lines -473 to +483
true => MAX_ACK_PENDING / self.read_batch_size,
true => std::cmp::max(1, self.concurrency / self.read_batch_size.max(1)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drop is significant. We're effectively dropping max ack pending from 10000 -> 500. It should be clearly noted in release notes.

Comment on lines -566 to +577
max_ack_pending,
// Cap inflight messages on this ISB reader to the vertex's `concurrency`. The
// reader holds a semaphore of this size so that, even with read-ahead, we never
// have more than `concurrency + read_batch_size` messages in flight.
max_ack_pending: concurrency,
Copy link
Copy Markdown
Contributor

@vaibhavtiwari33 vaibhavtiwari33 May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're dropping the default max ack pending from 25k -> 500 for ISB reader. This should also be noted as part of release notes since the change is quite significant.

Comment on lines +668 to +673
if vCopy.IsOrdered() && (vCopy.IsMapUDF() || vCopy.IsASink()) {
if vCopy.Limits == nil {
vCopy.Limits = &dfv1.VertexLimits{}
}
vCopy.Limits.Concurrency = ptr.To[uint64](1)
}
Copy link
Copy Markdown
Contributor

@vaibhavtiwari33 vaibhavtiwari33 May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw an error when concurrency > 1 for ordered processing instead of overwriting?

value: "true"
```

With read-ahead enabled the upper bound on in-flight messages becomes **`concurrency + readBatchSize`** (the data plane keeps reading until the in-flight count hits `concurrency`, then may pre-fetch one more batch).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. On source with read ahead enabled we cap at min(concurrency, readBatchSize) in this implementation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines +525 to +534
// GetConcurrency returns the maximum number of in-flight (read-but-not-acked) messages allowed
// at any time. It defaults to the read batch size when unset, which preserves the historical
// behavior where concurrency was implicitly bounded by the batch size.
func (v VertexLimits) GetConcurrency() uint64 {
if v.Concurrency != nil {
return *v.Concurrency
}
return v.GetReadBatchSize()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this getting used anywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used, but still want to keep it because of the default value.

Signed-off-by: Vaibhav Tiwari <vaibhav.tiwari33@gmail.com>
@yhl25 yhl25 marked this pull request as ready for review May 25, 2026 17:40
vigith added 2 commits May 26, 2026 09:58
Signed-off-by: Vigith Maurice <vigith@gmail.com>
Signed-off-by: Vigith Maurice <vigith@gmail.com>
@yhl25 yhl25 enabled auto-merge (squash) May 26, 2026 17:20
@vigith vigith disabled auto-merge May 26, 2026 17:26
@vigith vigith changed the title chore: introduce concurrency env for udf invocation feat: introduce concurrency env for udf invocation May 26, 2026
@vigith vigith enabled auto-merge (squash) May 26, 2026 17:27
@vigith vigith merged commit 42722ea into main May 26, 2026
27 checks passed
@vigith vigith deleted the concurrency branch May 26, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants