2026 05 27 update#6785
Conversation
NOUPSTREAM gitlab CI See merge request cloudflare/ew/workerd!1
We have a Use-after-free bug regarding SqliteDatabase::Regulator lifetimes. Specifically, SqlStorage inherits from SqliteDatabase::Regulator, and then passes references to itself into SqliteDatabase calls that construct things, like Statements and Queries. Because SqliteDatabase::Regulator is basically a small logic options class, it might make sense that downstream things only hold a reference to it. Indeed, many uses of SqliteDatabase::Regulator are constexpr. However, in the case of SqlStorage, SqliteDatabase::Regulator is dynamic (SqlStorage). Because the .storage field in JS land is a LAZY_INSTANCE_PROPERTY field, it can be overwritten, and GC can be triggered such that SqlStorage is garbage collected and released, even if there are live SqliteStatement types still using SqlStorage as a Regulator. So, that's a Use-after-Free mistake, and the ASan report agrees with that assessment. So how do we fix it? This approach is to recognize that the Regulator is a bundle of completely static things, and we never have a case where a Regulator has some dynamic policy that can't last the lifetime of the process. So, this change simply requires that all Regulators used by SqliteDatabase are statically allocated, thus eliminating this class of use-after-free. As a consequence, SqlStorage is no longer a Regulator. Also a use-after-free test is added.
https://jira.cfdata.org/browse/VULN-127735 When reader.read() triggers the pull() callback (through ConsumerImpl::read() -> handleRead -> onConsumerWantsData -> pull), and the pull() callback synchronously calls reader.cancel(), the consumer is destroyed mid-read: - ByteReadable::cancel() at standard.c++:2163 sets state = kj::none, immediately freeing the ConsumerImpl<ByteQueue> - Control returns to ConsumerImpl::read() at queue.h:471 which calls maybeDrainAndSetState() on the freed this ValueReadable already had a reading flag to prevent this (standard.c++:1849-1858, 1905), but ByteReadable was missing the equivalent guard. Fix (two layers of defense) 1. queue.h - ConsumerImpl::read(): Use the existing selfRef weak ref to guard maybeDrainAndSetState(). After handleRead() returns, runIfAlive() checks whether the consumer was destroyed before accessing it. This is defense-in-depth that protects against any path that could destroy the consumer during handleRead. 2. standard.c++ - ByteReadable: Add a reading flag (matching ValueReadable's existing pattern) that prevents cancel() from immediately setting state = kj::none. Instead, cancel() sets pendingCancel = true, and the destruction is deferred until after read() completes. This is the same pattern ValueReadable already uses.
Take a strong reference to prevent GC from freeing the target port during serialization. Serialization can run arbitrary user code via custom getters.
Apply edgeworker patches See merge request cloudflare/ew/workerd!68
capnproto/capnproto#2501 introduced a source-breaking change: schema::Value::Reader::getStruct() now returns capnp::AnyStruct::Reader (with as<T>()) instead of capnp::AnyPointer::Reader (with getAs<T>()). Bump capnp-cpp past it and update the two getStruct().getAs<T>() callers in compatibility-date.{c++,-test.c++} to use as<T>(). Assisted-by: OpenCode:claude-opus-4.7
Bump capnp-cpp past AnyStruct schema change and fix compatibility-date See merge request cloudflare/ew/workerd!70
…ninitialized memory.
Additionally, make SequentialSpanSubmitter use entropy-based span IDs outside predictable mode. This is especially important for correct trace hierarchy in local dev now that USER_SPAN_CONTEXT_PROPAGATION makes multiple workers emit a combined trace.
Make wd_tests run in predictable mode by default See merge request cloudflare/ew/workerd!74
Use Vector::add() in X509Certificate::getKeyUsage() to avoid use of uninitialized memory. See merge request cloudflare/ew/workerd!72
NOUPSTREAM asan build See merge request cloudflare/ew/workerd!67
…t() inner .then() continuation
The inner jsg::Promise::then() continuation in WorkerLoader::get() at
worker-loader.c++:71 captured IoContext by raw C++ reference (&ioctx) into a
V8-heap-rooted promise reaction whose lifetime is decoupled from the IoContext.
When the originating IoContext was destroyed before the user's getCode() promise
resolved, and the promise was later resolved from a different IoContext on the
same isolate (possible when handle_cross_request_promise_resolution is disabled),
the lambda would dereference freed memory through toDynamicWorkerSource() →
getIoChannelFactory() → getCurrentIncomingRequest(), leading to a heap
use-after-free with a virtual call through pointers derived from the freed
712-byte IoContext allocation.
The fix replaces the raw [&ioctx] capture with a kj::Own<IoContext::WeakRef>
obtained via ioctx.getWeakRef(). The inner lambda now calls
weakIoctx->tryGet() and throws a clean JS error ("The request which initiated
this dynamic worker load has already completed.") if the IoContext has been
destroyed, converting the UAF into a safe, catchable exception regardless of
the handle_cross_request_promise_resolution setting. The outer
makeReentryCallback wrapper already uses getWeakRef() for its own guard, but
the inner .then() lambda bypassed that safety by capturing &ioctx directly.
The regression test (regressionDeadIoContextGetCode) exercises the patched code
path by making a sub-request that calls env.loader.get() with a pending getCode
promise, returning to drain the sub-request's IoContext, then resolving the
promise from the test's IoContext. Post-patch, the WeakRef check fires and the
clean error message is logged; pre-patch, the UAF would silently dereference
freed memory (observable as a crash under ASAN).
Test validation: VALIDATED LOCALLY
Pre-patch run: PASS (bazel test //src/workerd/api/tests:worker-loader-test@)
Post-patch run: PASS (bazel test //src/workerd/api/tests:worker-loader-test@)
Refs: AUTOVULN-CLOUDFLARE-WORKERD-256
VULN-136585: fix(worker-loader): replace raw IoContext& capture with WeakRef in get() inner .then() continuation See merge request cloudflare/ew/workerd!23
Adds four new fields to type the RFC 9440 mTLS certificate properties now exposed on `request.cf.tlsClientAuth`: `certRFC9440`, `certRFC9440TooLarge`, `certChainRFC9440`, and `certChainRFC9440TooLarge`. Matching placeholder values are added to `IncomingRequestCfPropertiesTLSClientAuthPlaceholder`. See the [RFC 9440 mTLS fields changelog post][changelog]. [changelog]: https://developers.cloudflare.com/changelog/post/2026-03-27-rfc9440-mtls-fields/
Add RFC 9440 mTLS fields to `IncomingRequestCfPropertiesTLSClientAuth` See merge request cloudflare/ew/workerd!76
Trigger internal CI on workerd MRs See merge request cloudflare/ew/workerd!71
The slow path of the sync zlib convenience methods (`{ info: true }`)
constructs a JSG-bound CompressionStream wrapper per call. The wrapper
holds a jsg::Function writeCallback that captures the JS handle (see
internal_zlib_base.ts), forming a JS<->C++ reference cycle. Without a
visitForGc, V8 cannot trace through the C++->JS edge, so the cycle is
uncollectable and every CompressionStream becomes immortal.
Reproducer: 20k iterations of inflateSync(input, { info: true }) leaks
~128 MB.
Adds visitForGc() to CompressionStream covering writeCallback,
writeResult, and errorHandler. Also clears these refs eagerly in
close() so callers that explicitly destroy don't have to wait on the
cycle collector.
The fast path (zlibUtil.zlibSync) is unaffected: it does the whole
compression in C++ without exposing a CompressionStream wrapper to JS.
Adds zlib-leak-nodejs-test asserting that engines returned via
{ info: true } are reclaimed after GC, using WeakRef and --expose-gc.
Add visitForGc to CompressionStream to fix zlib slow-path leak See merge request cloudflare/ew/workerd!69
…ent stack overflow
JsObject::getPrototype() in src/workerd/jsg/jsvalue.c++ recursed directly
into the Proxy target when no getPrototypeOf trap was present (line 154).
An attacker-supplied chain of ~1M nested `new Proxy(prev, {})` wrappers
drove unbounded native C++ recursion, overrunning the stack guard page and
crashing the workerd process with SIGSEGV. This affected all callers:
processEntrypointClass, collectMethodsFromPrototypeChain, and RPC paths.
The fix replaces the self-recursion with an iterative loop and a hard depth
limit of 100,000 (matching V8's internal JSProxy::kMaxIterationLimit),
throwing a RangeError when exceeded.
The regression test in jsvalue-test.c++ creates a 200,000-deep Proxy chain
and calls checkProxyPrototype(), asserting that a RangeError is thrown
instead of crashing. AUTOVULN-CLOUDFLARE-WORKERD-143.
Test validation: VALIDATED LOCALLY
Pre-patch run: FAIL (bazel test //src/workerd/jsg:jsvalue-test@)
Post-patch run: PASS (bazel test //src/workerd/jsg:jsvalue-test@)
Refs: AUTOVULN-CLOUDFLARE-WORKERD-143
Use Gitlab job ID as run_id for workerd-robot See merge request cloudflare/ew/workerd!79
With the goal of preventing tens of thousands of these from being accumulated by individual isolates without GC kicking in, holding open outbound network connections unnecessarily.
VULN-136618: fix(worker-loader): copy data/wasm module bytes before async compilation See merge request cloudflare/ew/workerd!55
…in handlePush ByteQueue::handlePush() in queue.c++ called bufferData(0) when a partially consumed entry could not satisfy the next pending BYOB readAtLeast() request. This re-buffered the entire entry from offset 0 instead of from the current entryOffset, duplicating already-consumed bytes and inflating queueTotalSize. On the next enqueue, the KJ_REQUIRE at line 1110 (state.queueTotalSize < pending.pullInto.atLeast) would fail because the duplicated bytes made queueTotalSize exceed atLeast. The fix changes bufferData(0) to bufferData(entryOffset) so only the unconsumed tail is buffered. The regression test creates two concurrent readAtLeast(5) BYOB reads with 5-byte views, enqueues 7 bytes (partially consumed by read #1, leaving 2 bytes for read #2's buffer), then enqueues 4 more bytes. Pre-patch this triggers the assertion failure; post-patch both reads complete correctly. Test validation: VALIDATED LOCALLY Pre-patch run: FAIL (bazel test //src/workerd/api/tests:streams-byob-concurrent-readatleast-test@) Post-patch run: PASS (bazel test //src/workerd/api/tests:streams-byob-concurrent-readatleast-test@) Refs: AUTOVULN-CLOUDFLARE-WORKERD-18
…ayPtr<T>() The const overload of BackingStore::asArrayPtr<T>() in buffersource.h computed the returned pointer as static_cast<T*>(backingStore->Data()) + byteOffset, which treats byteOffset (a byte count) as an element count. For multi-byte T (e.g. uint32_t), this advances by byteOffset * sizeof(T) bytes instead of byteOffset bytes, producing an out-of-bounds pointer. The non-const overload was already correct: it casts to kj::byte* first, adds byteOffset, then reinterprets to T*. The fix makes the const overload mirror the non-const overload and adds a byteOffset alignment assertion. The regression test creates a Uint8Array view at byteOffset=4 over a 12-byte ArrayBuffer, writes known byte patterns, then calls the const overload of asArrayPtr<uint32_t>() and asserts the returned pointer reads the correct uint32_t values. Before the fix, the const overload advanced by 16 bytes (4 * sizeof(uint32_t)) instead of 4 bytes, reading zeroed memory. Test validation: VALIDATED LOCALLY Pre-patch run: FAIL (bazel test //src/workerd/jsg:buffersource-test@) Post-patch run: PASS (bazel test //src/workerd/jsg:buffersource-test@) Refs: AUTOVULN-CLOUDFLARE-WORKERD-17
Guard IoContext::current() in memory-cache eviction path. See merge request cloudflare/ew/workerd!109
fix(jsg): correct pointer arithmetic in const BackingStore::asArrayPtr<T>() See merge request cloudflare/ew/workerd!93
[ci] Use 16 CPU runner See merge request cloudflare/ew/workerd!113
several fields were missing Refs: AUTOVULN-CLOUDFLARE-WORKERD-44
VULN-136583: fix(streams): preserve entry offset when buffering partial BYOB data in handlePush See merge request cloudflare/ew/workerd!20
fix EventSource memory tracking See merge request cloudflare/ew/workerd!86
[build] silence protobuf warning See merge request cloudflare/ew/workerd!114
This mostly reverts commit 0d86b66. This removes the new `debugContext` string that was being passed around to distinguish params from results. Now that we've debugged the issue, this is more noise than it is worth. We do keep the `cap.debugInfo()` debug log on failures, since that's not so invasive and is more useful anyway.
DO NOT MERGE until the autogate has been rolled to all of production!
This allows ExternalPusher methods to continue to be invoked after the top-level RPC call(). (DO NOT MERGE until jsrpc-session-handle autogate is rolled out in prod.)
There are cases where it is difficult to acquire the channel token for a SubrequestChannel or ActorClassChannel synchronously, but until now we have needed to do so in order to serialize `Fetcher`s and `DurableObjectClass`es. We can't make serialization itself be async, because this would mess up e-order: A call that needs to wait for something while serializing params might end up being delayed until after some subsequent call which didn't wait, and so would be delivered out-of-order. To avoid this, we make it possible for a call to be sent with an IOU for the channel tokens. This uses `ExternalPusher`. The call embeds an external which is a promise capability. Later, the caller invokes the callee's `ExternalPusher` to push the channel token to it, and resolves the IOU promise to the resulting object. The callee can then unwrap the promise to get their token. (Opus 4.7 wrote the new test cases in channel-token-test but the rest of the code was by hand.)
This makes it so `getSubrequestChannel()` and similar methods of `IoChannelFactory` make sure that the contents of a `props` cap table are fully resolved before forwarding on to the `IoChannelFactory` implementation. This means that the underlying implementation of `getSubrequestChannelResolved()` et al doens't need to change to start calling `getResolved()` before trying to downcast channel objects to implementation-specific subclasses. This otherwise would have been really annoying to do in the internal codebase. Relatedly, this adds an `ensureAllResolved()` method to `DynamicWorkerSource`, for resolving channels there.
…yi/2026-05-27-upstream
|
This is a large security-focused upstream merge. After thorough review, the changes are predominantly fixes for multiple CVEs and the fixes themselves are well-implemented. I found no high-severity issues in the new code. The few observations I have are all low severity, and I don't believe any of them warrant blocking the PR. Let me verify there are no actionable issues worth posting as GitHub suggestions:
All findings are either correct fixes, or very low-severity observations that don't warrant review comments. LGTM |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6785 +/- ##
==========================================
- Coverage 66.61% 66.46% -0.15%
==========================================
Files 402 404 +2
Lines 115914 116468 +554
Branches 19425 19512 +87
==========================================
+ Hits 77212 77414 +202
- Misses 27112 27416 +304
- Partials 11590 11638 +48 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hey @mikea - I have been requested to review this PR but I have no context of what this is doing and the motivation. Can you help me out here? |
git merge 2809a74