Bump medida and batch#5338
Open
graydon wants to merge 3 commits into
Open
Conversation
SimpleTimer::Update is called from the parallel apply threads (notably the bucket point-load timers), where the per-update mutex around max tracking was a cross-thread contention point. The sum/count counters were already atomic; with a compare-exchange max the whole timer is now lock-free. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
With metrics enabled, every InvokeHostFunction op marked ~20 process-wide meters and updated ~7 histograms/timers from the apply threads (~25 contended lock/cache-line operations per op), and the parallel path also performed a registry lookup of the ledger.operation.apply timer per transaction. On the apply-load benchmark (sac, TX=6000, T=8) this made close time 282ms with metrics vs 107ms without, with p99 spikes from CKMS compaction; DISABLE_SOROBAN_METRICS_FOR_TESTING existed to hide this. Instead, record all per-op/per-tx metric updates into a per-thread SorobanMetrics::ApplyMetricsBatch (meter increments as plain sums, sample streams for percentile-bearing histograms/timers as raw value vectors, one brief uncontended lock per record) and drain all batches into the underlying medida metrics once per ledger in publishAndResetLedgerWideMetrics(), using the new UpdateMany bulk APIs. Observable metric values are preserved: meter counts are identical sums, EWMA rates tick at 5s granularity either way, and histograms receive the exact same sample stream, just at close time. The sequential apply path still updates medida directly (main thread, cheap after the medida fixes). Combined with those fixes, metrics-on close time drops to ~106.5ms vs ~104.1ms metrics-off (200-ledger run), i.e. metric overhead goes from ~164% to ~2%. Also adds a regression test asserting the batched metrics are visible after the ledger close that applied the transaction. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Stellar Core’s metrics collection to take advantage of newer medida batching APIs, reducing contention and per-op overhead in Soroban/parallel-apply hot paths. It also modernizes SimpleTimer max tracking to be lock-free.
Changes:
- Replace
SimpleTimer’s mutex-protected max tracking with an atomic, lock-free max and atomic reset onsyncMax(). - Add per-thread
SorobanMetrics::ApplyMetricsBatchbuffers plus RAII timers (BatchedTimerScope,ScopedNsecsTimer) and publish batched apply-path metrics once per ledger close. - Update parallel/sequential Soroban apply paths to record into per-thread batches and add a test ensuring batched metrics are visible after ledger close.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/util/SimpleTimer.h | Switch max tracking to atomic; header include changes. |
| src/util/SimpleTimer.cpp | Implement lock-free max update + atomic exchange on sync. |
| src/transactions/TransactionFrame.cpp | Batch per-tx Soroban metrics and use BatchedTimerScope for op apply timing in parallel apply. |
| src/ledger/LedgerManagerImpl.cpp | Record per-tx apply timing into Soroban per-thread batches in apply threads. |
| src/ledger/SorobanMetrics.h | Introduce per-thread apply metrics batching + new RAII timer helpers. |
| src/ledger/SorobanMetrics.cpp | Implement thread-local batch acquisition, draining, and bulk publishing via UpdateMany. |
| src/transactions/InvokeHostFunctionOpFrame.cpp | Record host-fn op metrics into per-thread batch and batch exec-time samples. |
| src/transactions/ExtendFootprintTTLOpFrame.cpp | Record op metrics into per-thread batch and batch exec-time samples. |
| src/transactions/RestoreFootprintOpFrame.cpp | Record op metrics into per-thread batch and batch exec-time samples. |
| src/transactions/test/InvokeHostFunctionTests.cpp | Add coverage verifying batched Soroban metrics are published at ledger close. |
Comment on lines
+3
to
6
| #include <atomic> | ||
| #include <chrono> | ||
| #include <medida/counter.h> | ||
| #include <medida/metric_name.h> |
Comment on lines
+214
to
+218
| // One entry per (thread, SorobanMetrics instance): tests can run several | ||
| // applications in one process and threads outlive applications, so the | ||
| // weak_ptr is re-validated on every first-use (the registry owns the | ||
| // batches; a stale entry from a destroyed instance whose address got | ||
| // reused expires together with its registry). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bump medida to take stellar/medida#38 (speeding up metrics and adding batch interfaces) and then cherry-picking in a couple commits @dmkozh did to further optimize metrics usage on top.