Skip to content

Bump medida and batch#5338

Open
graydon wants to merge 3 commits into
stellar:masterfrom
graydon:bump-medida-and-batch
Open

Bump medida and batch#5338
graydon wants to merge 3 commits into
stellar:masterfrom
graydon:bump-medida-and-batch

Conversation

@graydon

@graydon graydon commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Bump medida to take stellar/medida#38 (speeding up metrics and adding batch interfaces) and then cherry-picking in a couple commits @dmkozh did to further optimize metrics usage on top.

graydon and others added 3 commits July 2, 2026 18:07
SimpleTimer::Update is called from the parallel apply threads (notably the
bucket point-load timers), where the per-update mutex around max tracking
was a cross-thread contention point. The sum/count counters were already
atomic; with a compare-exchange max the whole timer is now lock-free.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
With metrics enabled, every InvokeHostFunction op marked ~20 process-wide
meters and updated ~7 histograms/timers from the apply threads (~25
contended lock/cache-line operations per op), and the parallel path also
performed a registry lookup of the ledger.operation.apply timer per
transaction. On the apply-load benchmark (sac, TX=6000, T=8) this made
close time 282ms with metrics vs 107ms without, with p99 spikes from CKMS
compaction; DISABLE_SOROBAN_METRICS_FOR_TESTING existed to hide this.

Instead, record all per-op/per-tx metric updates into a per-thread
SorobanMetrics::ApplyMetricsBatch (meter increments as plain sums, sample
streams for percentile-bearing histograms/timers as raw value vectors,
one brief uncontended lock per record) and drain all batches into the
underlying medida metrics once per ledger in
publishAndResetLedgerWideMetrics(), using the new UpdateMany bulk APIs.
Observable metric values are preserved: meter counts are identical sums,
EWMA rates tick at 5s granularity either way, and histograms receive the
exact same sample stream, just at close time.

The sequential apply path still updates medida directly (main thread,
cheap after the medida fixes). Combined with those fixes, metrics-on
close time drops to ~106.5ms vs ~104.1ms metrics-off (200-ledger run),
i.e. metric overhead goes from ~164% to ~2%.

Also adds a regression test asserting the batched metrics are visible
after the ledger close that applied the transaction.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 3, 2026 04:43
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Stellar Core’s metrics collection to take advantage of newer medida batching APIs, reducing contention and per-op overhead in Soroban/parallel-apply hot paths. It also modernizes SimpleTimer max tracking to be lock-free.

Changes:

  • Replace SimpleTimer’s mutex-protected max tracking with an atomic, lock-free max and atomic reset on syncMax().
  • Add per-thread SorobanMetrics::ApplyMetricsBatch buffers plus RAII timers (BatchedTimerScope, ScopedNsecsTimer) and publish batched apply-path metrics once per ledger close.
  • Update parallel/sequential Soroban apply paths to record into per-thread batches and add a test ensuring batched metrics are visible after ledger close.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/util/SimpleTimer.h Switch max tracking to atomic; header include changes.
src/util/SimpleTimer.cpp Implement lock-free max update + atomic exchange on sync.
src/transactions/TransactionFrame.cpp Batch per-tx Soroban metrics and use BatchedTimerScope for op apply timing in parallel apply.
src/ledger/LedgerManagerImpl.cpp Record per-tx apply timing into Soroban per-thread batches in apply threads.
src/ledger/SorobanMetrics.h Introduce per-thread apply metrics batching + new RAII timer helpers.
src/ledger/SorobanMetrics.cpp Implement thread-local batch acquisition, draining, and bulk publishing via UpdateMany.
src/transactions/InvokeHostFunctionOpFrame.cpp Record host-fn op metrics into per-thread batch and batch exec-time samples.
src/transactions/ExtendFootprintTTLOpFrame.cpp Record op metrics into per-thread batch and batch exec-time samples.
src/transactions/RestoreFootprintOpFrame.cpp Record op metrics into per-thread batch and batch exec-time samples.
src/transactions/test/InvokeHostFunctionTests.cpp Add coverage verifying batched Soroban metrics are published at ledger close.

Comment thread src/util/SimpleTimer.h
Comment on lines +3 to 6
#include <atomic>
#include <chrono>
#include <medida/counter.h>
#include <medida/metric_name.h>
Comment on lines +214 to +218
// One entry per (thread, SorobanMetrics instance): tests can run several
// applications in one process and threads outlive applications, so the
// weak_ptr is re-validated on every first-use (the registry owns the
// batches; a stale entry from a destroyed instance whose address got
// reused expires together with its registry).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants