perf(quality-gates): Add tbt metrics to global startup and flow timer gates#41963
perf(quality-gates): Add tbt metrics to global startup and flow timer gates#41963MajorLift wants to merge 10 commits into
tbt metrics to global startup and flow timer gates#41963Conversation
Calibrated from 14-day Sentry baseline (ci.branch:main, 2026-04-07–2026-04-20): | preset | p75 obs | p95 obs | |----------------------|---------|---------| | startupStandardHome | 313 ms | 351 ms | | startupPowerUserHome | 1189 ms | 1367 ms | | onboardingImport | 485 ms | 559 ms | | onboardingNew | 326 ms | 377 ms | | importSrpHome | 3354 ms | 3552 ms | | swap | 438 ms | 483 ms | | sendTransactions | 151 ms | 151 ms | | assetDetails | 95 ms | 127 ms | | solanaAssetDetails | ~1ms | ~1ms | Thresholds: warn = p×1.2, fail = p×1.3, ciMultiplier = 1.5. solanaAssetDetails uses a 50/75/100ms floor (near-zero baseline). dappPageLoad excluded — flow does not call buildLongTaskTimerResults. Closes MetaMask/MetaMask-planning#7194
|
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
Builds ready [88d36eb] [reused from 1b1b67a]
⚡ Performance Benchmarks (Total: 🟢 1 pass · 🟡 1 warn · 🔴 0 fail)
Bundle size diffs
|
Builds ready [cd85d26] [reused from 9594df6]
⚡ Performance Benchmarks (Total: 🟢 4 pass · 🟡 7 warn · 🔴 2 fail)
Bundle size diffs
|
Builds ready [b1f6aee]
⚡ Performance Benchmarks (Total: 🟢 6 pass · 🟡 8 warn · 🔴 1 fail)
Bundle size diffs
|
Builds ready [f1965d7]
⚡ Performance Benchmarks (Total: 🟢 7 pass · 🟡 8 warn · 🔴 0 fail)
Bundle size diffs
|
a9e976b to
f1965d7
Compare
Builds ready [f1965d7]
⚡ Performance Benchmarks (Total: 🟢 7 pass · 🟡 8 warn · 🔴 0 fail)
Bundle size diffs
|
|
Builds ready [36d90a6]
⚡ Performance Benchmarks (Total: 🟢 7 pass · 🟡 8 warn · 🔴 0 fail)
Bundle size diffs [🚀 Bundle size reduced!]
|
✨ Files requiring CODEOWNER review ✨👨🔧 @MetaMask/extension-platform (1 files, +54 -0)
🧪 @MetaMask/qa (1 files, +54 -0)
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 707f906. Configure here.
| p75: { warn: 1020, fail: 1100 }, | ||
| p95: { warn: 1225, fail: 1325 }, | ||
| ciMultiplier: DEFAULT_CI_MULTIPLIER, | ||
| }, |
There was a problem hiding this comment.
STANDARD_HOME TBT thresholds ~2.7x higher than documented
Medium Severity
The STANDARD_HOME tbt baseline comment says p75≈850ms / p95≈1021ms (from 2026-04-22, chrome-browserify), producing thresholds of warn=1020/fail=1100. But the PR description table documents a p75=313ms baseline with thresholds of warn=375/fail=425 — about 2.7x tighter. All other presets match their documented baselines. This discrepancy means the quality gate for startupStandardHome TBT is far more permissive than intended, potentially masking regressions.
Reviewed by Cursor Bugbot for commit 707f906. Configure here.





Description
Adds
tbt(Total Blocking Time) quality gate for all startup and user-journey benchmark presets by adding corresponding threshold entries toTHRESHOLD_REGISTRY.This commit builds on #39715 which shipped TBT instrumentation via
buildLongTaskTimerResultsinlong-task-helper.ts, which emits{ id: 'tbt', value, unit: 'ms' }per benchmark run. WBaselines were pulled from the CI benchmark Sentry project (
metamask-performance) using the 2026-04-07 → 2026-04-20 window (ci.branch:main), queryingavg(tags[benchmark.p75.tbt])/avg(tags[benchmark.p95.tbt])for startup presets andavg(tags[performance.p75.tbt])/avg(tags[performance.p95.tbt])for user-journey presets.Calibration rule:
warn = observed_p × 1.20,fail = observed_p × 1.30,ciMultiplier: 1.5.solanaAssetDetailsobserved near-zero TBT — a 50 ms floor was applied to avoid noise-triggered false positives.dappPageLoadis excluded: the dapp-page-load flow does not callbuildLongTaskTimerResults, so no TBT data exists in Sentry for that preset.Interaction benchmarks (
confirmTx,loadNewAccount,bridgeUserActions) are also excluded — single-step flows where TBT is dominated by one outlier long-task.No logic changes.
compare-benchmarks.tsreadsTHRESHOLD_REGISTRYgenerically; adding thetbtkey is sufficient to activate enforcement.Changelog
CHANGELOG entry: null
Related issues
Fixes: https://github.com/MetaMask/MetaMask-planning/issues/7194
Manual testing steps
yarn jest development/metamaskbot-build-announce/compare-benchmarks.test.tstbtviolations route through the same warn/fail path as other metricscompare-benchmarks.tslocally and confirmtbtis evaluated for startup/journey presetsScreenshots/Recordings
Pre-merge author checklist
Pre-merge reviewer checklist
Note
Medium Risk
Changes CI performance quality gates by introducing new
tbt(Total Blocking Time) thresholds across multiple benchmark presets, which could cause new CI failures or flaky gating if baselines/multipliers are miscalibrated. No production/runtime logic is modified.Overview
Adds
tbt(Total Blocking Time) to the benchmark threshold registry for startup and multi-step user-journey presets by introducing per-presetp75/p95warn/fail limits (with baseline comments) intest/e2e/benchmarks/utils/thresholds.ts.These new thresholds activate TBT enforcement for onboarding/import, swap/send, asset-details, and startup home benchmarks (using the default CI multiplier), while leaving interaction-only and dapp-page-load presets unchanged.
Reviewed by Cursor Bugbot for commit 707f906. Bugbot is set up for automated code reviews on this repo. Configure here.