Shakedown

Verify a new Mac before your return window closes.

A verification harness for new Macs, Apple Silicon and Intel. Built for cases where you can't easily return the unit: bought abroad, narrow return window, expensive config, or you just don't want to discover a defect three months from now.

A single command runs the automated phases (system inventory, battery health, CPU variance benchmark, sustained thermal load) end-to-end. A separate runbook walks you through the manual phases (display dead-pixel test, hinge / keyboard / speaker / port inspection, Apple Diagnostics).

Why "shakedown"? Borrowed from engineering: a shakedown run is the first-run stress test of new machinery before it's commissioned. Same idea, applied to your new Mac.

Why this exists

Some Mac generations ship with batch-level defects that don't show up in a quick boot test. The 2026 M5 Max line, for example, had units showing up to 41.5% multi-core performance variance between identical benchmark runs. You can only see that by running repeated load tests on a thermally-saturated chassis. A 30-second smoke test on store Wi-Fi will not catch it.

Shakedown is the procedure for catching those.

Status (v0.1): the methodology has not yet been validated against a confirmed-defective unit. Thresholds are derived from public reports and engineering reasoning. Expect them to tighten as crowd-sourced submissions land. Treat current results as advisory, not authoritative; if a verdict is borderline, rerun before deciding.

Phase 4 uses parallel SHA-256, which is hardware-accelerated on Apple Silicon and Coffee Lake+ Intel. The variance methodology transfers cleanly to any sustained workload (the SHA choice is for zero-install portability), but the test doesn't probe integer pipelines or memory bandwidth as deeply as Cinebench / Geekbench would. A non-accelerated workload pass now ships as opt-in Phase 4b (./run --noaccel, BLAKE2b), and memory bandwidth has its own phase (Phase 12).

Quick start

Install Xcode Command Line Tools (provides git, 5–10 min one-time; a GUI dialog pops up):
```
xcode-select -p >/dev/null 2>&1 || xcode-select --install
```

Clone and run.

git clone https://github.com/ugglr/mac-shakedown ~/mac-shakedown && cd ~/mac-shakedown
./run

On a terminal with no flags, ./run opens a short guided picker (verify a new Mac vs quick check, plus strict and the AI load), builds the command, and runs it, so you don't have to remember flags. Pass any flag (or SHAKEDOWN_YES=1, or pipe it) and the picker is skipped, the run proceeds directly.

Either way, ./run auto-selects the matching preset from your hardware (chip, memory, model, screen size), so it bins against the right thresholds and baseline with nothing to type. Pass --target mbp-16-m5-max-64 (see targets/) to assert an expected SKU, or to override the match; if no preset matches, it auto-detects chassis class from system_profiler and skips the inventory asserts.

The orchestrator runs the automated phases (preflight → inventory → battery → race benchmark → SSD test → memory bandwidth → CPU variance → thermal load) end-to-end, asks for sudo once upfront (Phase 5 and the SSD page-cache drop need it), and writes a SCHEMA-compliant report to Reports/local/ plus a sanitized PR-able copy to Reports/submissions/. Opt-in flags add heavier passes: --noaccel (a non-accelerated BLAKE2b variance pass), --gpu (a Metal GPU compute pass), and --llama (clones and builds llama.cpp for a combined CPU+GPU+memory AI load). --store bundles the thorough profile for verifying a new unit. Runtime ~20 min on Intel, ~27 min on Air, ~47 min on MacBook Pro.

Verifying a new purchase, especially in a store you can't easily return to? Follow the Store Day Checklist, a top-to-bottom run-this-then-that guide for verifying your machine in one sitting at the store. The Benchmark Reference behind it has the sourced per-generation expected scores and live-lookup links, and Verification/scripts/compare-reports.sh reference.json yours.json diffs your unit against a known-good sibling.

./run --no-sudo skips the 10-min thermal phase (the only phase that needs sudo) for a half-runtime no-password variance-only pass.

Keep the charger plugged in for all phases.

For the manual phases (display test, hinge / keyboard / speaker / port inspection, Apple Diagnostics, optional 30-min idle drain), follow Verification/Runbook.md phases 6–9 by hand.

What a run looks like

See examples/sample-report-illustrative/ for an annotated example PASS report on a 16" M5 Max: Markdown render and the underlying JSON. (Illustrative, not a real run, replaced when crowd-sourced submissions land.)

What gets checked

Phase	What	How
0	Pre-flight & quiet-system check	`uptime`, `top`
1	Hardware identity vs. target	`system_profiler` + `sysctl` → `inventory.sh`
2	Battery health (cycles, capacity, condition)	`ioreg AppleSmartBattery` → `battery.sh`
3	Sensor & port inventory	(uses Phase 1 output)
10	Race benchmark (cold)	`xz -9 -T<P>` of a 200 MB random blob → `race-bench.sh`
11	SSD sequential read/write (cold)	2 GB incompressible random data, page cache dropped between → `ssd-test.sh`
12	Memory bandwidth (cold)	multi-threaded STREAM triad (vendored C, clang at runtime) with a `memmove` fallback → `memory-bandwidth.sh`
4	CPU performance variance	5 s burst + chassis-class-aware warmup (300 s on Pro, 60 s on Air) + 5 × 60 s timed iterations, parallel SHA-256 → `cpu-variance.sh`
4b	Non-accelerated CPU variance (opt-in `--noaccel`)	same methodology with BLAKE2b (no crypto instruction, hits the integer pipelines) → `cpu-variance.sh`
5	Sustained thermal load (chassis-class-aware thresholds)	10 min continuous + `powermetrics` sampling → `thermal-load.sh`
13	GPU compute variance (opt-in `--gpu`)	sustained Metal FMA load, compiled with `swiftc` at runtime → `gpu-variance.sh`
14	Combined CPU+GPU+memory (opt-in `--llama`)	clones + builds llama.cpp, runs `llama-bench` (AI inference load) → `llama-bench.sh`
6	Display visual inspection	fullscreen color cycle in Safari → `display-test.sh`
7	Manual physical inspection	hinge, keyboard, speakers, ports, Touch ID, etc. (runbook checklist)
8	Apple Diagnostics	reboot + Cmd-D
9	Optional idle drain	30 min sleep, measure %/30 min

Phases 10 (race), 11 (SSD), and 12 (memory bandwidth) run before the heavy phases so the chassis is cold. They produce calibration numbers (verdict "info"); pass/fail thresholds land in schema v0.3 once the submission corpus is non-empty (Phase 11 carries a conservative warn-only floor in the meantime). Unlike Phase 4 (SHA-256, hardware-accelerated), Phase 10 (LZMA) is fair across chassis families since it doesn't hit any crypto coprocessor.

Specifying your target

Two ways:

No target (the usual way). Auto-selects the matching preset from the detected hardware (chip, memory, model family, and screen size), so the right thresholds, asserts, and baseline apply with nothing to type:
```
./run
```
If exactly one preset matches it is used (and printed); if none match, it falls back to auto-detecting chassis class, including the 14"/16" sub-class from the built-in display, and skips the SKU asserts. targets/ ships presets across the M1-M5 generations (Air, 14"/16" Pro/Max) plus Intel; see targets/README.md.
Pin a preset. Pass --target to assert an expected SKU (hard-fails on a chip / RAM mismatch, useful when verifying you got the SKU you paid for) or to override the auto-match:
```
./run --target mbp-16-m5-max-64
```

(Don't see your SKU? targets/README.md has the schema. Open a PR adding a preset.)

Repo layout

mac-shakedown/
├── README.md
├── CONTRIBUTING.md
├── CHANGELOG.md
├── SECURITY.md
├── LICENSE
├── run                             # convenience entry point, execs the orchestrator
├── Shakedown Brain.md              # Obsidian map-of-content (optional, for vault users)
├── .github/                        # issue + PR templates, CI lint workflow
├── Verification/                   # generation-agnostic test machinery
│   ├── Runbook.md                  # the procedure
│   ├── Pass-Fail Criteria.md       # thresholds (parameterized by chassis class)
│   ├── Benchmark Reference.md      # install/run/expected-score + in-store protocol
│   ├── Store Day Checklist.md      # the one-command in-store flow
│   ├── Production QA.md            # gap analysis: toward factory-grade QA
│   ├── per-core-pinning.md         # why macOS has no per-core affinity (methodology note)
│   └── scripts/
│       ├── run-shakedown.sh        # the orchestrator (`./run` execs this)
│       ├── inventory.sh            # system_profiler + sysctl → JSON
│       ├── battery.sh              # ioreg battery health → JSON
│       ├── race-bench.sh           # cold xz race → JSON
│       ├── ssd-test.sh             # sequential SSD read/write → JSON (sudo for purge)
│       ├── memory-bandwidth.sh     # STREAM triad (stream-triad.c) or memmove fallback → JSON
│       ├── stream-triad.c          # vendored STREAM-style triad, compiled at runtime
│       ├── cpu-variance.sh         # burst + warmup + 5×60s timed iters → JSON (WORKLOAD=blake2b for 4b)
│       ├── thermal-load.sh         # 10-min sustained + powermetrics → JSON (sudo)
│       ├── gpu-variance.sh         # opt-in Metal compute variance, swiftc at runtime → JSON
│       ├── llama-bench.sh          # opt-in llama.cpp combined load (clone+build) → JSON
│       ├── compare-reports.sh      # diff two reports (unit vs known-good sibling)
│       ├── make-baseline.sh        # build baselines/<preset>.json from known-good reports
│       ├── repeatability.sh        # MSA: is our gage error small vs the tolerance?
│       └── display-test.sh         # fullscreen color cycle (HTML)
├── baselines/                      # calibrated golden limits per SKU (mostly empty; built from known-good runs)
├── targets/                        # preset SKU configs
│   ├── README.md
│   ├── mbp-16-m5-max-64.json
│   ├── mbp-14-m5-pro-24.json
│   ├── macbook-air-m5-16.json
│   └── mbp-16-intel-2019.json
├── examples/                       # generation-specific calibrations + sample reports
│   ├── m5-2026/                    # the 2026 M5 generation defect landscape
│   │   ├── README.md
│   │   ├── M5 Quality Issues.md
│   │   ├── Issues/
│   │   └── Sources.md
│   └── sample-report-illustrative/ # what a real run looks like (illustrative, not from a real unit)
└── Reports/                        # output artifacts
    ├── SCHEMA.md                   # JSON report schema (v1.0)
    ├── local/                      # full output, gitignored
    └── submissions/                # sanitized, PR-able copies

Adding a new generation

When a new chip line ships, copy examples/m5-2026/ to examples/<generation>-<year>/, document the issues there, and add a target preset pointing to it. See CONTRIBUTING.md.

Running individual scripts (advanced)

./run orchestrates the script-driven phases. If you want to rerun a single phase, say to confirm a borderline variance warn without redoing the whole pass, call the scripts directly:

export CHASSIS_CLASS=active-cooled-pro-16  # or -14 | fanless | desktop | intel-laptop | intel-desktop

./Verification/scripts/inventory.sh    > Reports/inventory.json
./Verification/scripts/battery.sh      > Reports/battery.json
./Verification/scripts/cpu-variance.sh > Reports/variance.json
sudo CHASSIS_CLASS="$CHASSIS_CLASS" ./Verification/scripts/thermal-load.sh > Reports/thermal.json
./Verification/scripts/display-test.sh

The inline sudo CHASSIS_CLASS=... form preserves the env var across the privilege boundary regardless of sudoers env_keep config. Read the values against Pass-Fail Criteria.

Submit a calibration report

The v0.1 thresholds need real-world data to calibrate. If you ran the harness (PASS, WARN, or FAIL), please consider submitting your report. Known-good machines from someone you trust are the most valuable submissions, since that's what the methodology currently lacks.

./run already writes the sanitized copy to Reports/submissions/<filename>.json. To submit it:

Skim the JSON for any leftover PII (free-form notes, store name, etc.).
git checkout -b submit-<your-date> && git add Reports/submissions/<filename>.json && git commit -m "submission: <chassis> <chip> <verdict>"
Open a PR. CI runs the submission audit and rejects _raw_* leakage, plaintext serials, malformed schema, or off-pattern filenames.

The manual phases (display, physical inspection, Apple Diagnostics, idle drain) land as skipped placeholders. If you ran any of them, hand-edit Reports/local/<filename>.json to fill in the results, re-sanitize, and overwrite the submission copy before the PR.

Roadmap

Wave 7 shipped most of the original roadmap: the non-accelerated variance pass (Phase 4b, BLAKE2b), GPU compute variance (Phase 13, Metal), memory bandwidth (Phase 12), a conservative SSD floor, the active-cooled-pro-14 / active-cooled-pro-16 thermal sub-classes, and M1-M4 plus Intel-era generation calibrations.

Wave 8 added the in-store toolkit: the --store thorough profile, compare-reports.sh (diff your unit against a known-good sibling), a sourced Benchmark Reference, a vendored STREAM triad in Phase 12 (real copy / scale / add / triad), and an opt-in --llama phase that builds llama.cpp for a combined CPU+GPU+memory load.

Since then: a one-command verdict banner on ./run, a single-sitting Store Day Checklist, and the first leg of production-grade QA, calibrated golden baselines (make-baseline.sh builds baselines/<preset>.json from known-good runs; ./run then bins against it for a calibrated verdict). Verification/Production QA.md is the honest gap analysis to a factory-grade screen.

Still open:

Hosted aggregator. Eventually, submission via API to a public site so reports aren't reviewed by hand. Until then, the PR-submission flow above is the aggregator. Slower, but no infra, and PR review catches PII before merge.
A corpus to calibrate against. The mechanism now exists: ./run bins against a golden baseline when baselines/<preset>.json is present (make-baseline.sh builds one from known-good reports), turning the verdict calibrated. What is still missing is the population of known-good units to characterize. The Measurement System Analysis harness now exists (repeatability.sh reports whether our gage error is small relative to the tolerance it decides against), and --strict hard-gates the preconditions we can see before a run (on AC, quiet system) so a scored run is comparable. Coverage of the remaining defect holes and SPC are next. Verification/Production QA.md lays out the full path to a production-grade screen.
Per-core pinning. macOS lacks public CPU affinity APIs, so we can't pin workers to specific cores; a defective single core gets averaged across N P-cores. Reporting worker_imbalance_pct_per_iter is the partial mitigation. The investigation is written up in Verification/per-core-pinning.md: the short version is that THREAD_AFFINITY_POLICY is ignored on Apple Silicon and QoS hints only steer between the P and E clusters, so there's no per-core pinning to be had today.

Origin

Built originally to vet a 16" M5 Max purchase abroad, where returning a defective unit isn't practical. The research that informs the M5 Max thresholds is in examples/m5-2026/, kept as a worked example of what a generation calibration looks like.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shakedown

Why this exists

Quick start

What a run looks like

What gets checked

Specifying your target

Repo layout

Adding a new generation

Running individual scripts (advanced)

Submit a calibration report

Roadmap

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
.obsidian		.obsidian
Reports		Reports
Verification		Verification
baselines		baselines
examples		examples
targets		targets
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
Shakedown Brain.md		Shakedown Brain.md
run		run

Folders and files

Latest commit

History

Repository files navigation

Shakedown

Why this exists

Quick start

What a run looks like

What gets checked

Specifying your target

Repo layout

Adding a new generation

Running individual scripts (advanced)

Submit a calibration report

Roadmap

Origin

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages