perf(inference): eliminate ORT thread-pool spin-wait + synthetic test source by TCVinNYC · Pull Request #120 · AutoPTZ/autoptz

TCVinNYC · 2026-06-26T03:41:52Z

Problem

With multiple cameras the app pegged CPU (~930% / 66% system on 4 real cameras) with bursty variance and frame drops. Profiling the live session (macOS `sample`) showed the #1 consumer was ORT thread-pool spin-wait (`ThreadPoolTempl::WorkerLoop`, 10,105 leaf samples; 88 ORT worker threads) — threads busy-spinning between intermittent inference runs, not actual compute. insightface's SCRFD+ArcFace sessions made it worse: `get_model()` forwards only `providers`, so they ran cores-wide + spinning, bypassing the per-camera thread cap.

Fix

`inference.py` — `_apply_low_idle_threading()` disables ORT intra/inter-op spinning (`session.intra_op.allow_spinning=0`) and forces a single sequential inter-op pool on every `make_session()` session (detector + pose). Pure scheduling change; identical inference results.
`identify.py` — `_capped_insightface_sessions()` scopes a patch of `onnxruntime.InferenceSession.init` to inject a capped, non-spinning `SessionOptions` during `FaceAnalysis` construction (the only hook insightface leaves). Guarded by a module lock since the degraded per-worker face-build path isn't otherwise serialised across camera threads.

Results (validated)

	old	fixed
Inference pool, headless A/B (4 cams, yolo11m, all services)	514%	273%
Real 4 cameras, full pipeline (PyCharm)	~930% / 66% sys	~411% / 29% sys

A 2.3× reduction, under the 30% system target, with the CPU variance gone. Live re-profile confirms `WorkerLoop` is eliminated from the hotspots.

Test / scaling infrastructure (enables headless multi-camera CPU validation, no camera permission)

`SyntheticAdapter` (`source type: synthetic`) — procedural / image / video synthetic source that pans the scene so detection/tracking/ego all run; effective-fps telemetry (`AUTOPTZ_SYNTH_DEBUG`).
`AUTOPTZ_DB_PATH` — run against an isolated config profile.
`AUTOPTZ_SKIP_CAMERA_PREFLIGHT` — start the engine with no local camera (NDI/RTSP/synthetic-only or headless), instead of gating all engine start on macOS camera permission.
`tests/test_synthetic_source.py` (9 tests).

All 1336 tests pass; ruff + mypy + selftest green.

🤖 Generated with Claude Code

…ace pools The dominant CPU consumer with multiple cameras was ORT thread-pool spin-wait, not inference compute. A profiler (macOS sample) of a live 4-camera session showed ThreadPoolTempl::WorkerLoop as the #1 leaf (10105 samples) with 88 ORT worker threads — each session's intra-op pool busy-spins ~200ms before parking, so several intermittently-run sessions (detect every Nth frame, pose/face a few Hz) turn into a wall of idle CPU. Fixes: - inference.py: _apply_low_idle_threading() disables intra/inter-op spinning and forces a single sequential inter-op pool on every make_session() session (detector + pose). Pure scheduling change; no effect on results. - identify.py: insightface's get_model() forwards only providers (no SessionOptions), so its SCRFD+ArcFace sessions ran cores-wide + spinning, bypassing the cap. _capped_insightface_sessions() scopes a patch of InferenceSession.__init__ to inject a capped, non-spinning SessionOptions during FaceAnalysis construction, guarded by a module lock (the degraded per-worker face-build path is not otherwise serialised across camera threads). Headless A/B (4 cams, yolo11m, all services): inference pool 514% -> 273% CPU. Live-GUI profile confirms WorkerLoop eliminated from the hotspots. Test/scaling infra (enables headless multi-camera CPU validation, no camera perm): - ingest.py: SyntheticAdapter — procedural / image / video synthetic source that pans the scene so detection/tracking/ego all run; effective-fps telemetry. - frame_source.py + models.py: wire source type "synthetic". - store.py: AUTOPTZ_DB_PATH override to run against an isolated profile. - app.py: AUTOPTZ_SKIP_CAMERA_PREFLIGHT to start the engine without a local camera (NDI/RTSP/synthetic-only or headless runs). - tests/test_synthetic_source.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

TCVinNYC and others added 2 commits June 25, 2026 23:41

Merge main (CI determinism fix) into perf/cpu-multicam

0fc2941

TCVinNYC merged commit 418bf66 into main Jun 26, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(inference): eliminate ORT thread-pool spin-wait + synthetic test source#120

perf(inference): eliminate ORT thread-pool spin-wait + synthetic test source#120
TCVinNYC merged 2 commits into
mainfrom
perf/cpu-multicam

TCVinNYC commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TCVinNYC commented Jun 26, 2026

Problem

Fix

Results (validated)

Test / scaling infrastructure (enables headless multi-camera CPU validation, no camera permission)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant