22Title: Frame Pointers Everywhere: Enabling System-Level Observability for Python
33Author: Pablo Galindo Salgado <pablogsal@python.org>,
44 Ken Jin <kenjin@python.org>,
5- Savannah Ostrowski <savannahostrowski@gmail.com >,
5+ Savannah Ostrowski <savannah@python.org >,
66 Diego Russo <diego.russo@arm.com>
77Discussions-To:
88Status: Draft
@@ -15,7 +15,7 @@ Post-History:
1515Abstract
1616========
1717
18- This PEP does two things:
18+ This PEP proposes two things:
1919
20201. **Build CPython with frame pointers by default on platforms that support
2121 them. ** The default build configuration is changed to compile the
@@ -51,9 +51,9 @@ Motivation
5151
5252Python's observability story (profiling, debugging, and system-level tracing)
5353is fundamentally limited by the absence of frame pointers. The core motivation
54- of this PEP is to make Python observable by default: profilers faster and more
55- accurate, debuggers more reliable, and eBPF-based tools functional without
56- workarounds.
54+ of this PEP is to make Python observable by default, so that profilers are
55+ faster and more accurate, debuggers are more reliable, and eBPF-based tools
56+ are functional without workarounds.
5757
5858Today, users who want to profile CPython with system tools must rebuild the
5959interpreter with special compiler flags, a step that most users cannot or will
@@ -405,30 +405,25 @@ The JIT Compiler Needs Frame Pointers to Be Debuggable
405405------------------------------------------------------
406406
407407CPython's copy-and-patch JIT (:pep: `744 `) generates native machine code at
408- runtime. Without frame pointers in that generated code, stack unwinding
409- through JIT frames is broken for virtually every tool in the ecosystem: GDB,
410- LLDB, libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf ``,
411- and all eBPF-based profilers.
412-
413- The investigation in issue `#126910 `_ found that compiling the JIT stencils
414- with ``-fno-omit-frame-pointer `` and ``-mno-omit-leaf-frame-pointer `` is a
415- two-line change that would make most existing debuggers and profilers work with
416- JIT-compiled code immediately. The measured overhead is approximately 2% on
417- x86-64 and even lower on AArch64 (which has a dedicated link register). This
418- is a remarkably good outcome: other JIT compilers (V8, LuaJIT, .NET CoreCLR,
419- Julia, LLVM's ORC JIT) typically require hundreds to thousands of lines of code
420- to implement custom DWARF ``.eh_frame `` generation, GDB JIT interface support
408+ runtime. Without frame pointers in the interpreter, stack unwinding through
409+ JIT frames is broken for virtually every tool in the ecosystem: GDB, LLDB,
410+ libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf ``, and
411+ all eBPF-based profilers. Ensuring full-stack observability for JIT-compiled
412+ code is a prerequisite for the JIT to be considered production-ready.
413+
414+ Individual JIT stencils do not need frame-pointer prologues; the entire JIT
415+ region can be treated as a single frameless region for unwinding purposes.
416+ What matters is that the interpreter itself is built with frame pointers, so
417+ that the frame-pointer register (``%rbp `` on x86-64, ``x29 `` on AArch64) is
418+ reserved and not clobbered by stencil code. With frame pointers in the
419+ interpreter, unwinders can walk through JIT regions without needing to inspect
420+ individual stencils. This is a remarkably good outcome compared to other
421+ JIT compilers (V8, LuaJIT, .NET CoreCLR, Julia, LLVM's ORC JIT), which
422+ typically require hundreds to thousands of lines of code to implement custom
423+ DWARF ``.eh_frame `` generation, GDB JIT interface support
421424(``__jit_debug_register_code ``), and per-unwinder registration APIs
422- (``_U_dyn_register ``, ``__register_frame ``). CPython's JIT may get most of the
423- benefit from frame pointers alone if that follow-up change is adopted.
424-
425- Critically, for JIT frame pointers to produce useful results, the interpreter
426- itself must also have frame pointers. A JIT-compiled function calls back into
427- the interpreter for many operations; if the interpreter frames lack frame
428- pointers, the unwinder hits a gap and the stack trace is truncated. This PEP
429- addresses that interpreter-side gap. JIT stencil flags (issue `#126910 `_) are
430- a complementary follow-up needed for complete stack unwinding in the presence
431- of the JIT.
425+ (``_U_dyn_register ``, ``__register_frame ``). See issue `#126910 `_ for
426+ further discussion of frame pointers and the JIT.
432427
433428The Ecosystem Has Already Adopted Frame Pointers
434429------------------------------------------------
@@ -836,8 +831,21 @@ incorrectly.
836831Performance
837832-----------
838833
839- .. TODO: Insert full pyperformance results here once data collection
840- is complete.
834+ Full pyperformance results comparing the frame-pointer build against an
835+ identical build without frame pointers (geometric mean across 108
836+ benchmarks):
837+
838+ ===================================== =======================
839+ Machine Geometric mean overhead
840+ ===================================== =======================
841+ Apple M2 Mac Mini (arm64, macOS) 1.01x slower
842+ Intel Xeon Platinum 8480 (x86-64) 1.01x slower
843+ AMD EPYC 9654 (x86-64) 1.01x slower
844+ AWS Graviton c7g.16xlarge (aarch64) 1.02x slower
845+ Ampere Altra Max (aarch64) 1.01x slower
846+ Raspberry Pi (aarch64) +X.X%
847+ macOS M3 (arm64) +X.X%
848+ ===================================== =======================
841849
842850This overhead applies to both the interpreter and to C extensions that inherit
843851the flags via ``sysconfig ``. Detailed microarchitectural analysis shows the
@@ -892,10 +900,15 @@ information not already available through CPython's existing interfaces.
892900How to Teach This
893901=================
894902
895- No teaching is required. This change is invisible to Python users: no APIs
896- change, no behaviour changes, and no user action is needed. The only observable
897- effect is that profilers, debuggers, and system-level tracing tools produce
898- more complete and more reliable results out of the box.
903+ For Python users and application developers, this change is invisible: no APIs
904+ change, no behaviour changes, and no user action is needed. The only
905+ observable effect is that profilers, debuggers, and system-level tracing tools
906+ produce more complete and more reliable results out of the box.
907+
908+ Though extensions should see negligible overhead, extension authors who observe a
909+ measurable regression in a specific module can opt out as described in
910+ `Extension Build Impact `_. The ``--without-frame-pointers `` configure flag is
911+ documented in `Opt-Out Configure Flag `_.
899912
900913
901914Reference Implementation
0 commit comments