Skip to content

Commit 193136f

Browse files
authored
Merge pull request #5 from savannahostrowski/frame-pointers
Frame pointer PEP edits
2 parents bc602f8 + ab32362 commit 193136f

1 file changed

Lines changed: 47 additions & 34 deletions

File tree

peps/pep-0830.rst

Lines changed: 47 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ PEP: 830
22
Title: Frame Pointers Everywhere: Enabling System-Level Observability for Python
33
Author: Pablo Galindo Salgado <pablogsal@python.org>,
44
Ken Jin <kenjin@python.org>,
5-
Savannah Ostrowski <savannahostrowski@gmail.com>,
5+
Savannah Ostrowski <savannah@python.org>,
66
Diego Russo <diego.russo@arm.com>
77
Discussions-To:
88
Status: Draft
@@ -15,7 +15,7 @@ Post-History:
1515
Abstract
1616
========
1717

18-
This PEP does two things:
18+
This PEP proposes two things:
1919

2020
1. **Build CPython with frame pointers by default on platforms that support
2121
them.** The default build configuration is changed to compile the
@@ -51,9 +51,9 @@ Motivation
5151

5252
Python's observability story (profiling, debugging, and system-level tracing)
5353
is fundamentally limited by the absence of frame pointers. The core motivation
54-
of this PEP is to make Python observable by default: profilers faster and more
55-
accurate, debuggers more reliable, and eBPF-based tools functional without
56-
workarounds.
54+
of this PEP is to make Python observable by default, so that profilers are
55+
faster and more accurate, debuggers are more reliable, and eBPF-based tools
56+
are functional without workarounds.
5757

5858
Today, users who want to profile CPython with system tools must rebuild the
5959
interpreter with special compiler flags, a step that most users cannot or will
@@ -405,30 +405,25 @@ The JIT Compiler Needs Frame Pointers to Be Debuggable
405405
------------------------------------------------------
406406

407407
CPython's copy-and-patch JIT (:pep:`744`) generates native machine code at
408-
runtime. Without frame pointers in that generated code, stack unwinding
409-
through JIT frames is broken for virtually every tool in the ecosystem: GDB,
410-
LLDB, libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf``,
411-
and all eBPF-based profilers.
412-
413-
The investigation in issue `#126910`_ found that compiling the JIT stencils
414-
with ``-fno-omit-frame-pointer`` and ``-mno-omit-leaf-frame-pointer`` is a
415-
two-line change that would make most existing debuggers and profilers work with
416-
JIT-compiled code immediately. The measured overhead is approximately 2% on
417-
x86-64 and even lower on AArch64 (which has a dedicated link register). This
418-
is a remarkably good outcome: other JIT compilers (V8, LuaJIT, .NET CoreCLR,
419-
Julia, LLVM's ORC JIT) typically require hundreds to thousands of lines of code
420-
to implement custom DWARF ``.eh_frame`` generation, GDB JIT interface support
408+
runtime. Without frame pointers in the interpreter, stack unwinding through
409+
JIT frames is broken for virtually every tool in the ecosystem: GDB, LLDB,
410+
libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf``, and
411+
all eBPF-based profilers. Ensuring full-stack observability for JIT-compiled
412+
code is a prerequisite for the JIT to be considered production-ready.
413+
414+
Individual JIT stencils do not need frame-pointer prologues; the entire JIT
415+
region can be treated as a single frameless region for unwinding purposes.
416+
What matters is that the interpreter itself is built with frame pointers, so
417+
that the frame-pointer register (``%rbp`` on x86-64, ``x29`` on AArch64) is
418+
reserved and not clobbered by stencil code. With frame pointers in the
419+
interpreter, unwinders can walk through JIT regions without needing to inspect
420+
individual stencils. This is a remarkably good outcome compared to other
421+
JIT compilers (V8, LuaJIT, .NET CoreCLR, Julia, LLVM's ORC JIT), which
422+
typically require hundreds to thousands of lines of code to implement custom
423+
DWARF ``.eh_frame`` generation, GDB JIT interface support
421424
(``__jit_debug_register_code``), and per-unwinder registration APIs
422-
(``_U_dyn_register``, ``__register_frame``). CPython's JIT may get most of the
423-
benefit from frame pointers alone if that follow-up change is adopted.
424-
425-
Critically, for JIT frame pointers to produce useful results, the interpreter
426-
itself must also have frame pointers. A JIT-compiled function calls back into
427-
the interpreter for many operations; if the interpreter frames lack frame
428-
pointers, the unwinder hits a gap and the stack trace is truncated. This PEP
429-
addresses that interpreter-side gap. JIT stencil flags (issue `#126910`_) are
430-
a complementary follow-up needed for complete stack unwinding in the presence
431-
of the JIT.
425+
(``_U_dyn_register``, ``__register_frame``). See issue `#126910`_ for
426+
further discussion of frame pointers and the JIT.
432427

433428
The Ecosystem Has Already Adopted Frame Pointers
434429
------------------------------------------------
@@ -836,8 +831,21 @@ incorrectly.
836831
Performance
837832
-----------
838833

839-
.. TODO: Insert full pyperformance results here once data collection
840-
is complete.
834+
Full pyperformance results comparing the frame-pointer build against an
835+
identical build without frame pointers (geometric mean across 108
836+
benchmarks):
837+
838+
===================================== =======================
839+
Machine Geometric mean overhead
840+
===================================== =======================
841+
Apple M2 Mac Mini (arm64, macOS) 1.01x slower
842+
Intel Xeon Platinum 8480 (x86-64) 1.01x slower
843+
AMD EPYC 9654 (x86-64) 1.01x slower
844+
AWS Graviton c7g.16xlarge (aarch64) 1.02x slower
845+
Ampere Altra Max (aarch64) 1.01x slower
846+
Raspberry Pi (aarch64) +X.X%
847+
macOS M3 (arm64) +X.X%
848+
===================================== =======================
841849

842850
This overhead applies to both the interpreter and to C extensions that inherit
843851
the flags via ``sysconfig``. Detailed microarchitectural analysis shows the
@@ -892,10 +900,15 @@ information not already available through CPython's existing interfaces.
892900
How to Teach This
893901
=================
894902

895-
No teaching is required. This change is invisible to Python users: no APIs
896-
change, no behaviour changes, and no user action is needed. The only observable
897-
effect is that profilers, debuggers, and system-level tracing tools produce
898-
more complete and more reliable results out of the box.
903+
For Python users and application developers, this change is invisible: no APIs
904+
change, no behaviour changes, and no user action is needed. The only
905+
observable effect is that profilers, debuggers, and system-level tracing tools
906+
produce more complete and more reliable results out of the box.
907+
908+
Though extensions should see negligible overhead, extension authors who observe a
909+
measurable regression in a specific module can opt out as described in
910+
`Extension Build Impact`_. The ``--without-frame-pointers`` configure flag is
911+
documented in `Opt-Out Configure Flag`_.
899912

900913

901914
Reference Implementation

0 commit comments

Comments
 (0)