Skip to content

Commit fd3ea6c

Browse files
authored
Merge pull request #8 from Fidget-Spinner/frame-pointers-kj2
Add perf figures, tighten JIT claims, temporarily remove Diego
2 parents c69f2c2 + 24e90ad commit fd3ea6c

File tree

3 files changed

+42370
-13
lines changed

3 files changed

+42370
-13
lines changed

peps/pep-0830.rst

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ Title: Frame Pointers Everywhere: Enabling System-Level Observability for Python
33
Author: Pablo Galindo Salgado <[email protected]>,
44
Ken Jin <[email protected]>,
55
Savannah Ostrowski <[email protected]>,
6-
Diego Russo <[email protected]>
76
Discussions-To:
87
Status: Draft
98
Type: Standards Track
@@ -405,18 +404,18 @@ The JIT Compiler Needs Frame Pointers to Be Debuggable
405404
------------------------------------------------------
406405

407406
CPython's copy-and-patch JIT (:pep:`744`) generates native machine code at
408-
runtime. Without frame pointers in the interpreter, stack unwinding through
407+
runtime. Without reserved frame pointers in the JIT code, stack unwinding through
409408
JIT frames is broken for virtually every tool in the ecosystem: GDB, LLDB,
410409
libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf``, and
411410
all eBPF-based profilers. Ensuring full-stack observability for JIT-compiled
412411
code is a prerequisite for the JIT to be considered production-ready.
413412

414413
Individual JIT stencils do not need frame-pointer prologues; the entire JIT
415414
region can be treated as a single frameless region for unwinding purposes.
416-
What matters is that the interpreter itself is built with frame pointers, so
415+
What matters is that the JIT itself is must reserve frame pointers, so
417416
that the frame-pointer register (``%rbp`` on x86-64, ``x29`` on AArch64) is
418417
reserved and not clobbered by stencil code. With frame pointers in the
419-
interpreter, unwinders can walk through JIT regions without needing to inspect
418+
JIT, most unwinders can walk through JIT regions without needing to inspect
420419
individual stencils. This is a remarkably good outcome compared to other
421420
JIT compilers (V8, LuaJIT, .NET CoreCLR, Julia, LLVM's ORC JIT), which
422421
typically require hundreds to thousands of lines of code to implement custom
@@ -840,14 +839,14 @@ pyperformance JSON files can be found in
840839
===================================== =======================
841840
Machine Geometric mean overhead
842841
===================================== =======================
843-
Apple M2 Mac Mini (arm64) 1.01x slower
844-
Intel Xeon Platinum 8480 (x86-64) 1.01x slower
845-
AMD EPYC 9654 (x86-64) 1.01x slower
846-
AWS Graviton c7g.16xlarge (aarch64) 1.02x slower
847-
Ampere Altra Max (aarch64) 1.01x slower
848-
Raspberry Pi (aarch64). 1.00x slower
849-
macOS M3 Pro (arm64) 1.00x slower
850-
Intel i7 12700H (x86-64) 1.02x slower
842+
Apple M2 Mac Mini (arm64) 1.006x slower
843+
macOS M3 Pro (arm64) 1.001x slower
844+
Raspberry Pi (aarch64). 1.002x slower
845+
Ampere Altra Max (aarch64) 1.020x slower
846+
AWS Graviton c7g.16xlarge (aarch64) 1.027x slower
847+
Intel i7 12700H (x86-64) 1.019x slower
848+
AMD EPYC 9654 (x86-64) 1.008x slower
849+
Intel Xeon Platinum 8480 (x86-64) 1.006x slower
851850
===================================== =======================
852851

853852
This overhead applies to both the interpreter and to C extensions that inherit
@@ -1048,7 +1047,20 @@ Footnotes
10481047
Appendix
10491048
========
10501049

1051-
# TODO: KJ, once we have Diego's results.
1050+
For all graphs below, the green dots are geometric means of the
1051+
individual benchmark's median, while orange lines are the median of our data points.
1052+
Hollow circles reperesent outliers.
1053+
1054+
The first graph is the overall effect on pyperformance seen on each system.
1055+
Apart from the Ubuntu AWS Graviton System, all system configurations have below 2%
1056+
geometric mean and median slowdown:
1057+
1058+
.. image:: pep-0830_perf_over_baseline.svg
1059+
1060+
For individual benchmark results, see the following:
1061+
1062+
.. image:: pep-0830_perf_over_baseline_indiv.svg
1063+
10521064

10531065
Copyright
10541066
=========

0 commit comments

Comments
 (0)