gh-146073: Add fitness/exit quality mechanism for JIT trace frontend by cocolato · Pull Request #147966 · python/cpython

cocolato · 2026-04-01T13:24:31Z

Background

See: #146073

Introduced a preliminary fitness/exit quality mechanism for JIT trace frontend has, enabling the tracer to:

Stop proactively: Stop at the appropriate time without waiting for the buffer to fill up or hitting a dead end
Stop at good locations: Prioritize stopping at the entry points of existing executors (ENTER_EXECUTOR) to avoid stopping on instructions that can be optimized
Control frame depth: Apply a penalty that increases with depth for function call inlining

Issue: Improving trace quality by tracking "fitness" and "exit quality" #146073

Include/internal/pycore_optimizer.h

Fidget-Spinner

Great work. Thanks for doing this!

Python/optimizer.c

Include/internal/pycore_optimizer.h

Fidget-Spinner · 2026-04-01T14:08:40Z

I also forgot to mention: but we should adjust for the following:

Penalize frame depth underflow more so than normal frame push/pop.
Stop tracing when we hit MAX_ABSTRACT_FRAME_DEPTH, as we can't optimize it anyways.

markshannon

This looks good, and thanks for doing this.

Overall, the approach looks good.

I originally had in a mind that the fitness would reduce geometrically, not arithmetically, but your arithmetic approach looks easier to reason about and at least as good in terms of trace quality.

We need to reduce the number of configurable parameters to just one or two.
Then we can make sure that fitness and penalties are set such that we are guaranteed not to overflow the trace or optimizer buffers.

I'm not sure that rewinding is worth it. As long as "good" exits have a much higher score than "bad" exits, then we should (almost) always end up at a good exit.

Include/internal/pycore_interp_structs.h

markshannon · 2026-04-01T18:32:18Z

Include/internal/pycore_optimizer.h

+/* Default fitness configuration values for trace quality control.
+ * These can be overridden via PYTHON_JIT_FITNESS_* environment variables. */
+#define FITNESS_PER_INSTRUCTION        2
+#define FITNESS_INITIAL             (UOP_MAX_TRACE_LENGTH * FITNESS_PER_INSTRUCTION)


I'd stick with 2000. We really don't need traces of over 1000 uops; it is far more important to find a good stopping point than to have very long traces

@markshannon 1000 uops is only roughly 300 uops after optimization which corresponds to roughly only 60 bytecode instructions. That's too low. A reminder that the trace recording spews out a lot of uops that is only optimized away by the optimizer.

@cocolato I suspect richards perf regression might be due to too low trace length, I experimented with richards before and it's quite sensitive to how well the optimizer performs

Yes, and I tried reducing the branch/frame penalty, which also improve richards' performance.

Python/optimizer.c

markshannon · 2026-04-01T18:48:13Z

Python/optimizer.c

    trace->end -= needs_guard_ip;

    int space_needed = expansion->nuops + needs_guard_ip + 2 + (!OPCODE_HAS_NO_SAVE_IP(opcode));
    if (uop_buffer_remaining_space(trace) < space_needed) {


Suggested change

assert (uop_buffer_remaining_space(trace) > space_needed);

If we choose the fitness and exit values correctly, we can't run out of space.

I'm not sure if the current parameters allow us to remove the check here, so I think it's better to keep it for now.

Python/optimizer.c

bedevere-app · 2026-04-01T19:04:39Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

cocolato · 2026-04-02T13:38:57Z

I’ll try to optimize the relevant parameters further, as performance in benchmarks like Richards has declined, and I’ll add some debug logs to make it easier to track fitness-related metrics later on.

Python/optimizer.c

markshannon · 2026-04-02T15:21:57Z

It might be worth thinking about a few scenarios to help choose parameters.
For example:

A loop should only be unrolled once, but we do want to unroll short loops.
There shouldn't be at most ~4 unbiased branches in a trace

cocolato · 2026-04-03T12:09:13Z

Python/optimizer.c

+static inline int32_t
+compute_frame_penalty(const _PyOptimizationConfig *cfg)
+{
+    return (int32_t)cfg->fitness_initial / 10 + 1;
+}


The newest result with lower frame penalty:

+----------------------+----------+-----------------------+ | Benchmark | baseline | fitness | +======================+==========+=======================+ | raytrace | 262 ms | 212 ms: 1.24x faster | +----------------------+----------+-----------------------+ | pickle_pure_python | 277 us | 254 us: 1.09x faster | +----------------------+----------+-----------------------+ | go | 83.7 ms | 78.8 ms: 1.06x faster | +----------------------+----------+-----------------------+ | xml_etree_iterparse | 74.1 ms | 69.9 ms: 1.06x faster | +----------------------+----------+-----------------------+ | xml_etree_process | 50.4 ms | 48.3 ms: 1.04x faster | +----------------------+----------+-----------------------+ | xml_etree_generate | 77.3 ms | 74.1 ms: 1.04x faster | +----------------------+----------+-----------------------+ | xml_etree_parse | 122 ms | 119 ms: 1.03x faster | +----------------------+----------+-----------------------+ | regex_compile | 103 ms | 99.7 ms: 1.03x faster | +----------------------+----------+-----------------------+ | deltablue | 2.19 ms | 2.14 ms: 1.03x faster | +----------------------+----------+-----------------------+ | unpickle_pure_python | 171 us | 167 us: 1.02x faster | +----------------------+----------+-----------------------+ | regex_effbot | 2.16 ms | 2.12 ms: 1.02x faster | +----------------------+----------+-----------------------+ | fannkuch | 253 ms | 249 ms: 1.02x faster | +----------------------+----------+-----------------------+ | json_loads | 19.3 us | 19.1 us: 1.01x faster | +----------------------+----------+-----------------------+ | json_dumps | 7.60 ms | 7.66 ms: 1.01x slower | +----------------------+----------+-----------------------+ | pidigits | 136 ms | 138 ms: 1.01x slower | +----------------------+----------+-----------------------+ | pyflate | 267 ms | 273 ms: 1.02x slower | +----------------------+----------+-----------------------+ | float | 45.2 ms | 46.2 ms: 1.02x slower | +----------------------+----------+-----------------------+ | richards | 16.5 ms | 17.3 ms: 1.05x slower | +----------------------+----------+-----------------------+ | Geometric mean | (ref) | 1.03x faster | +----------------------+----------+-----------------------+

cocolato · 2026-04-03T12:41:47Z

debug.log

By analyzing specific trace paths in the richards, I found that the frame penalty has the greatest impact on fitness: _PUSH_FRAME: depth=1, penalty=-201 (per_frame=201) -> fitness=281

Fidget-Spinner · 2026-04-03T13:32:43Z

Very impressive, on fastmark, I see a roughly 1% speedup on my x86_64 Linux system, and on raytrace specifically it's quite a big bump:

Mean +- std dev: [raytrace_main] 290 ms +- 4 ms -> [raytrace_fitness] 268 ms +- 3 ms: 1.08x faster

Note: my system is kind of old now (i7-12700H)

Fidget-Spinner

LGTM. just one comment

Fidget-Spinner · 2026-04-03T14:53:26Z

Python/optimizer.c

+                        // so there's no point continuing the trace.
+                        DPRINTF(2, "Unsupported: frame depth %d >= MAX_ABSTRACT_FRAME_DEPTH\n",
+                                ts_depth->frame_depth);
+                        goto unsupported;


This should goto somewhere that we rewind and insert _EXIT_TRACE. The current unsupported instead inserts _DEOPT.

Perhaps try assigning int end_trace_opcode = _DEOPT, and when it hits this branch, write end_trace_opcode = _EXIT_TRACE then assign curr->opcode = end_trace_opcode;.

cocolato · 2026-04-03T14:54:02Z

I have made the requested changes; please review again.

bedevere-app · 2026-04-03T14:54:08Z

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

cocolato · 2026-04-03T14:54:38Z

Thank you all for taking the time to review this! Do the current parameters meet our expectations?

Changes have been addressed. Any minor tweaks can be done as follow-up PRs. Thanks Mark!

Fidget-Spinner

I'll take responsibility for this PR and fix/revert anything if it breaks stuff. Thanks again Hai Zhu and Mark for the reviews.

Any minor tweaks can be follow-up PRs.

Fidget-Spinner · 2026-04-03T15:53:37Z

I see a slight (~8%) slowdown in Richards on my system. However, I think that's acceptable considering most other things sped up.

cocolato and others added 2 commits April 1, 2026 00:57

add fitness && exit quality mechanism

1bfa176

Rewrite the code structure

2f9438a

cocolato requested review from FFY00, Fidget-Spinner, ZeroIntensity, ericsnowcurrently and markshannon as code owners April 1, 2026 13:24

bedevere-app bot added the awaiting review label Apr 1, 2026

bedevere-app bot mentioned this pull request Apr 1, 2026

Improving trace quality by tracking "fitness" and "exit quality" #146073

Open

cocolato added the skip news label Apr 1, 2026

cocolato commented Apr 1, 2026

View reviewed changes

Include/internal/pycore_optimizer.h Outdated Show resolved Hide resolved

Fidget-Spinner reviewed Apr 1, 2026

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

Include/internal/pycore_optimizer.h Outdated Show resolved Hide resolved

address review

709c0a1

markshannon previously requested changes Apr 1, 2026

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Apr 1, 2026

cocolato and others added 2 commits April 2, 2026 07:38

address many reviews

ef6ac24

Merge branch 'main' into jit-tracer-fitness

21f7122

cocolato marked this pull request as draft April 2, 2026 13:36

bedevere-app bot removed the awaiting changes label Apr 2, 2026

markshannon reviewed Apr 2, 2026

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

optimize some constants

b99fe61

cocolato added 3 commits April 2, 2026 23:22

fix comment

d09afb5

fix constent

c9957c3

reduce frame penalty

9447546

cocolato commented Apr 3, 2026

View reviewed changes

add debug log

7d3e4c4

cocolato marked this pull request as ready for review April 3, 2026 14:02

bedevere-app bot added the awaiting review label Apr 3, 2026

Fidget-Spinner reviewed Apr 3, 2026

View reviewed changes

bedevere-app bot added awaiting change review and removed awaiting review labels Apr 3, 2026

bedevere-app bot requested a review from markshannon April 3, 2026 14:54

cocolato added 2 commits April 3, 2026 22:59

address review

2c1b5e0

address review

2409b2f

bedevere-app bot added awaiting review and removed awaiting change review labels Apr 3, 2026

Fidget-Spinner approved these changes Apr 3, 2026

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Apr 3, 2026

Fidget-Spinner merged commit 198b04b into python:main Apr 3, 2026
74 of 77 checks passed

bedevere-app bot removed the awaiting merge label Apr 3, 2026

Uh oh!

Conversation

cocolato commented Apr 1, 2026 • edited by markshannon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fidget-Spinner commented Apr 1, 2026

Uh oh!

markshannon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

cocolato Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

cocolato Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bedevere-app bot commented Apr 1, 2026

Uh oh!

cocolato commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

markshannon commented Apr 2, 2026

Uh oh!

cocolato Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

cocolato commented Apr 3, 2026

Uh oh!

Fidget-Spinner commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cocolato commented Apr 3, 2026

Uh oh!

bedevere-app bot commented Apr 3, 2026

Uh oh!

cocolato commented Apr 3, 2026

Uh oh!

Fidget-Spinner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner commented Apr 3, 2026

Uh oh!

Uh oh!

cocolato commented Apr 1, 2026 •

edited by markshannon

Loading

cocolato Apr 3, 2026 •

edited

Loading

cocolato commented Apr 2, 2026 •

edited

Loading

Fidget-Spinner commented Apr 3, 2026 •

edited

Loading

Fidget-Spinner Apr 3, 2026 •

edited

Loading

Fidget-Spinner left a comment •

edited

Loading