这是indexloc提供的服务,不要输入任何密码
Skip to content

JIT x Free-Threading #141594

@Fidget-Spinner

Description

@Fidget-Spinner

Feature or enhancement

Proposal:

This provides an alternative design to #133171. I have a branch where all tests (including FT) pass.

Phase 1:
For the first and simplest phase, we could add a watcher/callback to thread creation. When multiple threads are created, we simply invalidate the executor as _CHECK_VALIDITY fails.

This means single threaded code gets all the benefits of JIT in the future, while multi-threaded code will lose all the benefits. However, both can coexist at the same time in the same build. We will finally get JIT + FT in a limited form.

Phase 2:
If we detect multiple threads are running, we turn off the non-thread safe optimizations, and redo the trace with those optimizations off. If we detect only a single thread, we follow phase 1 and run till we get invalidated. This allows single threaded code to run faster, while multi-threaded code will run a little slower. Note that all optimizations I've proposed for the JIT in the past 6 months and in the future are FT safe (in theory. whether they are implemented in practice is different).

Design:

  1. Move all important state to thread state.
  2. Remove reference counting from executors, and make them effectively immortal.
    a. There's a good reason: executors effectively already behave like immortal objects, we only clear them on invalidation or cold. So we should just remove refcounting and manage them ourselves. This also makes them more thread-safe (part 3).
  3. Use the chain of executors to only do invalidation. No freeing!
  4. Defer freeing executors to the cold executor cleanup, not during invalidation. Freeing executors could trigger arbitrary code, which may cause deadlocks.
  5. Lock code object on inserting executors. Lock runtime when inserting executors into the chain.
  6. Creation of >1 threads cause global invalidation of all executors and disables JIT.
  7. Remove all locks from executor cases. Those aren't needed. The only atomic check required is _CHECK_VALIDITY.
  8. Reading of opcodes needs to be atomic due to possible race with instrumentation. Reading of caches do not need to be atomic.

Follow-up after this:

  • Re-enable the reverse type cache in _PyType_LookupByVersion, and set_version_unlocked. Re-enable the optimizer tests for them as well.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

#133171

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions