Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental jit causes infinite loops on 3.13 branch #126127

Open
tacaswell opened this issue Oct 29, 2024 · 12 comments
Open

experimental jit causes infinite loops on 3.13 branch #126127

tacaswell opened this issue Oct 29, 2024 · 12 comments
Labels
3.13 bugs and security fixes topic-JIT type-bug An unexpected behavior, bug, or error

Comments

@tacaswell
Copy link
Contributor

tacaswell commented Oct 29, 2024

Bug report

Bug description:

Attempting to install cython on the 3.13 branch fails with what looks like an infinite loop that sits there and burns CPU forever.

I have bisected this to #124266 which makes no sense to me, but I'm pretty confident about it as observed behavior.

bisect log:

git bisect log git bisect start # status: waiting for both good and bad commits # good: [60403a5] Python 3.13.0 git bisect good 60403a5 # status: waiting for bad commit, 1 good commit known # bad: [ff044ed] [3.13] gh-124295: Add translation tests for argparse (GH-124803) (GH-126046) git bisect bad ff044ed # bad: [1279be6] [3.13] gh-123133: clarify p=0 case for "f" and "e" formatting types (GH-125426) (#125428) git bisect bad 1279be6 # bad: [7bc99dd] [3.13] Tee of tee was not producing n independent iterators (gh-123884) (gh-125081) git bisect bad 7bc99dd # bad: [db3ccd8] [3.13] gh-53780: Ignore the first "--" (double dash) between an option and command in argparse (GH-124275) (GH-125073) git bisect bad db3ccd8 # bad: [11d4b54] [3.13] gh-116850: Fix argparse for namespaces with not directly writable dict (GH-124667) (GH-124757) git bisect bad 11d4b54 # bad: [167d8d2] [3.13] gh-63143: Fix parsing mutually exclusive arguments in argparse (GH-124307) (GH-124418) git bisect bad 167d8d2 # good: [80ba17a] [3.13] gh-112804: Clamping timeout value for _PySemaphore_PlatformWait (gh-124914) (gh-124991) git bisect good 80ba17a # bad: [6387016] [3.13] gh-81691: Fix handling of multiple "--" (double dashes) in argparse (GH-124233) (GH-124266) git bisect bad 6387016 # good: [6fe746d] [3.13] gh-122392: IDLE - Fix overlapping lines in browsers (GH-122392) (GH-124975) (#125061) git bisect good 6fe746d # first bad commit: [6387016] [3.13] gh-81691: Fix handling of multiple "--" (double dashes) in argparse (GH-124233) (GH-124266)

The Python invocation it is failing on from setuptools is :

/home/tcaswell/.virtualenvs/bisect/bin/python3 -u -c
exec(compile('''
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
#   import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
#   setuptools doesn't think the script is `-c`. This avoids the following warning:
#     manifest_maker: standard file '-c' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize

try:
    import setuptools
except ImportError as error:
    print(
        "ERROR: Can not execute `setup.py` since setuptools is not available in "
        "the build environment.",
        file=sys.stderr,
    )
    sys.exit(1)

__file__ = %r
sys.argv[0] = __file__

if os.path.exists(__file__):
    filename = __file__
    with tokenize.open(__file__) as f:
        setup_py_code = f.read()
else:
    filename = "<auto-generated setuptools caller>"
    setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec"))
''' % ('/home/tcaswell/source/p/cython/cython/setup.py',), "<pip-setuptools-caller>", "exec")) bdist_wheel -d /tmp/pip-wheel-l3necg1_~

I have not succeeded in getting traceback out of this yet.

  • I'm pretty confident in the bisect on the 3.13 branch, turning the jit off works fine on the both the tip of the branch and 6387016
  • the initial commit on main that was backported fails to build cython for other reasons (see GH-125868: Fix STORE_ATTR_WITH_HINT specialization #125876)
  • the current main branch works fine both with and without the jit

attn @da-woods @scoder

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

@tacaswell tacaswell added the type-bug An unexpected behavior, bug, or error label Oct 29, 2024
@tacaswell
Copy link
Contributor Author

I tried reverting 6387016 but it did not revert cleanly (it looks like there was a run of backports for argparser from @serhiy-storchaka ) and I was not sure how to correctly resolve the conflicts.

@skirpichev skirpichev added 3.13 bugs and security fixes topic-JIT labels Oct 29, 2024
@mdboom
Copy link
Contributor

mdboom commented Oct 29, 2024

I was just going to report this issue (with a different reproducer). The 2to3 benchmark in pyperformance is hanging due to this -- same first bad commit of 6387016. I see the same behavior with the Tier 2 interpreter as the full JIT.

To reproduce (with a checkout of pyperformance):

python -m lib2to3 -f all benchmarks/bm_2to3/vendor
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/__main__.py", line 4, in <module>
    sys.exit(main("lib2to3.fixes"))
             ~~~~^^^^^^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/main.py", line 263, in main
    rt.refactor(args, options.write, options.doctests_only,
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                options.processes)
                ^^^^^^^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 690, in refactor
    return super(MultiprocessRefactoringTool, self).refactor(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        items, write, doctests_only)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 284, in refactor
    self.refactor_dir(dir_or_file, write, doctests_only)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 304, in refactor_dir
    self.refactor_file(fullname, write, doctests_only)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 731, in refactor_file
    return super(MultiprocessRefactoringTool, self).refactor_file(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        *args, **kwargs)
        ^^^^^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 339, in refactor_file
    tree = self.refactor_string(input, filename)
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 371, in refactor_string
    self.refactor_tree(tree, name)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/refactor.py", line 414, in refactor_tree
    match_set = self.BM.run(tree.leaves())
  File "/home/mdboom/Work/builds/pyperformance/venv/lib/python3.13/site-packages/lib2to3/btm_matcher.py", line 103, in run
    while current_ast_node:
          ^^^^^^^^^^^^^^^^
KeyboardInterrupt

Cc: @brandtbucher, @markshannon

@brandtbucher
Copy link
Member

brandtbucher commented Oct 29, 2024

Both of these reproducers do some argument parsing, so I wouldn’t be surprised if the argparse change created some new code path that the JIT doesn’t reason about properly.

Hm. I wonder if backporting GH-125935 could fix it. I’ll try reproducing with and without that commit.

@brandtbucher
Copy link
Member

Hm, nevermind. 3.13 doesn't even support jitting BINARY_OP_INPLACE_ADD_UNICODE.

@brandtbucher
Copy link
Member

I can reproduce the pyperformance hang. Setting PYTHON_JIT=0 fixes it, but PYTHON_UOPS_OPTIMIZE=0 doesn't. So I don't think this is in the optimizer, it's probably either something to do with trace projection/translation or stitching.

@brandtbucher
Copy link
Member

Last few traces compiled before the hang support the theory that this could indeed be triggered by the argparse change, and possibly string-related:

Created a proto-trace for ArgumentParser._parse_known_args2 (/home/brandtbucher/cpython/Lib/argparse.py:1901) at byte offset 120 -- length 77
Created a proto-trace for ArgumentParser._get_positional_actions (/home/brandtbucher/cpython/Lib/argparse.py:1880) at byte offset 36 -- length 21
Created a proto-trace for ArgumentParser._parse_known_args (/home/brandtbucher/cpython/Lib/argparse.py:1939) at byte offset 1664 -- length 57
Created a proto-trace for realpath (/home/brandtbucher/cpython/Lib/posixpath.py:390) at byte offset 290 -- length 53
Created a proto-trace for _parse (/home/brandtbucher/cpython/Lib/re/_parser.py:511) at byte offset 280 -- length 41
Created a proto-trace for SubPattern.getwidth (/home/brandtbucher/cpython/Lib/re/_parser.py:177) at byte offset 924 -- length 43
Created a proto-trace for join (/home/brandtbucher/cpython/Lib/posixpath.py:71) at byte offset 78 -- length 68
Created a proto-trace for _parse (/home/brandtbucher/cpython/Lib/re/_parser.py:511) at byte offset 168 -- length 41

@brandtbucher
Copy link
Member

Digging a bit deeper, what I'm seeing during the hang is the same trace being compiled over and over again:

Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57
Created a proto-trace for BottomMatcher.run (/home/brandtbucher/cpython/env/lib/python3.13/site-packages/lib2to3/btm_matcher.py:83) at byte offset 90 -- length 57

I really wish we had a smaller reproducer, because any change I make to the loop being compiled moves the hang to another file...

@brandtbucher
Copy link
Member

It's really hard to create a minimal reproducer for this, but here's what I think you need to do to hit this:

  • Create a loop that spans at least 256 instructions. This means that the loop will end with EXTENDED_ARG; JUMP_BACKWARD.
  • During the body of the loop, allocate lots of GC objects, or raise a signal, or have another thread request a GIL drop... anything to set the eval breaker. But make sure it isn't checked until the bottom of the loop (so no calls or anything). In the case of the pyperformance reproducer with lib2to3, this is a scheduled GC run.

So here's what appears to be happening:

  1. The loop warms up, a trace is projected, and an executor is installed at the bottom of the loop. This executor actually lives on top of the EXTENDED_ARG instruction, not the JUMP_BACKWARD instruction. This is by design, since executors can only sit on top of an 8-bit oparg.
  2. We enter the executor from JUMP_BACKWARD immediately after creating it.
  3. The executor deopts immediately due to the eval breaker being set, in _TIER2_RESUME_CHECK.
  4. We deopt to the EXTENDED_ARG, which has the new ENTER_EXECUTOR sitting on top of it.
  5. We execute the ENTER_EXECUTOR instruction, which sees that the eval breaker bit is set and doesn't begin executing the trace. Instead, it reloads the original instruction underneath it.
  6. This is the EXTENDED_ARG. We execute it. The eval breaker still hasn't been handled.
  7. We execute the following JUMP_BACKWARD. The eval breaker is checked and handled, and since the warmup counter is still at 0, we project a new trace, overwriting the old one. During this, the eval breaker is set again.
  8. Repeat from step 2, with the new trace.

Basically, our forward progress guarantees are being broken.

@brandtbucher
Copy link
Member

It's the allocation of the new executor that's re-scheduling GC (and re-setting the eval breaker).

@brandtbucher
Copy link
Member

brandtbucher commented Oct 29, 2024

Okay, this is pretty contrived (in order to keep the GC in a state that triggers the bug), but here's a reproducer:

import gc

THRESHOLD = gc.get_threshold()[0]
REFS = []

def create_gc_objects_during_gc(phase, info):
    # Simulate the creation of a full young generation of GC objects in a dying
    # object's finalizer:
    if phase == "stop":
        for _ in range(THRESHOLD):
            REFS.append([])

def f():
    for _ in range(17):  # Just enough times to JIT.
        # Loop size of at least 256 instructions to force an EXTENDED_ARG jump.
        # This is 128 * (LOAD_FAST + POP_TOP):
        _;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_
        _;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_
        _;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_
        _;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_;_
        # The young generation is currently full. Allocating the executor will
        # schedule a collection, which will trigger the bug.


gc.callbacks.append(create_gc_objects_during_gc)
gc.collect()  # Clean slate, with a full young generation.

f()  # Hang!

@brandtbucher
Copy link
Member

brandtbucher commented Oct 29, 2024

ENTER_EXECUTOR already has an eval breaker check, with a comment explaining it's to avoid infinite loops exactly like this one. We should probably just add a similar check to the other places where GOTO_TIER_TWO is called.

@mdboom
Copy link
Contributor

mdboom commented Oct 30, 2024

@brandtbucher: Do you have any insight into why this doesn't happen on main? Is it the incremental gc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes topic-JIT type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants