Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow subprocess performance on Python 3.13 #49

Open
stefan6419846 opened this issue Aug 10, 2024 · 0 comments
Open

Slow subprocess performance on Python 3.13 #49

stefan6419846 opened this issue Aug 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@stefan6419846
Copy link
Owner

Python 3.13 currently has a bad test speed (1:15 min for Python < 3.13, 4:30 for Python 3.13).

It seems to be related to tests.test_main.MainTestCase.test_retrieval. Since Python 3.13, it does not allow running in parallel any more due to some internal timeouts:

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
    r = call_item()
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 598, in __call__
    return [func(*args, **kwargs)
            ~~~~^^^^^^^^^^^^^^^^^
  File "/home/runner/work/license_tools/license_tools/license_tools/retrieval.py", line 214, in run_on_file
    return FileResults(
        path=path, short_path=short_path, retrieve_licenses=True,
    ...<3 lines>...
        retrieve_file_info=retrieval_kwargs["retrieve_file_info"],
    )
  File "<string>", line 15, in __init__
  File "/home/runner/work/license_tools/license_tools/license_tools/tools/scancode_tools.py", line 522, in __post_init__
    self.licenses = Licenses(**api.get_licenses(path_str))
                               ~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/scancode/api.py", line 200, in get_licenses
    for detection in detections:
                     ^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/licensedcode/detection.py", line 1924, in detect_licenses
    index = cache.get_index()
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/licensedcode/cache.py", line 459, in get_index
    return get_cache(
           ~~~~~~~~~^
        only_builtin=only_builtin,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        additional_directory=additional_directory
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ).index
    ^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/licensedcode/cache.py", line 399, in get_cache
    return populate_cache(
        only_builtin=only_builtin,
    ...<2 lines>...
        additional_directory=additional_directory,
    )
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/licensedcode/cache.py", line 419, in populate_cache
    _LICENSE_CACHE = LicenseCache.load_or_build(
        only_builtin=only_builtin,
    ...<6 lines>...
        additional_directory=additional_directory,
    )
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/licensedcode/cache.py", line 136, in load_or_build
    with lockfile.FileLock(lock_file).locked(timeout=timeout):
    ...<55 lines>...
        return license_cache
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/contextlib.py", line 141, in __enter__
    return next(self.gen)
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/scancode/lockfile.py", line 29, in locked
    raise LockTimeout(timeout)
scancode.lockfile.LockTimeout: 360
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/runner/work/license_tools/license_tools/license_tools/__main__.py", line 203, in <module>
    main()
    ~~~~^^
  File "/home/runner/work/license_tools/license_tools/license_tools/__main__.py", line 182, in main
    retrieval.run(
    ~~~~~~~~~~~~~^
        directory=arguments.directory,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<14 lines>...
        retrieve_cargo_metadata=arguments.retrieve_cargo_metadata,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/runner/work/license_tools/license_tools/license_tools/retrieval.py", line 438, in run
    results = list(
        run_on_downloaded_package_file(
    ...<5 lines>...
        )
    )
  File "/home/runner/work/license_tools/license_tools/license_tools/retrieval.py", line 360, in run_on_downloaded_package_file
    yield from run_on_package_archive_file(
    ...<3 lines>...
    )
  File "/home/runner/work/license_tools/license_tools/license_tools/retrieval.py", line 296, in run_on_package_archive_file
    yield from run_on_directory(
    ...<3 lines>...
    )
  File "/home/runner/work/license_tools/license_tools/license_tools/retrieval.py", line 243, in run_on_directory
    results = Parallel(n_jobs=job_count)(
        delayed(run_on_file)(
    ...<4 lines>...
        for path, short_path in files
    )
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 2007, in __call__
    return output if self.return_generator else list(output)
                                                ~~~~^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 1650, in _get_outputs
    yield from self._retrieve()
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 1754, in _retrieve
    self._raise_error_fast()
    ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
    error_job.get_result(self.timeout)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 745, in get_result
    return self._return_or_raise()
           ~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/hostedtoolcache/Python/3.13.0-beta.3/x64/lib/python3.13/site-packages/joblib/parallel.py", line 763, in _return_or_raise
    raise self._result
scancode.lockfile.LockTimeout: 360

Sequential processing works, but its performance is bad as well without any apparent reason. Isolated tests with bash-based parallel processing (instead of joblib) did not show this behavior: https://github.com/stefan6419846/scancode-tests/blob/main/.github/workflows/run.yml

@stefan6419846 stefan6419846 added the bug Something isn't working label Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant