Skip to content

Commit

Permalink
gh-124694: Add concurrent.futures.InterpreterPoolExecutor (gh-124548)
Browse files Browse the repository at this point in the history
This is an implementation of InterpreterPoolExecutor that builds on ThreadPoolExecutor.

(Note that this is not tied to PEP 734, which is strictly about adding a new stdlib module.)

Possible future improvements:

* support passing a script for the initializer or to submit()
* support passing (most) arbitrary functions without pickling
* support passing closures
* optionally exec functions against __main__ instead of the their original module
  • Loading branch information
ericsnowcurrently authored Oct 16, 2024
1 parent a38fef4 commit a5a7f5e
Show file tree
Hide file tree
Showing 12 changed files with 826 additions and 38 deletions.
6 changes: 4 additions & 2 deletions Doc/library/asyncio-dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,8 @@ To handle signals the event loop must be
run in the main thread.

The :meth:`loop.run_in_executor` method can be used with a
:class:`concurrent.futures.ThreadPoolExecutor` to execute
:class:`concurrent.futures.ThreadPoolExecutor` or
:class:`~concurrent.futures.InterpreterPoolExecutor` to execute
blocking code in a different OS thread without blocking the OS thread
that the event loop runs in.

Expand All @@ -128,7 +129,8 @@ if a function performs a CPU-intensive calculation for 1 second,
all concurrent asyncio Tasks and IO operations would be delayed
by 1 second.

An executor can be used to run a task in a different thread or even in
An executor can be used to run a task in a different thread,
including in a different interpreter, or even in
a different process to avoid blocking the OS thread with the
event loop. See the :meth:`loop.run_in_executor` method for more
details.
Expand Down
9 changes: 8 additions & 1 deletion Doc/library/asyncio-eventloop.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1305,6 +1305,12 @@ Executing code in thread or process pools
pool, cpu_bound)
print('custom process pool', result)

# 4. Run in a custom interpreter pool:
with concurrent.futures.InterpreterPoolExecutor() as pool:
result = await loop.run_in_executor(
pool, cpu_bound)
print('custom interpreter pool', result)

if __name__ == '__main__':
asyncio.run(main())

Expand All @@ -1329,7 +1335,8 @@ Executing code in thread or process pools

Set *executor* as the default executor used by :meth:`run_in_executor`.
*executor* must be an instance of
:class:`~concurrent.futures.ThreadPoolExecutor`.
:class:`~concurrent.futures.ThreadPoolExecutor`, which includes
:class:`~concurrent.futures.InterpreterPoolExecutor`.

.. versionchanged:: 3.11
*executor* must be an instance of
Expand Down
2 changes: 1 addition & 1 deletion Doc/library/asyncio-llapi-index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ See also the main documentation section about the
- Invoke a callback *at* the given time.


.. rubric:: Thread/Process Pool
.. rubric:: Thread/Interpreter/Process Pool
.. list-table::
:widths: 50 50
:class: full-width-table
Expand Down
135 changes: 131 additions & 4 deletions Doc/library/concurrent.futures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ The :mod:`concurrent.futures` module provides a high-level interface for
asynchronously executing callables.

The asynchronous execution can be performed with threads, using
:class:`ThreadPoolExecutor`, or separate processes, using
:class:`ProcessPoolExecutor`. Both implement the same interface, which is
defined by the abstract :class:`Executor` class.
:class:`ThreadPoolExecutor` or :class:`InterpreterPoolExecutor`,
or separate processes, using :class:`ProcessPoolExecutor`.
Each implements the same interface, which is defined
by the abstract :class:`Executor` class.

.. include:: ../includes/wasm-notavail.rst

Expand Down Expand Up @@ -63,7 +64,8 @@ Executor Objects
setting *chunksize* to a positive integer. For very long iterables,
using a large value for *chunksize* can significantly improve
performance compared to the default size of 1. With
:class:`ThreadPoolExecutor`, *chunksize* has no effect.
:class:`ThreadPoolExecutor` and :class:`InterpreterPoolExecutor`,
*chunksize* has no effect.

.. versionchanged:: 3.5
Added the *chunksize* argument.
Expand Down Expand Up @@ -227,6 +229,111 @@ ThreadPoolExecutor Example
print('%r page is %d bytes' % (url, len(data)))


InterpreterPoolExecutor
-----------------------

The :class:`InterpreterPoolExecutor` class uses a pool of interpreters
to execute calls asynchronously. It is a :class:`ThreadPoolExecutor`
subclass, which means each worker is running in its own thread.
The difference here is that each worker has its own interpreter,
and runs each task using that interpreter.

The biggest benefit to using interpreters instead of only threads
is true multi-core parallelism. Each interpreter has its own
:term:`Global Interpreter Lock <global interpreter lock>`, so code
running in one interpreter can run on one CPU core, while code in
another interpreter runs unblocked on a different core.

The tradeoff is that writing concurrent code for use with multiple
interpreters can take extra effort. However, this is because it
forces you to be deliberate about how and when interpreters interact,
and to be explicit about what data is shared between interpreters.
This results in several benefits that help balance the extra effort,
including true multi-core parallelism, For example, code written
this way can make it easier to reason about concurrency. Another
major benefit is that you don't have to deal with several of the
big pain points of using threads, like nrace conditions.

Each worker's interpreter is isolated from all the other interpreters.
"Isolated" means each interpreter has its own runtime state and
operates completely independently. For example, if you redirect
:data:`sys.stdout` in one interpreter, it will not be automatically
redirected any other interpreter. If you import a module in one
interpreter, it is not automatically imported in any other. You
would need to import the module separately in interpreter where
you need it. In fact, each module imported in an interpreter is
a completely separate object from the same module in a different
interpreter, including :mod:`sys`, :mod:`builtins`,
and even ``__main__``.

Isolation means a mutable object, or other data, cannot be used
by more than one interpreter at the same time. That effectively means
interpreters cannot actually share such objects or data. Instead,
each interpreter must have its own copy, and you will have to
synchronize any changes between the copies manually. Immutable
objects and data, like the builtin singletons, strings, and tuples
of immutable objects, don't have these limitations.

Communicating and synchronizing between interpreters is most effectively
done using dedicated tools, like those proposed in :pep:`734`. One less
efficient alternative is to serialize with :mod:`pickle` and then send
the bytes over a shared :mod:`socket <socket>` or
:func:`pipe <os.pipe>`.

.. class:: InterpreterPoolExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=(), shared=None)

A :class:`ThreadPoolExecutor` subclass that executes calls asynchronously
using a pool of at most *max_workers* threads. Each thread runs
tasks in its own interpreter. The worker interpreters are isolated
from each other, which means each has its own runtime state and that
they can't share any mutable objects or other data. Each interpreter
has its own :term:`Global Interpreter Lock <global interpreter lock>`,
which means code run with this executor has true multi-core parallelism.

The optional *initializer* and *initargs* arguments have the same
meaning as for :class:`!ThreadPoolExecutor`: the initializer is run
when each worker is created, though in this case it is run.in
the worker's interpreter. The executor serializes the *initializer*
and *initargs* using :mod:`pickle` when sending them to the worker's
interpreter.

.. note::
Functions defined in the ``__main__`` module cannot be pickled
and thus cannot be used.

.. note::
The executor may replace uncaught exceptions from *initializer*
with :class:`~concurrent.futures.interpreter.ExecutionFailed`.

The optional *shared* argument is a :class:`dict` of objects that all
interpreters in the pool share. The *shared* items are added to each
interpreter's ``__main__`` module. Not all objects are shareable.
Shareable objects include the builtin singletons, :class:`str`
and :class:`bytes`, and :class:`memoryview`. See :pep:`734`
for more info.

Other caveats from parent :class:`ThreadPoolExecutor` apply here.

:meth:`~Executor.submit` and :meth:`~Executor.map` work like normal,
except the worker serializes the callable and arguments using
:mod:`pickle` when sending them to its interpreter. The worker
likewise serializes the return value when sending it back.

.. note::
Functions defined in the ``__main__`` module cannot be pickled
and thus cannot be used.

When a worker's current task raises an uncaught exception, the worker
always tries to preserve the exception as-is. If that is successful
then it also sets the ``__cause__`` to a corresponding
:class:`~concurrent.futures.interpreter.ExecutionFailed`
instance, which contains a summary of the original exception.
In the uncommon case that the worker is not able to preserve the
original as-is then it directly preserves the corresponding
:class:`~concurrent.futures.interpreter.ExecutionFailed`
instance instead.


ProcessPoolExecutor
-------------------

Expand Down Expand Up @@ -574,6 +681,26 @@ Exception classes

.. versionadded:: 3.7

.. currentmodule:: concurrent.futures.interpreter

.. exception:: BrokenInterpreterPool

Derived from :exc:`~concurrent.futures.thread.BrokenThreadPool`,
this exception class is raised when one of the workers
of a :class:`~concurrent.futures.InterpreterPoolExecutor`
has failed initializing.

.. versionadded:: next

.. exception:: ExecutionFailed

Raised from :class:`~concurrent.futures.InterpreterPoolExecutor` when
the given initializer fails or from
:meth:`~concurrent.futures.Executor.submit` when there's an uncaught
exception from the submitted task.

.. versionadded:: next

.. currentmodule:: concurrent.futures.process

.. exception:: BrokenProcessPool
Expand Down
8 changes: 8 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,14 @@ ast
* The ``repr()`` output for AST nodes now includes more information.
(Contributed by Tomas R in :gh:`116022`.)

concurrent.futures
------------------

* Add :class:`~concurrent.futures.InterpreterPoolExecutor`,
which exposes "subinterpreters (multiple Python interpreters in the
same process) to Python code. This is separate from the proposed API
in :pep:`734`.
(Contributed by Eric Snow in :gh:`124548`.)

ctypes
------
Expand Down
12 changes: 11 additions & 1 deletion Lib/concurrent/futures/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
'Executor',
'wait',
'as_completed',
'InterpreterPoolExecutor',
'ProcessPoolExecutor',
'ThreadPoolExecutor',
)
Expand All @@ -39,7 +40,7 @@ def __dir__():


def __getattr__(name):
global ProcessPoolExecutor, ThreadPoolExecutor
global ProcessPoolExecutor, ThreadPoolExecutor, InterpreterPoolExecutor

if name == 'ProcessPoolExecutor':
from .process import ProcessPoolExecutor as pe
Expand All @@ -51,4 +52,13 @@ def __getattr__(name):
ThreadPoolExecutor = te
return te

if name == 'InterpreterPoolExecutor':
try:
from .interpreter import InterpreterPoolExecutor as ie
except ModuleNotFoundError:
ie = InterpreterPoolExecutor = None
else:
InterpreterPoolExecutor = ie
return ie

raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
Loading

0 comments on commit a5a7f5e

Please sign in to comment.