-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestExecutorDeadLock.test_crash_races[13] crash #279
Comments
Hum, it's probably |
The same problem happened in the #278 PR. So the frequency of this problem might be higher than expected. Again it's on Python 3.8 on windows. |
In #280, we added more debug info to reveal that this is a silent worker process crash with an error code In #281 @tomMoral could trigger a seemingly related exception when trying to write a reproducer that only imports numpy. This time, instead of the [INFO/SpawnProcess-5] process exiting with exitcode 1
Traceback (most recent call last):
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\concurrent\futures\process.py", line 237, in _process_worker
call_item = call_queue.get(block=True)
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "D:\a\1\s\tests\test_minimal_reproducer.py", line 4, in <module>
import numpy as np # noqa: F401
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\__init__.py", line 143, in <module>
from . import lib
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\__init__.py", line 25, in <module>
from .index_tricks import *
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\index_tricks.py", line 11, in <module>
import numpy.matrixlib as matrixlib
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\__init__.py", line 4, in <module>
from .defmatrix import *
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\defmatrix.py", line 11, in <module>
from numpy.linalg import matrix_power
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\linalg\__init__.py", line 73, in <module>
from .linalg import *
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 786, in exec_module
File "<frozen importlib._bootstrap_external>", line 881, in get_code
File "<frozen importlib._bootstrap_external>", line 980, in get_data
MemoryError
[INFO/SpawnProcess-19] process exiting with exitcode 1
-------------------------- Captured stderr teardown ---------------------------
Traceback (most recent call last):
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\concurrent\futures\process.py", line 237, in _process_worker
call_item = call_queue.get(block=True)
File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "D:\a\1\s\tests\test_minimal_reproducer.py", line 4, in <module>
import numpy as np # noqa: F401
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\__init__.py", line 143, in <module>
from . import lib
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\__init__.py", line 25, in <module>
from .index_tricks import *
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\index_tricks.py", line 11, in <module>
import numpy.matrixlib as matrixlib
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\__init__.py", line 4, in <module>
from .defmatrix import *
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\defmatrix.py", line 11, in <module>
from numpy.linalg import matrix_power
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\linalg\__init__.py", line 73, in <module>
from .linalg import *
File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\linalg\linalg.py", line 33, in <module>
from numpy.linalg import lapack_lite, _umath_linalg
ImportError: DLL load failed while importing _umath_linalg: The paging file is too small for this operation to complete.
[INFO/SpawnProcess-11] process exiting with exitcode 1 So it seems that the paging file size configation is too small on the CI instances. |
One possible solution would be to reuse the |
In the mean time #282 is a workaround (add log + skip failing test when nprocs is too large on windows). |
#282 was merged but let's keep this issue open if a volunteer would like to implement a proper fix as described above. |
As observed on #276, we get a rare random worker process crash when calling
executor.map
withcheck_pids_exist_then_sleep
on2 * max_workers
.The logs show that there is an intentionnally terminated process that is one of the newly created process. However I am not sure if this process termination is a cause or a consequence of the
TerminatedWorkerError
.If it's a consequence, I don't understand why we do not see similar messages for the other workers in the pool. If it's the cause, I do not understand what is calling the
kill_workers
method that logs theterminate process
message.The text was updated successfully, but these errors were encountered: