Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestExecutorDeadLock.test_crash_races[13] crash #279

Open
ogrisel opened this issue Jan 7, 2021 · 6 comments
Open

TestExecutorDeadLock.test_crash_races[13] crash #279

ogrisel opened this issue Jan 7, 2021 · 6 comments

Comments

@ogrisel
Copy link
Collaborator

ogrisel commented Jan 7, 2021

As observed on #276, we get a rare random worker process crash when calling executor.map with check_pids_exist_then_sleep on 2 * max_workers.

The logs show that there is an intentionnally terminated process that is one of the newly created process. However I am not sure if this process termination is a cause or a consequence of the TerminatedWorkerError.

If it's a consequence, I don't understand why we do not see similar messages for the other workers in the pool. If it's the cause, I do not understand what is calling the kill_workers method that logs the terminate process message.

2021-01-07T09:23:33.9919960Z ================================== FAILURES ===================================
2021-01-07T09:23:33.9921864Z __________________ TestExecutorDeadLock.test_crash_races[13] __________________
2021-01-07T09:23:33.9922589Z 
2021-01-07T09:23:33.9923140Z self = <tests.test_reusable_executor.TestExecutorDeadLock object at 0x000001C4FEAD0760>
2021-01-07T09:23:33.9924038Z n_proc = 13
2021-01-07T09:23:33.9926439Z 
2021-01-07T09:23:33.9927498Z     @pytest.mark.parametrize("n_proc", [1, 2, 5, 13])
2021-01-07T09:23:33.9928264Z     def test_crash_races(self, n_proc):
2021-01-07T09:23:33.9929740Z         """Test the race conditions in reusable_executor crash handling"""
2021-01-07T09:23:33.9932132Z         # Test for external crash signal comming from neighbor
2021-01-07T09:23:33.9933018Z         # with various race setup
2021-01-07T09:23:33.9933938Z         executor = get_reusable_executor(max_workers=n_proc, timeout=None)
2021-01-07T09:23:33.9934875Z         executor.map(id, range(n_proc))  # trigger the creation of the workers
2021-01-07T09:23:33.9935642Z         pids = list(executor._processes.keys())
2021-01-07T09:23:33.9936192Z         assert len(pids) == n_proc
2021-01-07T09:23:33.9937150Z         assert None not in pids
2021-01-07T09:23:33.9938234Z         res = executor.map(check_pids_exist_then_sleep,
2021-01-07T09:23:33.9939272Z                            [(.0001 * (j // 2), pids)
2021-01-07T09:23:33.9940288Z                             for j in range(2 * n_proc)])
2021-01-07T09:23:33.9941141Z >       assert all(list(res))
2021-01-07T09:23:33.9941768Z 
2021-01-07T09:23:33.9942305Z executor   = <loky.reusable_executor._ReusablePoolExecutor object at 0x000001C4FEA161C0>
2021-01-07T09:23:33.9948141Z n_proc     = 13
2021-01-07T09:23:33.9949095Z pids       = [4856, 2592, 1880, 5896, 5964, 1260, ...]
2021-01-07T09:23:33.9952635Z res        = <generator object _chain_from_iterable_of_lists at 0x000001C4FEB0B6D0>
2021-01-07T09:23:33.9953643Z self       = <tests.test_reusable_executor.TestExecutorDeadLock object at 0x000001C4FEAD0760>
2021-01-07T09:23:33.9954326Z 
2021-01-07T09:23:33.9957403Z tests\test_reusable_executor.py:358: 
2021-01-07T09:23:33.9959052Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2021-01-07T09:23:33.9960084Z loky\process_executor.py:865: in _chain_from_iterable_of_lists
2021-01-07T09:23:33.9960987Z     for element in iterable:
2021-01-07T09:23:33.9961891Z         element    = []
2021-01-07T09:23:33.9962838Z         iterable   = <generator object Executor.map.<locals>.result_iterator at 0x000001C4FEB0BA50>
2021-01-07T09:23:33.9963964Z c:\hostedtoolcache\windows\python\3.8.6\x64\lib\concurrent\futures\_base.py:611: in result_iterator
2021-01-07T09:23:33.9967707Z     yield fs.pop().result()
2021-01-07T09:23:33.9969194Z         fs         = [<Future at 0x1c4fea84550 state=finished returned list>, <Future at 0x1c4fea84520 state=finished returned list>, <Futu...>, <Future at 0x1c4fea84610 state=finished returned list>, <Future at 0x1c4fea84ee0 state=finished returned list>, ...]
2021-01-07T09:23:33.9972769Z         future     = <Future at 0x1c4feafa580 state=finished raised TerminatedWorkerError>
2021-01-07T09:23:33.9977720Z         timeout    = None
2021-01-07T09:23:33.9978661Z c:\hostedtoolcache\windows\python\3.8.6\x64\lib\concurrent\futures\_base.py:439: in result
2021-01-07T09:23:33.9979828Z     return self.__get_result()
2021-01-07T09:23:33.9981070Z         self       = <Future at 0x1c4feafa910 state=finished raised TerminatedWorkerError>
2021-01-07T09:23:33.9983655Z         timeout    = None
2021-01-07T09:23:33.9986323Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2021-01-07T09:23:34.0007552Z 
2021-01-07T09:23:34.0008310Z self = <Future at 0x1c4feafa910 state=finished raised TerminatedWorkerError>
2021-01-07T09:23:34.0008888Z 
2021-01-07T09:23:34.0009379Z     def __get_result(self):
2021-01-07T09:23:34.0041477Z         if self._exception:
2021-01-07T09:23:34.0042160Z >           raise self._exception
2021-01-07T09:23:34.0044096Z E           loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
2021-01-07T09:23:34.0045403Z 
2021-01-07T09:23:34.0046079Z self       = <Future at 0x1c4feafa910 state=finished raised TerminatedWorkerError>
2021-01-07T09:23:34.0046634Z 
2021-01-07T09:23:34.0047888Z c:\hostedtoolcache\windows\python\3.8.6\x64\lib\concurrent\futures\_base.py:388: TerminatedWorkerError
2021-01-07T09:23:34.0048997Z ---------------------------- Captured stderr setup ----------------------------
2021-01-07T09:23:34.0050080Z [DEBUG:MainProcess:MainThread] Reusing existing executor with max_workers=2.
2021-01-07T09:23:34.0050809Z 
2021-01-07T09:23:34.0051363Z ---------------------------- Captured stderr call -----------------------------
2021-01-07T09:23:34.0052759Z [DEBUG:MainProcess:MainThread] Creating a new executor with max_workers=13 as the previous instance cannot be reused (arguments have changed).
2021-01-07T09:23:34.0053476Z 
2021-01-07T09:23:34.0054147Z [DEBUG:MainProcess:MainThread] shutting down executor <loky.reusable_executor._ReusablePoolExecutor object at 0x000001C4FEAF7EE0>
2021-01-07T09:23:34.0056112Z 
2021-01-07T09:23:34.0056683Z [DEBUG:MainProcess:Thread-461] closing call_queue
2021-01-07T09:23:34.0057252Z 
2021-01-07T09:23:34.0058136Z [DEBUG:MainProcess:Thread-461] telling queue thread to quit
2021-01-07T09:23:34.0058933Z 
2021-01-07T09:23:34.0059552Z [DEBUG:MainProcess:Thread-461] Queue.join_thread()
2021-01-07T09:23:34.0060085Z 
2021-01-07T09:23:34.0060621Z [DEBUG:MainProcess:Thread-461] closing result_queue
2021-01-07T09:23:34.0061231Z 
2021-01-07T09:23:34.0061770Z [DEBUG:MainProcess:Thread-461] closing thread_wakeup
2021-01-07T09:23:34.0062304Z 
2021-01-07T09:23:34.0063189Z [DEBUG:MainProcess:Thread-461] joining processes
2021-01-07T09:23:34.0063866Z 
2021-01-07T09:23:34.0064439Z [INFO:LokyProcess-703:MainThread] Shutting down worker on sentinel
2021-01-07T09:23:34.0065031Z 
2021-01-07T09:23:34.0066464Z [DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
2021-01-07T09:23:34.0067033Z 
2021-01-07T09:23:34.0067975Z [INFO:LokyProcess-703:MainThread] process shutting down
2021-01-07T09:23:34.0068617Z 
2021-01-07T09:23:34.0069312Z [DEBUG:LokyProcess-703:MainThread] running all "atexit" finalizers with priority >= 0
2021-01-07T09:23:34.0069909Z 
2021-01-07T09:23:34.0070511Z [DEBUG:LokyProcess-703:MainThread] running the remaining "atexit" finalizers
2021-01-07T09:23:34.0071201Z 
2021-01-07T09:23:34.0071777Z [INFO:LokyProcess-703:MainThread] process exiting with exitcode 0
2021-01-07T09:23:34.0072330Z 
2021-01-07T09:23:34.0073246Z [DEBUG:LokyProcess-704:MainThread] recreated blocker with handle 48
2021-01-07T09:23:34.0073960Z 
2021-01-07T09:23:34.0074716Z [DEBUG:LokyProcess-704:MainThread] recreated blocker with handle 52
2021-01-07T09:23:34.0075941Z 
2021-01-07T09:23:34.0077136Z [DEBUG:LokyProcess-704:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0077762Z 
2021-01-07T09:23:34.0078369Z [DEBUG:LokyProcess-704:MainThread] recreated blocker with handle 72
2021-01-07T09:23:34.0078903Z 
2021-01-07T09:23:34.0079793Z [DEBUG:LokyProcess-704:MainThread] recreated blocker with handle 76
2021-01-07T09:23:34.0080548Z 
2021-01-07T09:23:34.0081103Z [DEBUG:LokyProcess-704:MainThread] recreated blocker with handle 80
2021-01-07T09:23:34.0081719Z 
2021-01-07T09:23:34.0082282Z [INFO:LokyProcess-704:MainThread] child process calling self.run()
2021-01-07T09:23:34.0082898Z 
2021-01-07T09:23:34.0083674Z [DEBUG:LokyProcess-704:MainThread] Worker started with timeout=10
2021-01-07T09:23:34.0084292Z 
2021-01-07T09:23:34.0084911Z [INFO:LokyProcess-704:MainThread] Shutting down worker on sentinel
2021-01-07T09:23:34.0086332Z 
2021-01-07T09:23:34.0087077Z [INFO:LokyProcess-704:MainThread] process shutting down
2021-01-07T09:23:34.0087712Z 
2021-01-07T09:23:34.0088374Z [DEBUG:LokyProcess-704:MainThread] running all "atexit" finalizers with priority >= 0
2021-01-07T09:23:34.0089082Z 
2021-01-07T09:23:34.0089786Z [DEBUG:LokyProcess-704:MainThread] running the remaining "atexit" finalizers
2021-01-07T09:23:34.0090272Z 
2021-01-07T09:23:34.0090939Z [INFO:LokyProcess-704:MainThread] process exiting with exitcode 0
2021-01-07T09:23:34.0091490Z 
2021-01-07T09:23:34.0092162Z [DEBUG:MainProcess:Thread-461] executor management thread clean shutdown of worker processes: [7440, 3112]
2021-01-07T09:23:34.0092880Z 
2021-01-07T09:23:34.0093458Z [DEBUG:MainProcess:MainThread] Create a executor with max_workers=13.
2021-01-07T09:23:34.0094022Z 
2021-01-07T09:23:34.0094949Z [DEBUG:MainProcess:MainThread] created semlock with handle 10036
2021-01-07T09:23:34.0096531Z 
2021-01-07T09:23:34.0097654Z [DEBUG:MainProcess:MainThread] created semlock with handle 9548
2021-01-07T09:23:34.0098316Z 
2021-01-07T09:23:34.0099019Z [DEBUG:MainProcess:MainThread] created semlock with handle 6336
2021-01-07T09:23:34.0099642Z 
2021-01-07T09:23:34.0100158Z [DEBUG:MainProcess:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0100855Z 
2021-01-07T09:23:34.0101788Z [DEBUG:MainProcess:MainThread] created semlock with handle 10060
2021-01-07T09:23:34.0102454Z 
2021-01-07T09:23:34.0103020Z [DEBUG:MainProcess:MainThread] ProcessPoolExecutor is setup
2021-01-07T09:23:34.0103676Z 
2021-01-07T09:23:34.0104663Z [DEBUG:MainProcess:MainThread] created semlock with handle 4952
2021-01-07T09:23:34.0106619Z 
2021-01-07T09:23:34.0107616Z [DEBUG:MainProcess:MainThread] created semlock with handle 6264
2021-01-07T09:23:34.0108385Z 
2021-01-07T09:23:34.0109582Z [DEBUG:MainProcess:MainThread] created semlock with handle 7232
2021-01-07T09:23:34.0110202Z 
2021-01-07T09:23:34.0110968Z [DEBUG:MainProcess:MainThread] created semlock with handle 6392
2021-01-07T09:23:34.0111653Z 
2021-01-07T09:23:34.0112425Z [DEBUG:MainProcess:MainThread] created semlock with handle 5024
2021-01-07T09:23:34.0113062Z 
2021-01-07T09:23:34.0114939Z [DEBUG:MainProcess:MainThread] created semlock with handle 6332
2021-01-07T09:23:34.0116325Z 
2021-01-07T09:23:34.0117293Z [DEBUG:MainProcess:MainThread] created semlock with handle 4516
2021-01-07T09:23:34.0117977Z 
2021-01-07T09:23:34.0118506Z [DEBUG:MainProcess:MainThread] created semlock with handle 9172
2021-01-07T09:23:34.0118853Z 
2021-01-07T09:23:34.0119145Z [DEBUG:MainProcess:MainThread] created semlock with handle 10128
2021-01-07T09:23:34.0119587Z 
2021-01-07T09:23:34.0120542Z [DEBUG:MainProcess:MainThread] created semlock with handle 6548
2021-01-07T09:23:34.0121258Z 
2021-01-07T09:23:34.0122053Z [DEBUG:MainProcess:MainThread] created semlock with handle 7340
2021-01-07T09:23:34.0122781Z 
2021-01-07T09:23:34.0123718Z [DEBUG:MainProcess:MainThread] created semlock with handle 6048
2021-01-07T09:23:34.0124448Z 
2021-01-07T09:23:34.0126299Z [DEBUG:MainProcess:MainThread] created semlock with handle 8768
2021-01-07T09:23:34.0127325Z 
2021-01-07T09:23:34.0130156Z [DEBUG:MainProcess:MainThread] Adjust process count : {4856: <LokyProcess name='LokyProcess-705' pid=4856 parent=7968 started>, 2592: <LokyProcess name='LokyProcess-706' pid=2592 parent=7968 started>, 1880: <LokyProcess name='LokyProcess-707' pid=1880 parent=7968 started>, 5896: <LokyProcess name='LokyProcess-708' pid=5896 parent=7968 started>, 5964: <LokyProcess name='LokyProcess-709' pid=5964 parent=7968 started>, 1260: <LokyProcess name='LokyProcess-710' pid=1260 parent=7968 started>, 5676: <LokyProcess name='LokyProcess-711' pid=5676 parent=7968 started>, 7928: <LokyProcess name='LokyProcess-712' pid=7928 parent=7968 started>, 2308: <LokyProcess name='LokyProcess-713' pid=2308 parent=7968 started>, 6020: <LokyProcess name='LokyProcess-714' pid=6020 parent=7968 started>, 3492: <LokyProcess name='LokyProcess-715' pid=3492 parent=7968 started>, 6108: <LokyProcess name='LokyProcess-716' pid=6108 parent=7968 started>, 7644: <LokyProcess name='LokyProcess-717' pid=7644 parent=7968 started>}
2021-01-07T09:23:34.0132902Z 
2021-01-07T09:23:34.0133625Z [DEBUG:MainProcess:MainThread] _start_executor_manager_thread called
2021-01-07T09:23:34.0134392Z 
2021-01-07T09:23:34.0136184Z [DEBUG:MainProcess:Thread-463] Queue._start_thread()
2021-01-07T09:23:34.0136996Z 
2021-01-07T09:23:34.0138510Z [DEBUG:MainProcess:Thread-463] doing self._thread.start()
2021-01-07T09:23:34.0139281Z 
2021-01-07T09:23:34.0140097Z [DEBUG:MainProcess:QueueFeederThread] starting thread to feed data to pipe
2021-01-07T09:23:34.0140750Z 
2021-01-07T09:23:34.0141685Z [DEBUG:MainProcess:Thread-463] ... done self._thread.start()
2021-01-07T09:23:34.0143706Z 
2021-01-07T09:23:34.0145720Z [DEBUG:LokyProcess-707:MainThread] recreated blocker with handle 56
2021-01-07T09:23:34.0146396Z 
2021-01-07T09:23:34.0147198Z [DEBUG:LokyProcess-707:MainThread] recreated blocker with handle 64
2021-01-07T09:23:34.0147879Z 
2021-01-07T09:23:34.0148733Z [DEBUG:LokyProcess-707:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0149448Z 
2021-01-07T09:23:34.0150347Z [DEBUG:LokyProcess-707:MainThread] recreated blocker with handle 72
2021-01-07T09:23:34.0150964Z 
2021-01-07T09:23:34.0151770Z [DEBUG:LokyProcess-707:MainThread] recreated blocker with handle 80
2021-01-07T09:23:34.0152518Z 
2021-01-07T09:23:34.0153332Z [DEBUG:LokyProcess-707:MainThread] recreated blocker with handle 84
2021-01-07T09:23:34.0153951Z 
2021-01-07T09:23:34.0154862Z [INFO:LokyProcess-707:MainThread] child process calling self.run()
2021-01-07T09:23:34.0156433Z 
2021-01-07T09:23:34.0157897Z [DEBUG:LokyProcess-707:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0158569Z 
2021-01-07T09:23:34.0159348Z [DEBUG:LokyProcess-706:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0160130Z 
2021-01-07T09:23:34.0160925Z [DEBUG:LokyProcess-706:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0161585Z 
2021-01-07T09:23:34.0162525Z [DEBUG:LokyProcess-706:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0163213Z 
2021-01-07T09:23:34.0164285Z [DEBUG:LokyProcess-706:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0164969Z 
2021-01-07T09:23:34.0165944Z [DEBUG:LokyProcess-706:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0166891Z 
2021-01-07T09:23:34.0167492Z [DEBUG:LokyProcess-706:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0168010Z 
2021-01-07T09:23:34.0168974Z [INFO:LokyProcess-706:MainThread] child process calling self.run()
2021-01-07T09:23:34.0169680Z 
2021-01-07T09:23:34.0170500Z [DEBUG:LokyProcess-706:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0171253Z 
2021-01-07T09:23:34.0172082Z [DEBUG:LokyProcess-708:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0172815Z 
2021-01-07T09:23:34.0173654Z [DEBUG:LokyProcess-708:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0174319Z 
2021-01-07T09:23:34.0175188Z [DEBUG:LokyProcess-708:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0175813Z 
2021-01-07T09:23:34.0176771Z [DEBUG:LokyProcess-708:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0177891Z 
2021-01-07T09:23:34.0178734Z [DEBUG:LokyProcess-708:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0179359Z 
2021-01-07T09:23:34.0180380Z [DEBUG:LokyProcess-708:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0181233Z 
2021-01-07T09:23:34.0182126Z [DEBUG:LokyProcess-712:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0182791Z 
2021-01-07T09:23:34.0183607Z [DEBUG:LokyProcess-712:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0184803Z 
2021-01-07T09:23:34.0185665Z [DEBUG:LokyProcess-712:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0186328Z 
2021-01-07T09:23:34.0187354Z [DEBUG:LokyProcess-712:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0188076Z 
2021-01-07T09:23:34.0189091Z [DEBUG:LokyProcess-712:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0189813Z 
2021-01-07T09:23:34.0190637Z [DEBUG:LokyProcess-712:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0191380Z 
2021-01-07T09:23:34.0192284Z [INFO:LokyProcess-708:MainThread] child process calling self.run()
2021-01-07T09:23:34.0192968Z 
2021-01-07T09:23:34.0193864Z [DEBUG:LokyProcess-708:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0194508Z 
2021-01-07T09:23:34.0195403Z [INFO:LokyProcess-712:MainThread] child process calling self.run()
2021-01-07T09:23:34.0196467Z 
2021-01-07T09:23:34.0197459Z [DEBUG:LokyProcess-712:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0198134Z 
2021-01-07T09:23:34.0199069Z [DEBUG:LokyProcess-713:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0199937Z 
2021-01-07T09:23:34.0200579Z [DEBUG:LokyProcess-713:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0201256Z 
2021-01-07T09:23:34.0202062Z [DEBUG:LokyProcess-713:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0202738Z 
2021-01-07T09:23:34.0203587Z [DEBUG:LokyProcess-713:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0204306Z 
2021-01-07T09:23:34.0205731Z [DEBUG:LokyProcess-713:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0206418Z 
2021-01-07T09:23:34.0207349Z [DEBUG:LokyProcess-713:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0208062Z 
2021-01-07T09:23:34.0208844Z [INFO:LokyProcess-713:MainThread] child process calling self.run()
2021-01-07T09:23:34.0209790Z 
2021-01-07T09:23:34.0211029Z [DEBUG:LokyProcess-713:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0211853Z 
2021-01-07T09:23:34.0212748Z [DEBUG:LokyProcess-715:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0213385Z 
2021-01-07T09:23:34.0214191Z [DEBUG:LokyProcess-715:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0214862Z 
2021-01-07T09:23:34.0215679Z [DEBUG:LokyProcess-715:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0216314Z 
2021-01-07T09:23:34.0217386Z [DEBUG:LokyProcess-715:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0218159Z 
2021-01-07T09:23:34.0219051Z [DEBUG:LokyProcess-715:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0219914Z 
2021-01-07T09:23:34.0220722Z [DEBUG:LokyProcess-715:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0221486Z 
2021-01-07T09:23:34.0222071Z [INFO:LokyProcess-715:MainThread] child process calling self.run()
2021-01-07T09:23:34.0222582Z 
2021-01-07T09:23:34.0223608Z [DEBUG:LokyProcess-715:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0224321Z 
2021-01-07T09:23:34.0225232Z [DEBUG:LokyProcess-710:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0225889Z 
2021-01-07T09:23:34.0226896Z [DEBUG:LokyProcess-710:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0227553Z 
2021-01-07T09:23:34.0228371Z [DEBUG:LokyProcess-710:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0229001Z 
2021-01-07T09:23:34.0229974Z [DEBUG:LokyProcess-710:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0230641Z 
2021-01-07T09:23:34.0231495Z [DEBUG:LokyProcess-710:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0232185Z 
2021-01-07T09:23:34.0232996Z [DEBUG:LokyProcess-710:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0233842Z 
2021-01-07T09:23:34.0234671Z [INFO:LokyProcess-710:MainThread] child process calling self.run()
2021-01-07T09:23:34.0235349Z 
2021-01-07T09:23:34.0236247Z [DEBUG:LokyProcess-710:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0237097Z 
2021-01-07T09:23:34.0238016Z [DEBUG:LokyProcess-705:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0238750Z 
2021-01-07T09:23:34.0239533Z [DEBUG:LokyProcess-705:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0240334Z 
2021-01-07T09:23:34.0241242Z [DEBUG:LokyProcess-705:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0241922Z 
2021-01-07T09:23:34.0242801Z [DEBUG:LokyProcess-705:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0243501Z 
2021-01-07T09:23:34.0244556Z [DEBUG:LokyProcess-705:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0249877Z 
2021-01-07T09:23:34.0250791Z [DEBUG:LokyProcess-705:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0251538Z 
2021-01-07T09:23:34.0252521Z [INFO:LokyProcess-705:MainThread] child process calling self.run()
2021-01-07T09:23:34.0253213Z 
2021-01-07T09:23:34.0254029Z [DEBUG:LokyProcess-705:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0254824Z 
2021-01-07T09:23:34.0255819Z [DEBUG:LokyProcess-714:MainThread] recreated blocker with handle 44
2021-01-07T09:23:34.0256754Z 
2021-01-07T09:23:34.0257533Z [DEBUG:LokyProcess-714:MainThread] recreated blocker with handle 48
2021-01-07T09:23:34.0258290Z 
2021-01-07T09:23:34.0259366Z [DEBUG:LokyProcess-714:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0259963Z 
2021-01-07T09:23:34.0260779Z [DEBUG:LokyProcess-714:MainThread] recreated blocker with handle 72
2021-01-07T09:23:34.0261518Z 
2021-01-07T09:23:34.0262102Z [DEBUG:LokyProcess-714:MainThread] recreated blocker with handle 76
2021-01-07T09:23:34.0264237Z 
2021-01-07T09:23:34.0265327Z [DEBUG:LokyProcess-714:MainThread] recreated blocker with handle 80
2021-01-07T09:23:34.0267006Z 
2021-01-07T09:23:34.0267925Z [INFO:LokyProcess-714:MainThread] child process calling self.run()
2021-01-07T09:23:34.0268603Z 
2021-01-07T09:23:34.0269426Z [DEBUG:LokyProcess-714:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0270332Z 
2021-01-07T09:23:34.0271563Z [DEBUG:LokyProcess-717:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0272085Z 
2021-01-07T09:23:34.0273413Z [DEBUG:LokyProcess-717:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0273991Z 
2021-01-07T09:23:34.0274809Z [DEBUG:LokyProcess-717:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0275529Z 
2021-01-07T09:23:34.0276357Z [DEBUG:LokyProcess-717:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0277481Z 
2021-01-07T09:23:34.0278336Z [DEBUG:LokyProcess-717:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0279171Z 
2021-01-07T09:23:34.0280075Z [DEBUG:LokyProcess-717:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0280651Z 
2021-01-07T09:23:34.0281490Z [DEBUG:LokyProcess-709:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0282162Z 
2021-01-07T09:23:34.0283177Z [DEBUG:LokyProcess-709:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0283799Z 
2021-01-07T09:23:34.0284757Z [DEBUG:LokyProcess-709:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0285429Z 
2021-01-07T09:23:34.0286355Z [DEBUG:LokyProcess-709:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0287300Z 
2021-01-07T09:23:34.0288118Z [DEBUG:LokyProcess-709:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0288816Z 
2021-01-07T09:23:34.0289687Z [DEBUG:LokyProcess-709:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0290300Z 
2021-01-07T09:23:34.0291247Z [DEBUG:LokyProcess-716:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0291990Z 
2021-01-07T09:23:34.0292899Z [DEBUG:LokyProcess-716:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0293694Z 
2021-01-07T09:23:34.0296893Z [DEBUG:LokyProcess-716:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0297609Z 
2021-01-07T09:23:34.0298544Z [DEBUG:LokyProcess-716:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0299298Z 
2021-01-07T09:23:34.0301646Z [DEBUG:LokyProcess-716:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0303909Z 
2021-01-07T09:23:34.0304729Z [DEBUG:LokyProcess-716:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0305485Z 
2021-01-07T09:23:34.0306341Z [DEBUG:LokyProcess-711:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0307414Z 
2021-01-07T09:23:34.0308914Z [DEBUG:LokyProcess-711:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0309457Z 
2021-01-07T09:23:34.0311541Z [DEBUG:LokyProcess-711:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0312234Z 
2021-01-07T09:23:34.0313105Z [DEBUG:LokyProcess-711:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0313832Z 
2021-01-07T09:23:34.0314675Z [DEBUG:LokyProcess-711:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0315291Z 
2021-01-07T09:23:34.0316362Z [DEBUG:LokyProcess-711:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0317358Z 
2021-01-07T09:23:34.0318275Z [INFO:LokyProcess-709:MainThread] child process calling self.run()
2021-01-07T09:23:34.0318901Z 
2021-01-07T09:23:34.0321065Z [DEBUG:LokyProcess-709:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0321784Z 
2021-01-07T09:23:34.0322617Z [INFO:LokyProcess-717:MainThread] child process calling self.run()
2021-01-07T09:23:34.0323274Z 
2021-01-07T09:23:34.0325383Z [DEBUG:LokyProcess-717:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0326148Z 
2021-01-07T09:23:34.0326957Z [INFO:LokyProcess-716:MainThread] child process calling self.run()
2021-01-07T09:23:34.0327606Z 
2021-01-07T09:23:34.0328510Z [DEBUG:LokyProcess-716:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0329300Z 
2021-01-07T09:23:34.0330994Z [INFO:LokyProcess-711:MainThread] child process calling self.run()
2021-01-07T09:23:34.0331616Z 
2021-01-07T09:23:34.0332548Z [DEBUG:LokyProcess-711:MainThread] Worker started with timeout=None
2021-01-07T09:23:34.0334108Z 
2021-01-07T09:23:34.0334937Z [DEBUG:MainProcess:Thread-463] terminate process LokyProcess-717
2021-01-07T09:23:34.0335673Z 
2021-01-07T09:23:34.0336520Z -------------------------- Captured stderr teardown ---------------------------
2021-01-07T09:23:34.0338083Z [DEBUG:MainProcess:MainThread] Creating a new executor with max_workers=2 as the previous instance cannot be reused (broken).
2021-01-07T09:23:34.0338802Z 
2021-01-07T09:23:34.0339776Z [DEBUG:MainProcess:MainThread] shutting down executor <loky.reusable_executor._ReusablePoolExecutor object at 0x000001C4FEA161C0>
2021-01-07T09:23:34.0341273Z 
2021-01-07T09:23:34.0341900Z [DEBUG:MainProcess:MainThread] Create a executor with max_workers=2.
2021-01-07T09:23:34.0342591Z 
2021-01-07T09:23:34.0343461Z [DEBUG:MainProcess:MainThread] created semlock with handle 10036
2021-01-07T09:23:34.0344148Z 
2021-01-07T09:23:34.0363474Z [DEBUG:MainProcess:MainThread] created semlock with handle 9548
2021-01-07T09:23:34.0378628Z 
2021-01-07T09:23:34.0379509Z [DEBUG:MainProcess:MainThread] created semlock with handle 7492
2021-01-07T09:23:34.0395919Z 
2021-01-07T09:23:34.0397101Z [DEBUG:MainProcess:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0397760Z 
2021-01-07T09:23:34.0398714Z [DEBUG:MainProcess:MainThread] created semlock with handle 6260
2021-01-07T09:23:34.0399372Z 
2021-01-07T09:23:34.0400620Z [DEBUG:MainProcess:MainThread] ProcessPoolExecutor is setup
2021-01-07T09:23:34.0401445Z 
2021-01-07T09:23:34.0402269Z [DEBUG:MainProcess:MainThread] created semlock with handle 5768
2021-01-07T09:23:34.0403031Z 
2021-01-07T09:23:34.0403866Z [DEBUG:MainProcess:MainThread] created semlock with handle 9980
2021-01-07T09:23:34.0404491Z 
2021-01-07T09:23:34.0405981Z [DEBUG:MainProcess:MainThread] Adjust process count : {4916: <LokyProcess name='LokyProcess-718' pid=4916 parent=7968 started>, 5540: <LokyProcess name='LokyProcess-719' pid=5540 parent=7968 started>}
2021-01-07T09:23:34.0407490Z 
2021-01-07T09:23:34.0408466Z [DEBUG:MainProcess:MainThread] _start_executor_manager_thread called
2021-01-07T09:23:34.0409350Z 
2021-01-07T09:23:34.0410559Z [DEBUG:MainProcess:Thread-464] Queue._start_thread()
2021-01-07T09:23:34.0411164Z 
2021-01-07T09:23:34.0411985Z [DEBUG:MainProcess:Thread-464] doing self._thread.start()
2021-01-07T09:23:34.0412704Z 
2021-01-07T09:23:34.0414728Z [DEBUG:MainProcess:QueueFeederThread] starting thread to feed data to pipe
2021-01-07T09:23:34.0415479Z 
2021-01-07T09:23:34.0416453Z [DEBUG:MainProcess:Thread-464] ... done self._thread.start()
2021-01-07T09:23:34.0417435Z 
2021-01-07T09:23:34.0418315Z [DEBUG:LokyProcess-719:MainThread] recreated blocker with handle 8
2021-01-07T09:23:34.0419036Z 
2021-01-07T09:23:34.0420027Z [DEBUG:LokyProcess-719:MainThread] recreated blocker with handle 12
2021-01-07T09:23:34.0420773Z 
2021-01-07T09:23:34.0421581Z [DEBUG:LokyProcess-719:MainThread] Queue._after_fork()
2021-01-07T09:23:34.0422497Z 
2021-01-07T09:23:34.0423368Z [DEBUG:LokyProcess-719:MainThread] recreated blocker with handle 16
2021-01-07T09:23:34.0424140Z 
2021-01-07T09:23:34.0425156Z [DEBUG:LokyProcess-719:MainThread] recreated blocker with handle 20
2021-01-07T09:23:34.0425876Z 
2021-01-07T09:23:34.0427446Z [DEBUG:LokyProcess-719:MainThread] recreated blocker with handle 24
2021-01-07T09:23:34.0428016Z 
2021-01-07T09:23:34.0429001Z [INFO:LokyProcess-719:MainThread] child process calling self.run()
2021-01-07T09:23:34.0429671Z 
2021-01-07T09:23:34.0430596Z [DEBUG:LokyProcess-719:MainThread] Worker started with timeout=10
2021-01-07T09:23:34.0431183Z 
2021-01-07T09:23:34.0432063Z ============================== warnings summary ===============================
2021-01-07T09:23:34.0433122Z tests/test_process_executor_loky.py::TestsProcessPoolLokyExecutor::test_worker_timeout
2021-01-07T09:23:34.0434189Z tests/test_process_executor_spawn.py::TestsProcessPoolSpawnExecutor::test_worker_timeout
2021-01-07T09:23:34.0435651Z   D:\a\1\s\loky\process_executor.py:688: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
2021-01-07T09:23:34.0437176Z     warnings.warn(
2021-01-07T09:23:34.0437799Z 
2021-01-07T09:23:34.0438651Z tests/test_reusable_executor.py::TestExecutorDeadLock::test_queue_full_deadlock
2021-01-07T09:23:34.0439959Z   D:\a\1\s\loky\reusable_executor.py:218: UserWarning: Trying to resize an executor with running jobs: waiting for jobs completion before resizing.
2021-01-07T09:23:34.0441271Z     warnings.warn("Trying to resize an executor with running jobs: "
2021-01-07T09:23:34.0441970Z 
2021-01-07T09:23:34.0442904Z -- Docs: https://docs.pytest.org/en/stable/warnings.html
2021-01-07T09:23:34.0443871Z ----------------- generated xml file: D:\a\1\s\test-data.xml ------------------
2021-01-07T09:23:34.0445891Z =========================== short test summary info ===========================
2021-01-07T09:23:34.0447075Z FAILED tests/test_reusable_executor.py::TestExecutorDeadLock::test_crash_races[13]
2021-01-07T09:23:34.0448202Z ====== 1 failed, 230 passed, 16 skipped, 3 warnings in 454.87s (0:07:34) ======
2021-01-07T09:23:34.0449100Z [INFO:MainProcess:MainThread] process shutting down
2021-01-07T09:23:34.0450075Z [DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
2021-01-07T09:23:34.0451834Z [DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up executor_manager_thread []
2021-01-07T09:23:34.0453269Z [DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers
2021-01-07T09:23:34.1638785Z ERROR: InvocationError for command 'D:\a\1\s\.tox\py38\Scripts\py.test.EXE' -vl --timeout=60 --maxfail=5 --skip-high-memory --junitxml=test-data.xml (exited with code 1)
2021-01-07T09:23:34.1653029Z ___________________________________ summary ___________________________________
2021-01-07T09:23:34.1655921Z ERROR:   py38: commands failed
2021-01-07T09:23:34.3747817Z ##[error]Cmd.exe exited with code '1'.
2021-01-07T09:23:34.4586395Z ##[section]Finishing: Test loky
@ogrisel
Copy link
Collaborator Author

ogrisel commented Jan 7, 2021

Hum, it's probably terminate_broken that calls kill_workers on the only worker that could not be shutdown cleanly after detecting the problem on the pool. Maybe we have a race condition when spawning the worker?

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jan 7, 2021

The same problem happened in the #278 PR. So the frequency of this problem might be higher than expected.

Again it's on Python 3.8 on windows.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jan 8, 2021

In #280, we added more debug info to reveal that this is a silent worker process crash with an error code 3221225477 or 0xc0000005 in hexadecimal which is the Windows code for an access violation when calling the _ForkingPickler.loads to unpickle the function to execute on the worker. The unpickling step is triggering a bunch of imports from the loky tests.test_reusable_executor module (which is normal). Those concurrent imports apparently cause the crash of the worker process on this windows machine.

In #281 @tomMoral could trigger a seemingly related exception when trying to write a reproducer that only imports numpy. This time, instead of the 3221225477 exit code, one gets the following exceptions (also when calling _ForkingPickler.loads to unpickle the function to execute on the worker) :

[INFO/SpawnProcess-5] process exiting with exitcode 1
Traceback (most recent call last):
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\concurrent\futures\process.py", line 237, in _process_worker
    call_item = call_queue.get(block=True)
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "D:\a\1\s\tests\test_minimal_reproducer.py", line 4, in <module>
    import numpy as np  # noqa: F401
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\__init__.py", line 143, in <module>
    from . import lib
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\__init__.py", line 25, in <module>
    from .index_tricks import *
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\index_tricks.py", line 11, in <module>
    import numpy.matrixlib as matrixlib
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\__init__.py", line 4, in <module>
    from .defmatrix import *
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\defmatrix.py", line 11, in <module>
    from numpy.linalg import matrix_power
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\linalg\__init__.py", line 73, in <module>
    from .linalg import *
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 786, in exec_module
  File "<frozen importlib._bootstrap_external>", line 881, in get_code
  File "<frozen importlib._bootstrap_external>", line 980, in get_data
MemoryError
[INFO/SpawnProcess-19] process exiting with exitcode 1
-------------------------- Captured stderr teardown ---------------------------
Traceback (most recent call last):
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\concurrent\futures\process.py", line 237, in _process_worker
    call_item = call_queue.get(block=True)
  File "C:\hostedtoolcache\windows\Python\3.9.1\x64\lib\multiprocessing\queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "D:\a\1\s\tests\test_minimal_reproducer.py", line 4, in <module>
    import numpy as np  # noqa: F401
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\__init__.py", line 143, in <module>
    from . import lib
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\__init__.py", line 25, in <module>
    from .index_tricks import *
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\lib\index_tricks.py", line 11, in <module>
    import numpy.matrixlib as matrixlib
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\__init__.py", line 4, in <module>
    from .defmatrix import *
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\matrixlib\defmatrix.py", line 11, in <module>
    from numpy.linalg import matrix_power
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\linalg\__init__.py", line 73, in <module>
    from .linalg import *
  File "d:\a\1\s\.tox\py39\lib\site-packages\numpy\linalg\linalg.py", line 33, in <module>
    from numpy.linalg import lapack_lite, _umath_linalg
ImportError: DLL load failed while importing _umath_linalg: The paging file is too small for this operation to complete.
[INFO/SpawnProcess-11] process exiting with exitcode 1

So it seems that the paging file size configation is too small on the CI instances.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jan 8, 2021

One possible solution would be to reuse the scripts/SetPageFileSize.ps1 powershell script of https://github.com/al-cheb/configure-pagefile-action to increase the paging file size in our Azure Pipelines CI config.

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jan 8, 2021

In the mean time #282 is a workaround (add log + skip failing test when nprocs is too large on windows).

@ogrisel
Copy link
Collaborator Author

ogrisel commented Jan 8, 2021

#282 was merged but let's keep this issue open if a volunteer would like to implement a proper fix as described above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant