-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pause/resume/context to workers #101
Conversation
@petrhosek - I added a new dependency, psutil here |
b623fd0
to
b6db099
Compare
468adc3
to
f505374
Compare
so turns out this just sends SIGSTOP to the python worker and not the clang subprocess as well. I could either:
alternatively, could figure out a way to maintain a record of spawned clang processes PIDs, do the "cancel and requeue" version, or accept that some clang processes will run to completion when asked to stop. I'll probably get back to this next week. |
|
||
from absl import logging | ||
# pylint: disable=unused-import | ||
from compiler_opt.distributed.worker import Worker | ||
|
||
from contextlib import AbstractContextManager | ||
from multiprocessing import connection | ||
from typing import Any, Callable, Dict, Optional | ||
from typing import Any, Callable, Dict, Optional, List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put list in alphabetical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same further below)
compiler_opt/distributed/worker.py
Outdated
ContextAwareWorker can check for this with isinstance(obj, ContextAwareWorker) | ||
""" | ||
|
||
def set_context(self, local: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContextAwareWorker is used nowhere, remove it for now.
@@ -102,13 +104,22 @@ class WorkerCancellationManager: | |||
managing resources. | |||
""" | |||
|
|||
def __init__(self): | |||
@dataclasses.dataclass | |||
class ProcData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thought: the motivating scenario here is the validator. For validation, we're actually OK to let compilation run longer than x seconds - because the goal is to get a thorough idea of what-if this model were shipped. So, how about:
- no ProcData
- just have the validator use a very large timeout, like 20 minutes (i.e. ~half of that in real compilation time)
- Allows a user to start/stop processes at will, via OS signals SIGSTOP and SIGCONT. - Allows a user to bind processes to specific CPUs. - Allows local_worker_pool to be used outside of a context manager - Switch workers to be Protocol based, so Workers are effectively duck-typed (i.e. anything that has the required methods passes as a Worker) Part of google#96
6575d59
to
d3ee08e
Compare
Part of #96