-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include stack trace on deadlock detection exception #595
Conversation
needs at least a cleanup before we pull but I do want to make sure the tid is set where it needs to be set, and secondarily if there are ways to make building the traceback easier/more readable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. We should change the tests to make additional assertions about the newly upgraded stacks though 🙂
temporalio/worker/_workflow.py
Outdated
tb_frames.pop() | ||
|
||
tb = top_tb | ||
for f in tb_frames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the workflow is timing out because two coroutines are playing some sort of infinite ping-pong (e.g. the workflow in #527), will the stack trace be huge, and so we need to limit the number of frames? Or does that asyncio coroutine switching not contribute any stack frames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should set a size limit on this traceback. There is a server-side truncation for failure stack traces that are too large, but it's technically possible we could exceed the 4MB limit to even send the message. I'd truncate at something extreme like 200000 bytes (i.e. 200kb) maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor code organization at this point, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and much cleaner. Minor/optional suggestions added. May want to wait and see if @dandavison has anything else to add.
May need to update with main which has updated sdk core (and rebuild to get lock file right)
Use the workflow tid to capture the workflow thread's frame at the time of deadlock.
Added
current_thread_id: Optional[int]
andget_worker_ident(self) -> Optional[int]
to interfaces/implementations.Currently I only set tid in
_run_once
for the standard implementation, if it's needed elsewhere please let me know. (This is mainly why it's marked as a draft)Also, readability around the code in
_workflow.py
is (imo) suboptimal so I'd appreciate feedback/ideas on that.