-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tutorial] Many nodes have a common parent node, but the node doesn't exist in PyTorch ET. #45
Comments
The trace_link and et_converter have been updated by the following PRs. First, please use the updated tools: Second, ensure you use PyTorch nightly for some time to collect traces, as we are utilizing the latest features from the PyTorch profiler to properly correlate traces. |
Thank you so much! The updated tools resolve the issue. I have one more question regarding the behavior of Q. The linking procedure seems to create edges between tensors. Do you know if this is expected? If so, what do they mean? Before After |
Let's make it clear - Did you use the latest trace_link.py to plot it? |
I cloned the Chakra repository this morning and used it to get the above figure, so I didn't. I just tried out the latest one which was updated an hour ago, but the result is still the same. There are edges between tensor nodes. |
Let me share some comments. There are many downstream tools in Chakra. When replaying traces on a real system using the actual PyTorch framework, tensors are crucial. However, in simulation, the tools do not care about the tensors. trace_link is one of the simulation tools, and it disregards any side effects in tensors. Perhaps this is why you are observing additional edges. |
I see. They will be ignored anyway in downstream tools. Thanks for all your answers. It really helps a lot. |
FYI, it was the problem of the fix in the PyTorch visualization tool that I made to make it work with previous versions of chakra tools. Actually, there were no edges between tensors in the collected traces :) |
Chakra install enhancements
Describe the Bug
I was following the Chakra trace collection tutorial. I was able to collect both PyTorch ET and Kineto trace, but I couldn't link them using
trace_link.py
.trace_link.py
emitted the following error:I looked into the collected PyTorch ET to further debug the issue. I found that many nodes have a parent attribute with a value of 3, but there was no node with id 3 (Please refer to the screenshot). I believe this caused the above error. Is my trace collection procedure wrong or is it a known bug? If it's a known bug, is there any way to resolve this error? Any pointers or answers would be appreciated.
Steps to Reproduce
Below is the PyTorch code that I used for the ET and Kineto trace collection:
trace_link.py
is from the PARAM GitHub repository, and I executed it with the command below.$ python3 trace_link.py --et-file matmul_et.json --kineto-file kineto_trace_matmul.json --exact-match
The PyTorch version is 2.1.2 as the higher version has some issues.(related to #40)
Expected Behavior
I expected that PyTorch ET would be collected without missing dependencies so that the link procedure would succeed without an error.
Screenshots
The text was updated successfully, but these errors were encountered: