Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print detailed error message on exception in get_start_timestamp_for_gpu_op #69

Merged
merged 1 commit into from
May 22, 2024

Conversation

TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented May 22, 2024

Summary

Print detailed error message on exception in get_start_timestamp_for_gpu_op

Test Plan

Previous Version

Traceback (most recent call last):                                                                                                                                                                                                     
  File "/Users/theo/venv/bin/chakra_trace_link", line 8, in <module>                                                                                                                                                                   
    sys.exit(main())                                                                                                                                                                                                                   
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_link.py", line 37, in main       
    linker.link_traces()                                                                                                                                                                                                               
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 473, in link_traces                                                                                                                 
    self.map_pytorch_to_kineto_ops()                                                                                                                                                                                                   
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 542, in map_pytorch_to_kineto_ops                                                                                                   
    cpu_ev_idx_to_gpu_ops_map = self.group_gpu_ops_by_cpu_launchers()                                                                                                                                                                  
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 586, in group_gpu_ops_by_cpu_launchers
    parent_cpu_op = self.find_parent_cpu_op(gpu_op)                                                                                                                                                                                    
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 635, in find_parent_cpu_op                                                                                                          
    kineto_gpu_op.timestamp = self.get_start_timestamp_for_gpu_op(kineto_gpu_op)                                                                                                                                                       
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 668, in get_start_timestamp_for_gpu_op                                                                                              
    raise RuntimeError(f"No valid timestamp found for GPU operator: {kineto_gpu_op.name}")                                                                                                                                             
RuntimeError: No valid timestamp found for GPU operator: void at::native::vectorized_elementwise_kernel<4, at::native::BinaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at:
:detail::Array<char*, 3> >(int, at::native::BinaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 3>)

Current Version

Traceback (most recent call last):
  File "/Users/theo/venv/bin/chakra_trace_link", line 8, in <module>
    sys.exit(main())
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_link.py", line 37, in main
    linker.link_traces()
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 414, in link_traces
    self.map_pytorch_to_kineto_ops()
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 480, in map_pytorch_to_kineto_ops
    cpu_ev_idx_to_gpu_ops_map = self.group_gpu_ops_by_cpu_launchers()
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 520, in group_gpu_ops_by_cpu_launchers
    parent_cpu_op = self.find_parent_cpu_op(gpu_op)
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 566, in find_parent_cpu_op
    kineto_gpu_op.timestamp = self.get_start_timestamp_for_gpu_op(kineto_gpu_op)
  File "/Users/theo/venv/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 597, in get_start_timestamp_for_gpu_op
    raise RuntimeError(f"No valid timestamp found for GPU operator: {kineto_gpu_op}")
RuntimeError: No valid timestamp found for GPU operator: KinetoOperator(id=None, category=kernel, name=void at::native::vectorized_elementwise_kernel<4, at::native::BinaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 3> >(int, at::native::BinaryFunctor<c10::BFloat16, c10::BFloat16, c10::BFloat16, at::native::binary_internal::MulFunctor<float> >, at::detail::Array<char*, 3>), phase=X, inclusive_dur=131, exclusive_dur=131, timestamp=1716386233636001, external_id=148356, ev_idx=-1, tid=2398109, parent_pytorch_op_id=None, inter_thread_dep=None, stream=7, rf_id=None, correlation=1213416)

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner May 22, 2024 22:05
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@srinivas212 srinivas212 merged commit 6ed23d2 into main May 22, 2024
8 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators May 22, 2024
@TaekyungHeo TaekyungHeo deleted the detailed-error-msg branch May 22, 2024 23:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants