Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor trace_link for better maintainability and readability #131

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

TaekyungHeo
Copy link
Contributor

@TaekyungHeo TaekyungHeo commented Jul 12, 2024

Summary

Refactor trace_link for better maintainability and readability

Test Plan

  1. CI passes.
  2. Ran correlation
$ pip install .              
Processing /Users/theo/chakra-dev
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: protobuf==4.* in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (4.23.4)
Requirement already satisfied: graphviz in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (0.20.1)
Requirement already satisfied: networkx in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (3.2.1)
Requirement already satisfied: pydot in /Users/theo/venv/lib/python3.10/site-packages (from chakra==0.0.4) (2.0.0)
Requirement already satisfied: pyparsing>=3 in /Users/theo/venv/lib/python3.10/site-packages (from pydot->chakra==0.0.4) (3.1.1)
Building wheels for collected packages: chakra
  Building wheel for chakra (pyproject.toml) ... done
  Created wheel for chakra: filename=chakra-0.0.4-py3-none-any.whl size=54789 sha256=77d0bdd1c3c5604cb899f9e6f9b76c8d01a28df9309d8d25d84da3d24dc3c867
  Stored in directory: /Users/theo/Library/Caches/pip/wheels/1f/cc/a0/f451e6630d3461090be1de9594059abe3c2f5be7ce264deca3
Successfully built chakra
Installing collected packages: chakra
  Attempting uninstall: chakra
    Found existing installation: chakra 0.0.4
    Uninstalling chakra-0.0.4:
      Successfully uninstalled chakra-0.0.4
Successfully installed chakra-0.0.4

$ python3 ci_tools/integration_tests.py --tgz_path tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05.tgz --num_ranks 8 --tolerance 0.05 --expected_times_ms 14597 14597 14968 14638 14649 14700 14677 14735
Extracting tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05.tgz to tests/data/1.0.2-chakra.0.0.4
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_0.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_0.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_0.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_1.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_1.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_1.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_2.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_2.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_2.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_3.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_3.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_3.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_4.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_4.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_4.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_5.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_5.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_5.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_6.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_6.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_6.json                                                                   
Running command: chakra_trace_link --chakra-host-trace tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_host_et_7.json --chakra-device-trace tests/data/1.0.2-chakra.0.0.4/lla
ma_pytorch24.05/kineto_7.json --output-file tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_7.json 
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_0.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_0.chakra --input_type PyTorch --log_filename /tmp/rank_0.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_1.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_1.chakra --input_type PyTorch --log_filename /tmp/rank_1.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_2.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_2.chakra --input_type PyTorch --log_filename /tmp/rank_2.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_3.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_3.chakra --input_type PyTorch --log_filename /tmp/rank_3.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_4.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_4.chakra --input_type PyTorch --log_filename /tmp/rank_4.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_5.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_5.chakra --input_type PyTorch --log_filename /tmp/rank_5.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_6.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_6.chakra --input_type PyTorch --log_filename /tmp/rank_6.log --simulate
Running command: chakra_converter --input_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_et_plus_7.json --output_filename tests/data/1.0.2-chakra.0.0.4/llama_pytorch24.05/chakra_final_7.chakra --input_type PyTorch --log_filename /tmp/rank_7.log --simulate

==> rank_0.log <==
INFO [07/12/2024 03:18:53 PM] GPU Node ID 301192 on stream 7 completed at 14488271us, tid: stream 7
INFO [07/12/2024 03:18:53 PM] Simulation of Chakra node execution completed.

==> rank_1.log <==
INFO [07/12/2024 03:20:58 PM] GPU Node ID 301192 on stream 7 completed at 14489195us, tid: stream 7
INFO [07/12/2024 03:20:58 PM] Simulation of Chakra node execution completed.

==> rank_2.log <==
INFO [07/12/2024 03:09:17 PM] GPU Node ID 301192 on stream 7 completed at 14550790us, tid: stream 7
INFO [07/12/2024 03:09:17 PM] Simulation of Chakra node execution completed.

==> rank_3.log <==
INFO [07/12/2024 03:19:21 PM] GPU Node ID 301192 on stream 7 completed at 14418327us, tid: stream 7
INFO [07/12/2024 03:19:21 PM] Simulation of Chakra node execution completed.

==> rank_4.log <==
INFO [07/12/2024 03:17:19 PM] GPU Node ID 301192 on stream 7 completed at 14500584us, tid: stream 7
INFO [07/12/2024 03:17:19 PM] Simulation of Chakra node execution completed.

==> rank_5.log <==
INFO [07/12/2024 03:15:02 PM] GPU Node ID 301192 on stream 7 completed at 14308678us, tid: stream 7
INFO [07/12/2024 03:15:02 PM] Simulation of Chakra node execution completed.

==> rank_6.log <==
INFO [07/12/2024 03:16:40 PM] GPU Node ID 301192 on stream 7 completed at 14385408us, tid: stream 7
INFO [07/12/2024 03:16:40 PM] Simulation of Chakra node execution completed.

==> rank_7.log <==
INFO [07/12/2024 03:12:03 PM] GPU Node ID 301192 on stream 7 completed at 14398107us, tid: stream 7
INFO [07/12/2024 03:12:03 PM] Simulation of Chakra node execution completed.

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner July 12, 2024 18:02
Copy link

github-actions bot commented Jul 12, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@TaekyungHeo TaekyungHeo force-pushed the refactor-trace-link branch from 3cf1c8b to f0f5772 Compare July 12, 2024 18:05
@TaekyungHeo TaekyungHeo added the enhancement New feature or request label Jul 12, 2024
@TaekyungHeo TaekyungHeo force-pushed the refactor-trace-link branch 3 times, most recently from d50225b to a36213f Compare July 13, 2024 11:15
@srinivas212 srinivas212 merged commit 472c72a into main Jul 16, 2024
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 16, 2024
@TaekyungHeo TaekyungHeo deleted the refactor-trace-link branch July 17, 2024 10:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants