Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#0: Hoist SubDeviceManager/Lock-Step Allocator to MeshDevice #16878

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

cfjchu
Copy link
Collaborator

@cfjchu cfjchu commented Jan 18, 2025

Ticket

Link to Github Issue

Problem description

As part of the TT-Mesh scope of work, we want to natively virtualize the devices in the mesh. Part of this virtualization process involves deferring to a single set of allocators (global, per-subdevice) at the MeshDevice level, instead of issuing repeated allocations on each of the per-device allocators.

What's changed

Checklist

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

tests/tt_metal/distributed/test_mesh_allocator.cpp Outdated Show resolved Hide resolved
@cfjchu cfjchu force-pushed the jchu/mesh-allocator branch from 92a2dd9 to d91d8e5 Compare January 18, 2025 02:44
Copy link
Contributor

@omilyutin-tt omilyutin-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level questions for now. In general, I think it is worth exploring a better "allocator" abstraction that is independent of details of mesh / single device / sub device.

tt_metal/distributed/mesh_buffer.cpp Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Show resolved Hide resolved
tt_metal/impl/sub_device/sub_device_manager.cpp Outdated Show resolved Hide resolved
tt_metal/impl/sub_device/sub_device_manager.cpp Outdated Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Outdated Show resolved Hide resolved
tt_metal/distributed/mesh_buffer.cpp Outdated Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Show resolved Hide resolved
tt_metal/distributed/mesh_device.cpp Show resolved Hide resolved
tt_metal/impl/sub_device/sub_device_manager.cpp Outdated Show resolved Hide resolved
@cfjchu cfjchu force-pushed the jchu/mesh-allocator branch 2 times, most recently from 5968a64 to 2f3c743 Compare January 21, 2025 23:59
@cfjchu cfjchu requested a review from davorchap as a code owner January 21, 2025 23:59
@cfjchu cfjchu force-pushed the jchu/mesh-allocator branch 4 times, most recently from 89a2fd9 to a04ce94 Compare January 23, 2025 21:05
Copy link
Contributor

@tt-aho tt-aho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-Device related changes look okay to me. Cleanup of how we want to expose these apis/interfaces instead of having a bunch of wrapping fns is a different discussion/issue from this pr.

@cfjchu cfjchu force-pushed the jchu/mesh-allocator branch from a04ce94 to 6558bf9 Compare January 24, 2025 06:12
@cfjchu cfjchu force-pushed the jchu/mesh-allocator branch from 6558bf9 to 86d6a3b Compare January 24, 2025 06:14

void initialize();
std::unique_ptr<SubDeviceManagerTracker> sub_device_manager_tracker_;
std::unique_ptr<WorkExecutor> work_executor_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't this new?

Copy link
Collaborator Author

@cfjchu cfjchu Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this yesterday... This is being added to provide implementations for IDevice interface for executor methods like push_work. Right now I'm defaulting executor to work in synchronous mode. This was added so that Buffer::create -> calls push_work -> requires executor methods on MeshDevice.

Separate discussion needs to be had about removing this from IDevice interface but my goal is to unify the APIs, as-is and to do so incrementally to prepare for TT-NN integration.

@cfjchu cfjchu merged commit 2c2110c into main Jan 24, 2025
218 of 220 checks passed
@cfjchu cfjchu deleted the jchu/mesh-allocator branch January 24, 2025 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants