-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uplift dram and l1 allocators to use dram/l1 specific alignment #17122
base: main
Are you sure you want to change the base?
Conversation
…_noc_coord_descriptor
…rm_with_block_sharded_v2_8x8_grid
…name for l1 alignment
@@ -1061,7 +1061,7 @@ conv_op_l1_usage conv2d::calculate_L1_usage( | |||
} else if (output_dtype == DataType::FLOAT32) { | |||
per_core_out_width_aligned *= 4; | |||
} | |||
output_size = round_up(per_core_out_width_aligned, 32) * pconfig.per_core_out_matrix_height; | |||
output_size = round_up(per_core_out_width_aligned, 16) * pconfig.per_core_out_matrix_height; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have defines/constexpr for these magic numbers (16 in this case, 32 before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a function just used for L1 we can use HAL api: get_alignment(HalMemType::L1)
@@ -108,7 +108,7 @@ def test_unet_trace( | |||
|
|||
@skip_for_grayskull("UNet not currently supported on GS") | |||
@pytest.mark.parametrize( | |||
"device_params", [{"l1_small_size": 68864, "trace_region_size": 442368, "num_command_queues": 2}], indirect=True | |||
"device_params", [{"l1_small_size": 68864, "trace_region_size": 917504, "num_command_queues": 2}], indirect=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so much larger? Seems odd we now need to double the trace region size.
@@ -343,7 +343,7 @@ def test_unet_trace_2cq_multi_device( | |||
|
|||
@skip_for_grayskull("UNet not currently supported on GS") | |||
@pytest.mark.parametrize( | |||
"device_params", [{"l1_small_size": 68864, "trace_region_size": 424960, "num_command_queues": 2}], indirect=True | |||
"device_params", [{"l1_small_size": 68864, "trace_region_size": 1376256, "num_command_queues": 2}], indirect=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - why such a large increase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if hardcoding is removed
@@ -1164,7 +1164,7 @@ conv_op_l1_usage conv2d::calculate_L1_usage( | |||
} else if (output_dtype == DataType::FLOAT32) { | |||
per_core_out_width_aligned *= 4; | |||
} | |||
output_size = round_up(per_core_out_width_aligned, 32) * pconfig.per_core_out_matrix_height; | |||
output_size = round_up(per_core_out_width_aligned, 16) * pconfig.per_core_out_matrix_height; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also not be hardcoded to 16
uint32_t l1_alignment = tt::tt_metal::hal.get_alignment(tt::tt_metal::HalMemType::L1); | ||
uint32_t per_core_N_bytes_padded = tt::round_up(per_core_N * datum_size_bytes, l1_alignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to query a's buffer alignment here instead of querying the hal?
Ticket
#13609
Problem description
Using the max of DRAM and L1 alignment for both DRAM and L1 buffers was causing pcc mismatches in i2s and s2i.
What's changed
Use L1/DRAM specific alignment for respective allocations. This will require some ops to be uplifted to handle re-alignment
Checklist