Uplift dram and l1 allocators to use dram/l1 specific alignment #17122

llongTT · 2025-01-26T23:17:43Z

Ticket

Problem description

Using the max of DRAM and L1 alignment for both DRAM and L1 buffers was causing pcc mismatches in i2s and s2i.

What's changed

Use L1/DRAM specific alignment for respective allocations. This will require some ops to be uplifted to handle re-alignment

Checklist

Post commit CI passes
Blackhole Post commit (if applicable)
Model regression CI testing passes (if applicable)
Device performance regression CI testing passes (if applicable)
(For models and ops writers) Full new models tests passes
New/Existing tests provide coverage for changes

…_noc_coord_descriptor

… asyn read

…t issue

…k failure

…ixed

…rm_with_block_sharded_v2_8x8_grid

…name for l1 alignment

…al into llong/diff-aligns-fix

bbradelTT · 2025-01-27T02:05:59Z

ttnn/cpp/ttnn/operations/conv/conv2d/conv2d_utils.cpp

@@ -1061,7 +1061,7 @@ conv_op_l1_usage conv2d::calculate_L1_usage(
            } else if (output_dtype == DataType::FLOAT32) {
                per_core_out_width_aligned *= 4;
            }
-            output_size = round_up(per_core_out_width_aligned, 32) * pconfig.per_core_out_matrix_height;
+            output_size = round_up(per_core_out_width_aligned, 16) * pconfig.per_core_out_matrix_height;


It would be nice to have defines/constexpr for these magic numbers (16 in this case, 32 before).

if this is a function just used for L1 we can use HAL api: get_alignment(HalMemType::L1)

esmalTT · 2025-01-27T13:18:20Z

models/experimental/functional_unet/tests/test_unet_trace.py

@@ -108,7 +108,7 @@ def test_unet_trace(

 @skip_for_grayskull("UNet not currently supported on GS")
 @pytest.mark.parametrize(
-    "device_params", [{"l1_small_size": 68864, "trace_region_size": 442368, "num_command_queues": 2}], indirect=True
+    "device_params", [{"l1_small_size": 68864, "trace_region_size": 917504, "num_command_queues": 2}], indirect=True


Why so much larger? Seems odd we now need to double the trace region size.

esmalTT · 2025-01-27T13:18:41Z

models/experimental/functional_unet/tests/test_unet_trace.py

@@ -343,7 +343,7 @@ def test_unet_trace_2cq_multi_device(

 @skip_for_grayskull("UNet not currently supported on GS")
 @pytest.mark.parametrize(
-    "device_params", [{"l1_small_size": 68864, "trace_region_size": 424960, "num_command_queues": 2}], indirect=True
+    "device_params", [{"l1_small_size": 68864, "trace_region_size": 1376256, "num_command_queues": 2}], indirect=True


Same here - why such a large increase?

tt-aho

lgtm if hardcoding is removed

tt-aho · 2025-01-27T14:37:10Z

ttnn/cpp/ttnn/operations/conv/conv2d/conv2d_utils.cpp

@@ -1164,7 +1164,7 @@ conv_op_l1_usage conv2d::calculate_L1_usage(
            } else if (output_dtype == DataType::FLOAT32) {
                per_core_out_width_aligned *= 4;
            }
-            output_size = round_up(per_core_out_width_aligned, 32) * pconfig.per_core_out_matrix_height;
+            output_size = round_up(per_core_out_width_aligned, 16) * pconfig.per_core_out_matrix_height;


Should also not be hardcoded to 16

tt-aho · 2025-01-27T15:29:07Z

ttnn/cpp/ttnn/operations/normalization/groupnorm/device/multi_core/groupnorm_op_multi_core.cpp

+    uint32_t l1_alignment = tt::tt_metal::hal.get_alignment(tt::tt_metal::HalMemType::L1);
+    uint32_t per_core_N_bytes_padded = tt::round_up(per_core_N * datum_size_bytes, l1_alignment);


Would it make more sense to query a's buffer alignment here instead of querying the hal?

abhullar-tt and others added 30 commits December 4, 2024 00:40

#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment

27ff243

#13609: Update memcpy to device to handle 16B aligned writes

21e4632

#12549: Fix BH unaligned read issue for tiled interleaved transpose HC

f48416c

Merge branch 'main' into abhullar/diff-aligns

92d0d5f

#13609: remove the change to genfiles.cpp/hpp

479f4da

#13609: also remove the deprecated call of jit_build_genfiles_bank_to…

1c4ea2d

…_noc_coord_descriptor

#13609: fix the build failure due to allocator_alignment change

7d8ae9a

Merge branch 'main' into abhullar/diff-aligns

86af491

Merge branch 'main' into abhullar/diff-aligns

fff8bbb

Merge branch 'main' into abhullar/diff-aligns

9590f84

#13609: enforce the alignment to the max of input/output to allow noc…

45be757

… asyn read

#13609: fix the failed test_sharded tests, using keep_l1_aligned flag

c9cbdff

#13609: take care of sharded padding failure due to DRAM/L1 alignmen…

353945b

…t issue

#13609: stick to the usage of keep_l1_aligned = True for now

5e0bdda

#13609: switch to i2s/s2i call explicitly to keep l1 aligned

628e010

Merge branch 'main' into abhullar/diff-aligns

740938e

Merge branch 'main' into abhullar/diff-aligns

53c9f09

Merge branch 'main' into abhullar/diff-aligns

49aef72

Merge branch 'main' into abhullar/diff-aligns

d8a7c8d

Add allocator api to get alignment based on all buffer types

0174c05

Merge branch 'main' into abhullar/diff-aligns

6f16c44

#13609: Temporarily skip the failed tests to see if more tests fail

7c46541

#13609: skip more tests

de04fc0

#13609: Update to address the group norm unit test issue

939caa5

#13609: enable group norm tests

af1451f

#13609: fix of test fold issue, working on GS now

4b1ab24

fix the segmentation fault due to the hugepage address alignment chec…

774dc43

…k failure

#13609: enable test_permute_5d_blocked as the memory issue has been f…

fead9bc

…ixed

#13609: explicitely pack l1 for group norm unit test of test_group_no…

ffbf4a4

…rm_with_block_sharded_v2_8x8_grid

#13609: address some review comment, regarding comments and variable …

45ed294

…name for l1 alignment

llongTT requested review from razorback3, dongjin-na, bbradelTT, yugaoTT, tt-aho, asandhupatlaTT, ntarafdar, sjameelTT, jaykru-tt, yugi957, jvegaTT, nardoTT, aliuTT, tt-asaigal, omilyutin-tt, abhullar-tt, pgkeller, tt-dma, ubcheema and davorchap as code owners January 26, 2025 23:17

llongTT added 4 commits January 26, 2025 15:18

Merge branch 'main' into llong/diff-aligns-fix

ba536e2

#13609: revert all the changes to reshard

9ed58a0

Merge branch 'llong/diff-aligns-fix' of github.com:tenstorrent/tt-met…

c4b589d

…al into llong/diff-aligns-fix

#13609: switch more to tt::align

96c68d8

bbradelTT approved these changes Jan 27, 2025

View reviewed changes

sankarmanoj-tt approved these changes Jan 27, 2025

View reviewed changes

esmalTT reviewed Jan 27, 2025

View reviewed changes

ntarafdar approved these changes Jan 27, 2025

View reviewed changes

yugaoTT approved these changes Jan 27, 2025

View reviewed changes

tt-aho approved these changes Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uplift dram and l1 allocators to use dram/l1 specific alignment #17122

Uplift dram and l1 allocators to use dram/l1 specific alignment #17122

llongTT commented Jan 26, 2025

bbradelTT Jan 27, 2025

abhullar-tt Jan 27, 2025

esmalTT Jan 27, 2025

esmalTT Jan 27, 2025

tt-aho left a comment

tt-aho Jan 27, 2025

tt-aho Jan 27, 2025

		uint32_t l1_alignment = tt::tt_metal::hal.get_alignment(tt::tt_metal::HalMemType::L1);
		uint32_t per_core_N_bytes_padded = tt::round_up(per_core_N * datum_size_bytes, l1_alignment);

Uplift dram and l1 allocators to use dram/l1 specific alignment #17122

Are you sure you want to change the base?

Uplift dram and l1 allocators to use dram/l1 specific alignment #17122

Conversation

llongTT commented Jan 26, 2025

Ticket

Problem description

What's changed

Checklist

bbradelTT Jan 27, 2025

Choose a reason for hiding this comment

abhullar-tt Jan 27, 2025

Choose a reason for hiding this comment

esmalTT Jan 27, 2025

Choose a reason for hiding this comment

esmalTT Jan 27, 2025

Choose a reason for hiding this comment

tt-aho left a comment

Choose a reason for hiding this comment

tt-aho Jan 27, 2025

Choose a reason for hiding this comment

tt-aho Jan 27, 2025

Choose a reason for hiding this comment