Some tests failing on M4 #13

dougallj · 2024-06-01T09:13:26Z

I had a quick look at the M4 – the tests for EXTRX, EXTRY, VECINT and VECFP are failing.

EXTRX and EXTRY can be fixed with the following change to each:

         if ((AMX_VER >= AMX_VER_M2) && (operand & (1ull << 31))) {
             operand &=~ (0x1ffull << 32);
             z_step = z_col & 32 ? 16 : 32;
         }
+        if ((AMX_VER >= AMX_VER_M4) && (operand & (1ull << 31))) {
+            dst_offset &= ~0x3F;
+        }
         store_enable &= parse_writemask(operand >> 32, xybytes, 9);
     } else if (operand & EXTR_BETWEEN_XY) {

VECINT and VECFP seem to have similar changes – if I only test operands of the form rand_next() & ~(0x3F | (0x3F<<10)) the tests pass. I was able to fix a simple test case by zeroing those bits if bit 31 was set, but that broke indexed-loads. Trying to fix indexed-loads didn't go well, and other experiments imply that that wouldn't be the end of it either. I might be able to work through it, but I figured I'd leave this here in case it's helpful.

Edit: Also, entirely unsurprisingly, Streaming-SVE mode (SME) and AMX are mutually exclusive – if either is enabled, trying to enable the other gives EXC_BAD_INSTRUCTION.

The text was updated successfully, but these errors were encountered:

corsix · 2024-06-01T11:01:40Z

I'm mildly surprised that AMX instructions are still present at all, given the introduction of SME.

I don't have any M4 hardware to test against at the moment, though I might pick up an M4 MBP when they come out.

dougallj · 2024-06-02T03:58:12Z

Yeah, I was surprised too – my initial theory was it was just for software compatibility (within Apple), but I think we're also seeing worse f16/bf16 throughput with SME too, because the spec'd SME operations map less directly to what AMX can do at that size. I might be misremembering the AMX behaviour, or misusing SME, but I've measured single-core SME f16 FLOPS ≈ single-core SME f32 FLOPS (as did someone else https://mastodon.social/@[email protected]/112528651326649755)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some tests failing on M4 #13

Some tests failing on M4 #13

dougallj commented Jun 1, 2024 •

edited

Loading

corsix commented Jun 1, 2024

dougallj commented Jun 2, 2024

Some tests failing on M4 #13

Some tests failing on M4 #13

Comments

dougallj commented Jun 1, 2024 • edited Loading

corsix commented Jun 1, 2024

dougallj commented Jun 2, 2024

dougallj commented Jun 1, 2024 •

edited

Loading