You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
VECINT and VECFP seem to have similar changes – if I only test operands of the form rand_next() & ~(0x3F | (0x3F<<10)) the tests pass. I was able to fix a simple test case by zeroing those bits if bit 31 was set, but that broke indexed-loads. Trying to fix indexed-loads didn't go well, and other experiments imply that that wouldn't be the end of it either. I might be able to work through it, but I figured I'd leave this here in case it's helpful.
Edit: Also, entirely unsurprisingly, Streaming-SVE mode (SME) and AMX are mutually exclusive – if either is enabled, trying to enable the other gives EXC_BAD_INSTRUCTION.
The text was updated successfully, but these errors were encountered:
Yeah, I was surprised too – my initial theory was it was just for software compatibility (within Apple), but I think we're also seeing worse f16/bf16 throughput with SME too, because the spec'd SME operations map less directly to what AMX can do at that size. I might be misremembering the AMX behaviour, or misusing SME, but I've measured single-core SME f16 FLOPS ≈ single-core SME f32 FLOPS (as did someone else https://mastodon.social/@[email protected]/112528651326649755)
I had a quick look at the M4 – the tests for EXTRX, EXTRY, VECINT and VECFP are failing.
EXTRX and EXTRY can be fixed with the following change to each:
VECINT and VECFP seem to have similar changes – if I only test operands of the form
rand_next() & ~(0x3F | (0x3F<<10))
the tests pass. I was able to fix a simple test case by zeroing those bits if bit 31 was set, but that broke indexed-loads. Trying to fix indexed-loads didn't go well, and other experiments imply that that wouldn't be the end of it either. I might be able to work through it, but I figured I'd leave this here in case it's helpful.Edit: Also, entirely unsurprisingly, Streaming-SVE mode (SME) and AMX are mutually exclusive – if either is enabled, trying to enable the other gives EXC_BAD_INSTRUCTION.
The text was updated successfully, but these errors were encountered: