Different results for the same model on different hardware / compute device #260

vade · 2024-11-12T23:30:33Z

hi there

First, thank you for WhisperKit, it's awesome and nice to work with. Ya'll have done a fantastic job.

Question: Should I expect to see different transcription results for WhisperKit depending on the Compute hardware used? I'm aware that ANE is 16 bits only, and that GPU can run 32 bit, and that may result in different output predictions for the same input in theory. In testing on your end, should that be expected?

Secondly, across different hardware revisions, M1 vs M2, for example, should one expect numerical stability assuming the same compute hardware chosen? Ie M1 GPU vs M2 GPU produces the same result? Same for ANE on M1 vs M2?

I ask because im seeing some confusing results of Transcription while trying to chose a model / device config for a shipping app.

If the above is expected, can you kindly lmk what sort of variance is to be expected?

atiorh · 2024-11-12T23:47:05Z

@vade Great question!

One major reason for variance in transcription results is the original Whisper decoding algorithm when temperature fallbacks are engaged (Described in Section 4.5). If an audio input triggers the temperature fallback conditions, all correct Whisper implementations that follow the OpenAI reference algorithm are expected to return non-deterministic results for those inputs.

atiorh · 2024-11-12T23:48:33Z

Notes on hardware variance: (that I don't think is a major factor in this case)

In an internal version of WhisperKit Benchmarks, we test and monitor cross-hardware variance of WER for the same model and WhisperKit config and cap the disparity at 20% relative WER, e.g. <0.12 is a pass for a reference WER of 0.1.
Most implementations of parallel computation with floating point precision are not fully deterministic unless specifically set to a deterministic state (to avoid non-deterministic certain kernels). There is no Apple hardware-specific reason WhisperKit would have additional non-determinism that I am aware of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results for the same model on different hardware / compute device #260

Different results for the same model on different hardware / compute device #260

vade commented Nov 12, 2024

atiorh commented Nov 12, 2024

atiorh commented Nov 12, 2024

Different results for the same model on different hardware / compute device #260

Different results for the same model on different hardware / compute device #260

Comments

vade commented Nov 12, 2024

atiorh commented Nov 12, 2024

atiorh commented Nov 12, 2024