Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly init WhisperKit regarding Model per Device management #254

Open
Alonnasi opened this issue Nov 5, 2024 · 3 comments
Open

Properly init WhisperKit regarding Model per Device management #254

Alonnasi opened this issue Nov 5, 2024 · 3 comments

Comments

@Alonnasi
Copy link

Alonnasi commented Nov 5, 2024

Hello everybody 😇

Hoping I will get some guidance here 🙏

I'm trying to manage the Models in terms of Downloading, Storing, and Device/Model Management, and I have some issues/questions that, in hope, will help me & others understand better and make the best of this awesome Kit.

  1. In order to begin the Model download as early as possible, and cut down the waiting time, I'm performing the init in the AppDelegate & storing the Kit in the Manager Class scope:

Screenshot 2024-11-05 at 20 13 54

But, I'm facing some error that comes from the Kit logger, right after the Model finishes downloading.
This error comes and goes and is not consistent, it can come when initiating the same Model that once worked, and different ones:

Screenshot 2024-11-05 at 20 31 52

What am I doing wrong?

  1. In cases which the Kit has initiated with no errors, when tryin to Transcribe a simple audio file, I'm getting dozens of ">>" segments results - Am I abusing the system? Am I using the wrong Model?
    I'm adding a screenshot of the Transcribe func:

Screenshot 2024-11-05 at 20 39 16

  1. Final Question, if I may.
    Is there a way for me to download the Model files straight into the App Project, and init them locally, with no need to download any Model? Is there a Model that can calculate Word Timestamps AND run on all devices?

Thanks so much for any help 🙌

@atiorh
Copy link
Contributor

atiorh commented Nov 7, 2024

Hi @Alonnasi!

In cases which the Kit has initiated with no errors, when tryin to Transcribe a simple audio file, I'm getting dozens of ">>" segments results - Am I abusing the system? Am I using the wrong Model?

It depends on the file (feel free to share a link) but dozens of transcription segments is not out of the ordinary. You can always cross-reference the results from our TestFlight app to what you are observing in your project as a sanity check.

Final Question, if I may.
Is there a way for me to download the Model files straight into the App Project, and init them locally, with no need to download any Model? Is there a Model that can calculate Word Timestamps AND run on all devices?

You can always pre-download and bundle the models but your app download size will bloat so the trade-off is yours to make. Word timestamps are supported on all models. tiny and base variants are supported on all Apple Silicon Macs + iPhone XS and newer. Please use the GPU for the AudioEncoder model on iPhone XS, XR and 11. We will make these presets available soon so you shouldn't have to make device-specific defaults on your side.

@Alonnasi
Copy link
Author

Alonnasi commented Nov 7, 2024

Thank you so much for the quick response! 😇

I've managed to make tests on Various devices which I tried almost Every model on each device (to compare results):

  • Is there specific config in terms of Model or WhisperKit init that should be done in a Simulator? Because the behavior is changing, one time works and the other it wont work at all.

  • I didn't found ANY model that runs and works on iPhone 11 / XS / XR. I've tried base, tiny, small, large 2, large 3 & turbo.

  • Can you please elaborate about the AudioEncoder usage? Will it help with running transcribe tasks on iPhone 11 / XS / XR?

Thank you again 🙌

@atiorh
Copy link
Contributor

atiorh commented Nov 7, 2024

I didn't found ANY model that runs and works on iPhone 11 / XS / XR. I've tried base, tiny, small, large 2, large 3 & turbo.

Can you please elaborate about the AudioEncoder usage? Will it help with running transcribe tasks on iPhone 11 / XS / XR?

For WhisperKit's computeOptions, you will need to set audioEncoderCompute to be cpuAndGPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants