-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed tokenizer and audio processing logic #214
base: main
Are you sure you want to change the base?
Conversation
@@ -22,16 +22,14 @@ public class AudioEncoder: AudioEncoding, WhisperMLModel { | |||
guard let inputDescription = model?.modelDescription.outputDescriptionsByName["encoder_output_embeds"] else { return nil } | |||
guard inputDescription.type == .multiArray else { return nil } | |||
guard let shapeConstraint = inputDescription.multiArrayConstraint else { return nil } | |||
let shape = shapeConstraint.shape.map { $0.intValue } | |||
return shape[1] | |||
return shapeConstraint.shape[0].intValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shape[0]: Batch size
shape[1]: Sequence length
shape[2]: Embedding dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @1amageek, I'm happy to see your enthusiasm to contribute to WhisperKit! I'd like to make a few recommendations to help guide your work in the future, since this PR has grown quite large.
With this in mind, I would recommend closing this PR and separating out the changes that add new features or fix existing bugs into separate PRs with details on why they are necessary and solve an open issue listed in https://github.com/argmaxinc/WhisperKit/issues. Thanks again for your contributions so far, open to any discussion on these topics. |
This PR addresses the following changes:
WhisperTokenizerWrapper
.startIndex
validation inAudioChunker.swift
andAudioProcessor.swift
.embedSize
andsequenceLength
calculations inAudioEncoder.swift
.applyChatTemplate
andencode
methods.Package.swift
.