-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert t2i_input_embedding.shape[1] == self.img_token_num #56
Comments
The t2i_input_embedding is the output hidden states of LLM to use for the later diffusion process. I just ran all the code on a new machine, and the demo works well. I'm not sure the reason why you encounter its shape as 1, since it should be dealt inside the prepare_inputs_for_generation function inside MiniGPT5 class:( MiniGPT-5/minigpt4/models/mini_gpt5.py Line 327 in 2121c74
If len(special_token_index) is not zero, it means the first output image token is generated. In other words, during the LLM generation, new_token_ids == self.output_img_id should be True, and all_img_tokens should be append to this first image token, which will make t2i_input_embedding to be the same length as img_token_num. |
Thank you for your reply. I would like to ask what the IMG_TOKEN_NUM in Constants.py represents. I noticed it is set to 8. |
I would also like to inquire, as I want to use miniGPT-5 to batch generate data, which parameters in the model do I need to reset when starting to process a new sample? I look forward to your reply. |
I have met the same problem. Have you addressed it? |
I hope you're doing well! I’ve been working with your code, and I’ve encountered an issue when executing the following assertion:
assert t2i_input_embedding.shape[1] == self.img_token_num
The error occurs because t2i_input_embedding.shape[1] is 1, but self.img_token_num is set to 8, causing the assertion to fail. I’m not sure why the second dimension of t2i_input_embedding would be 1 when I expect it to match self.img_token_num.
Could you help clarify the intended shape of t2i_input_embedding here? Is there any specific preprocessing step or reshaping operation I might have missed that would result in this discrepancy?
The text was updated successfully, but these errors were encountered: