Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert t2i_input_embedding.shape[1] == self.img_token_num #56

Open
gsfsdv opened this issue Dec 12, 2024 · 4 comments
Open

assert t2i_input_embedding.shape[1] == self.img_token_num #56

gsfsdv opened this issue Dec 12, 2024 · 4 comments

Comments

@gsfsdv
Copy link

gsfsdv commented Dec 12, 2024

I hope you're doing well! I’ve been working with your code, and I’ve encountered an issue when executing the following assertion:
assert t2i_input_embedding.shape[1] == self.img_token_num
The error occurs because t2i_input_embedding.shape[1] is 1, but self.img_token_num is set to 8, causing the assertion to fail. I’m not sure why the second dimension of t2i_input_embedding would be 1 when I expect it to match self.img_token_num.

Could you help clarify the intended shape of t2i_input_embedding here? Is there any specific preprocessing step or reshaping operation I might have missed that would result in this discrepancy?

@KzZheng
Copy link
Collaborator

KzZheng commented Dec 12, 2024

The t2i_input_embedding is the output hidden states of LLM to use for the later diffusion process. I just ran all the code on a new machine, and the demo works well. I'm not sure the reason why you encounter its shape as 1, since it should be dealt inside the prepare_inputs_for_generation function inside MiniGPT5 class:(

if new_token_ids == self.output_img_id:
)
If len(special_token_index) is not zero, it means the first output image token is generated. In other words, during the LLM generation, new_token_ids == self.output_img_id should be True, and all_img_tokens should be append to this first image token, which will make t2i_input_embedding to be the same length as img_token_num.

@gsfsdv
Copy link
Author

gsfsdv commented Dec 16, 2024

Thank you for your reply. I would like to ask what the IMG_TOKEN_NUM in Constants.py represents. I noticed it is set to 8.

@gsfsdv
Copy link
Author

gsfsdv commented Dec 16, 2024

I would also like to inquire, as I want to use miniGPT-5 to batch generate data, which parameters in the model do I need to reset when starting to process a new sample? I look forward to your reply.

@hlchen23
Copy link

I have met the same problem. Have you addressed it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants