Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pretrain the vision encoder? #124

Open
Chunchunwumu opened this issue Nov 16, 2024 · 1 comment
Open

How to pretrain the vision encoder? #124

Chunchunwumu opened this issue Nov 16, 2024 · 1 comment

Comments

@Chunchunwumu
Copy link

We need to continue unified structure learning on the Docowl1.5-stage1 model using some private data, followed by LoRA fine-tuning for the Document Parsing task. We modified the parameters in the 'finetune-docowl.sh' script based on the original paper, setting tune_vision2text=True, freeze_vision_model=False, and freeze_base_model=True in order to perform unified structure learning. After this stage, the model was able to perform inference normally. We then proceeded with fine-tuning the model for the 'Document Parsing' task using the 'finetune-docowl_lora.sh' script, aiming to further improve its performance. During this fine-tuning process, the model’s loss decreased as expected. However, after applying LoRA fine-tuning, the model's inference results became confused. That said, we were able to achieve the desired results by directly applying LoRA fine-tuning to the Docowl1.5 S1 model.

We would appreciate any suggestions you might have regarding our experimental design to help us achieve the expected results.

@snow-like-kk
Copy link

Hi, can I ask for your finetuing environment? I used the CUDA12.4, but it seems doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants