How to pretrain the vision encoder? #124

Chunchunwumu · 2024-11-16T13:29:16Z

We need to continue unified structure learning on the Docowl1.5-stage1 model using some private data, followed by LoRA fine-tuning for the Document Parsing task. We modified the parameters in the 'finetune-docowl.sh' script based on the original paper, setting tune_vision2text=True, freeze_vision_model=False, and freeze_base_model=True in order to perform unified structure learning. After this stage, the model was able to perform inference normally. We then proceeded with fine-tuning the model for the 'Document Parsing' task using the 'finetune-docowl_lora.sh' script, aiming to further improve its performance. During this fine-tuning process, the model’s loss decreased as expected. However, after applying LoRA fine-tuning, the model's inference results became confused. That said, we were able to achieve the desired results by directly applying LoRA fine-tuning to the Docowl1.5 S1 model.

We would appreciate any suggestions you might have regarding our experimental design to help us achieve the expected results.

snow-like-kk · 2024-12-01T15:16:53Z

Hi, can I ask for your finetuing environment? I used the CUDA12.4, but it seems doesn't work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pretrain the vision encoder? #124

How to pretrain the vision encoder? #124

Chunchunwumu commented Nov 16, 2024

snow-like-kk commented Dec 1, 2024

How to pretrain the vision encoder? #124

How to pretrain the vision encoder? #124

Comments

Chunchunwumu commented Nov 16, 2024

snow-like-kk commented Dec 1, 2024