How to use models with timm's arch #685

chchch0109 · 2023-10-08T00:47:26Z

chchch0109
Oct 8, 2023

Hi, I want to use the whole clip model to get a zero-shot head, and then convert the vision part to a timm model, how can I build the model and are there any pretrained weights I can use?

Answered by rsomani95

Oct 16, 2023

@ChChCh8 this is doable already. look at any of the convnext_* models, they're all coming from timm. Here's an example cfg:

open_clip/src/open_clip/model_configs/convnext_base_w.json

Lines 1 to 19 in e7b39e4

     {  
   "embed_dim": 640,  
   "vision_cfg": {  
   "timm_model_name": "convnext_base",  
   "timm_model_pretrained": false,  
   "timm_pool": "",  
   "timm_proj": "linear",  
   "timm_drop": 0.0,  
   "timm_drop_path": 0.1,  
   "image_size": 256  
   },  
   "text_cfg": {  
   "context_length": 77,  
   "vocab_size": 49408,  
   "width": 640,  
   "heads": 10,  
   "layers": 12  
   }  
   }  

 

View full answer

rsomani95 · 2023-10-16T20:32:09Z

rsomani95
Oct 16, 2023

@ChChCh8 this is doable already. look at any of the convnext_* models, they're all coming from timm. Here's an example cfg:

open_clip/src/open_clip/model_configs/convnext_base_w.json

Lines 1 to 19 in e7b39e4

    
           { 
        
               "embed_dim": 640, 
        
               "vision_cfg": { 
        
                   "timm_model_name": "convnext_base", 
        
                   "timm_model_pretrained": false, 
        
                   "timm_pool": "", 
        
                   "timm_proj": "linear", 
        
                   "timm_drop": 0.0, 
        
                   "timm_drop_path": 0.1, 
        
                   "image_size": 256 
        
               }, 
        
               "text_cfg": { 
        
                   "context_length": 77, 
        
                   "vocab_size": 49408, 
        
                   "width": 640, 
        
                   "heads": 10, 
        
                   "layers": 12 
        
               } 
        
           }

0 replies

rwightman · 2023-10-20T23:37:09Z

rwightman
Oct 20, 2023
Maintainer

further @rsomani95 ans, all of these require a bit of massaging, but it's not much code

any model that uses a timm vision encoder is quite easy to load back in timm, if model.visual.trunk is removed from state dict keys it matches the timm model
the builtin ViT model is pretty easy to load as well, there is some code in the vision_transformer.py file in timm to help load OpenCLIP (same as OpenAI) checkpoint keys, https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py#L903C6-L903C7
now, adding the zero-shot classifier, also not that difficult, if you use the fn here to build the classifier: https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/zero_shot_classifier.py#L21C23-L21C23 ... you can grab the weight (I think you need to transpose to load into an nn.Linear? I forget), init bias as 0 and then you can map that into the state dict as the head of a timm classification model, merging with state dict of either path above

0 replies

rwightman · 2023-10-20T23:41:00Z

rwightman
Oct 20, 2023
Maintainer

I will point out, I found it was better to fine-tune the headless model (at least with imagenet as the target) than to use a zero-shot head. Also, starting the fine-tune from the zero-shot head did not seem to help much, it was shorter but I found it difficult to get a better end result. However, that could change with smaller target datasets...

1 reply

chchch0109 Oct 21, 2023
Author

Thanks for your detailed reply! Yes I know it's better to fine-tune the headless model, but I need to keep the dimension of fine-tuned model the same as the zero-shot model, so I have to fine-tune with a zero-shot head.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use models with timm's arch #685

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

	{
	"embed_dim": 640,
	"vision_cfg": {
	"timm_model_name": "convnext_base",
	"timm_model_pretrained": false,
	"timm_pool": "",
	"timm_proj": "linear",
	"timm_drop": 0.0,
	"timm_drop_path": 0.1,
	"image_size": 256
	},
	"text_cfg": {
	"context_length": 77,
	"vocab_size": 49408,
	"width": 640,
	"heads": 10,
	"layers": 12
	}
	}

How to use models with timm's arch #685

chchch0109 Oct 8, 2023

Replies: 3 comments · 1 reply

rsomani95 Oct 16, 2023

rwightman Oct 20, 2023 Maintainer

rwightman Oct 20, 2023 Maintainer

chchch0109 Oct 21, 2023 Author

chchch0109
Oct 8, 2023

Replies: 3 comments 1 reply

rsomani95
Oct 16, 2023

rwightman
Oct 20, 2023
Maintainer

rwightman
Oct 20, 2023
Maintainer

chchch0109 Oct 21, 2023
Author