MeZo Forward Pass Implementation #601

thistleknot · 2023-06-20T04:19:36Z

Feature request

https://github.com/princeton-nlp/MeZO/blob/main/large_models/trainer.py

Motivation

Memory efficiency while training

Your contribution

Willing to test, train, and write-up bug reports.

younesbelkada · 2023-06-20T08:10:15Z

This implementation is very interesting IMO and fits the goals of PEFT library, anyone wants to give it a try?
cc @pacman100 @sayakpaul
It will be the first custom Trainer ever in PEFT, and the attached code seems quite clean. Happy to integrate it otherwise

sayakpaul · 2023-06-20T08:46:30Z

I would say it's better as a third-party showcase. I don't think it makes sense to add as a core feature of the library yet. Maybe, based on the adoption and community response, we could revisit this.

Bearnardd · 2023-06-21T08:14:58Z

Hi @younesbelkada @sayakpaul! I am also interested in this change. From my perspective there are 2 ways to implement this change. 1) It would be nice to add a showcase example how to combine peft with their trainer e.g. in the examples directory. One problem might be the fact that, if I am not mistaken you cannot simply install their code via e.g. pip and use as a package. You are forced to download their repo and use CLI which probably complicates things a bit. 2) Add a custom trainer as @younesbelkada mentioned.

pacman100 · 2023-06-21T14:01:53Z

Hello everyone, I have read the paper today. I concur with Sayak that before investing efforts into Custom Trainer for it we need to gauge interest and performance of MeZO.

I like the approach 1 suggested by @Bearnardd to start with.

Bearnardd · 2023-06-21T14:34:04Z

I completely agree with your thoughts, @pacman100. Nowadays, numerous new ideas and papers emerge daily, making it impractical to dedicate time to include each one without first assessing the interest from the community and the actual performance of a particular solution. Simple example is a pretty cheap alternative :)

github-actions · 2023-07-20T15:03:25Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

ghost · 2023-08-02T16:07:50Z

Aw, please implement MeZo guys, this could allow for near lossless compressed full fine tuned adaptors for true inference cost. Poor mans training would no longer have a in-superior connotation. Please @sayakpaul @pacman100 @Bearnardd ?

thistleknot · 2023-08-02T16:37:35Z

Yes please

thistleknot · 2023-08-18T05:15:56Z

I got this to work with bnb. train a 3b model with 5GB of ram using forward passes (effectively speeding up my training thanks to no need to do backward passes). I think it's worthwhile to integrate this, especially since it works alongside bnb .

thistleknot · 2023-08-19T17:12:35Z

from 90it/s to 60/it/s with Mezo

ghost · 2023-08-19T17:22:28Z

Nice! @thistleknot
When you say "bnb" do you mean 4bit?
When you say "from 90it/s to 60/it/s with Mezo" is 90 referring to single forward pass speeds not using mezo?

thistleknot · 2023-08-19T17:30:05Z

Yes and yes

…

On Sat, Aug 19, 2023, 10:22 AM Jeduh ***@***.***> wrote: Nice! @thistleknot <https://github.com/thistleknot> When you say "bnb" do you mean 4bit? When you say "from 90it/s to 60/it/s with Mezo" is 90 referring to single forward pass speeds not using mezo? — Reply to this email directly, view it on GitHub <#601 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHKKOSK5P3ON546KWG7UXTXWDY55ANCNFSM6AAAAAAZMWBPV4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ghost · 2023-08-19T17:35:35Z

@thistleknot would the finetuned model export as nf4? Would MeZo's predictions be made in nf4 too? How does that work?

thistleknot · 2023-08-19T21:53:48Z

yes and yes. the mezo function is simply a feed forward (I'm paraphrasing) compute_loss function, but it's plug and play with the lora setup with no qualms with integration that I see. I was trying to get it to work with gptq, but was unable to get a working setup, but I'm reading even if I did, it would still be limitied to a lora adapter.

So the best you can get with this without peft/lora, is simply faster training.

Shame relora isn't extended to anything other than llama.

ghost · 2023-08-20T13:27:57Z

Great work man @thistleknot

Shame indeed, relora has so much potential in distributing dense weights.

Just wondering, do you think MeZo could full finetune in fp8? I'm not even sure if the HF format allows fp8 safetensors. If so, 2x less memory would be nice with little to no loss. I've already got flash attention 2 working :) So close to full finetuning larger models on consumer hardware.

thistleknot · 2023-08-20T14:28:27Z

you can finetune with a lower fp, but ... you can't save the model! hence, stuck with lora. I tried auto-gptq, but couldn't get it to work, but I don't see any instructions on how to save a gptq model. Transformers doesn't support it yet, or if you know the process, I'm all ears, because I'd much rather prefer not to use a lora adapter, but it is what it is. I'm using layers of training to accommodate for the fact I have to use lora.

…

On Sun, Aug 20, 2023 at 6:28 AM Jeduh ***@***.***> wrote: Great work man @thistleknot <https://github.com/thistleknot> Shame indeed, relora has so much potential in distributing dense weights. Just wondering, do you think MeZo could full finetune in fp8? I'm not even sure if the HF format allows fp8 safetensors. If so, 2x less memory would be nice with little to no loss. I've already got flash attention 2 working :) So close to full finetuning larger models on consumer hardware. — Reply to this email directly, view it on GitHub <#601 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHKKOROJA4OUIUHFHHIK33XWIGGPANCNFSM6AAAAAAZMWBPV4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ghost · 2023-08-20T14:46:47Z

I think it would be quite tedious to change and save gtpq models on the fly.
You're right, it does seem like we can only train layers on top of frozen quants with LoRa or bit by bit for now. I see you pain @thistleknot and relate too. Respect that your still trying haha but all we need is a member of the team to take a glance.

ghost · 2023-08-20T14:58:27Z

@thistleknot

bigscience-workshop/petals#273

I was reminded this PR that managed to save 8bit models :) and learnt pytorch is working on e5m2 and e4m3 FP8 implementation which should be distributed better for transformers models.

thistleknot · 2023-08-20T16:57:22Z

got to wait for it to be merged. For now, I'm using a fancy layering setup. I was thinking I would have lora be optional (controlled by a parameter). So once I figure out the 'best' solution (I would prefer quantized weights, use mezo, save quantized weights... but that as you know isn't 'implemented' fully yet). In the meantime for sake of my sanity and testing, I'm simply using 1 document + 1 use case (squad) using lora (16GB p5200 on an m7730, best $600 I ever spent).

Then maybe one day when a magical better method like qlora or saving gptq models.

github-actions bot closed this as completed Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MeZo Forward Pass Implementation #601

MeZo Forward Pass Implementation #601

thistleknot commented Jun 20, 2023

younesbelkada commented Jun 20, 2023

sayakpaul commented Jun 20, 2023

Bearnardd commented Jun 21, 2023

pacman100 commented Jun 21, 2023

Bearnardd commented Jun 21, 2023 •

edited

Loading

github-actions bot commented Jul 20, 2023

ghost commented Aug 2, 2023 •

edited by ghost

Loading

thistleknot commented Aug 2, 2023

thistleknot commented Aug 18, 2023 •

edited

Loading

thistleknot commented Aug 19, 2023

ghost commented Aug 19, 2023

thistleknot commented Aug 19, 2023 via email

ghost commented Aug 19, 2023 •

edited by ghost

Loading

thistleknot commented Aug 19, 2023

ghost commented Aug 20, 2023

thistleknot commented Aug 20, 2023 via email

ghost commented Aug 20, 2023

ghost commented Aug 20, 2023 •

edited by ghost

Loading

thistleknot commented Aug 20, 2023

MeZo Forward Pass Implementation #601

MeZo Forward Pass Implementation #601

Comments

thistleknot commented Jun 20, 2023

Feature request

Motivation

Your contribution

younesbelkada commented Jun 20, 2023

sayakpaul commented Jun 20, 2023

Bearnardd commented Jun 21, 2023

pacman100 commented Jun 21, 2023

Bearnardd commented Jun 21, 2023 • edited Loading

github-actions bot commented Jul 20, 2023

ghost commented Aug 2, 2023 • edited by ghost Loading

thistleknot commented Aug 2, 2023

thistleknot commented Aug 18, 2023 • edited Loading

thistleknot commented Aug 19, 2023

ghost commented Aug 19, 2023

thistleknot commented Aug 19, 2023 via email

ghost commented Aug 19, 2023 • edited by ghost Loading

thistleknot commented Aug 19, 2023

ghost commented Aug 20, 2023

thistleknot commented Aug 20, 2023 via email

ghost commented Aug 20, 2023

ghost commented Aug 20, 2023 • edited by ghost Loading

thistleknot commented Aug 20, 2023

Bearnardd commented Jun 21, 2023 •

edited

Loading

ghost commented Aug 2, 2023 •

edited by ghost

Loading

thistleknot commented Aug 18, 2023 •

edited

Loading

ghost commented Aug 19, 2023 •

edited by ghost

Loading

ghost commented Aug 20, 2023 •

edited by ghost

Loading