Replies: 3 comments 4 replies
-
It's in there, just try it out. What is does is it computes the norm over all parameters and if they are above the threshold it scales them down. you can think of it as some kind of dynamic learning rate: whenever the model gets excited and want to change something, learning rate is scaled down. This prevents that the model navigates through an unstable loss landscape. A max grad norm of 0.01 sounds extremely low to me (as the norm is, as far as I understand, applied BEFORE the learning rate). So if learning rate is, say, 1e-4 and max_grad_norm is 1e-2 then the maximum a parameter could be changed per step is 1e-6 (realistically, it's more like 1e-8). For me it sounds like nothing happens anymore (in particular if parameters are low precision bf16). On the other hand: I absolutely agree that Flux training is extremely unstable. So maybe it helps, who knows. If you try it, please share your results/experiences! |
Beta Was this translation helpful? Give feedback.
-
i changed the default loss style for the flow-matching models to match Huawei's method, which overlaps with minRF and x-flux implementations. if you could, i would suggest redoing a run with the same settings, plus ^ this change. |
Beta Was this translation helpful? Give feedback.
-
The run with I'll share my config just for funsies
We might want to mark this thread as nsfw if you want to see any validation images, or I can post them somewhere else. I've been documenting some of the progress on Civitai (NSFW) but the images are very nightmare. I will say that it got to basically the exact same validation images with (rank-8 and 9k steps) as (rank-4 and 3k steps) so I suppose I will continue with the lower rank. It's just that it makes the shape of the appendage with no further detail being added, but the added flag does seem to avoid the ball of limbs from what I can see at my limited high-step count tests. I need to slow down though, I've spent a few hundred so far on running all these tests so I might just wait for someone else to bankroll a good LoRA. Godspeed. |
Beta Was this translation helpful? Give feedback.
-
Hello, I have been training Flux LoRAs for a few days and I came across a discussion about using the flag
--max_grad_norm=0.01
for stabilizing training with some models. Is there a layman's explanation of what it does?For some context, I'm trying to train new anatomical features in Flux (i.e. extra limbs/appendages) and while they are appearing in the validation to a certain extent, they are always small and deformed. I was curious if this stabilization could be applied to Flux as well or if it would simply prevent the model from learning. At rank 4, 8, and 16 up to 10k steps without this arg, the results are simply not very close, and past that I just get a ball of limbs. So perhaps this is just not possible. I would imagine gradient clipping (?) would make the training take longer but I am not very well versed in the technical details.
Any help or guidance is appreciated :)
Beta Was this translation helpful? Give feedback.
All reactions