You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I have a request to separate the two operations. Right now it prohibits from using a feature from deepspeed to access gradients in a model. See this issue: microsoft/DeepSpeed#5199 (comment).
My specific use case is fine-tuning a subset of weights in embedding and lm_head modules when initializing new tokens. I wanted to implement this logic in on_pre_optimizer_step callback in transformers library, which is right in between of backward and step.
There I want to use safe_get_full_grad from deepspeed to get the gradients. It has a condition that it should be called after backward and before step. But since accelerator.backward(usage link in transformers.trainer training loop) has these two operations merged (here) when used with deepspeed, using safe_get_full_grad is impossible.
Here is a small example to reproduce this issue. Please see the link attached earlier in deepspeed issues to see a discussion about it.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello. I have a request to separate the two operations. Right now it prohibits from using a feature from deepspeed to access gradients in a model. See this issue: microsoft/DeepSpeed#5199 (comment).
My specific use case is fine-tuning a subset of weights in embedding and lm_head modules when initializing new tokens. I wanted to implement this logic in on_pre_optimizer_step callback in
transformers
library, which is right in between ofbackward
andstep
.There I want to use safe_get_full_grad from deepspeed to get the gradients. It has a condition that it should be called after
backward
and beforestep
. But sinceaccelerator.backward
(usage link in transformers.trainer training loop) has these two operations merged (here) when used withdeepspeed
, usingsafe_get_full_grad
is impossible.Here is a small example to reproduce this issue. Please see the link attached earlier in deepspeed issues to see a discussion about it.
The text was updated successfully, but these errors were encountered: