Separate `backward` and `step` in `deepspeed` wrapper #3303

ummagumm-a · 2024-12-18T13:43:23Z

Hello. I have a request to separate the two operations. Right now it prohibits from using a feature from deepspeed to access gradients in a model. See this issue: microsoft/DeepSpeed#5199 (comment).

My specific use case is fine-tuning a subset of weights in embedding and lm_head modules when initializing new tokens. I wanted to implement this logic in on_pre_optimizer_step callback in transformers library, which is right in between of backward and step.

There I want to use safe_get_full_grad from deepspeed to get the gradients. It has a condition that it should be called after backward and before step. But since accelerator.backward (usage link in transformers.trainer training loop) has these two operations merged (here) when used with deepspeed, using safe_get_full_grad is impossible.

Here is a small example to reproduce this issue. Please see the link attached earlier in deepspeed issues to see a discussion about it.

The text was updated successfully, but these errors were encountered:

ummagumm-a · 2024-12-18T13:44:10Z

@tjruwase fyi

github-actions · 2025-01-17T15:06:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate `backward` and `step` in `deepspeed` wrapper #3303

Separate `backward` and `step` in `deepspeed` wrapper #3303

ummagumm-a commented Dec 18, 2024

ummagumm-a commented Dec 18, 2024

github-actions bot commented Jan 17, 2025

Separate backward and step in deepspeed wrapper #3303

Separate backward and step in deepspeed wrapper #3303

Comments

ummagumm-a commented Dec 18, 2024

ummagumm-a commented Dec 18, 2024

github-actions bot commented Jan 17, 2025

Separate `backward` and `step` in `deepspeed` wrapper #3303

Separate `backward` and `step` in `deepspeed` wrapper #3303