Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify_load_save_model #6626

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

modify_load_save_model #6626

wants to merge 2 commits into from

Conversation

ssklzx
Copy link

@ssklzx ssklzx commented Oct 15, 2024

accelerator = Accelerator()
model, optimizer, data = accelerator.prepare(model, optimizer, data)
device_map = {}
model = accelerate.dispatch_model(model, device_map=device_map)
accelerator.save_state(save_path)

When I use accelerate.dispatch_model after accelerator.prepare, there will be an error when saving the model

@tjruwase
Copy link
Contributor

@ssklzx, thanks for creating this PR. However, I think you misunderstood my response
#6620 (comment).

What I meant is that we need to debug further to understand why some parameters are missing from self.param_names. Are you able to provide a full repro?

@ssklzx
Copy link
Author

ssklzx commented Oct 16, 2024

@ssklzx, thanks for creating this PR. However, I think you misunderstood my response #6620 (comment).

What I meant is that we need to debug further to understand why some parameters are missing from self.param_names. Are you able to provide a full repro?

Because after the initialization of 'self. param_name', I will change the position of the parameters, such as moving them from 'CUDA: 0' to 'CUDA: 1', so I will not be able to find these transferred parameters

for example:
model, optimizer, data = accelerator.prepare(model, optimizer, data). # initialization 'self. param_name'
model = accelerate.dispatch_model(model, device_map=device_map) # change parameter position
accelerator.save_state(save_path) # report errors

@tjruwase
Copy link
Contributor

Because after the initialization of 'self. param_name', I will change the position of the parameters, such as moving them from 'CUDA: 0' to 'CUDA: 1', so I will not be able to find these transferred parameters

@ssklzx, thanks for the clarification. I think the correct solution here is for accelerate and DeepSpeed to coordinate to ensure that DeepSpeed is aware of new parameter locations, including updating self.param_names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants