Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for model-parallel training #28

Open
davmacario opened this issue Apr 7, 2024 · 1 comment
Open

Add support for model-parallel training #28

davmacario opened this issue Apr 7, 2024 · 1 comment
Labels
enhancement New feature or request extras Not directly related to the thesis, low priority feature request Ask for a new feature to be implemented

Comments

@davmacario
Copy link
Owner

The main limitation of LLMs is the huge model size, plus, during training, the required VRAM/RAM necessary to store the model + the backpropagation parameters are much higher than during inference.
As a result, it is possible to perform inference with GPT-2 XL on a single Nvidia GTX 1080 Ti (11 GB VRAM), but not training.

On a multi-GPU system, DistributedDataParallel does not solve the issue, as it still requires each device to fit the whole model, as only data is parallelized.

To use model parallelism, a possible working approach is to create a new class (ModelParallelGPT) that inherits from the original model class (GPT), but assigns a different piece of model to a different device on the host.
This trivial partition is far less efficient than MDI at the inference stage (as it does not allow for pipelining - only one GPU at a time is active), but it is the only way to train the model.

Another possible approach would be to check out PiPPy 👀.

@davmacario davmacario added enhancement New feature or request feature request Ask for a new feature to be implemented extras Not directly related to the thesis, low priority labels Apr 7, 2024
@davmacario
Copy link
Owner Author

See model-parallel branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request extras Not directly related to the thesis, low priority feature request Ask for a new feature to be implemented
Projects
None yet
Development

No branches or pull requests

1 participant