Add support for model-parallel training #28

davmacario · 2024-04-07T23:51:49Z

The main limitation of LLMs is the huge model size, plus, during training, the required VRAM/RAM necessary to store the model + the backpropagation parameters are much higher than during inference.
As a result, it is possible to perform inference with GPT-2 XL on a single Nvidia GTX 1080 Ti (11 GB VRAM), but not training.

On a multi-GPU system, DistributedDataParallel does not solve the issue, as it still requires each device to fit the whole model, as only data is parallelized.

To use model parallelism, a possible working approach is to create a new class (ModelParallelGPT) that inherits from the original model class (GPT), but assigns a different piece of model to a different device on the host.
This trivial partition is far less efficient than MDI at the inference stage (as it does not allow for pipelining - only one GPU at a time is active), but it is the only way to train the model.

Another possible approach would be to check out PiPPy 👀.

The text was updated successfully, but these errors were encountered:

davmacario · 2024-04-07T23:52:12Z

See model-parallel branch.

davmacario added enhancement New feature or request feature request Ask for a new feature to be implemented extras Not directly related to the thesis, low priority labels Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for model-parallel training #28

Add support for model-parallel training #28

davmacario commented Apr 7, 2024

davmacario commented Apr 7, 2024

Add support for model-parallel training #28

Add support for model-parallel training #28

Comments

davmacario commented Apr 7, 2024

davmacario commented Apr 7, 2024