Suggestion: merge-only option #66

psimm · 2024-06-23T17:10:45Z

While attempting to fine-tune llama-3 70b I ran into an OOM issue in the merge step. This left me in the situation where the LoRA adapter was trained and saved on the Modal volume but not merged yet, so I couldn't use the inference script. I don't want to run the training again. A similar situation occurs when no-merge-lora is used during training.

How about a new argument to launch in train.py called merge_only that allows merging the adapter of a resumed run? This setting would only make sense if a run_to_resume is passed.

Alternatively, an explanation in the readme of how to call the merge function separately from train.py could help others in the situation.

This worked for me:

merge.py

from .common import app
from .train import merge


@app.local_entrypoint()
def run_merge(run_folder, output_dir):
    merge.remote(run_folder, output_dir)

and to run

modal run src.merge --run-folder /runs/<run-name> --output-dir lora-out

On the merge OOM error: solved by setting CUDA_VISIBLE_DEVICES = "" so it merges in system RAM

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: merge-only option #66

Suggestion: merge-only option #66

psimm commented Jun 23, 2024 •

edited

Loading

Suggestion: merge-only option #66

Suggestion: merge-only option #66

Comments

psimm commented Jun 23, 2024 • edited Loading

psimm commented Jun 23, 2024 •

edited

Loading