Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: merge-only option #66

Open
psimm opened this issue Jun 23, 2024 · 0 comments
Open

Suggestion: merge-only option #66

psimm opened this issue Jun 23, 2024 · 0 comments

Comments

@psimm
Copy link

psimm commented Jun 23, 2024

While attempting to fine-tune llama-3 70b I ran into an OOM issue in the merge step. This left me in the situation where the LoRA adapter was trained and saved on the Modal volume but not merged yet, so I couldn't use the inference script. I don't want to run the training again. A similar situation occurs when no-merge-lora is used during training.

How about a new argument to launch in train.py called merge_only that allows merging the adapter of a resumed run? This setting would only make sense if a run_to_resume is passed.

Alternatively, an explanation in the readme of how to call the merge function separately from train.py could help others in the situation.

This worked for me:

merge.py

from .common import app
from .train import merge


@app.local_entrypoint()
def run_merge(run_folder, output_dir):
    merge.remote(run_folder, output_dir)

and to run

modal run src.merge --run-folder /runs/<run-name> --output-dir lora-out

On the merge OOM error: solved by setting CUDA_VISIBLE_DEVICES = "" so it merges in system RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant