Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Need LoRA model in .gguf format #243

Open
bioinformatist opened this issue Sep 20, 2024 · 2 comments
Open

[Feature Request]: Need LoRA model in .gguf format #243

bioinformatist opened this issue Sep 20, 2024 · 2 comments
Labels
feature New features

Comments

@bioinformatist
Copy link

bioinformatist commented Sep 20, 2024

Feature request / 功能建议

Similar to #231, but useful

Hey my dear bros, we're building an RAG application (especially for one of our products) using MiniCPM3. Below is our stack:

Type Component
LLM MiniCPM3
Web server Shuttle | Axum
OpenAI-compatible API server llama.cpp
Vector database qdrant

It's almost done.

As MiniCPM3 comes with an RAG suite, we'd like to use the LoRA adapter for better performance, just like:

# Suppose we already have downloaded MiniCPM3-4B and MiniCPM3-RAG-LoRA-GGUF models in current directory
docker run --rm -it -p 8080:8080 -v $PWD/MiniCPM3-4B-GGUF:/models -v $PWD/MiniCPM3-RAG-LoRA-GGUF:/lora --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/minicpm3-4b-q4_k_m.gguf --host 0.0.0.0 --port 8080 --n-gpu-layers 99 -v -ub 1024 -b 4096 --lora lora/lora-adapter-fp16.gguf

And the LoRA model cannot be converted to .gguf format now as the ggerganov/llama.cpp#9396 haven't be merged:

# As ditto
docker run -it --rm --entrypoint /app/convert_lora_to_gguf.py -v $PWD/MiniCPM3-4B:/models -v $PWD/MiniCPM3-RAG-LoRA:/lora ghcr.io/ggerganov/llama.cpp:full --outtype q8_0 --base /models /lora

It said:

The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Or could you give us some tips for converting? Thanks a lot!

MiniCPM3 is, de facto, an ideal edge-side LLM for small companies.

@bioinformatist bioinformatist added the feature New features label Sep 20, 2024
@LDLINGLINGLING
Copy link
Contributor

Hello, I think the best solution at present is to merge the original weights of lora and minicpm3, and then start your process

@bioinformatist
Copy link
Author

Got. Let me have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features
Projects
None yet
Development

No branches or pull requests

2 participants