title | author |
---|---|
Semantic Search and RAG on a FOSS stack |
Robert Timm |
Wikimedia Hackathon Tallinn 2024
Robert Timm [email protected]
Given a query, find texts with a meaning similar.
Create texts based on information loaded from external sources.
All software components are released under OSI approved licenses.
- ⛔ NVIDIA CUDA has a proprietary license
- ✅ AMD ROCm stack is MIT licensed
amdgpu
driver in kernel mainline
- Encode semantics ▶ Embeddings
- Find semantically similar objects ▶ Vector Database
Dims | OSI License | Pre Train Data | Fine Tune data | |
---|---|---|---|---|
all-MiniLM-L6-v2 | 384 | ✅ Apache-2.0 | ✅ | ✅ |
nomic-embed-text-v1 | 768 | ✅ Apache-2.0 | ✅ | ✅ |
bge-large-en-v1.5 | 1024 | ✅ MIT | ⛔ | ⛔ |
mxbai-embed-large-v1 | 1024 | ✅ Apache-2.0 | ⛔ | ⛔ |
Get up and running with large language models.
- Inference engine based on llama.cpp
- Supports AMD GPU via ROCm
- CPU support (AVX, AVX2, AVX512, Apple Silicon)
- Quantization
- Model Library
- One model, one inference at a time
OSI License | ROCm support | Production | |
---|---|---|---|
Ollama | ✅ MIT | ✅ | ⛔ |
llama.cpp | ✅ MIT | ✅ | ⛔ |
HF Text Embeddings Interface | ✅ Apache-2.0 | ⛔ | ✅ |
Infinity | ✅ MIT | ✅ | ✅ |
$ ollama serve &
$ ollama pull nomic-embed-text-v1
import ollama # pip install ollama
res = ollama.embeddings(
model="nomic-embed-text-v1",
prompt="This string")
res["embedding"] # [0.33, 0.62, 0.19, ...]
🐘 Postgres can do it 🎉
Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres.
A PostgreSQL extension written in Rust.
OSI License | |
---|---|
PostgreSQL | ✅ PostgreSQL License |
pgvector.rs | ✅ Apache-2.0 |
CREATE EXTENSION vectors;
CREATE TABLE chunks (
text TEXT NOT NULL,
embedding VECTOR( 768 ) NOT NULL
);
SELECT text FROM chunks
ORDER BY embedding <-> [0.33, 0.62, 0.19, ...]
LIMIT 5;
- Find matching sources ▶ Semantic Search
- Generate Response ▶ Large Language Model (LLM) Inference
- LLMs too 😀 Actually its core use case
- Great model library 📚
OSI License | ROCm Support | Production | |
---|---|---|---|
Ollama | ✅ MIT | ✅ | ⛔ |
llama.cpp | ✅ MIT | ✅ | ⛔ |
vllm | ✅ Apache-2.0 | ✅ | ✅ |
HF Text Generation Interface | ✅ Apache-2.0 | ✅ | ✅ |
Generate a text based on a prompt
import ollama # pip install ollama
res = ollama.chat(
model="zephyr:7b-beta",
messages=[{"role": "user", "content": f"Summarize this text: {text}"}],
stream=False,
)
res["message"]["content"] # "The given text..."
- Weights (binary)
- Pre training (source)
- Fine tuning data (source)
- Training code (build scripts)
- Intends to define Open Source models
- Defines which parts need to have OSD-compliant licenses
- Draft, Release planned for October 2024
- Latest draft (April 2024)
- Marks training data sets as optional
- But requires data characteristics, labeling procedures, etc.
OSI Weights | PT Data | FT Data | Code | |
---|---|---|---|---|
Mistral 0.2 7b | ✅ Apache-2.0 | ⛔ | ⛔ | ⛔ |
HF Zephyr 7b beta | ✅ MIT | ⛔ | ✅ | ✅ |
Microsoft Phi-3 Mini 3.8b | ✅ MIT | ⛔ | ⛔ | ⛔ |
Apple ELM 3b | ⛔❓ASCL | ✅ | ✅ | ✅ |
Meta Llama 3 8b | ⛔ Custom | ⛔ | ⛔ | ⛔ |
Google Gemma 1.1 7b | ⛔ Custom | ⛔ | ⛔ | ⛔ |
PT: pre-training - FT: fine tuning
- Bloom (2022), BigScience RAIL License v1.0, not SOTA, not OSD
- Allan AI OLMo based on the Dolma dataset
- LumiOpen Viking built on Lumi Supercomputer
- HuggingFace StarChat2 focussed on code
- OpenGPT-X EU funded
- ✅ Almost all software components are available with OSI approved licenses
- ✅ ROCm works and people are using it
- ❓ Definition of open source models unclear
- 🤔 Identifying truly open source models is complicated
- ⏳ Interesting developments ongoing