Skip to content

Latest commit

 

History

History
277 lines (178 loc) · 8.92 KB

slides.md

File metadata and controls

277 lines (178 loc) · 8.92 KB
title author
Semantic Search and RAG on a FOSS stack
Robert Timm

bg right

Semantic Search and RAG on a FOSS stack

Wikimedia Hackathon Tallinn 2024

Robert Timm [email protected]


Semantic Search

Given a query, find texts with a meaning similar.

Retrieval Augmented Generation (RAG)

Create texts based on information loaded from external sources.

Free and Open Source Software (FOSS) Stack

All software components are released under OSI approved licenses.


Demo Time


GPU stacks

  • ⛔ NVIDIA CUDA has a proprietary license
  • ✅ AMD ROCm stack is MIT licensed
    • amdgpu driver in kernel mainline

Components for Semantic Search

  • Encode semantics ▶ Embeddings
  • Find semantically similar objects ▶ Vector Database

Embedding Models

Dims OSI License Pre Train Data Fine Tune data
all-MiniLM-L6-v2 384 ✅ Apache-2.0
nomic-embed-text-v1 768 ✅ Apache-2.0
bge-large-en-v1.5 1024 ✅ MIT
mxbai-embed-large-v1 1024 ✅ Apache-2.0

Embedding Inference

Get up and running with large language models.

  • Inference engine based on llama.cpp
  • Supports AMD GPU via ROCm
  • CPU support (AVX, AVX2, AVX512, Apple Silicon)
  • Quantization
  • Model Library
  • One model, one inference at a time

Embedding Inference

OSI License ROCm support Production
Ollama ✅ MIT
llama.cpp ✅ MIT
HF Text Embeddings Interface ✅ Apache-2.0
Infinity ✅ MIT

Embedding Implementation

Start ollama

$ ollama serve &
$ ollama pull nomic-embed-text-v1

Generate embedding

import ollama # pip install ollama

res = ollama.embeddings(
    model="nomic-embed-text-v1",
    prompt="This string")

res["embedding"] # [0.33, 0.62, 0.19, ...]

Vector Database

🐘 Postgres can do it 🎉

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres.

A PostgreSQL extension written in Rust.


Vector Database Licensing

OSI License
PostgreSQL ✅ PostgreSQL License
pgvector.rs ✅ Apache-2.0

Vector Database Implementation

Create PostgreSQL table using pgvecto.rs

CREATE EXTENSION vectors;

CREATE TABLE chunks (
  text TEXT NOT NULL,
  embedding VECTOR( 768 ) NOT NULL
);

Find most similar chunks

SELECT text FROM chunks
  ORDER BY embedding <-> [0.33, 0.62, 0.19, ...]
  LIMIT 5;

Components for Retrieval Augmented Generation (RAG)

  • Find matching sources ▶ Semantic Search
  • Generate Response ▶ Large Language Model (LLM) Inference

LLM Inference

  • LLMs too 😀 Actually its core use case
  • Great model library 📚

LLM Inference

OSI License ROCm Support Production
Ollama ✅ MIT
llama.cpp ✅ MIT
vllm ✅ Apache-2.0
HF Text Generation Interface ✅ Apache-2.0

LLM Inference Implementation

Generate a text based on a prompt

import ollama # pip install ollama

res = ollama.chat(
    model="zephyr:7b-beta",
    messages=[{"role": "user", "content": f"Summarize this text: {text}"}],
    stream=False,
)
res["message"]["content"] # "The given text..."

Large Language Model Building Blocks

  • Weights (binary)
  • Pre training (source)
  • Fine tuning data (source)
  • Training code (build scripts)

OSI - Open Source AI Initiative

  • Intends to define Open Source models
  • Defines which parts need to have OSD-compliant licenses
  • Draft, Release planned for October 2024
  • Latest draft (April 2024)
    • Marks training data sets as optional
    • But requires data characteristics, labeling procedures, etc.

LLMs with Openly Licensed Weights

OSI Weights PT Data FT Data Code
Mistral 0.2 7b ✅ Apache-2.0
HF Zephyr 7b beta ✅ MIT
Microsoft Phi-3 Mini 3.8b ✅ MIT
Apple ELM 3b ⛔❓ASCL
Meta Llama 3 8b ⛔ Custom
Google Gemma 1.1 7b ⛔ Custom

PT: pre-training - FT: fine tuning


Open Source LLM Projects


Conclusion

  • ✅ Almost all software components are available with OSI approved licenses
  • ✅ ROCm works and people are using it
  • ❓ Definition of open source models unclear
  • 🤔 Identifying truly open source models is complicated
  • ⏳ Interesting developments ongoing