Skip to content

Latest commit

 

History

History
106 lines (80 loc) · 5.73 KB

examples.md

File metadata and controls

106 lines (80 loc) · 5.73 KB

🎯 Example Use Cases

Example notebook Link
RAG pipeline with LLMs loaded from Hugging Face 📔
RAG pipeline with FiD generator 📔
RAG pipeline with REPLUG-based generator 📔
RAG pipeline with LLMs running on Gaudi2 📔
RAG pipeline with quantized LLMs running on ONNX-running backend 📔
RAG pipeline with LLMs running on Llama-CPP backend 📔
Optimized and quantized Embeddings models for retrieval and ranking 📔
RAG pipeline with PLAID index and ColBERT Ranker 📔
RAG pipeline with Qdrant index 📔
RAG pipeline for summarization of multiple documents 📔

Efficient Open Domain Question-Answering with Quantized Components

Generate answers to questions answerable by using a corpus of knowledge.

Retrieval with fast lexical retrieval with BM25 or late-interaction dense retrieval with PLAID
Ranking with Sentence Transformers or ColBERT. We also offer utilizing highly optimized quantized re-rankers for fast inference. See how to get your own here.
Generation with Fusion-in-Decoder

flowchart LR
    id1[(Elastic<br>/PLAID)] <--> id2(BM25<br>/ColBERT) --> id3(ST<br>/ColBERT) --> id4(FiD)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
Loading

📓 Efficient and fast ODQA with PLAID, ColBERT and FiD
📓 Quantized Retrievers and Rankers using bi-encoders

Retrieval Augmented Generation with a LLM

To enhance generations using a Large Language Model (LLM) with retrieval augmentation, you can follow these steps:

  1. Define a retrieval flow: This involves creating a store that holds the relevant information and one or more retrievers/rankers to retrieve the most relevant documents or passages.

  2. Define a prompt template: Design a template that includes a suitable context or instruction, along with placeholders for the query and information retrieved by the pipeline. These placeholders will be filled in dynamically during generation.

  3. Request token generation from the LLM: Utilize the prompt template and pass it to the LLM, allowing it to generate tokens based on the provided context, query, and retrieved information.

Most of Huggingface Decoder LLMs are supported.

See a complete example in our RAG with LLMs📓 notebook.

flowchart LR
    id1[(Index)] <-->id2(.. Retrieval pipeline ..) --> id3(Prompt Template) --> id4(LLM)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id3 fill:#F3CECC,stroke:#B25450
    style id4 fill:#D5E8D4,stroke:#82B366
Loading

Using REPLUG for Parallel Document Reading with LLMs

Using the algorithm introduced in REPLUG: Retrieval-Augmented Black-Box Language Models to read multiple documents in parallel to generate an answer for any question.

📓 Using REPLUG for Parallel Document Reading with LLMs

flowchart LR
    id1[(Index)] <--> id2(.. Retrieval pipeline ..) -- "in parallel" --> id4(Doc 1 ...\nDoc 2 ...\nDoc 3 ...)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
Loading

Fusion-in-Decoder

We enable the utilization of the FiD model, to read multiple documents in parallel, thus generating an answer with the fusion of the knowledge in all retrieved documents.

📓 Fusion-in-Decoder (FiD)

flowchart LR
    id1[(Index)] <--> id2(.. Retrieval pipeline ..) --> id4(FiD)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
Loading

Open Domain Summarization

Summarize topics given free-text input and a corpus of knowledge. Retrieval with BM25 or other retrievers
Ranking with Sentence Transformers or other rankers
Generation Using "summarize: " prompt, all documents concatenated and FLAN-T5 generative model

📓 Open Domain Summarization

flowchart LR
    id1[(Elastic)] <--> id2(BM25) --> id3(SentenceTransformer) -- summarize--> id4(FLAN-T5)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
Loading