Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
documents.json		documents.json
minsearch.py		minsearch.py
open-ai-alternatives.md		open-ai-alternatives.md
parse-faq.ipynb		parse-faq.ipynb
rag-intro.ipynb		rag-intro.ipynb

README.md

Module 1: Introduction

In this module, we will learn what LLM and RAG are and implement a simple RAG pipeline to answer questions about the FAQ Documents from our Zoomcamp courses

What we will do:

Index Zoomcamp FAQ documents
- DE Zoomcamp: https://docs.google.com/document/d/19bnYs80DwuUimHM65UV3sylsCn2j1vziPOwzBwQrebw/edit
- ML Zoomcamp: https://docs.google.com/document/d/1LpPanc33QJJ6BSsyxVg-pWNMplal84TdZtq10naIhD8/edit
- MLOps Zoomcamp: https://docs.google.com/document/d/12TlBfhIiKtyBv8RnsoJR6F72bkPDGEvPOItJIxaEzE0/edit
Create a Q&A system for answering questions about these documents

1.1 Introduction to LLM and RAG

LLM
RAG
RAG architecture
Course outcome

1.2 Preparing the Environment

Installing libraries
Alternative: installing anaconda or miniconda

pip install tqdm notebook==7.1.2 openai elasticsearch pandas scikit-learn

1.3 Retrieval

We will use the search engine we build in the build-your-own-search-engine workshop: minsearch
Indexing the documents
Peforming the search

1.4 Generation with OpenAI

Invoking OpenAI API
Building the prompt
Getting the answer

If you don't want to use a service, you can run an LLM locally refer to module 2 for more details.

In particular, check "2.7 Ollama - Running LLMs on a CPU" - it can work with OpenAI API, so to make the example from 1.4 work locally, you only need to change a few lines of code.

1.4.2 OpenAI API Alternatives

Open AI Alternatives

1.5 Cleaned RAG flow

Cleaning the code we wrote so far
Making it modular

1.6 Searching with ElasticSearch

Run ElasticSearch with Docker
Index the documents
Replace MinSearch with ElasticSearch

Running ElasticSearch:

docker run -it \
    --rm \
    --name elasticsearch \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.4.3

Index settings:

{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

Query:

{
    "size": 5,
    "query": {
        "bool": {
            "must": {
                "multi_match": {
                    "query": query,
                    "fields": ["question^3", "text", "section"],
                    "type": "best_fields"
                }
            },
            "filter": {
                "term": {
                    "course": "data-engineering-zoomcamp"
                }
            }
        }
    }
}

Notes

Replace it with a link
Did you take notes? Add them above this line (Send a PR with links to your notes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01-intro

01-intro

README.md

Module 1: Introduction

1.1 Introduction to LLM and RAG

1.2 Preparing the Environment

1.3 Retrieval

1.4 Generation with OpenAI

1.4.2 OpenAI API Alternatives

1.5 Cleaned RAG flow

1.6 Searching with ElasticSearch

Notes

Files

01-intro

Directory actions

More options

Directory actions

More options

Latest commit

History

01-intro

Folders and files

parent directory

README.md

Module 1: Introduction

1.1 Introduction to LLM and RAG

1.2 Preparing the Environment

1.3 Retrieval

1.4 Generation with OpenAI

1.4.2 OpenAI API Alternatives

1.5 Cleaned RAG flow

1.6 Searching with ElasticSearch

Notes