Skip to content

Use RAG to make a chatbot for the drias-climat website

License

Notifications You must be signed in to change notification settings

meteofrance/rag_drias

Repository files navigation

RAG DRIAS

Our goal is to make a Retrieval Augmented Generation (RAG) on the DRIAS portal.

LLMs used in specialized fields may create hallucinations due to their lack of knowledge. RAG helps solve this problem by retrieving relevant documents from external knowledge bases.

homepage

Repository Structure

rag_drias
└─── docs
└─── rag_drias
│   └─── data.py          # text data management
│   └─── embedding.py     # wrapper for embedding models
│   └─── crawl.py         # website crawling tools
│   └─── settings.py      # settings (paths, model names,...)
└─── main.py              # Main python script

Documentation

Full code documentation of Rag_drias can be found here.

Install

  1. git clone https://github.com/meteofrance/rag_drias.git

  2. Build conda environment:

    conda env create --file environment.yaml
    conda activate ragdrias
  1. Change BASE_PATH in rag_drias/settings.py. This is where all your data and models will be saved.

  2. Download manually the different models :

If needed, see install instructions for git-lfs.

If needed, setup your HugginFace access token. (needed for Llama3B).

    cd <BASE_PATH>
    git lfs install   # (should return `Git LFS initialized.`)
    git clone https://huggingface.co/dangvantuan/sentence-camembert-large
    git clone https://huggingface.co/jpacifico/Chocolatine-14B-Instruct-4k-DPO
    git clone https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct  # optionnal
    git clone https://huggingface.co/BAAI/bge-reranker-v2-m3 # optionnal

Usage

  1. Crawl the website:
python main.py crawl
  1. Prepare the vector database:
python main.py prepare-database
  1. Make a query and retrieve the most relevant samples:
python main.py query "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?"
  1. Make a query and retrieve the answer:
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?"

add reranker model :

python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?" --reranker bge-reranker-v2-m3
  1. To see what the LLM would answer without the retrieved chunks:
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?" --no-use-rag"

Use --help to see all available options in the main.py script.

About

Use RAG to make a chatbot for the drias-climat website

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published