Our goal is to make a Retrieval Augmented Generation (RAG) on the DRIAS portal.
LLMs used in specialized fields may create hallucinations due to their lack of knowledge. RAG helps solve this problem by retrieving relevant documents from external knowledge bases.
rag_drias
└─── docs
└─── rag_drias
│ └─── data.py # text data management
│ └─── embedding.py # wrapper for embedding models
│ └─── crawl.py # website crawling tools
│ └─── settings.py # settings (paths, model names,...)
└─── main.py # Main python script
Full code documentation of Rag_drias can be found here.
-
git clone https://github.com/meteofrance/rag_drias.git
-
Build conda environment:
conda env create --file environment.yaml
conda activate ragdrias
-
Change
BASE_PATH
inrag_drias/settings.py
. This is where all your data and models will be saved. -
Download manually the different models :
If needed, see install instructions for git-lfs.
If needed, setup your HugginFace access token. (needed for Llama3B).
cd <BASE_PATH>
git lfs install # (should return `Git LFS initialized.`)
git clone https://huggingface.co/dangvantuan/sentence-camembert-large
git clone https://huggingface.co/jpacifico/Chocolatine-14B-Instruct-4k-DPO
git clone https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct # optionnal
git clone https://huggingface.co/BAAI/bge-reranker-v2-m3 # optionnal
- Crawl the website:
python main.py crawl
- Prepare the vector database:
python main.py prepare-database
- Make a query and retrieve the most relevant samples:
python main.py query "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?"
- Make a query and retrieve the answer:
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?"
add reranker model :
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?" --reranker bge-reranker-v2-m3
- To see what the LLM would answer without the retrieved chunks:
python main.py answer "Quels formats de données sont disponibles pour le téléchargement sur DRIAS ?" --no-use-rag"
Use --help
to see all available options in the main.py
script.