Own implementation of Refine chain for generative QA from langchain library.
Technologies used:
- Llama2-13b
- ctranslate2
- chromadb
- HuggingFace
- langchain (some non-LLM components)
Ran on Nvidia RTX A6000 GPU
The inference speed is faster than the vanilla HuggingFace pipeline.
Custom prompts lead to better quality, then base langchain implementation
TODO:
- Scrape more data from the American Cancer Society website
- Try a bigger model e.g. llama2-70b
- Implement automatical detection of the GPUs count to parallelize the computation
- Perform further prompt engineering to fight hallucinations
- Try another embedding NN
- Experiment with context-aware chunking