This project demonstrates a retrieval-augmented question-answering (QA) system for sales data stored in Excel files. The system combines ChromaDB for efficient document retrieval, HuggingFace embeddings for text encoding, and GPT-J-6B for natural language responses. This setup allows users to query sales data in plain English and receive accurate, context-aware answers.
- Load and preprocess Excel files for sales data analysis.
- Create a vector store using ChromaDB for fast document retrieval.
- Generate human-like responses with GPT-J-6B.
- Modular and scalable Python architecture.
- Example interactions to demonstrate capabilities.
git clone https://github.com/abdulvahapmutlu/sales-data-qa.git
cd sales-data-qa
Ensure you have Python 3.8+ installed. Then, install the required libraries:
pip install -r requirements.txt
Place your Excel files in the data/
directory.
Execute the following command to start the application:
python src/app.py
You can interact with the chatbot by asking questions about your sales data, such as:
- "What is the total sales in the sales data?"
- "Can you provide a breakdown of sales by region?"
- The
data_loader.py
script loads and preprocesses Excel files to create a unified text column for retrieval.
chromadb_manager.py
manages the creation of vector stores and the addition of documents in batches.
model_initializer.py
initializes the GPT-J-6B model for generating responses.
retrieval_qa.py
defines the question-answering chain using LangChain components.
app.py
orchestrates all components and provides a user-friendly interface.
This project is licensed under the MIT License.
Feel free to submit pull requests or open issues for any suggestions or improvements.
- ChromaDB: For vector storage.
- HuggingFace: For providing state-of-the-art language models.
- LangChain: For seamless integration of retrieval-augmented generation pipelines.