Sales Data Question-Answering with GPT-J and ChromaDB

This project demonstrates a retrieval-augmented question-answering (QA) system for sales data stored in Excel files. The system combines ChromaDB for efficient document retrieval, HuggingFace embeddings for text encoding, and GPT-J-6B for natural language responses. This setup allows users to query sales data in plain English and receive accurate, context-aware answers.

Features

Load and preprocess Excel files for sales data analysis.
Create a vector store using ChromaDB for fast document retrieval.
Generate human-like responses with GPT-J-6B.
Modular and scalable Python architecture.
Example interactions to demonstrate capabilities.

Getting Started

1. Clone the Repository

git clone https://github.com/abdulvahapmutlu/sales-data-qa.git
cd sales-data-qa

2. Install Dependencies

Ensure you have Python 3.8+ installed. Then, install the required libraries:

pip install -r requirements.txt

3. Add Your Sales Data

Place your Excel files in the data/ directory.

Usage

Run the Application

Execute the following command to start the application:

python src/app.py

Ask Questions

You can interact with the chatbot by asking questions about your sales data, such as:

"What is the total sales in the sales data?"
"Can you provide a breakdown of sales by region?"

Components

Data Preprocessing

The data_loader.py script loads and preprocesses Excel files to create a unified text column for retrieval.

ChromaDB Integration

chromadb_manager.py manages the creation of vector stores and the addition of documents in batches.

Language Model Initialization

model_initializer.py initializes the GPT-J-6B model for generating responses.

Retrieval QA

retrieval_qa.py defines the question-answering chain using LangChain components.

Main Application

app.py orchestrates all components and provides a user-friendly interface.

License

This project is licensed under the MIT License.

Contributing

Feel free to submit pull requests or open issues for any suggestions or improvements.

Acknowledgments

ChromaDB: For vector storage.
HuggingFace: For providing state-of-the-art language models.
LangChain: For seamless integration of retrieval-augmented generation pipelines.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sales Data Question-Answering with GPT-J and ChromaDB

Features

Getting Started

1. Clone the Repository

2. Install Dependencies

3. Add Your Sales Data

Usage

Run the Application

Ask Questions

Components

Data Preprocessing

ChromaDB Integration

Language Model Initialization

Retrieval QA

Main Application

License

Contributing

Acknowledgments

About

Releases

Packages

Languages

License

abdulvahapmutlu/sales-data-qa

Folders and files

Latest commit

History

Repository files navigation

Sales Data Question-Answering with GPT-J and ChromaDB

Features

Getting Started

1. Clone the Repository

2. Install Dependencies

3. Add Your Sales Data

Usage

Run the Application

Ask Questions

Components

Data Preprocessing

ChromaDB Integration

Language Model Initialization

Retrieval QA

Main Application

License

Contributing

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages