GitHub - mswornavidhya/HelpMateAI_RAG_GenAI: This project aims to build an intelligent semantic search system, incorporating optimized PDF document processing, strategic vector database searches with cache implementation, and coherent answer generation for efficient information extraction from policy documents.

Semantic Search System for Policy Documents

Overview: This project focuses on creating a semantic search system tailored for policy documents, integrating advanced techniques for document processing, vector embeddings, and coherent answer generation. The system comprises three key layers: embedding, searching, and generation, each optimized for enhanced performance.

Project Structure:

Embedding Layer: Explore various PDF document processing and chunking strategies. Choose between OpenAI's embedding model or SentenceTransformers for vector representations.
Search Layer: Design three diverse queries reflecting potential user questions in policy documents. Implement vector database searches against ChromaDB, incorporating a cache mechanism. Enhance search results with a re-ranking block using cross-encoding models from HuggingFace.

Rerank after cross encoding provided

Generation Layer: Design an exhaustive and instructive prompt for the Language Model (LM) to ensure coherent answer generation.Provide a few-shot example in the prompt to improve LM output.

Performance Evaluation: Thoroughly assess the impact of different strategies, models, and components on system performance.Consider scalability by addressing potential increases in document numbers or user queries. Getting Started:

Install the required libraries: pip install pdfplumber tiktoken openai chromaDB sentence-transformers -q.
Mount Google Drive: from google.colab import drive; drive.mount('/content/drive', force_remount=True).
Follow step-by-step instructions in the codebase to implement the Embedding, Search, and Generation layers.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Final Generated Answer from the Generation Layer - Q1.png		Final Generated Answer from the Generation Layer - Q1.png
Final Generated Answer from the Generation Layer - Q2.png		Final Generated Answer from the Generation Layer - Q2.png
Final Generated Answer from the Generation Layer - Q3.png		Final Generated Answer from the Generation Layer - Q3.png
Generative_Search_with_SentenceTransformers_and_Chroma.ipynb		Generative_Search_with_SentenceTransformers_and_Chroma.ipynb
Project Report.docx		Project Report.docx
README.md		README.md
Screenshot.docx		Screenshot.docx
Top 3 Results from the Search Layer - Q1.png		Top 3 Results from the Search Layer - Q1.png
Top 3 Results from the Search Layer - Q2.png		Top 3 Results from the Search Layer - Q2.png
Top 3 Results from the Search Layer - Q3.png		Top 3 Results from the Search Layer - Q3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mswornavidhya/HelpMateAI_RAG_GenAI

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages