A benchmarked document QA system for PDF and DOCX files that combines hybrid retrieval, document structure intelligence, reranking, and citation-aware grounded answers.
Problem
Long documents such as policies, theses, and research papers are difficult to query reliably. Keyword search misses semantic meaning, while naive LLM chat over documents often produces ungrounded answers without clear evidence. Broad summary questions are especially hard because relevant information is spread across multiple sections rather than one exact chunk.
Solution
HelpmateAI is a grounded long-document QA system that indexes PDF and DOCX files, builds persistent local retrieval artifacts, and answers user questions with visible evidence and citations. The system combines dense retrieval, lexical search, fusion, optional reranking, deterministic retrieval planning, and bounded answer generation so responses stay tied to source material rather than model memory.
How it works
1. Document Ingestion
Uploads PDF and DOCX files and extracts structured content, including page labels, section headings, clause IDs where possible, section paths, section kinds, and document-style hints.
2. Structure & Index Build
Builds metadata-rich chunks, section records, deterministic section synopses, and lightweight topology artifacts. The index is schema-versioned and keyed by document fingerprint so documents can be reused without unnecessary rebuilds.
3. Retrieval Planning
Analyzes the user’s question and produces a deterministic retrieval plan. Depending on query shape, the system can choose between chunk, synopsis, summary, section or hybrid. A lightweight LLM route refinement is only used when deterministic confidence is low.
4. Hybrid Retrieval
Runs dense retrieval, TF-IDF lexical retrieval, fusion, metadata-aware ranking, and optional reranking. It also uses soft structural guidance and global fallback so broader or distributed questions do not collapse into narrow evidence misses.
Challenges
- Handling noisy academic and journal PDFs with weak section structure
- Making retrieval behavior more measurable and benchmarkable rather than relying on intuition
- Balancing architecture complexity, latency, and retrieval gains through ablations and threshold calibration
What I learned
This project pushed me beyond building a simple RAG demo. I learned how retrieval quality depends on structure, routing, evidence selection, and evaluation not just embeddings or prompting. I also learned to treat benchmarking as part of the architecture itself, using retrieval metrics, abstention checks, baseline comparisons, and ragas to justify design decisions.
Tech Stack
- Next.js, FastAPI
- Vector Store: ChromaDB
- LLM / Evaluation: OpenAI, ragas
- Deployment path: Framer for marketing, Next.js for app, FastAPI for API, with Supabase and Chroma cloud persistence