Leander Antony | AI Engineer, RAG & Agentic Systems

A benchmarked document QA system for PDF and DOCX files that combines hybrid retrieval, document structure intelligence, reranking, and citation-aware grounded answers. On a 150 question evaluation held-out question suite, it produced zero false-support claims and correctly abstained on every unanswerable question. A 6–20% reduced hallucination when compared to OpenAI File Search and Vectara.

Architecture

Problem

Long documents such as policies, theses, and research papers are difficult to query reliably. Keyword search misses semantic meaning, while naive LLM chat over documents often produces ungrounded answers without clear evidence. Broad summary questions are especially hard because relevant information is spread across multiple sections rather than one exact chunk.

Solution

HelpmateAI is a grounded long-document QA system that indexes PDF and DOCX files, builds persistent local retrieval artifacts, and answers user questions with visible evidence and citations. It combines dense and lexical retrieval, fusion, reranking, an LLM retrieval orchestrator for explicit scope, indexing-time chunk semantics and document landmarks, and a strict support verifier, so responses stay tied to source material reducing hallucinations.

Workspace

How it works

1. Document Ingestion

Uploads PDF and DOCX files and extracts structured content via pypdf, python-docx, and a selective pdfplumber pass for table-heavy pages. Captures page labels, section headings, clause IDs, section paths, section kinds, and document style hints. An indexing time chunk-semantics layer classifies suspicious candidates as metadata, definition, or table evidence (or noise), and a document landmarks pass identifies title pages, forewords, abstracts, executive summaries, glossaries, and volume boundaries.

2. Structure & Index Build

Builds metadata rich chunks, section records, deterministic section synopses, and lightweight topology artifacts. Sections are enriched with document aware profile metadata - chapter numbers, section roles, page ranges and scope aliases so locally scoped questions can stay inside the requested chapter. The index is schema versioned and keyed by document fingerprint so documents can be reused without unnecessary rebuilds.

3. Retrieval Planning

Analyzes the question and produces a retrieval plan. An LLM retrieval orchestrator runs first on a compact document map and can resolve explicit local scope to validated section IDs; deterministic code then enforces safety checks and routes through chunk, synopsis, summary, section, or hybrid retrieval. Low-confidence orchestrator output is ignored, the deterministic layer never trusts unbounded LLM scope.

4. Hybrid Retrieval

Runs dense retrieval, TF-IDF lexical retrieval, fusion, metadata-aware ranking, and optional reranking. It also uses soft structural guidance and global fallback so broader or distributed questions do not collapse into narrow evidence misses.

5. Evidence Selection & Answering

5. Evidence Selection

Grades evidence as strong, weak, or unsupported. For weak middle band cases, retrieval can adapt without model based query rewriting. A spread triggered, reorder only evidence selector promotes stronger evidence among the top candidates without pruning support. Final answers carry citations, retrieval notes, and explicit support status. A strict support verifier can recover a refused answer to full support only when grounded facts are visible and no missing facts or hedging language remain. The mechanism behind the zero false support eval result.

Evaluation

Metrics

HelpmateAI

OpenAI File Search

Vectara

False Support Rate

6.7%

20%

Unsupported abstention

100%

93%

80%

Strict fully supported rate

90%

94%

96%

RAGAS faithfulness

92%

96%

72%

Metrics

HelpmateAI

OpenAI

Vectara

False Support Rate

6.7%

20%

Unsupported abstention

100%

93%

80%

Strict fully supported rate

90%

94%

96%

RAGAS faithfulness

92%

96%

72%

A 150 question held out suite across NIST AI RMF, an arXiv climate ML paper, a public UPenn thesis, FOMC minutes, and the IRENA World Energy Transitions Outlook. Vendors run in their native answer modes.

False support rate

Helpmate AI

OpenAI File Search

Vectara

20%

Unsupported abstention

Helpmate AI

100%

OpenAI File Search

93%

Vectara

80%

Fully supported rate

Helpmate AI

90%

OpenAI File Search

94%

Vectara

96%

RAGAS Faithfullness

Helpmate AI

92%

OpenAI File Search

96%

Vectara

72%

Challenges

- Improving broad summary and synthesis-style questions without hurting strong factual retrieval

- Improving broad summary and synthesis style questions without hurting strong factual retrieval

- Handling noisy academic and journal PDFs with weak section structure

- Making retrieval behavior more measurable and benchmarkable rather than relying on intuition

- Removing model-based query rewriting in favor of simpler, more predictable deterministic recovery and guardrails

- Removing model based query rewriting in favor of simpler, more predictable deterministic recovery and guardrails

- Balancing architecture complexity, latency, and retrieval gains through ablations and threshold calibration

What I learned

This project pushed me beyond building a simple RAG demo. I learned how retrieval quality depends on structure, routing, evidence selection, and evaluation not just embeddings or prompting. I also learned to treat benchmarking as part of the architecture itself, using retrieval metrics, abstention checks, baseline comparisons, and ragas to justify design decisions.

Tech Stack

- Next.js, FastAPI, Caddy + Docker on a VPS

- Document parsing: pypdf, python-docx, pdfplumber for table enrichment

- Vector Store: ChromaDB (local + optional cloud persistence)

- LLM / Evaluation: OpenAI, Ragas, Vectara

- Scikit-learn, sentence-transformers, Python (core logic)

- Retrieval / ML: scikit-learn, sentence-transformers, Python core logic

View App

View Repo

Go Back Home

Helpmate AI