Amplify the Expert: A Philosophy for Building Enterprise RAG

Amplify the Expert: A Philosophy for Building Enterprise RAG


of Enterprise Document Intelligence, a series that builds an enterprise RAG system from four bricks: parsing, question parsing, retrieval, and generation.

Amplify the expert: the thesis behind every architectural choice in the series.

where this article fits in the series: a manifesto alongside the numbered spine – Image by author

If you have to remember one idea from this series, it is this: enterprise RAG amplifies the expert. It does not replace them. This piece sets down the thesis up front, before the techniques start, because every later article derives from it.

Most architectural mistakes in production RAG follow from forgetting this. Once you accept it, the rest of the series stops being a catalog of techniques and starts looking like a coherent argument.

1. The thesis in one sentence

This series is about building RAG systems that amplify enterprise experts working with their own documents, not about building general-purpose document intelligence that replaces them.

The premise sounds modest but it changes most architectural choices. The system’s job is to scale judgment that already exists in human form: the lawyer who has read a thousand contracts, the underwriter who reaches for the deductible clause on reflex, the compliance officer who knows which sentence the auditor will ask about. Those people are the source of truth. The system handles volume, finds passages in seconds, compares documents systematically. It does not pretend to be the expert.

Every other position the series defends derives from this thesis. Vector stores are a fallback because the expert already knows the keywords. Deterministic dispatchers beat autonomous agents because the expert needs to audit what happened. Expert dictionaries beat fine-tuned embeddings because the expert’s vocabulary is richer than any IDF formula or vector space could capture.

2. The gap between two camps

Most enterprises run two parallel realities on the same documents: an opaque vector-store pipeline the IT camp built, and an expert who still searches with Ctrl+F because nothing the IT camp shipped earned their trust. The series sits in the bridge between the two.

The two camps and the bridge the series sits in – Image by author

On the IT side, the camp told by vendors and conference talks to chunk every document, push it into a vector store, embed every query, and trust that cosine similarity will find the right passage. They build the system, they run it, and if you ask them precisely why a given chunk came back, very few can answer. The architecture is opaque even to the people who deployed it.

On the expert side, decades of accumulated reading. Lawyers who have read a thousand contracts. Underwriters who have priced ten thousand policies. Compliance officers who can name the clause an auditor will ask about before the auditor walks in. Ask them how they search a document. The honest answer is almost always the same. They open the PDF, hit Ctrl+F, type a keyword they know works in their corpus, find the passage. If the keyword misses, they go to the table of contents, locate the right section, scan it line by line. That is the retrieval method that decades of expertise has converged on.

The gap is not benign. The IT-camp system is opaque even to the people who built it; the expert-camp method is precise but does not scale. The series’s natural move is to bring them together: take the method the expert already trusts (keyword search anchored on real vocabulary, then TOC navigation when keywords miss) and use the LLM to scale it. LLMs are now strong enough that the retrieval stage no longer has to be clever to compensate. The 2022-era reflex of stacking embedding tricks on top of a weak generation model was solving a problem that no longer exists at the same intensity. Retrieval can stay close to the expert’s natural workflow without losing answer quality.

Underneath the two camps sits a distinction worth stating plainly. There are two ways to answer a question, and they are not the same operation:

Enterprise work is the second case, and the rest of the series keeps the two phases apart.

Mirroring the expert’s method this closely is not cosmetic. The point is not that vector stores are wrong everywhere; the point is that adopting a method the expert cannot recognize, on documents the expert knows by heart, is the fastest way to lose their trust. Without trust, the system does not get used, and a system that is not used has zero value regardless of how impressive its benchmarks look.

3. The historical parallel: machine learning ten years ago

RAG is repeating the enterprise ML wave of 2015 to 2020 verbatim. The same vendor-copying reflex, the same generic templates, the same failure modes. What worked then, and what will work for RAG now, is domain-specific work anchored on existing expertise.

The two enterprise waves share the same shape ten years apart – Image by author

Between 2015 and 2020, enterprises tried to build ML systems by copying Google, DeepMind, and Facebook. “Build a model that learns” was the slogan. Most enterprise ML projects from that era failed to reach production. Gartner put the figure at around 85% in 2019, and the practitioners who lived through the wave cite numbers in the same range. They failed for the same reasons every time. Enterprise companies do not have Google’s data scale. They do not have research teams. They do not have unlimited compute budgets. They do not have the open-ended use cases that justify general approaches.

What ended up working in enterprise ML was domain-specific work. Actuarial forecasting tuned to insurance. Document classification calibrated on internal vocabulary. Risk scoring that exploited the variables domain experts had already identified as predictive. The systems that delivered value were the ones that built on existing expertise, not the ones that tried to learn it from scratch.

RAG is repeating this exact pattern. Enterprises copy OpenAI, plug their data into generic managed RAG products, vectorize everything by default. The failure modes are the same as the ML decade: too much generality, not enough domain anchoring, no answer for the cases the benchmarks did not cover. The alternative is the same answer that worked ten years ago. Domain-specific RAG. Codify the expertise that already exists. Use the structure of documents the team already knows. Amplify the expert instead of bypassing them.

This parallel matters for two reasons. It gives the argument historical depth (we have seen this movie). And it gives the argument constructive framing (we are not against OpenAI; we are saying the trajectory is known and the alternative is to build for our context, not theirs).

4. Where this applies (and where it does not)

The thesis is not universal. Four context properties decide whether this series is your guide. When all four hold, the architecture earns its place; when one is missing, a different stance fits better.

The four conditions for this architecture and the verdict either way – Image by author

The four properties are:

These four hold for most enterprise document intelligence work. Insurance brokers, law firms, hospitals, banks, government agencies, any organization where experts work with structured documents under regulatory scrutiny.

Where it does not. Open-domain QA over the web, consumer chat, exploration of a corpus where no expert exists, settings where the questions are unbounded. There, general-purpose retrieval and autonomous agents make more sense. The trade-off shifts: you sacrifice audit and reproducibility, but you also do not have an expert who would have used either. Those are different problems and the architecture should be different. The series’s stance is defensible precisely because it admits where it does not apply.

5. The three founding principles

Amplifying the expert turns into code under three disciplines: choose techniques the expert recognises, build a pyramidal architecture a new engineer can trace in one sitting, and use relational tables (not strings) at every brick junction.

The three disciplines and the design test each one carries – Image by author

These three are not features. They are the discipline that makes the series’s specific architectural choices defensible across many years and many contributors.

6. The four bricks, through this philosophy

The four bricks (parsing, question parsing, retrieval, generation) are common to most RAG architectures. What is specific here is that each one mirrors something the expert does mentally and amplifies it on the axes a manual workflow cannot reach. Every later article in the series develops one of these four ideas in code.

Each brick mirrors an expert action and amplifies on one axis – Image by author

Parsing mirrors how the expert scans a document on first read: grasp the topic, find the section list, spot where the numbers live. The parser does that scan once and keeps the result. Everything missed here cannot be recovered downstream, which makes parsing the most important choice in the pipeline.

Question parsing mirrors the Ctrl-F reflex: the expert starts by typing two or three keywords. The brick keeps that and amplifies it on two axes Ctrl-F can’t reach (co-occurrence and expert-dictionary expansion), then splits the question into a retrieval brief and a generation brief that downstream bricks consume separately.

Retrieval mirrors the triage the expert does after Ctrl-F returns thirty hits: drop the off-topic ones, keep the few worth a second look. The brick does that at scale and keeps three things apart that “top-k chunks” collapses, the anchor (where the match lands), the scope (what goes to generation), and the context (the surrounding the expert reads by reflex). The criterion is “the set worth a second human pass”, not “top-k by cosine”.

Generation is where the discipline against fabrication lives: a faithful restatement of what the retrieved scope says plus the citation to verify it, never a paraphrase that drifts. The LLM fills a typed Pydantic schema (answer, line citations, answer_found, confidence, caveats) that the expert controls by writing the schema and the prompt.

  • Article 8a (the answer contract): the typed answer with citations and self-checks
  • Article 8b (prompt assembly): prompt + schema + trace from a parsed question
  • Article 8c (validation): the validator and the feedback loop that closes the pipeline
  • Article 13 (the workflow pipeline): wire the four upgraded bricks into one
  • Article 9 (the upgraded pipeline): the Article 1 (minimal RAG) baseline, upgraded brick by brick

Every brick respects the same discipline: structured input, structured output, no string-soup at any junction. That makes the system queryable, auditable, replayable, and joinable across years of accumulated questions and answers. Part IV (Article 14 the corpus problem, 15 preparing the corpus, 16 ontology, 17 querying the corpus) shows what the same four bricks become at corpus scale, with a SQL-shaped corpus_index, an ontology in five relational tables, and corpus-level QA. Part V (Article 18 code architecture, 19 storage, 20 evaluation, 21 cost & latency, 22 security) makes the architecture operable across years.

7. What follows from the thesis

The series defends six counter-positions against the mainstream RAG playbook. They are not stylistic choices: each one follows mechanically from the thesis once the four context properties hold.

Six condition-to-consequence rows derived from the four context properties – Image by author

The point of this piece is that these positions are not independent. They are one argument with six visible consequences.

8. Sources and further reading

This epilogue is the philosophical anchor of the series. The framing of expert judgment as a renewable resource comes from Tetlock and Gardner (Superforecasting, 2015). The tool-as-amplifier philosophy that maps directly to RAG architecture is from Norman (The Design of Everyday Things, 1988). Anthropic’s Building Effective Agents (Dec 2024) is the industry framing of when workflows win over agents. The classic short paper behind the amplify-the-expert tiebreaker is Bainbridge’s Ironies of Automation (1983): the more advanced the automation, the more the human contribution matters. Agentic patterns where the agent still uses the audited bricks the expert curated are follow-up work.

Same direction as the epilogue:

  • Tetlock & Gardner, Superforecasting: The Art and Science of Prediction, 2015. Expert judgment as a renewable resource; the amplify the expert thesis treats domain experts the way Tetlock treats superforecasters.
  • Norman, The Design of Everyday Things, 1988/2013. Tool-as-amplifier rather than tool-as-replacement; the philosophy applies to RAG architecture the same way it applies to door handles.
  • Anthropic, Building Effective Agents, December 2024. When LLM agents work and when deterministic workflows win; the decision matrix matches this series’ philosophy.
  • Carr, The Glass Cage: How Our Computers Are Changing Us, W.W. Norton 2014. Cautionary book on automation that bypasses expert judgment; the broker-corpus stories in the series are concrete instances of Carr’s concerns.
  • Bainbridge, Ironies of Automation, Automatica 1983. Classic short paper: the more advanced the automation, the more the human contribution matters. The philosophical backing for the amplify-the-expert tiebreaker.

Different angle, different context:

  • Bostrom, Superintelligence: Paths, Dangers, Strategies, Oxford University Press 2014. The strongest philosophical case for systems that aim past expert amplification toward full autonomy. The context is long-term AGI; this epilogue handles enterprise document work where experts are accessible and audit is required.
  • Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023 (arXiv:2210.03629). Agent reasons and acts without human-curated routing. The context is general-purpose tool-picking; developing this line on top of the audited bricks the expert maintains is follow-up work.

Earlier in the series:

Part I: What works, what breaks

Part II: The four bricks



Source link