Harness Engineering / Long form

Documentation you can't query is a graveyard

On this page

You have four hundred markdown files. Plans, ADRs, runbooks, meeting notes, session logs, that one decision doc explaining why the auth flow looks unhinged. You wrote all of it so future-you — and your agents — could use it later. Quick question: when did you last actually find the right one at the moment you needed it?

If the honest answer is "I grepped, gave up, and rewrote a doc that already existed," you don't have a knowledge base. You have a write-only graveyard.

My position: documentation you can't query is just disk that makes you feel organized. Accumulated knowledge becomes leverage only when you — and the agents next to you — can retrieve the right slice on demand, in plain language, locally. Retrieval is not a nice-to-have you bolt on later. It's a named component of the harness — the scaffolding of retrieval, tests, review, rollback, and ownership that keeps AI-native work standing. I run one over my own corpus, and I'll tell you what to look for. I won't tell you how to wire it up. That part has a price tag.

Markdown remembers. Retrieval is what lets you ask.

There's a separate, true idea floating around: markdown is where the repo remembers — durable decisions live in plain text, not in a chat that scrolls away. Fine. But memory you can't retrieve isn't memory. It's storage. The graveyard isn't a format problem; it's a retrieval problem. You don't lose the ADR because it was the wrong file type. You lose it because nothing answers "where did we decide X, and why?" without you remembering the filename first.

Grep finds the word you remember. It can't find the idea you forgot you wrote down. That gap is the whole game.

Two axes, not one: lexical and semantic

A real local knowledge base searches on two axes at once. Lexical search (BM25 and its relatives) nails exact terms — the error code, the function name, the client's name. Semantic search uses embeddings to find things by meaning, so "how do we handle a customer who churns mid-contract" surfaces the doc titled "involuntary downgrade policy" even though they share no keywords.

You want both, because you query in both modes depending on the day. This isn't a hot take — it's how serious retrieval teams build. Anthropic's writeup on Contextual Retrieval combines embeddings with BM25 precisely because neither alone is enough (www.anthropic.com); the pattern traces back to the paper that named retrieval-augmented generation (arxiv.org). The difference now: you can run a competent version locally, over your own files, without shipping a byte to anyone's cloud.

The criteria that matter more than the tool

Tool worship is the trap. The question is never "which retrieval tool is best." It's "which one can I own, audit, and abandon without pain." My criteria, in order:

Local-first. Your corpus is plans, client context, half-formed strategy. That doesn't belong in a third party's index by default. Local retrieval keeps the private graph private.
CLI before SDK before MCP. If a retrieval tool gives me a real command-line surface, I can script it, schedule it, and inspect it. CLI → SDK → MCP is my adoption order for exactly this reason — the further down that list you go, the more lock-in and the more fragile the sandbox.
Retrieval points, it doesn't rule. The index is a pointer to a relevant document, not the source of truth. Current reality gets confirmed by reading the file, the git history, the tests, the live system. Treat the index as gospel and it will confidently hand your agent a decision you reversed three months ago.

Working isn't the bar. Surviving is the bar. A retrieval setup that dazzles in a demo and rots in a month was never infrastructure.

The failure modes nobody puts on the landing page

Owning retrieval means owning its failure modes. Three that bite:

Staleness. An index that hasn't re-synced lies with total confidence. It returns the old runbook, the agent acts on it, and you debug a problem that was already fixed. A knowledge base is only as trustworthy as it is fresh — and freshness is ongoing work, not a one-time setup.

Noise drowning signal. The instinct is to index everything. Don't. Pour in every session log and throwaway note and your durable docs get buried — retrieval quality drops, embedding cost climbs, leak risk rises. There's a real tipping point where ephemeral chatter becomes the majority of the collection and the good stuff stops surfacing. "Less is better" isn't minimalism for its own sake; it's retrieval hygiene.

"Just use a giant context window." The fashionable objection: models read a million tokens now, so why retrieve? Because context is a budget, not a hard drive. Stuffing everything in on every call is slower, pricier, less auditable — and ships your private corpus to a model every single time. Retrieval keeps you precise and local. Long context complements it; it doesn't retire it.

Position

A local, queryable knowledge base — lexical and semantic, owned and kept fresh — is minimum infrastructure for anyone working alongside agents at any real scale. Not because it's trendy. Because the alternative is a graveyard of docs you'll rewrite from scratch and an agent planning against context it can't actually reach.

The tool I use today is replaceable. The discipline is not: own it, keep it fresh, index for signal over noise, and treat it as a pointer rather than gospel. Working isn't the bar — surviving is.

Here you get the why. The how — standing this up over your corpus, with the domain boundaries, freshness, and contribution perimeter your case actually needs — is where it gets specific to you. Need hands, not a map? That's Carevia mentorship.