archaeo — git blame tells you who. archaeo tells you why.

You open a file you didn’t write. There’s a line that looks wrong — a retry count of 5, a 30-second timeout, a guard clause nobody remembers. You want to change it, but first: why is this here, and what breaks if I touch it?

So you run git blame. It points at a commit from three weeks ago: “refactor: tidy up imports.” Useless. The line’s real origin — the PR where someone argued for it, the incident that caused it — is buried under years of renames, moves, and formatting passes. You spend twenty minutes spelunking through GitHub and give up.

That’s the entire job of archaeo. The terminal above is a real run against kubernetes/kubernetes: in about six seconds it found the PR that introduced the line and surfaced the actual design debate from code review — the thing you’d otherwise spend an afternoon digging out.

The one rule

There are a hundred “chat with your codebase” tools, and most of them hallucinate. archaeo has exactly one rule that makes it different:

The LLM never answers from its own knowledge. It only summarizes retrieved evidence, and every claim cites a concrete artifact. If the evidence isn’t there, it says “no recorded decision found.” A confident guess is a defect, not a feature.

Trust is the whole game. A tool that sometimes invents a plausible reason is worse than no tool, because you can’t tell the good answers from the bad ones. archaeo would rather tell you it doesn’t know.

The hard part: blame-through-time

This is where almost every tool is mediocre, and where archaeo spends its entire complexity budget. git blame shows the last commit that touched a line — usually cosmetic. archaeo follows the line backward, skipping the cosmetic strata, to the change that introduced the behavior:

git blame  →  last touching commit            (rename / format / move — useless)

archaeo    →  renames → moves → refactors → squash → cherry-picks
                                ↓
                  the commit that introduced the BEHAVIOR

The moat: a cross-file behavioral-origin trace, not a single git blame.

It uses git log -L to trace the line through its in-file history, detects the “file-introduction wall” (where code was moved in from elsewhere), then uses git’s pickaxe across all history to jump files and find where the logic originally entered the repo — even if it was first written somewhere else. A deterministic classifier skips the cosmetic commits. Then it recovers the chain: commit → merged PR → linked issue → the review comments that argued about it, handling squash-merges and cherry-picks along the way.

Does it actually work? 150+ real runs.

A demo on a toy repo proves nothing. So I ran 151 real queries across kubernetes, react, cognee and others — target lines sampled programmatically, not hand-picked. The summary:

Repo	Queries	Found the real PR	Notes
kubernetes/kubernetes	30	28 (93%)	6 HIGH · median 6s
topoteretes/cognee	57	57 (100%)	3.1s avg
facebook/react	3	2 + 1 honest LOW	traced the 5ms scheduler slice
PR-driven, combined	87	85 (97.7%)	6 HIGH · 58 MED · 22 LOW
a direct-push repo	19	0 — all honest LOW	refusing to fabricate = working

A few rows from the line-level results (every row reproducible — full set on the evidence page):

Target	Confidence	PR	s
…/handlers/responsewriters/compression.go:65	HIGH	#139482	5
…/scheduling/workload_aware_preemption.go:143	HIGH	#139375	6
…/retrieval/agentic_retriever.py:234	MEDIUM	#2726	3
…/loaders/core/text_loader.py:51	MEDIUM	#1240	2
…/legacy direct-commit line	LOW	—	1

HIGH is deliberately rare — it needs a clear winning commit plus a PR plus a linked issue or a substantive human review comment. Most real PRs earn MEDIUM. The model never inflates its certainty; that’s the point.

How it’s built

local & git-only

Runs on your machine against a repo you’ve cloned. No server, no SaaS, no telemetry. The only network call fetches PR/issue text from GitHub with your token, cached in SQLite.

bring your own key

Anthropic / OpenAI / Gemini — or run fully offline with a deterministic summarizer. No inference bill on us, no vendor lock-in.

honest by construction

Summarize-only LLM layer that cannot invent evidence; three-tier confidence with the reasons shown.

self-hostable, MIT

Enterprises won’t hand their git history to someone’s cloud. They don’t have to. Node 22+, zero native builds.

Where this goes

V1 is intentionally narrow: why, risk, and explain-commit, GitHub-only. It widens from there — but only where it stays evidence-grounded.

V1 · shipped

Why a line exists

archaeo why path:line — the behavioral-origin trace
archaeo risk path — 0–10 blast-radius from churn, coupling, incidents
archaeo explain-commit sha

V2 · next

The team’s memory

who / expert — who actually knows this code, from authorship + reviews
why <service> — the business purpose of a module, synthesized from its PRs
Onboarding mode — “how does auth work here?”, fully cited
Discovery (search / ask), plus GitLab & Bitbucket

V3 · later

Impact & dependencies

impact <service> — “what breaks if I change this?” via a real dependency graph
Multi-hop expertise & dependency reasoning — where a graph engine finally earns its place

The honest part

No tool earns trust by hiding its seams. archaeo is git-history-only — if your team’s “why” lives in Slack and Jira, it won’t see it, and it’ll tell you so rather than guess. A repo full of fix stuff commits has the historical value of a burnt library; archaeo surfaces “this part of your history is undocumented,” which is itself useful. It’s GitHub-only for now, and slower on partial/shallow clones (it warns you).

If you’ve ever been scared to touch code you didn’t write, point it at your own repo and tell me whether it finds something a senior engineer couldn’t in 30 seconds. That’s the bar.

git blame tells you who.archaeo tells you why.