archaeo
Open source · MIT · git-history-only · BYO LLM key

git blame tells you who.
archaeo tells you why.

A local CLI that traces any line of code back through every move, rename, and refactor — to the commit, PR, issue, and review comment that introduced the logic. With an honest confidence score, and a flat “no recorded decision found” when the history can’t support an answer.

npm i -g git-archaeo View on GitHub See 150+ real runs
blame ↓ through timethe line
reformat / lintcosmetic
│ skip
rename filecosmetic
│ skip
move → util/move
│ stitch across files
introduced the logicorigin
151
real queries on real repos
97.7%
resolved to the real PR
(PR-driven repos, n=87)
6 HIGH
incl. the reviewer’s own critique
1–8s
per query on a full clone

You open a file you didn’t write. There’s a line that looks wrong — a retry count of 5, a 30-second timeout, a guard clause nobody remembers. You want to change it, but first: why is this here, and what breaks if I touch it?

So you run git blame. It points at a commit from three weeks ago: “refactor: tidy up imports.” Useless. The line’s real origin — the PR where someone argued for it, the incident that caused it — is buried under years of renames, moves, and formatting passes. You spend twenty minutes spelunking through GitHub and give up.

That’s the entire job of archaeo. The terminal above is a real run against kubernetes/kubernetes: in about six seconds it found the PR that introduced the line and surfaced the actual design debate from code review — the thing you’d otherwise spend an afternoon digging out.

The one rule

There are a hundred “chat with your codebase” tools, and most of them hallucinate. archaeo has exactly one rule that makes it different:

The LLM never answers from its own knowledge. It only summarizes retrieved evidence, and every claim cites a concrete artifact. If the evidence isn’t there, it says “no recorded decision found.” A confident guess is a defect, not a feature.

Trust is the whole game. A tool that sometimes invents a plausible reason is worse than no tool, because you can’t tell the good answers from the bad ones. archaeo would rather tell you it doesn’t know.

The hard part: blame-through-time

This is where almost every tool is mediocre, and where archaeo spends its entire complexity budget. git blame shows the last commit that touched a line — usually cosmetic. archaeo follows the line backward, skipping the cosmetic strata, to the change that introduced the behavior:

git blame  →  last touching commit            (rename / format / move — useless)

archaeo    →  renames → moves → refactors → squash → cherry-picks
                                
                  the commit that introduced the BEHAVIOR
The moat: a cross-file behavioral-origin trace, not a single git blame.

It uses git log -L to trace the line through its in-file history, detects the “file-introduction wall” (where code was moved in from elsewhere), then uses git’s pickaxe across all history to jump files and find where the logic originally entered the repo — even if it was first written somewhere else. A deterministic classifier skips the cosmetic commits. Then it recovers the chain: commit → merged PR → linked issue → the review comments that argued about it, handling squash-merges and cherry-picks along the way.

Does it actually work? 150+ real runs.

A demo on a toy repo proves nothing. So I ran 151 real queries across kubernetes, react, cognee and others — target lines sampled programmatically, not hand-picked. The summary:

RepoQueriesFound the real PRNotes
kubernetes/kubernetes3028 (93%)6 HIGH · median 6s
topoteretes/cognee5757 (100%)3.1s avg
facebook/react32 + 1 honest LOWtraced the 5ms scheduler slice
PR-driven, combined8785 (97.7%)6 HIGH · 58 MED · 22 LOW
a direct-push repo190 — all honest LOWrefusing to fabricate = working

A few rows from the line-level results (every row reproducible — full set on the evidence page):

TargetConfidencePRs
…/handlers/responsewriters/compression.go:65HIGH#1394825
…/scheduling/workload_aware_preemption.go:143HIGH#1393756
…/retrieval/agentic_retriever.py:234MEDIUM#27263
…/loaders/core/text_loader.py:51MEDIUM#12402
…/legacy direct-commit lineLOW1

HIGH is deliberately rare — it needs a clear winning commit plus a PR plus a linked issue or a substantive human review comment. Most real PRs earn MEDIUM. The model never inflates its certainty; that’s the point.

How it’s built

local & git-only

Runs on your machine against a repo you’ve cloned. No server, no SaaS, no telemetry. The only network call fetches PR/issue text from GitHub with your token, cached in SQLite.

bring your own key

Anthropic / OpenAI / Gemini — or run fully offline with a deterministic summarizer. No inference bill on us, no vendor lock-in.

honest by construction

Summarize-only LLM layer that cannot invent evidence; three-tier confidence with the reasons shown.

self-hostable, MIT

Enterprises won’t hand their git history to someone’s cloud. They don’t have to. Node 22+, zero native builds.

Where this goes

V1 is intentionally narrow: why, risk, and explain-commit, GitHub-only. It widens from there — but only where it stays evidence-grounded.

V1 · shipped

Why a line exists

  • archaeo why path:line — the behavioral-origin trace
  • archaeo risk path — 0–10 blast-radius from churn, coupling, incidents
  • archaeo explain-commit sha
V2 · next

The team’s memory

  • who / expert — who actually knows this code, from authorship + reviews
  • why <service> — the business purpose of a module, synthesized from its PRs
  • Onboarding mode — “how does auth work here?”, fully cited
  • Discovery (search / ask), plus GitLab & Bitbucket
V3 · later

Impact & dependencies

  • impact <service> — “what breaks if I change this?” via a real dependency graph
  • Multi-hop expertise & dependency reasoning — where a graph engine finally earns its place

The honest part

No tool earns trust by hiding its seams. archaeo is git-history-only — if your team’s “why” lives in Slack and Jira, it won’t see it, and it’ll tell you so rather than guess. A repo full of fix stuff commits has the historical value of a burnt library; archaeo surfaces “this part of your history is undocumented,” which is itself useful. It’s GitHub-only for now, and slower on partial/shallow clones (it warns you).

If you’ve ever been scared to touch code you didn’t write, point it at your own repo and tell me whether it finds something a senior engineer couldn’t in 30 seconds. That’s the bar.