Methodology
Goal
Produce an evidence-backed deprecation recommendation for every
driver-bearing subdirectory under drivers/ in the Linux kernel,
without burying the operator in cost.
The output is a structured dossier per directory. Each dossier:
- assigns one of six recommendations:
keep,keep-annotate,deprecate,remove,unsure, ornot-a-driver - carries a confidence in
[0, 1] - cites at least one source for every non-trivial fact
- names the chipset family and replacement driver (if any)
- explicitly notes which tool produced which fact
The output is a structured corpus. Anyone — a maintainer, a distro packager, a researcher — can browse it, sort it, filter it, challenge individual verdicts, and rebuild the whole thing from scratch with a different kernel SHA or a different model.
The three phases
Phase 1 — git-log dormancy ranker (no LLM)
Walks drivers/ and produces, per leaf directory containing at
least one .c file with a real driver entry-point macro:
commits_5y(raw count, last 5 years)substantive_commits_5y(raw minus mechanical sweeps)first_touch_ts,last_touch_ts,last_substantive_touch_tsunique_authors_5y,top_author,top_author_commits- a single scalar
dormancy_scorederived from the above
A directory’s dormancy score is high when it is old, has had few substantive commits recently, and the most recent substantive touch was long ago. The score is zero for:
- directories younger than 5 years (recent re-organisations don’t count as dormant)
- mega-subsystem leaves (the score lookup table for
MEGA_SUBSYSTEM_PREFIXES) - non-driver content (asset suffix
/tests,/include,/dt, … or no driver-entry-point macro found)
Phase 1 runs in about three minutes on the full kernel tree, costs
nothing, and emits a ranked top-N shortlist with parent-subsumption
applied (if drivers/foo/bar is shortlisted, no descendant of
bar/ is added separately).
Full design + parameters: ranking.md.
Phase 2 — codex CLI dossier probe
For each shortlisted directory, one call to OpenAI’s codex CLI:
codex exec \
--ephemeral --ignore-rules --skip-git-repo-check \
-c model="gpt-5.4" \
-c model_reasoning_effort="medium" \
-s workspace-write \
-C "$KERNEL_ROOT" \
--add-dir data/dossiers \
--add-dir "/run/user/$(id -u)" \
--output-schema data/schema.v1.json \
-o data/dossiers/<path>/dossier.json \
--json \
"<prompt>"
The prompt:
- explains the role and the available tools
- carries the phase-1 static features so the model can reason about age + churn without re-reading git
- defines an “early exit” rule: if the directory is clearly not a
driver, return
recommendation_hint=not-a-driverimmediately, no tool calls - lists tool budgets (3-5 total) and which fallback to use when primary tools fail
The model can use:
- a
lore.kernel.orgMCP server (lore_activity,lore_file_timeline,lore_search,lore_regex, etc.) - shell, including
lei qagainst the local public-inbox mirror - native web_search
--output-schema is non-negotiable — without it the model
free-texts around enum values and we lose the discipline that
makes the corpus uniform. Every property in
data/schema.v1.json is in required,
additionalProperties: false is set at every level, every enum is
closed.
Per-probe cost: ~75s wall clock, ~170k input tokens (~78% cached across a batch), ~3k output tokens, ~$0.20-$0.30 at current gpt-5.4 pricing.
Full design + the structured-output gotchas: pipeline.md.
Phase 3 — validation
Two checks, both runnable any time:
# Structural: schema + required files + driver_path consistency
uv run --script scripts/validate_dossiers.py
# URL resolution: HEAD-check every cited source
uv run --script scripts/spot_check.py data/dossiers
The structural validator confirms that every per-driver directory
has the same set of files, that dossier.json validates against
the schema, that summary.json and meta.json carry the expected
keys, and that driver_path matches the directory layout. On the
current corpus it reports zero issues across 864 dossiers.
The URL resolver issues curl -L HEAD requests in parallel and
classifies each cited source as 2xx (real and reachable), 4xx
(broken), or 5xx (transient). Bot-blocked URLs (Anubis on lore,
Cloudflare on vendor sites) return 403/429 — those are still
real, they’re not fabrications. Across the spot-checked
deprecate/remove subset, 0 of 159 URLs were genuine fabrications.
Why this shape
A few decisions are worth defending explicitly:
- JSON Schema, not free-text. Asking the model “should this driver be deprecated?” is a different question depending on who is asking. Asking “fill out this 10-field form” is the same question every time. The schema discipline makes the corpus comparable across drivers and re-runnable.
- Citations or confidence-cap. The prompt says: if you can’t
cite real evidence, return
sources: []and cap confidence at 0.3. That nudges the model away from confident-sounding hot takes when web search hasn’t returned anything useful. - Six categories, not two. “Keep vs deprecate” loses too much
information.
keep-annotate(a real category — most legacy drivers belong here) means “leave the code in place but document the niche”.removemeans “patches are already in flight”;deprecatemeans “candidate for the next removal series”;unsureis escape-valve for the model. Six lets us distinguish active ongoing cleanup from latent deprecation candidates. - One dir, one dossier. A driver-file-level pipeline would be more accurate but burns more codex calls and produces redundant output for tightly-related files. The dir-level coarsening is a deliberate trade. See limitations.md.
- Mega-subsystem blocklist. Drivers under
drivers/gpu/drm/amdordrivers/net/ethernet/intelare essentially allkeep. The blocklist prevents wasted codex calls on a-priori-known-active code. The list is inphase1_rank.py::MEGA_SUBSYSTEM_PREFIXES.
Re-running
The pipeline is reproducible given:
- a Linux kernel checkout (any branch or SHA)
- the OpenAI
codexCLI configured with a workingmcp_servers.lore-httpentry in~/.codex/config.toml uv(Python package manager) installed~$140of API budget for the full top-1000 sweep
To bump the kernel snapshot, change --ref and --since on the
phase-1 invocation; the dormancy formula and filters carry
forward. Phase 2 will idempotently skip drivers whose
dossier.json already exists at the new ranking — pass --force
to re-probe everything.
To change the model or the prompt: edit phase2_probe.py and
re-run with --force on the affected paths. Each run records
its model + reasoning effort in meta.json, so you can mix
versions in one corpus and audit later.
What this is not
- It is not a recommendation that any specific patch should land. The corpus surfaces evidence; humans decide.
- It is not stable across re-runs in a strict sense — model outputs are deterministic-ish but not bit-identical. Re-running the same prompt with the same model often produces a different source list but a similar verdict.
- It is not a replacement for upstream review. A
removeverdict here is a conversation starter, not a pre-approved patch.