Repository Evidence Graph Queries

src/queries/insights.ts is the shared adapter for dashboard-grade evidence graph queries. CLI, TUI, and integration tests should reuse these builders instead of embedding ad hoc SurrealQL that can drift from the schema.

Example commands:

axctl insights
axctl insights schema
axctl insights repositories --limit=25
axctl insights checkouts --limit=25
axctl insights git --limit=25
axctl insights friction --limit=50
axctl insights tools --limit=20
axctl insights sessions --limit=20
axctl insights file-evidence --limit=20
axctl insights feedback-loops --limit=20
axctl insights feedback-language --limit=20
axctl insights message-signals --limit=20
axctl insights reactions --limit=20
axctl insights reaction-themes --limit=20
axctl insights reaction-events --limit=20
axctl insights reaction-event-themes --limit=20
axctl insights verification-gaps --limit=20
axctl insights user-language --limit=20
axctl insights token-impact --limit=20
axctl insights cache-health --limit=20
axctl insights workflow-impact --limit=20
axctl insights codex-health --limit=20
axctl insights closure --limit=20
axctl insights post-feature-fixes --limit=20
axctl insights skill-candidates --limit=20
axctl insights graph-health --limit=10
axctl dashboard --limit=25
axctl costs summary --since=2
axctl costs for --query "live-traces" --limit=20
axctl costs for --terms "live trace,livetrace,live-traces" --since=2 --limit=50
axctl costs for --query "checkout bug" --since=7 --here
axctl costs for --commit 464c80b
axctl costs for --branch main --limit=20
axctl pricing --query gpt-5.5

Use --json on any insights view to print the raw query rows. The message-analysis views default to compact, scan-friendly output.

The builders target the current schema fields directly:

Cost And Pricing Queries

axctl costs is the graph-backed read surface for token spend. It reads session_token_usage rows produced by provider ingest and session-health, then groups by provider/model and reports estimated USD. Cost estimates use agent_model pricing rows imported by the pricing/models stage, with a built-in local catalog as a fallback for common models.

Supported commands:

axctl costs summary [--since=N] [--source=codex|claude|pi|opencode|cursor]
axctl costs for --session <session-id>
axctl costs for --query <turn-text> [--since=N] [--project=<path>] [--here] [--limit=N]
axctl costs for --terms <term-a,term-b> [--since=N] [--project=<path>] [--here] [--limit=N]
axctl costs for --commit <sha>
axctl costs for --branch <branch> [--limit=N]
axctl pricing [--query <model-or-provider>] [--limit=N]

Selectors map onto graph evidence:

--query and --terms accept scope filters:

The output includes session count, total estimated tokens, prompt/output/cache token buckets, model breakdown, pricing source, and the matching session ids. Use --json on costs for and pricing when another tool needs structured data.

Known limits:

Harness Doctor Tables And Ingestion Status

The Harness Doctor ingest slice currently persists these tables:

Current implementation status:

The harness ingest stage is idempotent and:

  1. Upserts guidance_source rows keyed by path.
  2. Upserts guidance_revision rows keyed by source path plus content hash.
  3. Upserts declared and observed stack records.

Use axctl project harness --json as the canonical report surface and axctl insights schema to verify durable table population after ingest.

Command Outcome And User Language Tables

The command outcome slice adds:

command_outcome is keyed from the original tool_call record and classifies commands into success, expected_feedback, search_miss, guardrail, environment_blocker, workflow_error, product_bug_signal, or unknown. This keeps useful TDD/lint/typecheck feedback distinct from real workflow friction.

user_message_ngram is derived from turn.role = "user" excerpts and stores bi-gram/tri-gram frequency plus correction, failed-tool, edit, and verification proximity counters. It is an intentionally small first pass for mining repeated preferences, corrections, and language that should become taste or harness learning candidates.

Session Token And Workflow Health Tables

The session health slice adds:

workflow_epoch currently derives a gsd to superpowers split from the first observed superpowers:* skill invocation. This is a heuristic, but it creates a stable comparison anchor for dogfooding workflow migration questions.

session_token_usage labels token/model quality in its JSON labels field. token_source_quality is explicit for provider counters such as Codex token_count, Pi usage fields, or Claude usage metadata; estimate for transcript-byte estimates; and unavailable when neither counters nor text bytes are present. model_source_quality distinguishes provider model names from missing model metadata. Cost reads also surface unpriced_model_reason when pricing is not computed for a row.

session_health records turns, tool calls, tool errors, correction-like user messages, interruption/status/redirect-like user messages, subagent dispatches, plan snapshots, estimated tokens, cache ratios, and a coarse context-pressure bucket. These rows power token-impact, cache-health, workflow-impact, and codex-health.

Closure Quality And Skill Candidate Tables

The closure-quality slice adds:

commit_classification classifies commit messages as feature, fix, refactor, test, docs, chore, or unknown.

later_fixed_by links a feature commit to a later fix commit when they share a repository, land within the time window, and touch one or more of the same files. This is a deliberately conservative first pass: it treats same-file post-feature fixes as evidence that closure quality could improve.

skill_candidate turns repeated fix-chain patterns and risky session health signals into candidate skills or guardrails, such as ingest idempotency checks, schema-change smoke tests, live query dogfooding, or session closure quality gates.

Onboarding

axctl onboarding --json checks whether global Claude, Codex, and shared agent guidance directories are git-tracked. This gives future guidance and skill experiments commit evidence before ax starts recommending harness changes.

SurrealKit workflow takeaway: local development can keep importing the schema directly for now. Tests should prefer isolated databases or namespaces so query/integration runs do not mutate the user's main ax/main graph. A future schema sync and rollout workflow can be added once the evidence graph stabilizes.

Implementation-pattern reference: docs/effect-reference-t3code.md captures Effect practices from the local .references/t3code clone that are worth adapting as the prototype grows, especially typed config, process services, schema decoders, and layer-based tests.

Prototype Verification Notes

The prototype writes the new evidence graph beside the legacy taste graph. Existing taste/search commands continue to read legacy edges while the new insight commands read through src/queries/insights.ts.

Verification commands run:

2026-05-11 Dogfood Notes

Full backlog dogfood ran:

Observed outputs:

Dogfood fixes made during the backlog run:

Live dogfood counts after the smoke:

Schema coverage after the smoke should be read from axctl insights schema. The schema view counts current active and staged graph tables and omits tables removed by schema migrations.

Legacy self-improve importer behavior:

Install onboarding dogfood:

wterm terminal dogfood:

Harness Doctor schema additions are populated by default ingest. If they are empty, run axctl ingest --since=1 and inspect the harness/doctor ingest stage.

Dashboard generated at:

file:///Users/necmttn/.local/share/ax/dashboard.html

Experiment Loop CLI (axctl improve)

axctl improve is the read-write surface on top of the experiment-loop tables (proposal, skill_proposal, experiment, checkpoint). The loop: retro → proposal → experiment → verdict (see axctl retro for the front end). Subcommands:

axctl improve recommend

Rank open proposals by confidence × recency × frequency and print them as paste-ready blocks, each wrapped in <!--ax:id--> provenance markers so the agent file edit is traceable back to the proposal.

Flags:

axctl improve accept <id>

Default mode emits .ax/tasks/<id>.md, a structured brief your primary agent (Claude Code, Codex, etc.) consumes to edit the target file with the marker still in place. The brief tracks task_emitted status on the experiment row.

Flags:

<id> accepts either the dedupe sig (12-char prefix from recommend) or the full proposal:<key> record id.

axctl improve lint

Scan grounded agent files for <!--ax:id--> markers and reconcile against the DB:

Flags:

Linter dedupes against proposal.dedupe_sig exactly and pushes the stale-task date filter into SurrealQL, so it stays fast as the proposal table grows.

axctl improve show <id>

Full evidence trail for one proposal: source retro(s), baseline cluster, skill payload (trigger pattern, proposed behavior, expected impact), the linked experiment row, scaffold path, checkpoint snapshots, locked verdict.

axctl improve list

Browse the proposal queue.

Flags:

axctl improve verdict <id>

Inspect or lock the +30-session verdict.

Flags:

axctl improve reject <id>

Mark proposal rejected. Future re-derives of the same trigger are deduped against rejected proposals, so the same pattern won't re-propose every retro.

Flags:

axctl improve checkpoint

Compute checkpoint snapshots at +3/+10/+30 sessions for active experiments (session-count windows, not calendar days - see issue #83). Cron-runnable; the weekly self-improve cron calls this. Legacy day-based rows (t+7/t+30/t+90) from before #83 stay in the DB as historical data and are not re-derived.

axctl improve reset --yes

Wipe all experiment-loop state (proposals, experiments, checkpoints, skill proposals). For test fixtures and local-only debugging. Requires --yes.

Provenance markers

Every accepted proposal's edit is wrapped:

<!--ax:a1b2c3d4e5f6-->
... agent-file content ...
<!--/ax:a1b2c3d4e5f6-->

The id is the proposal dedupe_sig prefix. axctl improve lint reconciles both directions: orphan markers (DB has no proposal) and orphan proposals (task_emitted but the brief was never consumed). Nested same-id close tags are balanced; markers across multiple files for the same proposal are allowed.

.ax/tasks/<id>.md task briefs

When axctl improve accept <id> runs without --auto-scaffold/--with-agent, it writes .ax/tasks/<id>.md with:

  1. Target file path (e.g. ~/.claude/CLAUDE.md or a skill SKILL.md path).
  2. The exact paste-ready block (markers + content).
  3. A Lint after applying: footer pointing at axctl improve lint.

The brief is plain markdown. Hand it to any agent; the agent's diff is what lands in your config. lint reconciles the brief's existence against the marker actually showing up in the target file.

Session Sharing CLI (axctl share)

Share a Session

axctl share <session-id> exports a sanitized session artifact, creates a secret GitHub Gist containing ax-session.json, and prints an https://ax.necmttn.com/s/<owner>/<gist-id> renderer URL, which opens the Studio-backed session inspector.

Use --dry-run to inspect the artifact before publishing:

axctl share <session-id> --dry-run > session-share.json

Secret Gists are unlisted links, not private storage. Do not share sessions that contain secrets or proprietary data without reviewing the dry-run artifact first.

Flags:

Retro CLI (axctl retro)

The retro surface tracks one structured reflection per session (tried, worked, failed, next). A session has been retro'd iff the graph has a reviewed edge from it to a retro row. See ADR-0010 for the design rationale.

axctl retro emit

Write a retro for one session and create the reviewed edge.

Two paths:

Flags:

axctl retro pending

List sessions in the window that have no reviewed edge. Drives the /retro skill's Step 0 "drain the backlog" flow.

Two-pass query: ended sessions (ended_at != NONE) come first; idle sessions (no ended_at AND started_at older than --idle-min) come second. Subagent sessions (source = 'claude-subagent') are excluded by default - their retros belong to the parent session's review.

Flags:

axctl retro brief

Write a .ax/tasks/retro/<session-key>.md task brief for one session. The brief is what the retro-reviewer subagent consumes. Frontmatter includes the transcript pointer, model used, turn count, pending reason, and a suggested_model heuristic (haiku for ≤5 turns, opus for ≥40 turns, sonnet otherwise).

Flags:

axctl retro list

Browse recent retros (reverse-chronological).

Flags:

axctl retro reflect

Walk clustered retro-derived proposals interactively (accept / reject / skip each pattern). Used by the /retro skill's triage step; see that skill for the full workflow.

axctl retro meta

Emit a read-only investigation snapshot (JSON) for an external AI agent to drive a deep retro-of-retros. Used by /retro-meta.

axctl retro plan

Register an externally-drafted plan as a proposal (plus experiment unless --leave-open). Called by an external agent after the user agrees in a /retro-meta session.

.ax/tasks/retro/<session-key>.md briefs

A retro brief is a markdown file with YAML frontmatter (session_id, session_key, transcript, model_used, turns, pending_reason, suggested_model, status: pending) and a body describing what the reviewer should produce. The retro-reviewer subagent reads it, calls ax retro emit --source=manual, optionally calls ax improve recommend for repeated patterns, and updates the brief's frontmatter status: completed. The reviewed edge created by ax retro emit removes the session from the next ax retro pending result.

These briefs live next to the older .ax/tasks/<id>.md improve briefs but in their own subdir to keep listings clean.

Workflow extraction queries

These commands were shipped in the feat/workflow-extraction-port-2026-05-29 branch. They cover scoped ingest, session navigation, cross-session recall, skill classification, role tagging, and role-aware skill views.

ax ingest here [--since=Nd] [--stages=...]

Scope a full ingest run to the git repository at $PWD. The claude stage is restricted to the matching ~/.claude/projects/<slug>/ transcript directory; git history is restricted to this repo path. Codex, Pi, OpenCode, and Cursor stages are skipped by default because they have no per-repo cwd filter yet.

Flags:

axctl ingest here --since=3
axctl ingest here --stages=claude,git,signals

Errors with a clear message when $PWD is not inside a git repository.

ax sessions here [--days=N] [--json]

List sessions whose repository matches the git repo at $PWD, reverse chronological, within the last N days.

Flags:

axctl sessions here
axctl sessions here --days=7 --json

ax sessions around <date> [--days=N --project=PATH] [--json]

List sessions that started within ±N days of <date>. Accepts YYYY-MM-DD or full ISO 8601.

Flags:

axctl sessions around 2026-05-23
axctl sessions around 2026-05-23 --days=7 --project=/Users/me/Projects/acme

ax sessions near <sha> [--json]

List sessions whose time range overlaps the commit window around <sha> in the git repository at $PWD. The window is derived from the commit's author date and the surrounding parent/child timestamps. Root commits fall back to ±3 days.

Flags:

axctl sessions near d923fcc
axctl sessions near HEAD --json

ax sessions show <id> [--expand=<uuid> | --all] [--by-role] [--json]

Display the invoked-skill and tool-call timeline for one session. Subagent sessions are collapsed to one-line summaries by default.

Flags:

<id> accepts a bare UUID, a claude-subagent-<id> string, or a full session:⟨...⟩ record id.

axctl sessions show a1b2c3d4-e5f6-...
axctl sessions show a1b2c3d4 --expand=f9e8d7c6 --by-role
axctl sessions show a1b2c3d4 --all --json

ax recall <q> [--sources=turn,commit,skill] [--scope=here|all] [--project=? --skill=? --since=ISO] [--json]

Full-text BM25 recall across sessions, commits, and skill invocations. Returns ranked hits with timestamps, project slugs, and excerpt snippets.

Flags:

axctl recall "auth middleware"
axctl recall "schema migration" --scope=here --sources=turn,commit
axctl recall "retry loop" --project=acme-app --since=2026-05-01 --json

Pass --project=? or --skill=? on a TTY to get a numbered interactive picker; these flags require a value when stdin is not a TTY.

ax skills classify [<skill>...] [--out-dir=<path> --dry-run --json]

Emit one classify brief per unclassified skill into .ax/tasks/classify-<slug>.md. In default mode (no names), targets all skills with ≥ 3 invocations that have no plays_role edge with source in ("frontmatter", "brief", "user"). In explicit mode (one or more names provided), targets exactly those skills with no invocation threshold and no unclassified guard.

Flags:

axctl skills classify
axctl skills classify retro simplify --dry-run
axctl skills classify --out-dir=.ax/tasks --json

Skips files that already exist (idempotent). The generated briefs are consumed by axctl skills lint once an agent fills in the primary_role frontmatter field.

ax skills tag <skill> <role> [--confidence=N --rationale="..." --remove]

Write (or remove) a plays_role edge with source="user" between a skill and a role. Idempotent: any prior user-source edge for the same pair is deleted before the new one is created. Run multiple times with different roles to attach multiple roles to the same skill.

Flags:

Role and skill names are validated at the boundary (alphanumeric, _ or -, optionally plugin-namespaced for skills; lowercase alphanumeric and _ or - for roles).

axctl skills tag retro reflection
axctl skills tag simplify cleanup --confidence=0.8 --rationale="consistent usage pattern"
axctl skills tag simplify cleanup --remove

ax skills lint [--task-dir=<path> --dry-run --json]

Scan .ax/tasks/classify-*.md for filled briefs (YAML frontmatter with a non-empty primary_role field), write plays_role edges with source="brief", and remove each brief file after a successful write. Pending briefs (no primary_role) are silently skipped. This is the counterpart to skills classify - classify emits the brief, lint consumes it.

Flags:

axctl skills lint
axctl skills lint --dry-run
axctl skills lint --task-dir=.ax/tasks --json

Sweeps all prior source="brief" edges for a skill before writing the current set, so role shrinkage is handled atomically.

ax skills weighted [--window=Nd --limit=N --doctor-threshold=N --json]

Rank skills by a composite weighted score over a rolling time window. The score blends invocations (positive), errors near invocation (negative), user corrections within 3 turns (negative), commits produced by sessions that invoked the skill (positive), and proposed-but-not-invoked counts (negative).

Flags:

axctl skills weighted
axctl skills weighted --window=7 --limit=10
axctl skills weighted --doctor-threshold=3 --json

ax skills by-role <role> [--json --limit=N]

List all skills classified as <role>, ranked by invocation count.

Flags:

axctl skills by-role reflection
axctl skills by-role cleanup --limit=20 --json

ax skills roles <skill> [--json]

List all roles assigned to <skill> (from any source: frontmatter, brief, user, or inferred), with confidence scores and sources.

Flags:

Exits with a non-zero status and an error message when the skill name is not found in the DB.

axctl skills roles retro
axctl skills roles "superpowers:systematic-debugging" --json

ax roles [--json]

List every role in the DB with the count of skills assigned to it. Useful for exploring the taxonomy before tagging or classifying.

Flags:

axctl roles
axctl roles --json

Empty DB Benchmarks

Use scripts/bench-empty-db.sh for cold ingest timing without mutating ax/main:

scripts/bench-empty-db.sh --since=90

The script selects a unique AX_DB_DB=bench_<timestamp>, applies the schema, runs ingest, imports Claude insights, writes schema.json, checkouts.json, and git.json, and generates a static dashboard under ~/.local/share/ax/benchmarks/<db>/.

Repo initialization is not per-project. Ingest discovers repositories from existing transcript cwd values and optionally from ~/.local/share/ax/ax-repos.txt. The Git pass backfills session.repository and session.checkout; produced edges are then tied to the checkout plus commit timestamp, while touched edges connect commits to canonical repository-relative files.

The final ingest smoke also found and fixed a plan-item identity bug: plan item records now use plan+sequence identity, and the writer deletes legacy content-hashed item rows that conflict on the plan_item_plan_seq unique index before upserting the canonical row.