How ax sees your work

ax watches your Claude Code and Codex sessions on disk, turns them into a typed local graph, and lets you query the result without sending anything anywhere. This page walks the shape once — not as a table reference, but as a tour of the data flow from raw transcript to queryable evidence.

The shape

Every session you run with Claude Code or Codex lands as a .jsonl file under ~/.claude/projects/ or ~/.codex/sessions/. Each line is a turn: a user message, an assistant response, a tool call and its result, a correction, a sub-agent dispatch. The ingest pipeline reads those files and materialises the transcript into a SurrealDB graph running locally on your machine.

The core nodes are few. A session is one run from start to finish — it knows which project, which model, roughly when it started, and whether a commit came out the other side. A turn belongs to a session and carries the role (user, assistant, tool_result), a classified intent (organic task, correction, preference, wrapper instruction), and a short text excerpt for full-text recall. A tool_call belongs to a turn and records exactly what tool fired, what it was handed, what came back, how long it took, and whether it errored. A skill is a standing instruction installed on your machine — a markdown file that shapes how the agent behaves.

Connecting them are graph edges. The schema uses SurrealDB RELATE statements to express relationships that would be join-table boilerplate in a relational schema: a turn invokes a skill when the agent loads it; a session produced a commit; a commit touched a file; a turn corrected_by the next user turn when the user stepped in to redirect. These edges are the load-bearing primitive. A vector index of turn text would let you find similar moments — it wouldn't tell you that this correction came three turns after that tool error, or that this skill fires in sessions where the commit rate is low.

Derived tables — session_health, proposal, retro, friction_event, command_outcome, and semantic_signal — sit on top of the core nodes and summarise at session or cross-session scope. They are how the graph accumulates opinion over time, not just fact.

The stages

Each ingest stage has a reason for existing that lives next to its source file as a @rationale comment. The section below is generated by walking src/ingest/*.ts and pulling those rationale headers out.

skills

Skills are the agent's standing instructions. Indexing them up-front means later stages can ask "which skills exist" without re-walking the filesystem on every query, and the dashboard can show a static catalogue without reading transcripts at all.

Inputs: ~/.claude/skills/, ~/.agents/skills/, plugin caches

Outputs: skill rows, plays_role edges

Source: apps/axctl/src/ingest/skills.ts

agent-def

Subagent definition files (~/.claude/agents/*.md + per-repo .claude/agents/*.md) are config the agent declares but the graph was previously blind to (only scope-read, no table). Indexing them as a first-class reconciled entity - same lifecycle as skills - lets the dashboard list agents, their declared skills, and their model, and lets reconcile tombstone agents deleted off disk instead of ghosting forever.

Inputs: ~/.claude/agents/.md, <repo>/.claude/agents/.md

Outputs: agent_def rows (soft-tombstoned on disappearance)

Source: apps/axctl/src/ingest/agent-def.ts

invoked-positions

Computes and writes turn_index, total_turns, and is_first onto every invoked edge that still carries NONE for any of those fields. turn_index is written at RELATE time for new ingests, but total_turns and is_first require the full turn count per session and the per- (session, skill) group ordering - information that is only stable after all transcripts are ingested. This stage runs after claude, codex, and subagents to fill in those values.

Inputs: invoked edges with NONE position fields

Outputs: invoked.turn_index, invoked.total_turns, invoked.is_first

Source: apps/axctl/src/ingest/backfill-invoked-positions.ts

Today only the skills stage is annotated. As each remaining ingest stage gets its @rationale comment, it will appear here automatically without further edits to this page.

The readers

Once the graph is built, a small set of typed queries drive everything the CLI and dashboard expose. They are not general-purpose queries; each one is shaped to a specific product question.

ax improve list reads proposal rows ordered by score, joining back through cites_evidence to the friction, command outcome, or session evidence that generated them. It answers: what should I try changing about how I work?

ax retro pending reads session rows that ended without a corresponding retro record. It answers: which recent sessions haven't been reflected on yet?

axctl skills search <term> runs a BM25 full-text query across skill.name and skill.description using the skill_text analyzer, then ranks by combined term presence. It answers: do I have a skill for this?

The dashboard insight views read session_health aggregates — turn counts, tool error rates, correction counts, context pressure — grouped by workflow_epoch so you can see how a week of work compares to the one before it.

Reader rationale annotation — the same @rationale pattern applied to query modules — is not in place yet. It will follow the same ingest-stage approach: annotate the source, run the extractor, the docs update themselves. That work is a Task 6 follow-up.

Why this shape

Local-first. Every piece of evidence ax collects is private by construction. The SurrealDB instance runs on 127.0.0.1:8521 on your machine. Nothing is transmitted to a remote server, no API key is required to read your own history, and the data stays where you can inspect or delete it. Local-first is not a constraint that was imposed — it is the shape that fits a tool whose subject matter is what you tried, what failed, and what you want to try next.

Graph, not vector index. The interesting questions about a session are relational: which tool calls preceded this correction; does this skill get invoked in sessions that produce commits or sessions that don't; how often does the agent ask for clarification before making an edit versus after. A vector index of turn text answers similarity queries well — "find turns that look like this one" — but loses the connective tissue between turns, sessions, tools, files, and commits. SurrealDB's graph layer means a RELATE edge between a session and a commit is a first-class queryable fact, not a join across two denormalised tables. The schema comment at the top of schema.surql puts it plainly: skill ← invoked ← turn → edited → file. That is the model.

SurrealQL. Running both document and graph queries against the same database without a second moving part means the product vocabulary — session, turn, skill, retro, proposal — can be expressed directly in schema and queries without an impedance mismatch layer. The schema stays close to how ax talks about itself; SurrealQL's RELATE and graph-traversal syntax (->, <-) let the queries read like the product description.


For leaf-level decisions — why this edge type, why this ingest stage cap, why this idempotency strategy — see the ADR index.