what ax actually does

Every scenario, one graph.

Concrete demos of what your local ax instance already exposes. Each one is something you can run today, on your own history.

Backtest a hook against history. Search every session you've ever had. See where your tokens go. Watch a verdict earn its place at +30 sessions. Route the intern work to cheaper models. Keep your plan budget in view. Take proposals mined from your own transcripts. Find out which sessions thrash. Publish your receipts and hand the graph to an agent.

before-you-ship · cases sample output

Ask the graph what your hook would have caught.

You write a guardrail. You don't know if it'll catch real mistakes or just become noise. ax hooks cases scores the candidate against labeled cases from your own session history - true and false positives, a real precision number - so the decision to ship is evidence, not vibes.

~/.ax/hooks/main-branch-guard.tscandidate

import { defineHook, Verdict, GitEnv } from "@ax/hooks-sdk";

export default defineHook({
  name: "main-branch-guard",
  events: ["PreToolUse"],
  matcher: { tools: ["Bash"] },
  run: (event) =>
    Effect.gen(function* () {
      const cmd = event.tool?.input.command ?? "";
      if (!/^git (push|commit)\b/.test(cmd)) return Verdict.allow;
      const branch = yield* (yield* GitEnv).currentBranch(event.cwd);
      if (/^(main|master|production)$/.test(branch ?? ""))
        return Verdict.block("direct write to protected branch");
      return Verdict.allow;
    }),
});

ax · hooks cases · ~/Projects/ax

~/.claude $ ax hooks cases main-branch-guard --since=7
  ↳ replay window   2026-05-21 → 2026-05-28  (7d)
  ↳ sessions        14 claude_code, 3 codex  (17 total)
  ↳ tool_calls      1,247 bash invocations indexed
 
  replaying… ████████████████████ 1247/1247  4.2s
 
  ───────────────────────────────────────────────────────────
  verdict          SHIP · HIGH-CONFIDENCE
  ───────────────────────────────────────────────────────────
  fires             12 / 1,247 calls  (0.96%)
  ├─ true positives  11  would have blocked actual main-branch pushes
  └─ false positives  1  legitimate release → main · 2026-05-24
 
  precision         0.917   recall 0.917   F1 0.917
  prevented rollbacks  5     (traced via post-event reverts)
 
  by repo
    ~/Projects/api       8  ▮▮▮▮▮▮▮▮
    ~/Projects/web       3  ▮▮▮
    ~/Projects/infra     1  ▮ ← false positive lives here
 
  one to review:
    sess_8af3·turn-42  release/v2-cutover  → allow-list?
 
  install with: ax hooks install ~/.ax/hooks/main-branch-guard.ts --providers=claude,codex
~/.claude $

replay window · 17 sessions · 1,247 bash calls2026-05-21 → 2026-05-28 · 12 fires · 5 rollbacks prevented

Thu 213 sessions

163 calls

Fri 222 sessions

131 calls

Sat 231 session

86 calls

Sun 242 sessions

112 calls

Mon 253 sessions

221 calls

Tue 263 sessions

248 calls

Wed 272 sessions

177 calls

Thu 281 session · today

109 calls

pass · normal traffic would have blocked · true positive traced to a later rollback false positive · review

search the graph

Find what you shipped last time you did this.

Every transcript ax has ever ingested is full-text searchable - Claude Code, Codex, every turn, every tool call, every reasoning text. Ranked excerpts come back with the session, the file, the commit, and whether it stuck.

14,832 turns412 sessionsclaude + codex

4 matches · 38 ms

010.94

Built the OAuth refresh token rotation. The middleware now checks expiry with <= not < after the bug we hit last quarter - tests cover the boundary tick and the clock-skew window.

claude code·session 5a8e9c·2026-05-21 · 14:02·~/Projects/ax·src/auth/middleware.ts

→ shipped in8b3d1f4adoptedt + 7d

020.81

PR #847 - OAuth refresh path. Tests cover both expiry edge cases; the middleware guards against double-refresh by holding a per-tenant lock for the duration of the rotation.

claude code·session 3c1d22·2026-04-14 · 09:48·~/Projects/ax·src/auth/refresh.ts

→ shipped in2e0a5ccadoptedt + 30d

030.72

Initial OAuth wiring. Note for future me: don't reuse the access token endpoint for refresh - separate route, separate rate limit, separate audit log.

codex·session 7f4b88·2026-03-02 · 22:11·~/Projects/ax·src/auth/routes.ts

→ shipped in9d1e0a2adopted · lockedt + 90d

040.41

Spike on OAuth session-binding inside the middleware - rejected, returned to the PR #420 approach. Leaving the diff in scratch/ in case the threat model changes.

claude code·session 1a2b33·2025-12-08 · 16:30·~/Projects/ax·scratch/oauth-bind.ts

→ rolled backrejectedt + 2d

ax · local taste & telemetry graph · prototype

see the bleed · token-impact

Where your agent context goes.

Every agent user is bleeding money on cache misses they can't see. ax insights token-impact --since=7d joins your local claude + codex transcripts, reconciles provider metadata against transcript bytes, and shows the spend, the hit rate, and the workflows burning the budget.

tokens · 7d

14.2M

▲ +20% vs 11.8M last week

claude 8.5M · codex 5.7M

spend · 7d

$42.18

claude $24.18codex $18.00

cache hit · 7d

67%

▼ -4pp WoWtarget 80%

By workflow epoch & expensive sessions

join: session_token_usage ⋈ session_health

gsd

42%

superpowers

31%

ad-hoc

27%

cachedcache miss (paid)

Bar length = share of total tokens. Color split inside each bar = cached vs. paid for the same workload. ad-hoc is half the tokens of gsd but burns more dollars - fewer rituals, lower cache hit.

session 9c2e44 · claude

2.40M tk

~/Projects/ax › src/ingest/transcripts.ts refactor

cache hit 41%$7.81 · 14 turns

session 4f1ab0 · codex

1.85M tk

~/Projects/ax › insights CLI scaffold

cache hit 58%$5.94 · 22 turns

session a07e91 · claude

1.31M tk

~/Projects/ax › schema v3 migration

cache hit 79%$3.12 · 9 turns

session 2bf330 · codex

1.07M tk

~/Projects/ax › docs/landing rewrite

cache hit 36%$3.78 · 31 turns

session 7d4c12 · claude

0.92M tk

~/Projects/api › live-traces vendor

cache hit 74%$2.34 · 11 turns

3.2×

codex burns 3.2× the context of claude code for equivalent work - same workflow_epoch, same repo, same outcome. Most of that is restated history per turn.workflow-impact says the gsd → superpowers migration is paying off · run ax insights workflow-impact for the cohort comparison

where the numbers come from

ax reads provider metadata - cache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens - and falls back to transcript-byte estimates when a turn predates cache reporting.

runs on your machine

Local SurrealDB instance. Typed Effect pipeline. No outbound calls, no upload. Sibling diagnostics: cache-healthworkflow-impactskill-impact

the compounding part

Every change earns its place by session 30.

Accepting a proposal doesn't make it true. ax turns each acceptance into an experiment with three forward-looking checkpoints — t+3, t+10, t+30 sessions — and watches the next runs to see if the change actually held. Days are the wrong unit when an agent ships eight sessions a day. The verdict at t+30 sessions is locked. Future proposals know.

Fig · S-04verdict timeline · post-feature-verify

acceptexperiment opened

exp_id post-feature-verify · t0

marker added · src/cli/run.ts:42

watching marker · file · pattern · tests

pending

ax doesn't trust the moment you accept — it earns the verdict by watching what happens across the next 30 sessions. Marker still landed? File still healthy? Pattern not recurring? Tests still green? Each checkpoint joins evidence from the same graph that generated the proposal. Sessions, not days — a weekend doesn't artificially delay; a productive afternoon doesn't artificially rush. The verdict at +30 sessions is locked and feeds the next round. Verdicts live in the improve queue — ax improve verdict confirms or overrides one from the CLI.

recent experiments5 of 47

post-feature-verify+30 sessmarker landed · 0 rollbacks · 1 dependentadopted
main-branch-guardrail+10 sessmarker landed · 2 of 4 callsites bypassedpartial
skill-ts-default+3 sessawaiting first signal · 1 session remainingpending
ingest-regression+30 sesspattern not recurred over 30 sessions · tests greenadopted
cache-warm-on-start+10 sessadded 800ms cold start · reverted at session 6regressed

verdict states ›adoptedregressedpartialignoredno_longer_needed

route the intern work · dispatches

Stop paying frontier rates for mechanical dispatches.

Every sub-task your agent spawns inherits your most expensive model unless something says otherwise. ax dispatches --candidates finds the dispatches that ran on fable or opus but matched a mechanical routing class - and reprices each one against the cheaper model, from the tokens it actually burned.

biggest single receipt

$35.18

one dispatch · $50.26 on inherit → sonnet

"Implement Task 3: session map strip"

redirectable · last 2d

$209.59

39 model-less dispatches on fable/opus

matched mechanical routing classes

where the nudge lands

Claudecode

route-dispatch hook · at dispatch time

advisory - suggests, can’t rewrite the dispatch

Top candidates, repriced

ax dispatches --candidates --days=14

tsagent_typedescriptionsuggestchild costest savings

06-10 13:30general-purposeImplement Task 3: session map stripclaude-sonnet-4-6$50.26$35.18

06-11 07:41general-purposeFix ingest run lifecycleclaude-sonnet-4-6$30.98$21.69

06-10 07:09general-purposeAdd deep span instrumentationclaude-sonnet-4-6$26.41$18.49

06-10 15:32general-purposeImplement P2-T16 skillsclaude-sonnet-4-6$16.29$11.40

06-11 07:42general-purposeSweep stale 8520 port refsclaude-haiku-4-5$8.56$7.70

06-12 06:44codebase-analyzerExtract contracts for planclaude-sonnet-4-6$6.75$4.73

top 6 of dozens of candidates in 14d · $99.29 est. savings on these rows alone · "inherit" means no model was specified, so the dispatch rode the expensive default

01find

ax dispatches --candidates

Inherited an expensive model + matched a mechanical class. Each row carries a suggested model and the dollars it would have saved.

02compile

ax routing compile

Writes the class table to ~/.ax/hooks/routing-table.json - merge-preserving, your own classes survive a regenerate.

03advise

route-dispatch hook

Nudges the cheaper model at dispatch time in Claude Code — advisory, so your agent decides. The next "Fix ingest run lifecycle" gets pointed at sonnet, not fable.

tune ax routing tune mines the unmatched expensive dispatches into new classes - two-token prefix clustering, ≥3 members. Mechanical classes auto-apply; judgment-flagged ones (review / design / plan / audit) only ship via an emitted brief and an agent backtest.

where the numbers come from

Every dispatch row joins the parent tool_call to the child session it spawned. Savings are repriced from the tokens the child actually burned - not a projection, a receipt.

there's a whole page on this

The leak, the loop, and 30 days of verbatim receipts from one machine: ax · routing →

measure + tune, live

Your bill, broken out and tunable.

ax studio's /cost view renders the same numbers the CLI prints — the main-vs-subagent spend split, per-model cost, and the dispatch candidates worth routing down — live off your local graph. And routing is regex underneath, so it ships an interactive tuner: edit a class pattern, watch which past dispatches it catches (and which it shouldn't), flag false positives into an exclude list, and save — the route-dispatch hook picks it up live.

ax studio /cost view: main-thread routability bars and the interactive routing tuner with an editable regex pattern, suggested model, and exclude patterns over real dispatch history — ax studio · `/cost` — main-thread routability and the interactive routing tuner

know the envelope · quota

Your plan limits, live, everywhere you look.

Claude tells you about your usage limit when you hit it. ax quota reads the same usage endpoint the Claude app does - your 5-hour and 7-day rolling windows, live, with the OAuth token you already have. No new login, no DB, nothing leaves your machine but the one call Claude already makes.

5h64%resets 04:29

7d63%resets 04:59

7d sonnet5%resets 04:59

extraoff · no overage billing past the windows

One cached read, three surfaces

~/.ax/quota-cache.json · 60s ttl

terminalax quota

~ $ ax quota

window       used  resets
5h            64%  04:29
7d            63%  04:59
7d sonnet      5%  04:59
extra         off

(fetched 0s ago, live)

claude code statuslineax quota --statusline

~/Projects/ax · sonnet-4-65h 64% → 04:29 · 7d 63%

One plain line for the statusLine command. Poll every render - it's the cache answering, not the API.

macOS menubarax quota --swiftbar

◕ 64%⌥⚙Fri 09:41

A SwiftBar/xbar plugin body - the burn rate lives next to the clock. Fetch failures degrade to the stale cache, never a crash in the menubar.

where the numbers come from

The same api.anthropic.com/api/oauth/usage endpoint the Claude app polls, read with your existing Claude Code OAuth token - macOS Keychain first, ~/.claude/.credentials.json fallback. ax never refreshes the token.

runs on your machine

No SurrealDB involved at all - this is the one ax command with zero graph. Responses cache at ~/.ax/quota-cache.json (60s TTL) so statusline and menubar can poll freely without hammering the endpoint.

the graph talks back · improve from our own graph · 2026-06

Proposals mined from your own transcripts.

ax improve recommend scores improvement proposals out of your transcript graph - each one with an evidence trail and a backtested projected value. Accept one and it becomes a brief an agent acts on. Lint reconciles what actually got applied. Verdicts confirm it or retire it.

17.49hookhook__17b5aaf6aade53e5high · 39/wkorigin: system

Route mechanical subagent dispatches to cheaper models

evidence 39 model-less dispatches on fable/opus matched mechanical routing classes in the last 2d; est $209.59 redirectable. Top classes: well-specified-impl ($95.27), bug-fix ($44.59), spec-review ($32.57).

apply axctl improve accept hook__17b5aaf6aade53e5

16.03skillPost-feature verification checklisthigh · 26/wkFeature closure needs stronger same-file follow-up verification.

11.93skillGraph query dogfood checklisthigh · 8/wkQuery builders can pass string tests while returning slow or low-signal output.

8.90skillSurrealDB schema change guardrailhigh · 3/wkSchema changes need a tighter migration/apply/query verification loop.

Accept is not the end - it's the experiment

recommend → accept → apply → lint → verdict

recommendscored, with evidenceaccept.ax/tasks/<id>.md briefagent applieslike any task filelintreconciles guidanceverdictconfirms or retires

Agents write back too - ax improve propose / ax improve analyze let a session file its own proposal mid-run; origin badges keep agent-derived and system-derived suggestions distinguishable.

#1↑

The top proposal above is the first showcase on this page. The graph mined "route mechanical dispatches to cheaper models" out of its own transcripts - $209.59 redirectable in two days - before it existed as a feature. We accepted the brief; it shipped as dispatch routing.the loop eating its own output · run ax improve recommend for yours

where the numbers come from

Scores blend frequency, severity, and the impact engine's backtested projected value - what the proposal would have saved or caught over your actual recent history, not a hypothetical.

runs on your machine

Mined from the local graph, applied to your own agent files. Nothing auto-edits: accept emits a brief, an agent does the work, ax improve lint checks it landed. The whole deck - proposals, impact, and past bets measured at +3/+10/+30 sessions - lives in the studio improve dashboard: ax serve.

who's thrashing · churn

Landed, edited, repaired - by source.

Lines of code is a vanity metric until you split it. ax sessions churn --here classifies 30 days of writes into landed vs edit vs repair LOC per provider, counts failed checks, and groups the failures into episodes - so "which sessions thrash" has a number.

Composition of added LOC · 30d

~/Projects/ax · claude / claude-subagent / codex

codex

claude-subagent

claude

landed · survived as written edit · reworked later repair · fixing a failed check

The repair sliver is the point - a tiny repair share means checks catch problems before they ship. The edit band is where the real rework hides: claude-subagent reworks a third of everything it writes.

sourcesessfailsepisodespasslandededitsrepair

codex574672322+330,730/-150,779+286/-142+58/-25

claude-subagent71952913+23,979/-4,157+13,508/-2,199+981/-274

claude1417173+117,641/-32,784+36,455/-9,172+3,867/-1,550

ax sessions churn --here · 30d window · LOC shown as +added/-removed

What an episode is

failure opens · same-family pass closes · 30min expiry

✗check failsepisode opens✗ ✗same-family failuresjoin the open episode✓same-family passepisode closes⏱30 min silenceepisode expires

467✗

codex failed 467 checks in 30 days - 8.2 per session - and still landed 330k LOC with under 0.1% repair share. The failures cluster into just 23 episodes: it thrashes in short windows against the test suite, then lands clean.claude-subagent is the opposite shape · 1.3 fails/session, 35% edit share

where the numbers come from

Every tool_call that runs a check (tests, typecheck, lint, build) is classified pass/fail by family. LOC written after a failure, touching the same files, counts as repair; later rework of landed lines counts as edit.

runs on your machine

Same local graph as everything else - scope with --here, a specific --project, or one --source. 30d window by default, --since=N to change it.

receipts, public · profiles

Publish what you actually ran.

ax profile publish turns your local graph into a public gist - counts, dates, trends, the skills and hooks you really lean on. No transcripts, no code, no paths. The nightly compile ranks everyone who opted in.

leaderboard/leaders

#	user	tokens
1	@you	1.8B
2	@abuilder	1.2B
3	@cferreira	940M

Boards rebuild from registered gists. A skill trends once 2+ builders use it - however each installed it (a loose skill dir and a plugin count as the same skill). See the live boards →

~/.axax-profile.json

{
  "v": 1,
  "github": "you",
  "window_days": 30,
  "stats": {
    "sessions": 412,
    "streak_days": 9,
    "tokens": { "total": 1.8e9 },
    "cost_usd": 605
  },
  "rig": {
    "skills": [
      { "name": "superpowers:tdd", "runs": 88 }
    ],
    "hooks": ["enforce-worktree"],
    "routing_table": true
  }
}

Aggregates only - the exact JSON is shown to you for consent before the first publish. Your profile page renders it live.

Hand the graph to an agent

ax mcp · stdio · 17 read-only tools

model context protocolax mcp

ax mcp runs a stdio MCP server exposing ax's read-only queries as 17 tools, so an agent can interrogate your graph in-context - recall a past session, pull weighted skills, read a proposal - mid-task. Mutating ops are deliberately not exposed.

recall
sessions_around
session_show
skills_weighted
skills_by_role
skills_roles
roles
improve_recommend
improve_show
improve_list

consent first, always

The first publish shows you the exact JSON and asks. State lives in ~/.ax/profile-publish.json; ax profile unpublish deletes the gist and resets it. Nothing leaves your machine until you say yes.

runs on your machine

The MCP server has no native deps and never mutates - it's the same query layer the CLI uses, handed to whatever agent you point at it. The graph stays local; only the answers cross the wire.