axagent experiencelive

what ax actually does

Four scenarios, one graph.

Concrete demos of what your local ax instance already exposes - backtest a hook against history, search every session you've ever had, see where your tokens go, watch a verdict earn its place at +30 sessions. Each one is something you can run today.

before-you-ship · backtest

Ask the graph what your hook would have caught.

You write a guardrail. You don't know if it'll catch real mistakes or just become noise. ax replays your last week of actual sessions against the hook before you install it, so the decision to ship is evidence, not vibes.

~/.claude/hooks/pre-tool-main-branch-guard.tscandidate
// fires on Bash tool_call before execution
export const hook = {
  on: "pre_tool",
  tool: "Bash",
  match(call) {
    const cmd = call.input.command ?? "";
    if (!/^\s*git\s+(push|commit)\b/.test(cmd)) return false;
    const branch = call.context.git?.branch ?? "";
    return /^(main|master|production)$/.test(branch);
  },
  action: "block",
  reason: "direct write to protected branch",
};
pre-tool · Bash12 lines · gut-check before backtest
ax · hooks backtest · ~/Projects/ax
~/.claude $ ax hooks backtest pre-tool-main-branch-guard --since=7d
  ↳ replay window   2026-05-21 → 2026-05-28  (7d)
  ↳ sessions        14 claude_code, 3 codex  (17 total)
  ↳ tool_calls      1,247 bash invocations indexed
 
  replaying… ████████████████████ 1247/1247  4.2s
 
  ───────────────────────────────────────────────────────────
  verdict          SHIP · HIGH-CONFIDENCE
  ───────────────────────────────────────────────────────────
  fires             12 / 1,247 calls  (0.96%)
  ├─ true positives  11  would have blocked actual main-branch pushes
  └─ false positives  1  legitimate hotfix → production · 2026-05-24
 
  precision         0.917   recall 0.917   F1 0.917
  prevented rollbacks  5     (traced via post-event reverts)
 
  by repo
    ~/Projects/ax         8  ▮▮▮▮▮▮▮▮
    ~/Projects/quera      3  ▮▮▮
    ~/Projects/dotfiles   1   ← false positive lives here
 
  one to review:
    sess_8af3·turn-42  hotfix/prod-token-leak  → allow-list?
 
  install with: ax hooks install pre-tool-main-branch-guard --allow=hotfix/*
~/.claude $ 
replay window · 17 sessions · 1,247 bash calls2026-05-21 → 2026-05-28 · 12 fires · 5 rollbacks prevented
Thu 213 sessions
163 calls
Fri 222 sessions
131 calls
Sat 231 session
86 calls
Sun 242 sessions
112 calls
Mon 253 sessions
221 calls
Tue 263 sessions
248 calls
Wed 272 sessions
177 calls
Thu 281 session · today
109 calls
pass · normal traffic would have blocked · true positive traced to a later rollback false positive · review

search the graph

Find what you shipped last time you did this.

Every transcript ax has ever ingested is full-text searchable - Claude Code, Codex, every turn, every tool call, every reasoning text. Ranked excerpts come back with the session, the file, the commit, and whether it stuck.

14,832 turns412 sessionsclaude + codex
4 matches · 38 ms
010.94

Built the OAuth refresh token rotation. The middleware now checks expiry with <= not < after the bug we hit last quarter - tests cover the boundary tick and the clock-skew window.

claude code·session 5a8e9c·2026-05-21 · 14:02·~/Projects/ax·src/auth/middleware.ts
→ shipped in8b3d1f4adoptedt + 7d
020.81

PR #847 - OAuth refresh path. Tests cover both expiry edge cases; the middleware guards against double-refresh by holding a per-tenant lock for the duration of the rotation.

claude code·session 3c1d22·2026-04-14 · 09:48·~/Projects/ax·src/auth/refresh.ts
→ shipped in2e0a5ccadoptedt + 30d
030.72

Initial OAuth wiring. Note for future me: don't reuse the access token endpoint for refresh - separate route, separate rate limit, separate audit log.

codex·session 7f4b88·2026-03-02 · 22:11·~/Projects/ax·src/auth/routes.ts
→ shipped in9d1e0a2adopted · lockedt + 90d
040.41

Spike on OAuth session-binding inside the middleware - rejected, returned to the PR #420 approach. Leaving the diff in scratch/ in case the threat model changes.

claude code·session 1a2b33·2025-12-08 · 16:30·~/Projects/ax·scratch/oauth-bind.ts
→ rolled backrejectedt + 2d

ax · local taste & telemetry graph · prototype

see the bleed · token-impact

Where your agent context goes.

Every agent user is bleeding money on cache misses they can't see. ax insights token-impact --since=7d joins your local claude + codex transcripts, reconciles provider metadata against transcript bytes, and shows the spend, the hit rate, and the workflows burning the budget.

tokens · 7d
14.2M
▲ +20%  vs 11.8M last week
claude 8.5M · codex 5.7M
spend · 7d
$42.18
claude $24.18codex $18.00
cache hit · 7d
67%
▼ -4pp WoWtarget 80%

By workflow epoch & expensive sessions

join: session_token_usage ⋈ session_health
gsd
42%
superpowers
31%
ad-hoc
27%
cachedcache miss (paid)

Bar length = share of total tokens. Color split inside each bar = cached vs. paid for the same workload. ad-hoc is half the tokens of gsd but burns more dollars - fewer rituals, lower cache hit.

session 9c2e44 · claude
2.40M tk
~/Projects/ax src/ingest/transcripts.ts refactor
cache hit 41%$7.81 · 14 turns
session 4f1ab0 · codex
1.85M tk
~/Projects/ax insights CLI scaffold
cache hit 58%$5.94 · 22 turns
session a07e91 · claude
1.31M tk
~/Projects/ax schema v3 migration
cache hit 79%$3.12 · 9 turns
session 2bf330 · codex
1.07M tk
~/Projects/ax docs/landing rewrite
cache hit 36%$3.78 · 31 turns
session 7d4c12 · claude
0.92M tk
~/Projects/quera live-traces vendor
cache hit 74%$2.34 · 11 turns
3.2×
codex burns 3.2× the context of claude code for equivalent work - same workflow_epoch, same repo, same outcome. Most of that is restated history per turn.workflow-impact says the gsd → superpowers migration is paying off · run ax insights workflow-impact for the cohort comparison
where the numbers come from
ax reads provider metadata - cache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens - and falls back to transcript-byte estimates when a turn predates cache reporting.
runs on your machine
Local SurrealDB instance. Typed Effect pipeline. No outbound calls, no upload. Sibling diagnostics: cache-healthworkflow-impactskill-impact

the compounding part

Every change earns its place by session 30.

Accepting a proposal doesn't make it true. ax turns each acceptance into an experiment with three forward-looking checkpoints — t+3, t+10, t+30 sessions — and watches the next runs to see if the change actually held. Days are the wrong unit when an agent ships eight sessions a day. The verdict at t+30 sessions is locked. Future proposals know.

Fig · S-04verdict timeline · post-feature-verify
acceptexperiment opened
exp_id post-feature-verify · t0
marker added · src/cli/run.ts:42
watching marker · file · pattern · tests
pending

ax doesn't trust the moment you accept — it earns the verdict by watching what happens across the next 30 sessions. Marker still landed? File still healthy? Pattern not recurring? Tests still green? Each checkpoint joins evidence from the same graph that generated the proposal. Sessions, not days — a weekend doesn't artificially delay; a productive afternoon doesn't artificially rush. The verdict at +30 sessions is locked and feeds the next round.

recent experiments5 of 47

  • post-feature-verify+30 sessmarker landed · 0 rollbacks · 1 dependentadopted
  • main-branch-guardrail+10 sessmarker landed · 2 of 4 callsites bypassedpartial
  • skill-ts-default+3 sessawaiting first signal · 1 session remainingpending
  • ingest-regression+30 sesspattern not recurred over 30 sessions · tests greenadopted
  • cache-warm-on-start+10 sessadded 800ms cold start · reverted at session 6regressed
verdict states ›adoptedregressedpartialignoredno_longer_needed