// fires on Bash tool_call before execution export const hook = { on: "pre_tool", tool: "Bash", match(call) { const cmd = call.input.command ?? ""; if (!/^\s*git\s+(push|commit)\b/.test(cmd)) return false; const branch = call.context.git?.branch ?? ""; return /^(main|master|production)$/.test(branch); }, action: "block", reason: "direct write to protected branch", };
what ax actually does
Four scenarios, one graph.
Concrete demos of what your local ax instance already exposes - backtest a hook against history, search every session you've ever had, see where your tokens go, watch a verdict earn its place at +30 sessions. Each one is something you can run today.
before-you-ship · backtest
Ask the graph what your hook would have caught.
You write a guardrail. You don't know if it'll catch real mistakes or just become noise. ax replays your last week of actual sessions against the hook before you install it, so the decision to ship is evidence, not vibes.
~/.claude $ ax hooks backtest pre-tool-main-branch-guard --since=7d ↳ replay window 2026-05-21 → 2026-05-28 (7d) ↳ sessions 14 claude_code, 3 codex (17 total) ↳ tool_calls 1,247 bash invocations indexed replaying… ████████████████████ 1247/1247 4.2s ─────────────────────────────────────────────────────────── verdict SHIP · HIGH-CONFIDENCE ─────────────────────────────────────────────────────────── fires 12 / 1,247 calls (0.96%) ├─ true positives 11 would have blocked actual main-branch pushes └─ false positives 1 legitimate hotfix → production · 2026-05-24 precision 0.917 recall 0.917 F1 0.917 prevented rollbacks 5 (traced via post-event reverts) by repo ~/Projects/ax 8 ▮▮▮▮▮▮▮▮ ~/Projects/quera 3 ▮▮▮ ~/Projects/dotfiles 1 ▮ ← false positive lives here one to review: sess_8af3·turn-42 hotfix/prod-token-leak → allow-list? install with: ax hooks install pre-tool-main-branch-guard --allow=hotfix/* ~/.claude $
search the graph
Find what you shipped last time you did this.
Every transcript ax has ever ingested is full-text searchable - Claude Code, Codex, every turn, every tool call, every reasoning text. Ranked excerpts come back with the session, the file, the commit, and whether it stuck.
Built the OAuth refresh token rotation. The middleware now checks expiry with <= not < after the bug we hit last quarter - tests cover the boundary tick and the clock-skew window.
PR #847 - OAuth refresh path. Tests cover both expiry edge cases; the middleware guards against double-refresh by holding a per-tenant lock for the duration of the rotation.
Initial OAuth wiring. Note for future me: don't reuse the access token endpoint for refresh - separate route, separate rate limit, separate audit log.
Spike on OAuth session-binding inside the middleware - rejected, returned to the PR #420 approach. Leaving the diff in scratch/ in case the threat model changes.
ax · local taste & telemetry graph · prototype
see the bleed · token-impact
Where your agent context goes.
Every agent user is bleeding money on cache misses they can't see. ax insights token-impact --since=7d joins your local claude + codex transcripts, reconciles provider metadata against transcript bytes, and shows the spend, the hit rate, and the workflows burning the budget.
By workflow epoch & expensive sessions
Bar length = share of total tokens. Color split inside each bar = cached vs. paid for the same workload. ad-hoc is half the tokens of gsd but burns more dollars - fewer rituals, lower cache hit.
ax insights workflow-impact for the cohort comparisoncache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens - and falls back to transcript-byte estimates when a turn predates cache reporting.the compounding part
Every change earns its place by session 30.
Accepting a proposal doesn't make it true. ax turns each acceptance into an experiment with three forward-looking checkpoints — t+3, t+10, t+30 sessions — and watches the next runs to see if the change actually held. Days are the wrong unit when an agent ships eight sessions a day. The verdict at t+30 sessions is locked. Future proposals know.
ax doesn't trust the moment you accept — it earns the verdict by watching what happens across the next 30 sessions. Marker still landed? File still healthy? Pattern not recurring? Tests still green? Each checkpoint joins evidence from the same graph that generated the proposal. Sessions, not days — a weekend doesn't artificially delay; a productive afternoon doesn't artificially rush. The verdict at +30 sessions is locked and feeds the next round.
recent experiments5 of 47
- post-feature-verify+30 sessmarker landed · 0 rollbacks · 1 dependentadopted
- main-branch-guardrail+10 sessmarker landed · 2 of 4 callsites bypassedpartial
- skill-ts-default+3 sessawaiting first signal · 1 session remainingpending
- ingest-regression+30 sesspattern not recurred over 30 sessions · tests greenadopted
- cache-warm-on-start+10 sessadded 800ms cold start · reverted at session 6regressed