Internal working engineering record - kept for reference, not a user-facing guide.

Use native hook observability as the substrate for autonomous agent retrospectives

axctl will ingest native Harness Hook Events and Hook Command invocations as stable Local Evidence before treating hook-driven improvements as product behavior. Hook command results are raw evidence, not final judgment: a blocking hook can be successful boundary feedback when the agent shows corrective follow-through.

Feedback Cases and Evaluation Rules will evaluate short-horizon behavior after a hook signal, using deterministic backtests by default and AI primarily for case authoring, explanation, ambiguous review, and refinement. Case types remain generic because hook intent is user- and team-specific.

Agent Retrospectives may run Autonomous Intervention Runs that create, test, enable, pause, or revise hooks, skills, Evaluation Rules, and other Guidance changes. Global hook settings should point to an ax-managed intervention runner such as axctl intervention run <id> rather than embedding generated shell directly. The runner owns timeouts, smoke tests, fail-open defaults, explicit fail-closed guardrails, disable switches, rollback metadata, and Recovery Paths.

When Guidance Sources are Git-tracked, autonomous runs should commit ax-managed changes separately from user work and record commits, before/after hashes, evidence, scope, backtest results, and rollback commands. Historical self-improve proposal sessions become Retrospective Candidates that can be backtested and promoted later, but native hook observability lands first because it provides the measurement substrate.