episode traces

full execution logs
tool calls, outcomes

analysis agent

reads traces, spots
failure patterns

meta-
harness

harness patch

prompt edit, new tool
or retry logic change

eval & deploy

re-run tasks, confirm
measurable improvement