s11: Error Recovery — Errors aren't the end, they're the start of a retry
s01 → ... → s09 → s10 → s11 → s12 → s13 → ... → s20
"Errors aren't the end, they're the start of a retry" — escalate tokens, compact context, switch models.
Harness layer: Resilience — classify and recover when the main loop hits errors.
The Problem
The Agent is running along and then errors out:
The Agent crashes. It doesn't retry, doesn't switch models, doesn't reduce context — it just crashes.
In production, API errors are the norm. The three most common failure modes: truncated output (the model runs out of tokens mid-sentence), context overflow (still too long even after compaction), and transient failures (429 rate limiting / 529 overload). An Agent that doesn't handle errors is like a car that stalls at the slightest touch.
Solution
)
The loop and prompt assembly from s10 are fully preserved. The only change: the LLM call is wrapped in try/except, with different recovery paths based on error type. After recovery, continue loops back to the top to call the LLM again.
The three most common recovery patterns (the teaching version only handles 429/529; real systems also cover connection errors, timeouts, cloud vendor credential caches, etc. CC actually has 13+ reason codes; see the Deep Dive for the rest):
How It Works
Path 1: Output Truncated
The model runs out of tokens mid-sentence — max_tokens is exhausted. The default 8000 tokens isn't enough for a complete response.
On the first occurrence, escalate max_tokens from 8K to 64K (8x the space) and retry the same request — the truncated output is NOT appended to messages, keeping the original request intact. If 64K is still not enough, save the truncated output and inject a continuation prompt telling the model to pick up where it left off, up to 3 times:
Escalation gets one chance; continuation gets up to 3. After that, exit — further continuations won't produce meaningful output.
Path 2: Context Overflow
The LLM says "your context is too long" (prompt_too_long). All four compaction layers from s08 have already run, and it's still over the limit.
Trigger reactive compact — more aggressive than auto compact. The teaching version keeps only the last 5 messages to simulate compaction; real CC generates a compact summary via LLM, then retries with the compacted message list. Retry after compacting. But if it's still over the limit after one compaction, the only option is to exit — compacting again won't make it any smaller:
Path 3: Transient Failures
Network blips, 429 rate limiting, 529 overload — these aren't bugs, they're normal in distributed systems.
Both 429 and 529 use exponential backoff + jitter: wait 0.5 seconds on the first attempt, 1 second on the second, 2 seconds on the third, up to 10 retries. Random jitter prevents concurrent requests from all retrying at the same instant. Three consecutive 529 overload errors → switch to the fallback model (if FALLBACK_MODEL_ID environment variable is configured):
Backoff formula: min(500 × 2^attempt, 32000) + random(0~25%). If the server returns a Retry-After header, that value takes priority.
Putting It All Together
The outer try/except catches API exceptions (prompt_too_long, etc.), with_retry handles transient errors (429/529), and stop_reason checks handle truncation. Three recovery mechanisms, each handling its own error type.
Changes from s10
Try It
Try these prompts:
- Ask the Agent to generate a very long piece of code, and observe whether it automatically continues after truncation (look for the
[max_tokens] escalatinglog) - Read many files consecutively to bloat the context, and observe reactive compact
- If you encounter 429/529, observe the exponential backoff log output
What's Next
The Agent can now automatically recover from errors. But the tasks it handles are still one-shot — you give it a task, it finishes, it's done.
What if the Agent could manage a task list — with dependencies, persisted to disk, resumable across sessions? A TODO list is not a task system.
s12 Task System → Tasks form a dependency graph with state and persistence. This is the foundation for multi-Agent collaboration.
Deep Dive into CC Source
The following is based on CC source code:
query.ts(1729 lines),services/api/withRetry.ts(822 lines),query/tokenBudget.ts(93 lines), andutils/tokenBudget.ts(73 lines).
1. A Dozen-Plus Reason/Transition Codes (Not Just 3)
The teaching version covers 3 of the most common recovery patterns. CC actually has a dozen-plus reason/transition codes, evaluated after every LLM call:
The teaching version only expands on the first 5 (most common); each of the rest has its own dedicated handling logic.
2. Precise Exponential Backoff Formula
CC's backoff delay (withRetry.ts:530-548):
If the server returns a Retry-After header, that value takes priority.
3. Original CONTINUATION Prompt
CC's continuation prompt (query.ts:1225-1227):
Token budget nudge prompt (tokenBudget.ts:72):
4. Streaming Error Handling
In CC's streaming path, recoverable errors (413, max_tokens, media errors) are withheld from display during streaming (query.ts:788-822) — SDK consumers don't see them, only the recovery logic does. After streaming ends, the system determines whether recovery is needed.
5. 529 → Fallback Model Switch
After 3 consecutive 529 overload errors (MAX_529_RETRIES = 3), CC automatically switches to the fallback model (e.g., Opus → Sonnet). On switch, all pending messages and tool results are cleared, and the user sees "Switched to {model} due to high demand".
6. Diminishing Returns Detection
Token budget "continuations" aren't unlimited. When there are 3 consecutive continuations with a token increment < 500, the system determines "continuing won't produce meaningful output" and stops continuation (tokenBudget.ts:60-62).
