How Agents Handle Unexpected Errors
What happens when an AI agent runs into a problem it didn't see coming — and how good ones recover.
Errors Are Inevitable
An AI agent is a helper that takes actions on its own — it opens pages, reads files, calls tools, and answers questions. But the real world is messy. Websites go down. Files are missing. Answers come back empty. A typo in a command can break a whole step. These surprises are called unexpected errors.
An unexpected error is anything the agent was not prepared for. It can be a tiny hiccup (the page loaded a second late) or a big wall (the API you needed is offline). The point is: the agent didn't plan for it.
A well-built agent doesn't just crash or give up. It notices the problem, thinks about what to do, and picks the best next move — like a person who runs into a closed door and quickly looks for another way in.
Errors Decide If a Tool Is Trustworthy
Agents that crash on the first problem feel like toys. Agents that recover feel like real tools you can rely on. The difference is almost always in how they handle errors.
Think about a customer service bot. A bad one says "Something went wrong, try again later" and gives up. A good one says "I couldn't reach the order system, but let me try a different way" — and gets you an answer. The first one wastes your time. The second one earns your trust.
For builders, error handling is also a safety issue. An agent that doesn't know how to fail safely can take the wrong action, repeat a mistake forever, or burn through your money retrying a broken task. Smart error handling is what keeps a runaway agent from running off the rails.
💡 Key Insight
Most "AI agent failures" you hear about are not really AI failures — they're missing error handling. The model did its best. The system around it just had no plan for what to do when things went wrong.
The Error Recovery Loop
When an agent hits an unexpected error, a good one follows a simple recovery loop. It tries, watches, thinks, and adjusts — usually in just a few seconds.
Each step matters:
- Try — take the action you planned.
- Notice — check the result. Did it work? Did it fail? What kind of failure?
- Decide — pick a recovery move. Common options: retry the same thing, change approach and try something else, ask the human for help, or stop and report.
- Remember — write down what happened so you don't make the same mistake twice.
Skipping the Remember step is the most common mistake. An agent that doesn't learn from errors will hit the same wall over and over.
A Simple Retry Loop in Python
Here's a tiny example showing the recovery loop in code. The function tries to fetch a webpage. If the page is down, it waits a moment and tries again — up to three times. If it still fails, it gives up gracefully and reports the problem instead of crashing.
import requests import time def fetch_page(url, tries=3): # Step 1: Try the action for attempt in range(1, tries + 1): try: response = requests.get(url, timeout=5) # Step 2: Notice — did it work? if response.status_code == 200: return response.text print(f"Attempt {attempt}: got status {response.status_code}") except Exception as e: # Step 2: Notice — something went wrong print(f"Attempt {attempt} failed: {e}") # Step 3: Decide — wait, then retry if attempt < tries: print("Waiting 2 seconds before retry...") time.sleep(2) # Step 4: Give up gracefully and report raise RuntimeError(f"Could not fetch {url} after {tries} tries")
This is the heart of the recovery loop in just a few lines: try, notice the error, decide to wait and try again, and stop cleanly if it still doesn't work. Real agents do the same thing — just with more steps, more tools, and a way to remember the mistake for next time.
Knowledge Check
Test what you learned with this quick quiz.