AI Agents • History

AutoGPT & BabyAGI

The two open-source projects that first showed the world what fully autonomous AI agents could do — for better and worse.

Scroll to start

What Are AutoGPT and BabyAGI?

In early 2023, two open-source projects went viral and changed how people thought about AI. They showed that LLMs could do more than answer questions — they could pursue goals autonomously, breaking complex tasks into sub-tasks and executing them in loops without human input at every step.

AutoGPT, created by Toran Bruce Richards, gave an LLM the ability to plan, execute actions, review results, and loop. BabyAGI, created by Yohei Nakajima, showed an even simpler version — a minimal task-driven agent that creates, prioritizes, and executes tasks in a loop.

AutoGPT

  • GPT-4 only — expensive to run
  • Can spiral into infinite loops
  • No built-in cost or step limits
  • Tool use led to unpredictable behavior

BabyAGI

  • Minimal, readable code (~200 lines)
  • Clear task queue logic
  • Easy to customize and extend
  • Inspired hundreds of fork projects

Why They Mattered

Before AutoGPT and BabyAGI, the public perception of AI was largely chatbots — question in, answer out. These projects shifted the narrative to autonomous agents — AI that takes actions, not just answers questions.

They also revealed real problems: without guardrails, autonomous agents can overspend, loop endlessly, or take unintended actions. Their flaws drove the development of better frameworks — LangGraph, LangChain Agents, CrewAI — that added the safety rails these early experiments lacked.

Key Insight

AutoGPT and BabyAGI were proof of concept, not production tools. They demonstrated the art of the possible and the importance of constraints. Every agent framework built since has learned from what they got right — and especially from what they got wrong.

AutoGPT's Core Loop
🎯
Goal
User provides a goal
📋
Plan
LLM breaks it into tasks
Execute
Runs tasks via tools
🔁
Loop
Reviews and repeats
until goal complete_

The Task Loop Pattern

Both AutoGPT and BabyAGI share the same underlying pattern: a continuous loop where the agent creates tasks, executes them, and refines the task list based on results. The difference is in complexity and scope.

01
📝

Task Creation

Given a goal, the agent uses an LLM to generate a list of sub-tasks needed to achieve it. "Build a website" becomes research, design, code, test, deploy.

02

Task Execution

Each task is executed — either by the LLM directly or by calling external tools (web search, file writing, code execution). Results are recorded.

03
🔄

Task Prioritization

Results feed back into the task list. Completed tasks are removed. New sub-tasks discovered during execution are added. The loop continues until the goal is reached.

babyagi_loop.py
# BabyAGI's core loop in ~15 lines
while task_list:
    task = task_list.popleft()
    result = execute_task(task, agent)

    # Enrich result and create new tasks
    new_tasks = agent.know(result)
    task_list.extend(new_tasks)

    # Re-prioritize based on objective
    task_list = agent.prioritize(task_list)

A Real Use Case

Imagine you want to research and write a report on renewable energy trends. With AutoGPT, you give it the goal and it loops through research, summarization, and writing — stopping when the report meets its own quality bar.

The Lesson Learned

AutoGPT famously ran up thousands of dollars in API costs in hours when users set vague goals and stepped away. This drove the industry toward human-in-the-loop designs — where agents ask for confirmation before expensive actions, rather than running fully autonomously without oversight.

Knowledge Check

Test what you learned about AutoGPT and BabyAGI.

3 Questions

Question 01

What was the main innovation AutoGPT and BabyAGI demonstrated?

Question 02

What problem did AutoGPT expose that shaped later agent frameworks?

Question 03

What is the core pattern shared by both AutoGPT and BabyAGI?

🏆

You crushed it!

Perfect score on AutoGPT and BabyAGI.