AI Development

Temperature, Tokens, and Context Windows

The three key levers that control how AI behaves — and how to use them without a computer science degree.

Scroll to start

Three Knobs That Change Everything

Think of AI like a very fast reader who has to guess the next word in every sentence. Temperature, tokens, and context windows are the three settings that control how it does this.

These aren't scary technical terms — they're just ways to tell AI how creative, how focused, and how attentive you want it to be. Once you understand them, you can make AI do exactly what you need.

1
🎲

Temperature

Controls how random or predictable the AI's answers are. Think of it like a creativity dial.

2
📝

Tokens

The tiny word-pieces that AI reads and writes. One token is about half a word in English.

3
🧠

Context Window

How much text the AI can "remember" at once — its working memory for the whole conversation.

These Settings Actually Matter

The same AI, with the same question, can give you completely different answers just by changing these three settings. Understanding them means you stop blaming AI for being "bad" and start knowing how to fix it.

Temperature decides whether you get a reliable answer or a wild creative one. Context windows decide if the AI can handle your long email or just the first paragraph. And tokens decide both what the AI sees and what shows up on your bill.

💡 Key Insight

Tokens are like pennies — each one is tiny, but they add up fast. A typical page of text is about 800 tokens. That 20-page document? That's 16,000 tokens. And every token costs money.

Getting these right means you use less money, get better answers, and stop AI from forgetting what you said five minutes ago in a long conversation.

The Three Levers, Explained Simply

Here's how each one actually works, in plain language.

How Temperature Works
🤖
AI Guesses
AI sees "The sky is..." and thinks: blue, gray, pink...
🎲
Temp = 0
Always picks the top answer: "blue"
Temp = 1
Picks a wild one: "chartreuse" or "grandma's kitchen"

Low Temperature (0.0 – 0.3)

  • 📋 Picks the most likely next word every time
  • Great for facts, math, and code
  • 🔁 Same answer every time
  • 📊 Reliable and consistent

High Temperature (0.7 – 1.0)

  • 🎨 Picks from less likely words
  • 💡 Great for stories, brainstorming, poems
  • 🔀 Different answer each time
  • 🌈 More creative and surprising

Tokens are broken-up pieces of text. AI doesn't read word-by-word — it reads in chunks called tokens. "Apple" might be one token. "Understanding" might be two. Most English words are 1–2 tokens. A paragraph is about 50–150 tokens. A page is about 800 tokens.

Context Window is the total amount of tokens the AI can see at once. It's like giving the AI a fixed-size sticky note to write everything it knows. When that sticky note fills up, the oldest writing gets erased. Different AI models have different sizes — some hold 4,000 tokens, some hold 200,000 tokens.

Temperature in Action

Here's how the same prompt gives completely different results at low vs. high temperature. This is a Python example using the OpenAI API.

temperature_demo.py
# Same prompt — different creativity levels
import openai

client = OpenAI()

# LOW TEMP — reliable, focused answers
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 15% of 80?"}],
    temperature=0   # Predictable, exact answer
)
# Output: "12" — every single time

# HIGH TEMP — creative, surprising answers
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a one-sentence story about a robot."}],
    temperature=0.9  # Creative, different every time
)
# Output: varies each time — "The robot planted flowers
# made of microchips, hoping they'd grow softer..."

# CONTEXT WINDOW check — rough token count
prompt = "This is a long document. " * 500
print(len(prompt.split()) / 2)  # ~token estimate
# If this exceeds your model's context limit,
# the oldest parts get dropped silently.

The context window matters because when you paste a long document, you're spending tokens — and when you run out of space, older parts of your conversation just disappear. That's why AI sometimes "forgets" things you said at the start of a long chat.

Knowledge Check

Test what you learned with this quick quiz.

Quick Quiz

Question 1
You want AI to write a fun, creative poem. Which temperature setting would work best?
Question 2
What is a "token" in AI terms?
Question 3
You paste a 50-page document into AI and it seems to forget what you asked at the top. What probably happened?
🏆

You crushed it!

Perfect score on this module.