AI Development

What Are Embeddings and How AI Understands Meaning

Q: What is an embedding?

A list of numbers that represents the meaning of a word or sentence

Q: Why do similar words end up with similar embeddings?

Because they show up in similar situations across billions of sentences

Q: Which of these is a common use for embeddings?

To find text with a similar meaning

A simple look at how AI turns words and ideas into numbers to find patterns in meaning.

Scroll to start

01 — The Concept

Turning Words Into Numbers

Imagine if every word in the English language had its own address on a giant map of meaning. The word "cat" would sit close to "kitten." "Happy" would be near "joyful." "Banana" would sit next to "fruit." This is what an embedding does — but for an AI computer.

An embedding is a long list of numbers that captures the meaning of a word, sentence, or idea. AI computers don't read words the way we do. They work with numbers. So we turn words into numbers in a smart way. Words that mean similar things end up with similar numbers. Words that mean very different things end up with numbers that are far apart.

Think of it like GPS coordinates. Every place on Earth has two numbers — a latitude and a longitude. Places that are close in real life have close numbers. Embeddings work the same way. But instead of two numbers and physical places, they use hundreds of numbers and meanings.

02 — Why It Matters

Why AI Needs to Understand Meaning

Every time you use AI today, embeddings are working behind the scenes. When you ask ChatGPT a question, it uses embeddings to figure out what you mean. When Netflix suggests a movie, it uses embeddings to find shows that feel like the ones you already watched. When Google shows search results, it uses embeddings to find pages that match the idea behind your words — not just the exact words you typed.

This matters because the same idea can be said in many different ways. "Happy," "joyful," "glad," and "cheerful" all mean close to the same thing. A simple search would treat them as four different words. Embeddings help AI see that they are similar.

Without embeddings, AI would be stuck matching exact words. With embeddings, AI gets meaning. This is what lets AI translate languages, recommend products, sort your email, and answer your questions.

💡 Key Insight

Embeddings let AI understand that "puppy" is closer to "dog" than to "skyscraper" — even though no one ever told the AI that. The AI learned this by reading billions of sentences and noticing which words show up in similar places.

03 — How It Works

How AI Learns to Turn Words Into Numbers

Embeddings are made by a special kind of AI called a language model. The model reads billions of sentences from books, websites, and articles. As it reads, it learns which words tend to show up in similar situations. Words that often appear near "puppy" — like "dog," "cute," and "bark" — end up with similar numbers. Words that never show up together end up with very different numbers.

The model is not given a list of which words are similar. It figures this out by noticing patterns on its own. After seeing enough text, it builds a giant map where each word has a spot. The spot is decided by the meaning the model learned. Here is the simple process:

How an Embedding Is Made

📚

Read

AI reads billions of sentences

→

🔍

Spot Patterns

Find words used in similar ways

→

🧮

Score Words

Give each word a list of numbers

→

🗺️

Map Meaning

Place similar words close together

↺ refined over time

Once the map is built, comparing two words is just math. The AI takes the two lists of numbers and checks how close they are. If the numbers are very close, the words mean similar things. If they are very far apart, the words mean different things.

04 — Practical Example

Making an Embedding With Python

Here is a real example. We will ask an AI model to turn a word into an embedding — a list of numbers that captures its meaning. We will use the OpenAI API, but other services work in a very similar way.

embed.py

# pip install openai numpy
from openai import OpenAI
import numpy as np

client = OpenAI()

# Get an embedding for a single word
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="puppy"
)

embedding = response.data[0].embedding

print(f"Word: puppy")
print(f"Embedding length: {len(embedding)} numbers")
print(f"First 5 numbers: {embedding[:5]}")

When we run this code, the AI sends back a long list of numbers. The list is 1,536 numbers long. Here is what the output looks like:

output

Word: puppy
Embedding length: 1536 numbers
First 5 numbers: [0.0123, -0.0456, 0.0789, -0.0234, 0.0567]

Now let's compare two words and see how similar they are. We use a simple math trick called cosine similarity. The closer the result is to 1.0, the more similar the words.

compare.py

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def embed(word):
    r = client.embeddings.create(
        model="text-embedding-3-small",
        input=word
    )
    return r.data[0].embedding

puppy = embed("puppy")
dog   = embed("dog")
car   = embed("car")

print(f"puppy vs dog:  {cosine_similarity(puppy, dog):.3f}")  # ~0.82
print(f"puppy vs car:  {cosine_similarity(puppy, car):.3f}")  # ~0.18

The first score is high — "puppy" and "dog" are clearly related. The second score is low — "puppy" and "car" have nothing to do with each other. This is the magic of embeddings: the numbers encode meaning, and you can compare that meaning with simple math.

05 — Test Yourself

Knowledge Check

Test what you learned with this quick quiz.

Quick Quiz — 3 Questions

Question 1

What is an embedding?

Question 2

Why do similar words end up with similar embeddings?

Question 3

Which of these is a common use for embeddings?

Turning Words Into Numbers

Why AI Needs to Understand Meaning

💡 Key Insight

How AI Learns to Turn Words Into Numbers

Making an Embedding With Python

Knowledge Check

Quick Quiz — 3 Questions

You crushed it!