Rate Limiting and Throttling Explained
How websites control how many requests you can make — and why it keeps the whole internet running.
Too Many Requests, Not Enough Seats
Imagine a coffee shop with 10 chairs. If 50 people try to sit down at once, there's chaos. The shop needs a rule: one person per seat, and if the seats are full, new people wait. That's rate limiting — a set of rules that controls how many times something can happen in a given period of time.
When you use an app or a website, your browser sends requests to a server. Servers can only handle so many requests at once — just like those 10 chairs. Rate limiting is how servers make sure everyone gets a fair turn without crashing the whole system.
Throttling is a softer version of the same idea. Instead of cutting someone off completely when they hit their limit, throttling slows them down. Think of it like a speed governor on a car: you can go fast, but not dangerously fast.
Keeping the Internet Fair and Online
Without rate limiting, one person with a fast script could crowd out everyone else. A single automated program can send thousands of requests per second — that's like one person trying to take all 50 coffee shop seats at once.
Rate limiting protects three things: the service stays online so everyone can use it, legitimate users get fair access without slowdowns, and costs stay predictable because the server isn't overwhelmed.
💡 Key Insight
Rate limiting isn't about blocking users — it's about making sure the internet works for everyone. Even the biggest companies like Google and AWS use rate limits to keep their services stable during traffic spikes.
The Three Main Ways to Limit
Servers track how many requests come in using a few different methods. Here are the most common ones:
- Fixed Window — The server picks a time window (like one minute) and counts requests inside it. When the window resets, the count starts over. Simple to understand, but can have a "boundary burst" problem where requests spike at the exact moment a window resets.
- Token Bucket — Think of a bucket that holds tokens. Each request uses one token. Tokens refill at a steady rate (say, 10 per minute). You can burst up to the bucket's max, but then you wait for refills. This is the most common approach for API rate limiting.
- Leaky Bucket — Requests flow through like water through a leaky bucket — at a constant rate no matter how fast they arrive. Extra requests queue up or get dropped. Great for smoothing out traffic spikes.
When a user hits their limit, the server usually sends back a special code: 429 Too Many Requests. This tells the calling app or browser: "Slow down, come back later."
A Simple Rate Limiter in Code
Here's a tiny JavaScript example that shows the basic idea of token bucket rate limiting. Every time you call makeRequest(), it checks if there's a token available.
// Token bucket rate limiter // Allows 5 requests, refills 1 token every second const bucket = { tokens: 5, maxTokens: 5, refillRate: 1, // per second lastRefill: Date.now() }; function refillBucket() { const now = Date.now(); const secondsPassed = (now - bucket.lastRefill) / 1000; bucket.tokens = Math.min( bucket.maxTokens, bucket.tokens + (secondsPassed * bucket.refillRate) ); bucket.lastRefill = now; } function makeRequest() { refillBucket(); if (bucket.tokens < 1) { console.log("⛔ Rate limited! Wait a moment."); return false; } bucket.tokens--; console.log("✅ Request sent! Tokens left:", bucket.tokens); return true; } // Try making 7 requests for (let i = 0; i < 7; i++) { makeRequest(); }
The output would look like this:
✅ Request sent! Tokens left: 4 ✅ Request sent! Tokens left: 3 ✅ Request sent! Tokens left: 2 ✅ Request sent! Tokens left: 1 ✅ Request sent! Tokens left: 0 ⛔ Rate limited! Wait a moment. ⛔ Rate limited! Wait a moment.
Knowledge Check
Test what you learned with this quick quiz.