Learn Python Series):In episode #40, we learned async fundamentals - event loops, coroutines, concurrent task execution. Now we apply those concepts to real-world HTTP requests and production async patterns.
The problem: you need to fetch data from 1000 URLs. Synchronous code would take forever. Async seems like the answer - start all 1000 requests concurrently! But that creates new problems.
Nota bene: This episode is about controlled concurrency and production-ready async patterns, not just making things concurrent.
Imagine starting 1000 HTTP requests simultaneously. What happens?
Your machine opens 1000 TCP connections at once. Your OS has connection limits - you might hit them. The remote server receives 1000 simultaneous requests from your IP - rate limiting kicks in, requests fail, you get banned.
Memory usage spikes - each connection consumes buffers. Network bandwidth saturates. The event loop struggles managing 1000 concurrent tasks.
Most critically: you're being a bad citizen. Hammering someone's API with 1000 simultaneous requests is abuse, even if unintentional.
The solution: controlled concurrency. Run many tasks concurrently, but limit HOW many at once.
A semaphore is a counter that limits concurrent access to a resource. Think of it like a parking lot with limited spaces.
The lot has 5 spaces. Car 1 enters (counter: 4 spaces left). Cars 2, 3, 4, 5 enter (counter: 0). Car 6 arrives - lot is full, must wait. Car 1 leaves (counter: 1). Now car 6 can enter.
In code: you create a semaphore with capacity 5. Tasks acquire the semaphore before proceeding. If 5 tasks already hold it, task 6 waits. When a task finishes, it releases the semaphore, allowing waiting tasks to proceed.
This limits concurrent operations to 5 at any moment, even if you have 1000 total tasks queued.
The mental model: semaphores are waiting rooms. Limited seats. When full, newcomers wait for someone to leave.
Here's a semaphore throttling 1000 tasks down to 10 concurrent:
import asyncio
import aiohttp
sem = asyncio.Semaphore(10)
async def fetch(session, url):
async with sem: # blocks if 10 tasks already inside
async with session.get(url) as resp:
return await resp.text()
async def main():
urls = [f"https://api.example.com/item/{i}" for i in range(1000)]
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
print(f"Fetched {len(results)} pages")
asyncio.run(main())
All 1000 tasks are created immediately — asyncio.gather schedules them all. But the semaphore ensures only 10 run their HTTP request at any given moment. The other 990 wait their turn at the async with sem line. Clean, simple, effective.
Most APIs have rate limits: "100 requests per minute" or "10 requests per second". Exceed these and you get errors or bans.
Semaphores control instantaneous concurrency (how many right now). Rate limiting controls throughput over time (how many per period).
The simplest rate limiting: add delays. If the API allows 10 requests/second, wait 0.1 seconds between requests. But this wastes time - you're artificially slowing yourself.
Better: token bucket algorithm. You have a bucket holding tokens. Each request consumes a token. Tokens regenerate at a fixed rate (10/second). If the bucket is empty, wait for a token to regenerate.
This allows bursts - if you haven't made requests recently, the bucket is full, you can fire off 10 immediately. Then it drains and regenerates at the allowed rate.
A simple rate limiter using asyncio.sleep:
import asyncio
import time
class RateLimiter:
def __init__(self, calls_per_second):
self.delay = 1.0 / calls_per_second
self.last_call = 0.0
async def wait(self):
now = time.monotonic()
wait_time = self.delay - (now - self.last_call)
if wait_time > 0:
await asyncio.sleep(wait_time)
self.last_call = time.monotonic()
limiter = RateLimiter(calls_per_second=10)
async def fetch_with_limit(session, url):
await limiter.wait()
async with session.get(url) as resp:
return await resp.text()
Combine this with semaphores and you control both instantaneous concurrency AND throughput over time. For most use cases, that combination is sufficient.
Creating HTTP connections is expensive. TCP handshake, DNS lookup, TLS negotiation if HTTPS. For a single request, fine. For 1000 requests, wasteful.
Session reuse solves this. Create one session object, make all requests through it. The session maintains a connection pool - reuses open connections instead of creating new ones for each request.
In aiohttp:
async with aiohttp.ClientSession() as session:
# All requests reuse connections from the session's pool
for url in urls:
async with session.get(url) as response:
data = await response.text()
Creating a session per request defeats this - you lose connection reuse. One session for many requests is the pattern.
Real networks fail. Servers return errors. Connections timeout. Production async code must handle these gracefully.
The patterns:
Timeouts: Every operation should have a timeout. Without it, a stuck request blocks resources indefinitely. Wrap operations with asyncio.wait_for(operation, timeout=10).
Retries with backoff: Transient errors (network blip, server overload) often succeed on retry. But don't retry immediately - you'll hammer the same struggling server. Wait, then retry. Wait longer, retry again. Exponential backoff: 1s, 2s, 4s, 8s...
Circuit breakers: If a service fails repeatedly, stop trying. Track failure rate - if it crosses a threshold, "open the circuit" (stop making requests). After a cooldown period, try again ("half-open"). If it succeeds, close the circuit (resume normal operation).
These patterns prevent cascading failures. One slow service doesn't bring down your entire application.
Let's implement timeouts and retries with exponential backoff:
import asyncio
import aiohttp
import random
async def fetch_with_retry(session, url, max_retries=3, timeout=10):
for attempt in range(max_retries):
try:
async with asyncio.timeout(timeout):
async with session.get(url) as resp:
if resp.status == 429: # Too Many Requests
retry_after = int(resp.headers.get("Retry-After", 5))
await asyncio.sleep(retry_after)
continue
resp.raise_for_status()
return await resp.json()
except asyncio.TimeoutError:
pass # fall through to retry logic below
except aiohttp.ClientError:
pass
if attempt < max_retries - 1:
delay = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(delay)
raise Exception(f"Failed after {max_retries} attempts: {url}")
The asyncio.timeout (Python 3.11+) context manager cancels the request if it takes too long — no hanging forever. The retry loop uses exponential backoff: first retry waits ~1s, second ~2s, third ~4s. The random.uniform adds jitter so multiple clients don't all retry at the exact same moment, which would just create another traffic spike.
Note how we also handle HTTP 429 (Too Many Requests) separately — the server is telling us to slow down, so we respect its Retry-After header instead of guessing.
Async adds complexity. When is it worth it?
Use async when:
Use sync (requests library) when:
Don't use async just because it's "modern". Use it when concurrency provides meaningful benefit.
Here's a production-grade fetcher combining everything from this episode — semaphore, session reuse, retries, timeouts, and rate limiting:
import asyncio
import aiohttp
import time
async def fetch_many(urls, concurrency=20, timeout=15, retries=3):
sem = asyncio.Semaphore(concurrency)
results = {}
async def worker(session, url):
async with sem:
for attempt in range(retries):
try:
async with asyncio.timeout(timeout):
async with session.get(url) as resp:
resp.raise_for_status()
results[url] = await resp.text()
return
except (asyncio.TimeoutError, aiohttp.ClientError) as e:
if attempt == retries - 1:
results[url] = f"FAILED: {e}"
else:
await asyncio.sleep(2 ** attempt)
connector = aiohttp.TCPConnector(limit=concurrency)
async with aiohttp.ClientSession(connector=connector) as session:
tasks = [worker(session, url) for url in urls]
await asyncio.gather(*tasks)
return results
TCPConnector(limit=concurrency) caps the connection pool to match our semaphore — no point opening more TCP connections than we'll use simultaneously. The semaphore controls task-level concurrency, the connector controls socket-level concurrency. Both should agree.
Each failed request retries with backoff. Each request has a timeout. All requests share one session (connection pooling). And no more than 20 run concurrently. That's a responsible, production-ready fetcher in about 25 lines.
For production async HTTP code:
These aren't optional extras - they're requirements for reliable async systems.
In this episode, we covered production async patterns:
Async isn't just about speed. It's about efficiently managing many concurrent I/O operations while remaining a responsible system citizen.