Learn Python Series (#41) - Asynchronous Python Part 2 | Ecency

Learn Python Series (#41) - Asynchronous Python Part 2

Repository

https://github.com/realScipio/learn-python-series

What will I learn?

You will learn why semaphores matter and how they prevent resource exhaustion;
the mental model behind rate limiting and respectful API usage;
how async HTTP differs from synchronous requests and why that matters;
production patterns for retry logic, timeouts, and error handling;
when to use connection pooling and session reuse.

Requirements

A working modern computer running macOS, Windows or Ubuntu;
An installed Python 3(.11+) distribution;
The ambition to learn Python programming.

Difficulty

Intermediate, advanced

Curriculum (of the `Learn Python Series`):

In episode #40, we learned async fundamentals - event loops, coroutines, concurrent task execution. Now we apply those concepts to real-world HTTP requests and production async patterns.

The problem: you need to fetch data from 1000 URLs. Synchronous code would take forever. Async seems like the answer - start all 1000 requests concurrently! But that creates new problems.

Nota bene: This episode is about controlled concurrency and production-ready async patterns, not just making things concurrent.

The problem with unlimited concurrency

Imagine starting 1000 HTTP requests simultaneously. What happens?

Your machine opens 1000 TCP connections at once. Your OS has connection limits - you might hit them. The remote server receives 1000 simultaneous requests from your IP - rate limiting kicks in, requests fail, you get banned.

Memory usage spikes - each connection consumes buffers. Network bandwidth saturates. The event loop struggles managing 1000 concurrent tasks.

Most critically: you're being a bad citizen. Hammering someone's API with 1000 simultaneous requests is abuse, even if unintentional.

The solution: controlled concurrency. Run many tasks concurrently, but limit HOW many at once.

Semaphores: concurrency throttles

A semaphore is a counter that limits concurrent access to a resource. Think of it like a parking lot with limited spaces.

The lot has 5 spaces. Car 1 enters (counter: 4 spaces left). Cars 2, 3, 4, 5 enter (counter: 0). Car 6 arrives - lot is full, must wait. Car 1 leaves (counter: 1). Now car 6 can enter.

In code: you create a semaphore with capacity 5. Tasks acquire the semaphore before proceeding. If 5 tasks already hold it, task 6 waits. When a task finishes, it releases the semaphore, allowing waiting tasks to proceed.

This limits concurrent operations to 5 at any moment, even if you have 1000 total tasks queued.

The mental model: semaphores are waiting rooms. Limited seats. When full, newcomers wait for someone to leave.

Here's a semaphore throttling 1000 tasks down to 10 concurrent:

import asyncio
import aiohttp

sem = asyncio.Semaphore(10)

async def fetch(session, url):
    async with sem:  # blocks if 10 tasks already inside
        async with session.get(url) as resp:
            return await resp.text()

async def main():
    urls = [f"https://api.example.com/item/{i}" for i in range(1000)]
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)
    print(f"Fetched {len(results)} pages")

asyncio.run(main())

All 1000 tasks are created immediately — asyncio.gather schedules them all. But the semaphore ensures only 10 run their HTTP request at any given moment. The other 990 wait their turn at the async with sem line. Clean, simple, effective.

Rate limiting: respecting API constraints

Most APIs have rate limits: "100 requests per minute" or "10 requests per second". Exceed these and you get errors or bans.

Semaphores control instantaneous concurrency (how many right now). Rate limiting controls throughput over time (how many per period).

The simplest rate limiting: add delays. If the API allows 10 requests/second, wait 0.1 seconds between requests. But this wastes time - you're artificially slowing yourself.

Better: token bucket algorithm. You have a bucket holding tokens. Each request consumes a token. Tokens regenerate at a fixed rate (10/second). If the bucket is empty, wait for a token to regenerate.

This allows bursts - if you haven't made requests recently, the bucket is full, you can fire off 10 immediately. Then it drains and regenerates at the allowed rate.

A simple rate limiter using asyncio.sleep:

import asyncio
import time

class RateLimiter:
    def __init__(self, calls_per_second):
        self.delay = 1.0 / calls_per_second
        self.last_call = 0.0

    async def wait(self):
        now = time.monotonic()
        wait_time = self.delay - (now - self.last_call)
        if wait_time > 0:
            await asyncio.sleep(wait_time)
        self.last_call = time.monotonic()

limiter = RateLimiter(calls_per_second=10)

async def fetch_with_limit(session, url):
    await limiter.wait()
    async with session.get(url) as resp:
        return await resp.text()

Combine this with semaphores and you control both instantaneous concurrency AND throughput over time. For most use cases, that combination is sufficient.

Connection pooling and session reuse

Creating HTTP connections is expensive. TCP handshake, DNS lookup, TLS negotiation if HTTPS. For a single request, fine. For 1000 requests, wasteful.

Session reuse solves this. Create one session object, make all requests through it. The session maintains a connection pool - reuses open connections instead of creating new ones for each request.

In aiohttp:

async with aiohttp.ClientSession() as session:
    # All requests reuse connections from the session's pool
    for url in urls:
        async with session.get(url) as response:
            data = await response.text()

Creating a session per request defeats this - you lose connection reuse. One session for many requests is the pattern.

Error handling patterns

Real networks fail. Servers return errors. Connections timeout. Production async code must handle these gracefully.

The patterns:

Timeouts: Every operation should have a timeout. Without it, a stuck request blocks resources indefinitely. Wrap operations with asyncio.wait_for(operation, timeout=10).

Retries with backoff: Transient errors (network blip, server overload) often succeed on retry. But don't retry immediately - you'll hammer the same struggling server. Wait, then retry. Wait longer, retry again. Exponential backoff: 1s, 2s, 4s, 8s...

Circuit breakers: If a service fails repeatedly, stop trying. Track failure rate - if it crosses a threshold, "open the circuit" (stop making requests). After a cooldown period, try again ("half-open"). If it succeeds, close the circuit (resume normal operation).

These patterns prevent cascading failures. One slow service doesn't bring down your entire application.

Let's implement timeouts and retries with exponential backoff:

import asyncio
import aiohttp
import random

async def fetch_with_retry(session, url, max_retries=3, timeout=10):
    for attempt in range(max_retries):
        try:
            async with asyncio.timeout(timeout):
                async with session.get(url) as resp:
                    if resp.status == 429:  # Too Many Requests
                        retry_after = int(resp.headers.get("Retry-After", 5))
                        await asyncio.sleep(retry_after)
                        continue
                    resp.raise_for_status()
                    return await resp.json()

        except asyncio.TimeoutError:
            pass  # fall through to retry logic below
        except aiohttp.ClientError:
            pass

        if attempt < max_retries - 1:
            delay = (2 ** attempt) + random.uniform(0, 1)
            await asyncio.sleep(delay)

    raise Exception(f"Failed after {max_retries} attempts: {url}")

The asyncio.timeout (Python 3.11+) context manager cancels the request if it takes too long — no hanging forever. The retry loop uses exponential backoff: first retry waits ~1s, second ~2s, third ~4s. The random.uniform adds jitter so multiple clients don't all retry at the exact same moment, which would just create another traffic spike.

Note how we also handle HTTP 429 (Too Many Requests) separately — the server is telling us to slow down, so we respect its Retry-After header instead of guessing.

When to use async HTTP vs sync

Async adds complexity. When is it worth it?

Use async when:

You're making many concurrent requests (dozens to thousands)
Requests to different services can overlap (microservices, aggregation)
You're building a server handling many simultaneous clients

Use sync (requests library) when:

You're making a few sequential requests
Code simplicity matters more than speed
You're working in a larger sync codebase

Don't use async just because it's "modern". Use it when concurrency provides meaningful benefit.

Putting it all together

Here's a production-grade fetcher combining everything from this episode — semaphore, session reuse, retries, timeouts, and rate limiting:

import asyncio
import aiohttp
import time

async def fetch_many(urls, concurrency=20, timeout=15, retries=3):
    sem = asyncio.Semaphore(concurrency)
    results = {}

    async def worker(session, url):
        async with sem:
            for attempt in range(retries):
                try:
                    async with asyncio.timeout(timeout):
                        async with session.get(url) as resp:
                            resp.raise_for_status()
                            results[url] = await resp.text()
                            return
                except (asyncio.TimeoutError, aiohttp.ClientError) as e:
                    if attempt == retries - 1:
                        results[url] = f"FAILED: {e}"
                    else:
                        await asyncio.sleep(2 ** attempt)

    connector = aiohttp.TCPConnector(limit=concurrency)
    async with aiohttp.ClientSession(connector=connector) as session:
        tasks = [worker(session, url) for url in urls]
        await asyncio.gather(*tasks)

    return results

TCPConnector(limit=concurrency) caps the connection pool to match our semaphore — no point opening more TCP connections than we'll use simultaneously. The semaphore controls task-level concurrency, the connector controls socket-level concurrency. Both should agree.

Each failed request retries with backoff. Each request has a timeout. All requests share one session (connection pooling). And no more than 20 run concurrently. That's a responsible, production-ready fetcher in about 25 lines.

Production checklist

For production async HTTP code:

Limit concurrency with semaphores (protect your resources and theirs)
Respect rate limits with delays or token buckets (stay within API constraints)
Reuse sessions for connection pooling (performance)
Set timeouts on all operations (prevent hangs)
Implement retries with exponential backoff (handle transient failures)
Log errors with context (debugging distributed systems is hard)
Monitor metrics (request rate, error rate, latency percentiles)