Learn Zig Series (#96) - Mini Project: Chat Server - Server Core

What will I learn?

How to turn last episode's paper protocol into a running server that many people can connect to at once;
Why the heart of a chat server is not the socket but a shared registry of connected clients, and how to protect it when several threads touch it at the same time;
How to write an accept loop that spawns one handler thread per connection, and why "thread per client" is the right first model even though it isn't the last word;
How a per-client handler reads framed messages (episode 95's readFrame), decodes them, and asks the hub to broadcast the result to everyone else;
Where the sneaky concurrency bugs hide -- a client removed while a broadcast is mid-flight -- and how a single mutex plus careful lock discipline keeps the whole thing honest;
How to test the broadcast-and-registry logic with zero sockets, the same discipline we've kept since episode 12;
How this hand-built server compares to the goroutines-and-channels approach in Go, the async runtimes in Rust, and raw pthread plumbing in C.

Requirements

A working modern computer running macOS, Windows or Ubuntu;
An installed Zig 0.14+ distribution (download from ziglang.org);
The ambition to learn Zig programming.

Difficulty

Intermediate

Curriculum (of the `Learn Zig Series`):

Learn Zig Series (#96) - Mini Project: Chat Server - Server Core

Here we go ;-) Last episode we did the unglamorous, load-bearing work: we designed the protocol for our chat system without writing a single line of socket code. We enumerated every message the two sides can exchange, modelled them as tagged unions so the compiler polices each case, chose a length-prefixed binary framing that survives the realities of TCP, and built a codec we tested -- happy path and hostile path both -- entirely in memory. That was the contract. Today we build the machine that speaks it.

This is the piece most people picture when they hear "chat server": a process that sits on a port, accepts a crowd of connections, and relays what each person says to everybody else. It sounds simple, and the happy-path version genuinely is short. The interesting part -- the part that separates a toy from something you'd trust with more than two friends -- is what happens when many connections live at once and they all touch the same shared state. That is a concurrency problem, and concurrency is where good intentions go to die. So we'll build the server in the order that keeps us honest: the shared registry first, then the threads that hammer on it, and a fistful of tests that prove the dangerous part works before it ever meets a real socket.

From contract to machinery: what the server core must do

Strip the chat server down to its job and it's shorter than you'd think. It has to:

accept new TCP connections on a well-known port (the accept loop from episode 51);
give each connected client a small identity -- an id, a chosen nickname, and its socket;
keep a registry of everyone currently connected, so it can reach them all;
for every incoming say from one client, encode a chat message (episode 95's codec) and broadcast it to every other client;
announce arrivals and departures, and clean up the moment a socket drops.

Notice that only two of those bullets are about the network. The rest are about managing a shared collection under concurrent access. That's the tell: the difficulty of a chat server does not live in the sockets, it lives in the registry that several threads read and write at the same time. Get the registry right and the sockets are the boring accept loop we've already written twice in this series (episodes 42 and 51). Get it wrong and you get the classic multiplayer bug -- a crash that only happens when two people do something at the same instant, which is to say the bug you can never reproduce on purpose.

The shape of a connected client

Let's start with the smallest honest datatype: what does the server need to remember about one connected person? An id to address them by, the nickname they chose at join time, and the stream to write bytes into. That's it.

const std = @import("std");
const net = std.net;

// One connected participant. The hub owns this; handler threads borrow a pointer.
const Client = struct {
    id: u32,
    nick: []u8, // heap-owned copy -- the join frame it came from is long gone
    stream: net.Stream,

    fn deinit(self: *Client, alloc: std.mem.Allocator) void {
        alloc.free(self.nick);
    }
};

The single design decision worth pausing on is nick: []u8 -- an owned copy, not a []const u8 slice pointing into the frame it arrived in. Episode 95's decoder deliberately returned slices into the received buffer (zero copies, fast, safe as long as you use them before the buffer dies). But a client's nickname has to outlive the join frame by minutes or hours, so here is exactly the moment episode 95 warned about: the server "dupes the strings deliberately". We alloc.dupe the nick once at join time and free it once at disconnect. Episode 8's lifetime thinking, made concrete: the codec never hides an allocation, so the server decides, in the open, precisely when a copy is worth its keep.

The hub: one shared registry, many threads

Now the centre of gravity. The hub is the shared registry -- a map from client id to client, plus the one lock that guards it. Every handler thread will reach into this structure, so every field that can be touched concurrently sits behind the mutex (episode 30). Nothing clever, and clever is precisely what you do not want here.

const Hub = struct {
    alloc: std.mem.Allocator,
    mutex: std.Thread.Mutex = .{},
    clients: std.AutoHashMapUnmanaged(u32, *Client) = .{},
    next_id: u32 = 1,

    fn init(alloc: std.mem.Allocator) Hub {
        return .{ .alloc = alloc };
    }

    fn deinit(self: *Hub) void {
        var it = self.clients.valueIterator();
        while (it.next()) |client_ptr| {
            const c = client_ptr.*;
            c.stream.close();
            c.deinit(self.alloc);
            self.alloc.destroy(c);
        }
        self.clients.deinit(self.alloc);
    }
};

The clients map keys on a u32 id rather than on the nickname or the socket handle, and that's a deliberate a part from convenience. An id is cheap to compare, never collides, and -- crucially -- lets a client rename later (a future feature) without rehashing the whole registry. The next_id counter only ever increments under the lock, so two clients connecting in the same microsecond can never be handed the same id. We saw this exact hazard in episode 30: a read-modify-write on shared state (id = next_id; next_id += 1) is not atomic unless you make it so. Here the mutex makes it so.

Let me add the three operations the handlers actually call -- add, remove, and broadcast -- each of which takes the lock, does the smallest possible amount of work, and releases it.

// Register a freshly-accepted connection. Returns the assigned id.
fn add(self: *Hub, nick: []const u8, stream: net.Stream) !u32 {
    const c = try self.alloc.create(Client);
    errdefer self.alloc.destroy(c);
    c.* = .{
        .id = 0,
        .nick = try self.alloc.dupe(u8, nick),
        .stream = stream,
    };
    errdefer c.deinit(self.alloc);

    self.mutex.lock();
    defer self.mutex.unlock();

    c.id = self.next_id;
    self.next_id += 1;
    try self.clients.put(self.alloc, c.id, c);
    return c.id;
}

// Drop a client (they left or their socket broke). Idempotent on a missing id.
fn remove(self: *Hub, id: u32) void {
    self.mutex.lock();
    const maybe = self.clients.fetchRemove(id);
    self.mutex.unlock();

    if (maybe) |kv| {
        kv.value.stream.close();
        kv.value.deinit(self.alloc);
        self.alloc.destroy(kv.value);
    }
}

Look at what remove does outside the lock. It pulls the client out of the map while holding the mutex (a quick fetchRemove, episode 22), then closes the socket and frees the memory after unlocking. Why the split? Because stream.close() is a syscall -- it can block, however briefly -- and you never want to hold a shared lock across a syscall if you can avoid it. Every other thread that wants the registry would be stuck waiting on a close() that has nothing to do with them. The rule of thumb, learned the hard way in every threaded codebase I've touched: hold the lock for pointer shuffling, never for I/O. We'll bend that rule slightly in a moment for broadcast, and I'll be honest about the cost when we do.

The accept loop

With the hub in hand, the front door is the same accept loop we wrote for the key-value store (episode 42) and the HTTP server (episode 51). Bind a listener, and for every connection that lands, spawn a handler thread and detach it so it cleans itself up.

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    var hub = Hub.init(alloc);
    defer hub.deinit();

    const addr = try net.Address.parseIp("0.0.0.0", 9000);
    var listener = try addr.listen(.{ .reuse_address = true });
    defer listener.deinit();
    std.log.info("chat server up on {}", .{addr});

    while (true) {
        const conn = listener.accept() catch |err| {
            std.log.warn("accept failed: {s}", .{@errorName(err)});
            continue;
        };
        const t = std.Thread.spawn(.{}, handleClient, .{ &hub, conn.stream }) catch |err| {
            std.log.warn("spawn failed: {s}", .{@errorName(err)});
            conn.stream.close();
            continue;
        };
        t.detach();
    }
}

Two small robustness details that a "hello world" server skips but a real one must not. First, a failed accept is logged and shrugged off, not fatal -- one bad connection attempt (a port scanner, a client that hung up during the handshake) should never take the whole server down. Second, if spawn itself fails (we've hit the thread limit, episode 71's resource limits made real), we close the orphaned socket instead of leaking it. reuse_address is there so that restarting the server doesn't make you wait out the kernel's TIME_WAIT on the port -- a small quality-of-life fix you'll want every single time you iterate on a network daemon.

The thread-per-client model gets a bad rap for not scaling to a hundred thousand connections, and that's true -- at that scale you want an event loop (epoll/kqueue, the machinery behind episode 88's WebSocket server). But for a chat server with a human at each end, a thread per connection is the right first design: it's trivially readable, each client's logic is a plain straight-line function, and a few thousand mostly-idle threads cost almost nothing on a modern kernel. Premature epoll here would be episode 34's warning in the flesh -- optimising a bottleneck you don't have.

The per-client handler

Here's where episode 95's protocol finally earns its keep. Each handler thread runs the same short story: read the first frame and expect a join, register with the hub, then loop reading frames and turning each say into a broadcast, until the socket closes or the client sends leave.

fn handleClient(hub: *Hub, stream: net.Stream) void {
    // First frame MUST be a join. Anything else and we hang up.
    var buf: [4096]u8 = undefined;
    const first = readFrame(stream, &buf) catch return closeQuietly(stream);
    const hello = decodeClient(first) catch return closeQuietly(stream);

    const nick = switch (hello) {
        .join => |m| m.nick,
        else => return closeQuietly(stream), // protocol says join-first; enforce it
    };

    const id = hub.add(nick, stream) catch return closeQuietly(stream);
    defer hub.remove(id); // whatever happens below, we deregister on the way out

    announce(hub, id, .{ .joined = .{ .nick = nick } });

    while (true) {
        const frame = readFrame(stream, &buf) catch break; // EOF or bad frame -> leave
        const msg = decodeClient(frame) catch break;
        switch (msg) {
            .say => |m| relaySay(hub, id, m.text),
            .leave => break,
            .join => {}, // a second join is nonsense; ignore rather than crash
        }
    }
}

fn closeQuietly(stream: net.Stream) void {
    stream.close();
}

The defer hub.remove(id) is the quiet hero of this function. No matter how the handler exits -- a clean leave, a truncated frame, a peer that yanked the cable, an unknown kind byte -- the client is deregistered and its socket closed exactly once, on the way out. This is episode 4's error-handling philosophy paying a dividend it promised chapters ago: because every failure path is an error that breaks the loop, and defer runs on every path, there is no way to leave a ghost client rotting in the registry. That single-line guarantee is worth more than any amount of manual cleanup code sprinkled through the error branches, which is exactly where leaks breed.

And notice the switch on msg is still exhaustive, still the compiler's job. A second join mid-session is meaningless, so we ignore it deliberately -- but the point is the compiler made us decide. In a text protocol that stray message would be an unhandled edge case waiting for a bad day.

Broadcasting without tearing frames

Now the operation the whole server exists to perform: taking one client's words and putting them in front of everyone else. This is the method where I bend my own "never hold the lock across I/O" rule, so let me show it and then defend it.

// Send one pre-encoded frame to every client except `skip_id`.
// Holds the lock across the writes -- see the discussion below for why, and the cost.
fn broadcast(self: *Hub, frame: []const u8, skip_id: u32) void {
    self.mutex.lock();
    defer self.mutex.unlock();

    var it = self.clients.iterator();
    while (it.next()) |entry| {
        const c = entry.value_ptr.*;
        if (c.id == skip_id) continue;
        // A write can fail (client vanished). Don't kill the broadcast --
        // the doomed client will be reaped by its own handler's defer.
        c.stream.writeAll(frame) catch {};
    }
}

Two things are load-bearing here. First, writeAll (not write): a single write on a socket can send fewer bytes than you handed it, and a half-written frame is a desync bomb on the receiving end -- the exact stream-versus-message lesson from episode 21 and 95, now on the sending side. writeAll loops until the whole frame is gone or the socket dies. Second, a failed write is swallowed. If a client disappeared between the registry lookup and the write, that's not the broadcaster's problem to solve -- that client's own handler thread will hit EOF on its next readFrame and run its defer hub.remove. Trying to remove it here, mid-iteration, while holding the lock we're iterating under, is how you corrupt a hash map. Let each thread clean up after itself.

Now the honest part. I hold the mutex across every socket write in the loop, which technically violates my own rule about not doing I/O under a lock. Why accept that here? Because the alternative -- collecting a snapshot of client pointers under the lock, releasing it, then writing -- reintroduces the use-after-free we just banished: a client could be removed and freed by its handler between the snapshot and the write, and now I'm writing to a dangling *Client. Holding the lock keeps every client pointer alive for the duration of the broadcast, which is correct and simple. The cost is real: one slow client (a laptop that closed its lid) stalls the broadcast for everyone until its writeAll times out. For a chat server among friends, that trade -- simplicity and safety now, at the price of head-of-line blocking under a pathological client -- is the right one. The production answer is to give each client its own outbound queue and a dedicated writer thread, so the broadcaster only ever does a non-blocking enqueue under the lock. That's a genuine upgrade, and it's exactly the kind of refinement the polish phase of a mini-project exists for. Naming the tradeoff out loud beats pretending the simple version is free.

The two helpers the handler leaned on just wrap broadcast with the codec:

// Encode a server message once, then push it to everyone (optionally skipping the origin).
fn announce(hub: *Hub, skip_id: u32, msg: ServerMsg) void {
    const frame = encodeServer(hub.alloc, msg) catch return;
    defer hub.alloc.free(frame);
    hub.broadcast(frame, skip_id);
}

// Turn "client id said text" into a chat frame carrying the sender's nickname.
fn relaySay(hub: *Hub, sender_id: u32, text: []const u8) void {
    hub.mutex.lock();
    const nick = if (hub.clients.get(sender_id)) |c| hub.alloc.dupe(u8, c.nick) catch null else null;
    hub.mutex.unlock();

    const owned = nick orelse return;
    defer hub.alloc.free(owned);
    announce(hub, sender_id, .{ .chat = .{ .nick = owned, .text = text } });
}

relaySay copies the sender's nickname under the lock and works with the copy afterwards, precisely so it isn't holding a pointer into a Client that another thread might free. Encode once, send many -- the frame is built a single time and the same bytes go to every recipient, which is both faster and the only way to guarantee everyone sees byte-identical output.

Testing the hub without a single socket

Here's the discipline that has run through this entire series since episode 12, and it's the reason the concurrency above doesn't scare me: the dangerous logic is separable from the sockets, so we test it with none. The registry -- add, assign an id, remove, don't hand out duplicate ids -- is pure state management. We can drive it with a dummy stream and assert the invariants directly.

test "hub assigns unique incrementing ids and tracks membership" {
    const alloc = std.testing.allocator;
    var hub = Hub.init(alloc);
    defer hub.deinit();

    // net.Stream is just a handle; an invalid one is fine as long as we never
    // read/write it. We only exercise the registry bookkeeping here.
    const dummy = net.Stream{ .handle = -1 };

    const a = try hub.add("scipio", dummy);
    const b = try hub.add("femdev", dummy);
    try std.testing.expect(a != b); // no two clients share an id
    try std.testing.expectEqual(@as(usize, 2), hub.clients.count());

    hub.remove(a);
    try std.testing.expectEqual(@as(usize, 1), hub.clients.count());
    try std.testing.expect(hub.clients.get(a) == null); // gone for good
    try std.testing.expect(hub.clients.get(b) != null); // untouched
}

test "removing an unknown id is a harmless no-op" {
    const alloc = std.testing.allocator;
    var hub = Hub.init(alloc);
    defer hub.deinit();
    hub.remove(9999); // must not crash, must not corrupt the map
    try std.testing.expectEqual(@as(usize, 0), hub.clients.count());
}

Because the tests run under std.testing.allocator, they also prove there are no leaks: every dupe'd nickname and every create'd Client must be freed by remove/deinit, or the test fails on a detected leak. That's episode 26's allocator-as-a-testing-tool trick doing double duty -- correctness and memory hygiene from the same three assertions. The socket path (accept, read, write) we verify in the next stretch of the project with a real loopback connection; the logic that would actually bite us is nailed down right here, deterministically, in microseconds.

To keep net.Stream{ .handle = -1 } honest across platforms, note the registry code above never touches stream except in remove/deinit, where it calls close(). On the two tests above every client is removed through the registry, so close(-1) is invoked -- which the OS turns into a harmless EBADF we ignore. If that ever felt fragile, the cleaner move is to make Client hold a small stream interface (episode 13's type erasure) so tests inject a no-op writer. For a mini-project the invalid handle is pragmatic and clear; I mention the interface route because it's the seam you'd cut along the day you want to test the write path without a kernel in the loop.

Performance and design considerations

A chat server's load profile is gentle -- humans type slowly and a room of a thousand is a trickle by network standards -- so the numbers that matter aren't throughput, they're the shapes of the costs. Locking granularity: one mutex over the whole registry is fine to thousands of clients because each critical section is a hash lookup or a pointer swap, measured in nanoseconds; the moment profiling (episode 34) showed contention you'd shard the registry or move to a read-write lock, not before. Allocation: we copy a nickname once per client and encode each broadcast frame once per message -- no per-recipient allocation, so a say to a busy room is one small encodeServer and N socket writes, nothing more. The head-of-line cost we already dissected: acceptable for friends, upgradable to per-client queues for strangers.

The design choice I'd defend hardest is keeping the socket layer and the registry layer separable. Every genuinely tricky invariant -- unique ids, no ghost clients, no double-free -- lives in code you can test without a network. That is not an accident of style; it's the whole reason a concurrent program this small can be reasoned about at all. The threads are simple because the shared state is small and every path through it is either "hold the lock and shuffle pointers" or "do I/O with no lock held" (with broadcast the one deliberate, documented exception). Complexity you can name and bound is complexity you can live with.

How this compares to C, Rust, and Go

In C, this is pthread_create per connection, a pthread_mutex_t around a hand-rolled hash table or linked list, and send() in a loop. It works and it's fast, and it's also where the CVE history of network daemons largely comes from: forget one pthread_mutex_unlock on an error path and you deadlock; forget to remove a client on disconnect and you leak a socket; free a client that another thread still holds and you get the use-after-free that only crashes in production. Zig doesn't magically prevent those, but defer and errdefer make the correct cleanup the easy cleanup, which is most of the battle.

In Go, you'd write a goroutine per connection and -- idiomatically -- avoid the shared mutex entirely by funnelling everything through a single chan: clients register, deregister, and broadcast by sending messages to one central goroutine that owns the registry outright. "Share memory by communicating." It's a genuinely lovely model and less error-prone than locks, at the cost of a scheduler and garbage collector you don't see or control. Our mutex version is doing the same job with the machinery visible and the allocations explicit -- the recurring theme of this whole arc.

In Rust, the shared registry would be an Arc<Mutex<HashMap<...>>>, and the borrow checker would refuse to compile the use-after-free I had to talk myself out of by holding the lock across broadcast -- it would force the lifetime relationship into the types. That's real safety we're holding by hand and by convention here. The async flavour (tokio) buys massive connection counts but drags in a runtime and colours every function async; for a from-scratch teaching server, threads keep the control flow linear and legible, which is the point.

Zig lands, as it keeps landing in this networking arc, in the sweet spot for understanding: a couple hundred readable lines, one mutex you can see and reason about, defer guaranteeing cleanup on every exit, the compiler forcing every message variant to be handled, and the genuinely dangerous logic tested without a socket in sight. You can hold the entire server in your head -- and that, for a concurrent program, is the rarest and most valuable property of all.

Where this is heading

Step back at what we built on top of last episode's contract. We gave each connection a small owned identity, put every connected client in a single mutex-guarded registry, wrote an accept loop that spawns a linear handler per client, and turned each incoming say into a broadcast that reaches everyone else -- encoding once, sending many, cleaning up automatically on every failure path. Then we proved the parts that could actually hurt us (unique ids, no ghosts, no leaks) with deterministic tests and not one real socket. The server core is done: start it, point episode 95's frames at port 9000, and messages already flow between connected clients.

What's missing is everything a human needs to enjoy it. Right now the "client" is a test harness firing byte frames; nobody wants to chat by hand-encoding tagged unions. So the next stretch builds the piece a person actually sits in front of -- reading your keystrokes and painting other people's words on your screen without the two fighting over the terminal -- and after that, the memory that lets a room remember what was said before you walked in. Both lean on tools we already own: the framed codec from episode 95, the threads and atomics from episode 30, the terminal-rendering instincts from the ECS engine (episode 58), and the hash maps from episode 22. Each is easy precisely because the server core underneath it is small, tested, and honest about its one deliberate tradeoff.

The thread through this whole networking arc hasn't moved since episode 21: bound every length, name your endianness, hold your locks for pointer-shuffling and never for I/O, and let defer guarantee that whatever goes wrong, you leave the shared state clean. We've now got a server that a room of people can actually talk through. Next time, we give them a face to talk with.

Thanks for reading, and I'll see you in the next one!

@scipio