Learn Zig Series (#62) - File Systems: Reading Directories and Meta...

Learn Zig Series (#62) - File Systems: Reading Directories and Metadata

What will I learn

How to open and iterate directories with std.fs.Dir and the iterator API;
How to read file metadata: size, timestamps, permissions via stat;
How to build a recursive directory walker from scratch;
How to detect, read, and follow symbolic links;
How Unix file permissions and ownership work at the syscall level;
How to create, rename, and delete files and directories safely;
How to use temporary files and atomic writes for crash safety;
How to build a practical tree command that displays directory structure.

Requirements

A working modern computer running macOS, Windows or Ubuntu;
An installed Zig 0.14+ distribution (download from ziglang.org);
The ambition to learn Zig programming.

Difficulty

Intermediate

Curriculum (of the `Learn Zig Series`):

Learn Zig Series (#62) - File Systems: Reading Directories and Metadata

We just wrapped up Project G -- the assembler, two-pass assembly, and disassembler across episodes 59, 60, and 61. We went from bit encoding to human-readable assembly and back. That was deep in the weeds of binary formats and instruction sets. Today we're switching gears entirely. We're going to work with the file system -- reading directories, inspecting file metadata, handling symlinks, and building a complete tree command by the end.

File system operations are one of those things that sound boring until you actually need them, and then suddenly everything is edge cases. Paths with spaces. Symlink loops. Permission denied on a directory you own. Files that disappear between listing and opening. Zig's standard library gives us std.fs which wraps the POSIX and Windows syscalls with proper error handling -- no silent failures, no null pointer surprises. We've used std.fs briefly in episode 10 for basic file I/O and in the shell project (episodes 47-50) for process spawning. Now we're going to really explore what it can do.

Here we go!

Opening and iterating directories

The fundamental operation: open a directory and list what's inside. In Zig, std.fs.cwd().openDir() gives you a Dir handle, and .iterate() gives you an iterator that yields entries one at a time. This is the lazy evaluation pattern we covered in episode 23 -- entries are read from the kernel on demand, not all at once:

const std = @import("std");

pub fn listDirectory(dir_path: []const u8) !void {
    var dir = try std.fs.cwd().openDir(dir_path, .{ .iterate = true });
    defer dir.close();

    const stdout = std.io.getStdOut().writer();

    var iter = dir.iterate();
    while (try iter.next()) |entry| {
        const kind_str = switch (entry.kind) {
            .file => "FILE",
            .directory => "DIR ",
            .sym_link => "LINK",
            .named_pipe => "PIPE",
            .unix_domain_socket => "SOCK",
            .block_device => "BDEV",
            .character_device => "CDEV",
            .whiteout => "WHTE",
            .door => "DOOR",
            .event_port => "EVPT",
            .unknown => "????",
        };
        try stdout.print("  {s}  {s}\n", .{ kind_str, entry.name });
    }
}

pub fn main() !void {
    const args = try std.process.argsAlloc(std.heap.page_allocator);
    defer std.process.argsFree(std.heap.page_allocator, args);

    const path = if (args.len > 1) args[1] else ".";
    try listDirectory(path);
}

A few things to notice here. The .{ .iterate = true } flag is mandatory -- if you try to iterate a Dir that was opened without it, you get a compile error. This is Zig being explicit about intentions: opening a directory for iteration might require different kernel flags than opening it for path resolution, and Zig wants you to say what you actually need.

The entry.kind field tells you what type of file system object this is. Most of the time you'll see .file and .directory, but on Unix systems you can also find symbolic links, named pipes (FIFOs), Unix domain sockets, and device files. The .whiteout type is an overlay filesystem thing (Docker uses it). The .unknown type means the filesystem didn't report the entry type -- some older filesystems don't include type information in directory entries, and you'd need to stat the file to find out.

One important detail: the iterator does NOT guarantee any particular ordering. You might get entries alphabetically, or by inode number, or in whatever order the filesystem stored them. If you need sorted output (like the tree command we're building later), you'll have to collect entries into a list and sort them yourself.

Reading file metadata with stat

Knowing a file's name and type is only the start. The stat call gives you everything else -- size, timestamps, permissions, owner, number of hard links:

const Stat = std.fs.File.Stat;

pub fn printFileInfo(dir: std.fs.Dir, name: []const u8, writer: anytype) !void {
    const stat = dir.statFile(name) catch |err| {
        try writer.print("  {s}: cannot stat: {}\n", .{ name, err });
        return;
    };

    const size = stat.size;
    const atime = stat.atime;
    const mtime = stat.mtime;

    // convert nanosecond timestamps to seconds
    const atime_secs = @divFloor(atime, std.time.ns_per_s);
    const mtime_secs = @divFloor(mtime, std.time.ns_per_s);

    try writer.print("  {s}:\n", .{name});
    try writer.print("    size:     {d} bytes\n", .{size});
    try writer.print("    atime:    {d} (epoch secs)\n", .{atime_secs});
    try writer.print("    mtime:    {d} (epoch secs)\n", .{mtime_secs});

    // permissions (Unix only)
    const mode = stat.mode;
    try writer.print("    mode:     0o{o}\n", .{mode});

    // decode permission bits
    const owner_r: u8 = if (mode & 0o400 != 0) 'r' else '-';
    const owner_w: u8 = if (mode & 0o200 != 0) 'w' else '-';
    const owner_x: u8 = if (mode & 0o100 != 0) 'x' else '-';
    const group_r: u8 = if (mode & 0o040 != 0) 'r' else '-';
    const group_w: u8 = if (mode & 0o020 != 0) 'w' else '-';
    const group_x: u8 = if (mode & 0o010 != 0) 'x' else '-';
    const other_r: u8 = if (mode & 0o004 != 0) 'r' else '-';
    const other_w: u8 = if (mode & 0o002 != 0) 'w' else '-';
    const other_x: u8 = if (mode & 0o001 != 0) 'x' else '-';

    try writer.print("    perms:    {c}{c}{c}{c}{c}{c}{c}{c}{c}\n", .{
        owner_r, owner_w, owner_x,
        group_r, group_w, group_x,
        other_r, other_w, other_x,
    });
}

pub fn main() !void {
    var dir = try std.fs.cwd().openDir(".", .{ .iterate = true });
    defer dir.close();

    const stdout = std.io.getStdOut().writer();

    var iter = dir.iterate();
    while (try iter.next()) |entry| {
        if (entry.kind == .file) {
            try printFileInfo(dir, entry.name, stdout);
        }
    }
}

The stat call returns a Stat struct with fields for size (in bytes), access time (atime), modification time (mtime), and on Unix systems the mode field which encodes permissions. The timestamps are in nanoseconds since the Unix epoch (January 1, 1970) -- Zig uses nanosecond precision because modern filesystems support it (ext4, APFS, NTFS all track sub-second timestamps).

The permission bit decoding is classic Unix. The mode is a 12-bit value where the bottom 9 bits encode read/write/execute for owner, group, and others. 0o644 means owner can read+write, group and others can only read -- the default for most files. 0o755 adds execute permission for everyone -- typical for directories and executable binaries. We use octal formatting ({o}) because that's how permissions are traditionally displayed on Unix. The 0o prefix in Zig's number literals is the octal equivalent of 0x for hex.

Having said that, this code is Unix-specific. On Windows, stat.mode exists but has different semantics -- Windows uses ACLs (Access Control Lists) instead of the Unix rwx model. Cross-platform file permission handling is one of those areas where you basically need an if (builtin.os.tag == .windows) branch. Zig's standard library abstracts most of this, but permissions are inherently OS-specific.

Building a recursive directory walker

Listing a single directory is easy. Walking an entire tree recursively is where things get interesting. We need to handle nested directories, track our depth (to avoid infinite recursion from symlink loops), and accumulate results:

const WalkEntry = struct {
    path: []const u8,
    name: []const u8,
    kind: std.fs.Dir.Entry.Kind,
    depth: usize,
};

const DirectoryWalker = struct {
    allocator: std.mem.Allocator,
    entries: std.ArrayList(WalkEntry),
    max_depth: usize,

    fn init(allocator: std.mem.Allocator, max_depth: usize) DirectoryWalker {
        return .{
            .allocator = allocator,
            .entries = std.ArrayList(WalkEntry).init(allocator),
            .max_depth = max_depth,
        };
    }

    fn deinit(self: *DirectoryWalker) void {
        for (self.entries.items) |entry| {
            self.allocator.free(entry.path);
        }
        self.entries.deinit();
    }

    fn walk(self: *DirectoryWalker, base_path: []const u8) !void {
        try self.walkRecursive(base_path, 0);
    }

    fn walkRecursive(self: *DirectoryWalker, dir_path: []const u8, depth: usize) !void {
        if (depth > self.max_depth) return;

        var dir = std.fs.cwd().openDir(dir_path, .{ .iterate = true }) catch |err| {
            // permission denied, broken link, etc -- skip silently
            _ = err;
            return;
        };
        defer dir.close();

        var iter = dir.iterate();
        while (try iter.next()) |entry| {
            // build full path
            const full_path = try std.fs.path.join(self.allocator, &.{ dir_path, entry.name });

            // copy name (entry.name is only valid during iteration)
            const name_copy = try self.allocator.dupe(u8, entry.name);

            try self.entries.append(.{
                .path = full_path,
                .name = name_copy,
                .kind = entry.kind,
                .depth = depth,
            });

            // recurse into subdirectories
            if (entry.kind == .directory) {
                try self.walkRecursive(full_path, depth + 1);
            }
        }
    }
};

test "walker finds files in nested directories" {
    const allocator = std.testing.allocator;

    // create a temp directory structure for testing
    var tmp_dir = std.testing.tmpDir(.{});
    defer tmp_dir.cleanup();

    // create some files and subdirs
    try tmp_dir.dir.writeFile(.{ .sub_path = "file1.txt", .data = "hello" });
    try tmp_dir.dir.makeDir("subdir");
    var sub = try tmp_dir.dir.openDir("subdir", .{});
    defer sub.close();
    try sub.writeFile(.{ .sub_path = "file2.txt", .data = "world" });

    // get the temp dir path
    const tmp_path = try tmp_dir.dir.realpathAlloc(allocator, ".");
    defer allocator.free(tmp_path);

    var walker = DirectoryWalker.init(allocator, 10);
    defer walker.deinit();

    try walker.walk(tmp_path);

    // should find at least: file1.txt, subdir, subdir/file2.txt
    try std.testing.expect(walker.entries.items.len >= 3);

    // verify we found file1.txt
    var found_file1 = false;
    for (walker.entries.items) |entry| {
        if (std.mem.eql(u8, entry.name, "file1.txt")) {
            found_file1 = true;
            try std.testing.expectEqual(std.fs.Dir.Entry.Kind.file, entry.kind);
            try std.testing.expectEqual(@as(usize, 0), entry.depth);
        }
    }
    try std.testing.expect(found_file1);
}

The key decision here is storing full paths. During iteration, entry.name is a pointer into the iterator's internal buffer -- it's only valid until the next iter.next() call. If you store it and use it later, you'll get garbage or a crash. We must dupe (duplicate) the name into our own memory. Same story with the full path -- std.fs.path.join allocates a new string that we own.

The max_depth parameter prevents infinite recursion from symbolic link cycles. If directory A contains a symlink to directory B, and B contains a symlink back to A, a naive walker would recurse forever. The depth limit is a simple brute-force protection. A more sophisticated approach would track visited inodes (device number + inode number pairs uniquely identify filesystem objects), but the depth limit works for most practical cases.

We also silently skip directories we can't open. On a real system you'll hit "Permission denied" on /root, /proc/1, and other protected directories. Crashing the entire walk because of one inaccessible directory would be useless. The catch converts the error into a no-op -- the entry still appears in our results (we added it before trying to recurse), but its children don't.

Symlinks: detecting, reading, and following

Symbolic links are filesystem entries that point to another path. They're like shortcuts on Windows but more deeply integrated into the OS. The tricky part is that most filesystem operations follow symlinks transparently -- stat on a symlink gives you info about the target, not the link itself. To inspect the link itself you need lstat, and to read where it points you need readlink:

pub fn inspectSymlinks(dir_path: []const u8, allocator: std.mem.Allocator) !void {
    var dir = try std.fs.cwd().openDir(dir_path, .{ .iterate = true });
    defer dir.close();

    const stdout = std.io.getStdOut().writer();

    var iter = dir.iterate();
    while (try iter.next()) |entry| {
        if (entry.kind == .sym_link) {
            // read the symlink target
            var buf: [std.fs.max_path_bytes]u8 = undefined;
            const target = dir.readLink(entry.name, &buf) catch |err| {
                try stdout.print("  {s} -> (unreadable: {})\n", .{ entry.name, err });
                continue;
            };

            // check if the target actually exists
            const target_exists = blk: {
                dir.access(entry.name, .{}) catch {
                    break :blk false;
                };
                break :blk true;
            };

            const status = if (target_exists) "OK" else "BROKEN";
            try stdout.print("  {s} -> {s} [{s}]\n", .{ entry.name, target, status });
        }
    }
}

pub fn createAndInspectSymlink(allocator: std.mem.Allocator) !void {
    const stdout = std.io.getStdOut().writer();

    // create a temp file to link to
    const tmp = try std.fs.cwd().createFile("_test_target.txt", .{});
    tmp.close();
    defer std.fs.cwd().deleteFile("_test_target.txt") catch {};

    // create a symlink
    try std.fs.cwd().symLink("_test_target.txt", "_test_link.txt", .{});
    defer std.fs.cwd().deleteFile("_test_link.txt") catch {};

    // read the link target
    var buf: [std.fs.max_path_bytes]u8 = undefined;
    const target = try std.fs.cwd().readLink("_test_link.txt", &buf);
    try stdout.print("symlink: _test_link.txt -> {s}\n", .{target});

    // stat follows the link (gives info about target)
    const stat = try std.fs.cwd().statFile("_test_link.txt");
    try stdout.print("stat size (target): {d}\n", .{stat.size});

    _ = allocator;
}

The readLink function takes a buffer and returns a slice into it -- the symlink target path. Note that symlink targets can be relative or absolute. A relative symlink like ../lib/libfoo.so is resolved relative to the directory containing the link, not the current working directory. This distinction catches people all the time.

The "broken symlink" check uses access which follows the link and checks if the final target exists. If the target was deleted or moved, the symlink is "dangling" -- it still exists as a directory entry but points nowhere. The ls -la command shows these in red on most terminals. Our code labels them BROKEN in the output.

One thing Zig does NOT provide in the standard library is an lstat equivalent -- a stat call that returns info about the symlink itself rather than its target. On Linux you'd use the lstat syscall directly through std.posix.lstat. For most applications you don't need it, but if you're writing a backup tool that needs to preserve symlinks as-is (rather than following them), it matters.

File system permissions and ownership

On Unix systems, every file has an owner (user ID), a group (group ID), and a 12-bit permission mode. The bottom 9 bits are the familiar rwxrwxrwx. The top 3 bits are the setuid, setgid, and sticky bits -- special permissions that affect execution behavior:

const PermissionInfo = struct {
    mode: u32,
    readable: bool,
    writable: bool,
    executable: bool,

    fn fromMode(mode: u32) PermissionInfo {
        return .{
            .mode = mode,
            .readable = (mode & 0o444) != 0,
            .writable = (mode & 0o222) != 0,
            .executable = (mode & 0o111) != 0,
        };
    }

    fn formatMode(mode: u32, buf: *[10]u8) []const u8 {
        buf[0] = if (mode & 0o400 != 0) 'r' else '-';
        buf[1] = if (mode & 0o200 != 0) 'w' else '-';
        buf[2] = if (mode & 0o100 != 0) 'x' else '-';
        buf[3] = if (mode & 0o040 != 0) 'r' else '-';
        buf[4] = if (mode & 0o020 != 0) 'w' else '-';
        buf[5] = if (mode & 0o010 != 0) 'x' else '-';
        buf[6] = if (mode & 0o004 != 0) 'r' else '-';
        buf[7] = if (mode & 0o002 != 0) 'w' else '-';
        buf[8] = if (mode & 0o001 != 0) 'x' else '-';
        buf[9] = 0;
        return buf[0..9];
    }
};

pub fn checkPermissions(path: []const u8) !void {
    const stdout = std.io.getStdOut().writer();
    const stat = try std.fs.cwd().statFile(path);
    const info = PermissionInfo.fromMode(stat.mode);

    var buf: [10]u8 = undefined;
    const perms_str = PermissionInfo.formatMode(stat.mode, &buf);

    try stdout.print("{s}: {s} (0o{o})\n", .{ path, perms_str, stat.mode & 0o7777 });
    try stdout.print("  readable:   {}\n", .{info.readable});
    try stdout.print("  writable:   {}\n", .{info.writable});
    try stdout.print("  executable: {}\n", .{info.executable});

    // check for special bits
    if (stat.mode & 0o4000 != 0) try stdout.print("  SETUID bit set\n", .{});
    if (stat.mode & 0o2000 != 0) try stdout.print("  SETGID bit set\n", .{});
    if (stat.mode & 0o1000 != 0) try stdout.print("  STICKY bit set\n", .{});
}

The setuid bit (0o4000) is one of those Unix mechanisms that's both incredibly useful and terrifyingly dangerous. When set on an executable, it runs as the file's owner rather than the user who launched it. That's how passwd works -- it needs to write to /etc/shadow which is owned by root, so the passwd binary has setuid root. But a setuid program with a buffer overflow is an instant privilege escalation vulnerability. If you've been following the ethical hacking series, you know this is one of the first things pentesters check on a Linux system.

The sticky bit (0o1000) on a directory means only the file owner can delete files in it. That's why /tmp has permissions drwxrwxrwt -- the t at the end is the sticky bit. Without it, anyone could delete anyone else's temporary files.

For our purposes, reading permissions is more useful than setting them. Zig provides std.fs.cwd().chmod() for changing permissions, but you generally want to create files with the right permissions from the start rather than creating them with default permissions and then fixing them afterward.

Creating, renaming, and deleting files and directories

The CRUD operations of the file system. These are straightforward individually but have interesting failure modes when combined:

pub fn fileSystemOperations(allocator: std.mem.Allocator) !void {
    const stdout = std.io.getStdOut().writer();
    _ = allocator;

    // create a directory
    std.fs.cwd().makeDir("test_project") catch |err| switch (err) {
        error.PathAlreadyExists => {
            try stdout.print("Directory already exists, continuing...\n", .{});
        },
        else => return err,
    };
    defer std.fs.cwd().deleteDir("test_project") catch {};

    // create a nested directory structure (mkdir -p equivalent)
    try std.fs.cwd().makePath("test_project/src/utils");
    defer {
        // cleanup in reverse order
        std.fs.cwd().deleteDir("test_project/src/utils") catch {};
        std.fs.cwd().deleteDir("test_project/src") catch {};
    }

    // create a file inside
    {
        const file = try std.fs.cwd().createFile("test_project/src/main.zig", .{});
        defer file.close();
        try file.writeAll("const std = @import(\"std\");\n");
    }
    defer std.fs.cwd().deleteFile("test_project/src/main.zig") catch {};

    // verify it exists
    const stat = try std.fs.cwd().statFile("test_project/src/main.zig");
    try stdout.print("Created file: {d} bytes\n", .{stat.size});

    // rename (move) the file
    try std.fs.cwd().rename("test_project/src/main.zig", "test_project/src/app.zig");

    // verify rename worked
    std.fs.cwd().access("test_project/src/main.zig", .{}) catch {
        try stdout.print("main.zig no longer exists (good)\n", .{});
    };
    try std.fs.cwd().access("test_project/src/app.zig", .{});
    try stdout.print("app.zig exists (rename worked)\n", .{});

    // cleanup the renamed file
    try std.fs.cwd().deleteFile("test_project/src/app.zig");
}

test "create and delete directory tree" {
    const dir = std.testing.tmpDir(.{});
    defer dir.cleanup();

    // create nested structure
    try dir.dir.makePath("a/b/c");
    try dir.dir.writeFile(.{ .sub_path = "a/b/c/file.txt", .data = "test" });

    // verify it exists
    const stat = try dir.dir.statFile("a/b/c/file.txt");
    try std.testing.expectEqual(@as(u64, 4), stat.size);

    // delete leaf file
    try dir.dir.deleteFile("a/b/c/file.txt");

    // now we can delete the empty directories
    try dir.dir.deleteDir("a/b/c");
    try dir.dir.deleteDir("a/b");
    try dir.dir.deleteDir("a");
}

Several things to note. First, makeDir fails if the directory already exists -- that's by design, because "create this directory" and "ensure this directory exists" are different operations. The makePath function is the mkdir -p equivalent: it creates all intermediate directories and doesn't fail if any already exist.

Second, deleteDir only works on EMPTY directories. If there are files or subdirectories inside, it returns error.DirNotEmpty. To delete a directory tree you need to walk it bottom-up, deleting files first and then empty directories in reverse depth order. This is intentional -- accidentally deleting a non-empty directory tree is the kind of mistake that ends careers. The standard library makes you be explicit about it.

Third, rename is atomic on most filesystems. If your program crashes between calling rename and the function returning, the file will be at either the old path or the new path, never in a half-moved state. This atomicity is what makes rename useful for safe file updates (which we'll use in the next section).

Temporary files and atomic writes

When writing important data, you don't want a crash to leave you with a half-written file. The solution: write to a temporary file first, then atomically rename it over the target. This way the target file is either the old version or the new version, never a corrupted mix:

const AtomicWriter = struct {
    allocator: std.mem.Allocator,
    target_path: []const u8,
    tmp_path: []const u8,
    file: std.fs.File,

    fn init(allocator: std.mem.Allocator, target_path: []const u8) !AtomicWriter {
        // create temp file path by appending .tmp suffix
        const tmp_path = try std.fmt.allocPrint(allocator, "{s}.tmp", .{target_path});

        const file = try std.fs.cwd().createFile(tmp_path, .{});

        return .{
            .allocator = allocator,
            .target_path = target_path,
            .tmp_path = tmp_path,
            .file = file,
        };
    }

    fn write(self: *AtomicWriter, data: []const u8) !void {
        try self.file.writeAll(data);
    }

    fn commit(self: *AtomicWriter) !void {
        // flush and close the temp file
        self.file.close();

        // atomically replace target with temp file
        std.fs.cwd().rename(self.tmp_path, self.target_path) catch |err| {
            // if rename fails, try to clean up temp file
            std.fs.cwd().deleteFile(self.tmp_path) catch {};
            return err;
        };

        self.allocator.free(self.tmp_path);
    }

    fn abort(self: *AtomicWriter) void {
        self.file.close();
        std.fs.cwd().deleteFile(self.tmp_path) catch {};
        self.allocator.free(self.tmp_path);
    }
};

test "atomic write produces complete file or nothing" {
    const allocator = std.testing.allocator;

    // write atomically
    var writer = try AtomicWriter.init(allocator, "/tmp/test_atomic.txt");

    try writer.write("line 1\n");
    try writer.write("line 2\n");
    try writer.write("line 3\n");
    try writer.commit();

    // verify the file exists and has the right content
    const content = try std.fs.cwd().readFileAlloc(allocator, "/tmp/test_atomic.txt", 1024);
    defer allocator.free(content);

    try std.testing.expectEqualStrings("line 1\nline 2\nline 3\n", content);

    // cleanup
    try std.fs.cwd().deleteFile("/tmp/test_atomic.txt");
}

test "atomic write abort leaves no file" {
    const allocator = std.testing.allocator;

    // start writing but abort
    var writer = try AtomicWriter.init(allocator, "/tmp/test_abort.txt");
    try writer.write("partial data");
    writer.abort();

    // verify target file does NOT exist
    std.fs.cwd().access("/tmp/test_abort.txt", .{}) catch |err| {
        try std.testing.expectEqual(error.FileNotFound, err);
        return;
    };
    // if we get here, the file exists (bad)
    try std.fs.cwd().deleteFile("/tmp/test_abort.txt");
    return error.TestFailed;
}

The commit function is the critical piece. It closes the temporary file (flushing any buffered data to disk), then renames it over the target. The rename is atomic on POSIX filesystems -- either it happens entirely or not at all. If the program crashes during writeAll, the temp file has partial data but the target file is untouched. If the program crashes during rename... well, rename is a single syscall, so it's effectively instantaneous from the crash perspective.

The abort function is for when something goes wrong during the write. It closes and deletes the temp file, leaving the filesystem exactly as it was before init was called. This is the error recovery path.

Real applications use this pattern everywhere. Text editors, databases, configuration managers -- anything where losing data is unacceptable. SQLite famously uses a write-ahead log (which we built in Zig back in episode 41) for the same reason: crash safety. The atomic rename approach is simpler and works well for files that are written in their entirety (configs, serialized state, generated output).

Practical example: building a tree command

Time to put it all together. The tree command is a classic Unix utility that displays directory structure as an indented tree. Our version will show file types, sizes, and handle the box-drawing characters that make tree output look nice:

const TreeConfig = struct {
    show_hidden: bool = false,
    show_size: bool = false,
    max_depth: usize = 10,
    dirs_only: bool = false,
};

const TreeStats = struct {
    dirs: usize = 0,
    files: usize = 0,
    symlinks: usize = 0,
    total_size: u64 = 0,
};

fn printTree(
    dir: std.fs.Dir,
    prefix: []const u8,
    config: TreeConfig,
    stats: *TreeStats,
    depth: usize,
    allocator: std.mem.Allocator,
    writer: anytype,
) !void {
    if (depth >= config.max_depth) return;

    // collect and sort entries
    var entries = std.ArrayList(std.fs.Dir.Entry).init(allocator);
    defer entries.deinit();

    var names = std.ArrayList([]const u8).init(allocator);
    defer {
        for (names.items) |n| allocator.free(n);
        names.deinit();
    }

    var iter = dir.iterate();
    while (try iter.next()) |entry| {
        if (!config.show_hidden and entry.name[0] == '.') continue;
        if (config.dirs_only and entry.kind != .directory) continue;

        // copy name since iterator reuses buffer
        const name_copy = try allocator.dupe(u8, entry.name);
        try names.append(name_copy);
        try entries.append(.{
            .name = names.items[names.items.len - 1],
            .kind = entry.kind,
        });
    }

    // sort entries alphabetically (directories first, then files)
    const SortCtx = struct {
        fn lessThan(_: void, a: std.fs.Dir.Entry, b: std.fs.Dir.Entry) bool {
            // directories before files
            if (a.kind == .directory and b.kind != .directory) return true;
            if (a.kind != .directory and b.kind == .directory) return false;
            // alphabetical within same type
            return std.mem.lessThan(u8, a.name, b.name);
        }
    };
    std.mem.sort(std.fs.Dir.Entry, entries.items, {}, SortCtx.lessThan);

    for (entries.items, 0..) |entry, i| {
        const is_last = (i == entries.items.len - 1);
        const connector = if (is_last) "\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80 " else "\xe2\x94\x9c\xe2\x94\x80\xe2\x94\x80 ";
        const extension = if (is_last) "    " else "\xe2\x94\x82   ";

        // print the entry
        try writer.print("{s}{s}", .{ prefix, connector });

        switch (entry.kind) {
            .directory => {
                try writer.print("{s}/\n", .{entry.name});
                stats.dirs += 1;

                // recurse
                const new_prefix = try std.fmt.allocPrint(allocator, "{s}{s}", .{ prefix, extension });
                defer allocator.free(new_prefix);

                var sub = dir.openDir(entry.name, .{ .iterate = true }) catch {
                    continue; // skip inaccessible directories
                };
                defer sub.close();

                try printTree(sub, new_prefix, config, stats, depth + 1, allocator, writer);
            },
            .sym_link => {
                var buf: [std.fs.max_path_bytes]u8 = undefined;
                const target = dir.readLink(entry.name, &buf) catch "???";
                try writer.print("{s} -> {s}\n", .{ entry.name, target });
                stats.symlinks += 1;
            },
            else => {
                if (config.show_size) {
                    const stat = dir.statFile(entry.name) catch null;
                    if (stat) |s| {
                        stats.total_size += s.size;
                        try writer.print("{s} [{d} bytes]\n", .{ entry.name, s.size });
                    } else {
                        try writer.print("{s}\n", .{entry.name});
                    }
                } else {
                    try writer.print("{s}\n", .{entry.name});
                }
                stats.files += 1;
            },
        }
    }
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer {
        const check = gpa.deinit();
        if (check == .leak) @panic("memory leak detected");
    }
    const allocator = gpa.allocator();

    const args = try std.process.argsAlloc(allocator);
    defer std.process.argsFree(allocator, args);

    const root_path = if (args.len > 1) args[1] else ".";

    // parse flags
    var config = TreeConfig{};
    for (args[1..]) |arg| {
        if (std.mem.eql(u8, arg, "-a")) config.show_hidden = true;
        if (std.mem.eql(u8, arg, "-s")) config.show_size = true;
        if (std.mem.eql(u8, arg, "-d")) config.dirs_only = true;
    }

    const stdout = std.io.getStdOut().writer();
    try stdout.print("{s}\n", .{root_path});

    var dir = try std.fs.cwd().openDir(root_path, .{ .iterate = true });
    defer dir.close();

    var stats = TreeStats{};
    try printTree(dir, "", config, &stats, 0, allocator, stdout);

    try stdout.print("\n{d} directories, {d} files", .{ stats.dirs, stats.files });
    if (stats.symlinks > 0) {
        try stdout.print(", {d} symlinks", .{stats.symlinks});
    }
    if (config.show_size) {
        try stdout.print(", {d} bytes total", .{stats.total_size});
    }
    try stdout.print("\n", .{});
}

The tree-drawing characters (those Unicode box-drawing symbols) are the visual magic. The connector is either a T-junction for middle entries or an L-junction for the last entry. The extension is either a vertical bar (for entries that have siblings below) or spaces (for the last entry). This recursive indentation pattern is what creates the tree structure in the output. The UTF-8 bytes are hardcoded rather than using Unicode escapes because Zig strings are raw bytes -- \xe2\x94\x82 is the UTF-8 encoding of the box-drawing vertical line character.

The sorting puts directories before files, then alphabetical within each group. This matches how tree and most file managers display things -- directories at the top, files at the bottom. The std.mem.sort function takes a comparison function as a parameter, which is the same pattern we'd use in C's qsort but type-safe.

The printTree function is recursive but bounded by config.max_depth. Each recursive call opens a subdirectory handle, iterates its contents, and closes it when done. We're NOT accumulating all entries in memory like the DirectoryWalker earlier -- instead we print as we go, which means this works fine on enormous directory trees that wouldn't fit in memory. The tradeoff is that you can't sort the entire tree globally, only within each directory level. But that's how the real tree command works too.

One performance note: opening and closing a directory handle for every subdirectory involves syscalls, and on deep trees with thousands of directories that adds up. The real tree command uses the fts family of functions (or nftw on some systems) which are optimized for recursive traversal. Our version is simpler but plenty fast for normal use.

Exercises

Extend the DirectoryWalker to skip directories that contain a .ignore file. When the walker encounters a directory, it should check for the presense of a .ignore file inside it, and if found, skip that entire subtree. Write a test that creates a directory tree with .ignore in one branch and verifies that branch is excluded.
Write a findFiles function that takes a directory path and a glob pattern (like "*.zig" or "test_*") and returns all matching files recursively. Implement basic glob matching yourself (just * wildcards at the start and end, no regex). Test it by creating temporary files with various names and verifying the correct ones are returned.
Build a du (disk usage) command that walks a directory tree and prints the total size of each subdirectory, sorted from largest to smallest. Format sizes in human-readable units (bytes, KB, MB). Test it with a known directory structure where you've written files of specific sizes.

Wat we geleerd hebben

std.fs.cwd().openDir() with .iterate = true gives you a directory iterator that lazily yields entries -- names and types without loading everything into memory
statFile returns metadata (size, timestamps, permissions) and timestamps use nanosecond precision since the Unix epoch
Permission bits use the classic Unix octal model: 9 bits for rwxrwxrwx, plus 3 special bits (setuid, setgid, sticky)
Building a recursive walker requires duplicating entry names (the iterator reuses its buffer), tracking depth to prevent symlink loops, and gracefully handling permission errors
Symlinks need readLink to find their target -- most operations follow symlinks transparently, so detecting and inspecting them requires explicit code
makeDir fails on existing directories (be explicit) while makePath creates the full chain silently -- different operations for diferent needs
Atomic writes use a temp file + rename pattern to guarantee crash safety: the target is either the old version or the new version, never half-written
The tree command combines directory iteration, sorting, recursive descent, and UTF-8 box-drawing characters into a practical tool that demonstrates real-world filesystem programming

We started a new arc today -- low-level OS interaction from Zig. File systems are just the beginning. The concepts here (handles, iterators, error handling on every operation, graceful degradation) apply to all OS-level programming. Working directly with the file system also forces you to think about things higher-level languages hide from you: when memory for names gets freed, what happens if a file disapears between checking and opening, how permissions interact with process identity.

Thanks for reading!

Hive account@scipio

Learn Zig Series (#62) - File Systems: Reading Directories and Metadata

Learn Zig Series (#62) - File Systems: Reading Directories and Metadata

What will I learn

Requirements

Difficulty

Curriculum (of the Learn Zig Series):

Learn Zig Series (#62) - File Systems: Reading Directories and Metadata

Opening and iterating directories

Reading file metadata with stat

Building a recursive directory walker

Symlinks: detecting, reading, and following

File system permissions and ownership

Creating, renaming, and deleting files and directories

Temporary files and atomic writes

Practical example: building a tree command

Exercises

Wat we geleerd hebben

Curriculum (of the `Learn Zig Series`):