std.fs.Dir and the iterator API;stat;tree command that displays directory structure.Learn Zig Series):We just wrapped up Project G -- the assembler, two-pass assembly, and disassembler across episodes 59, 60, and 61. We went from bit encoding to human-readable assembly and back. That was deep in the weeds of binary formats and instruction sets. Today we're switching gears entirely. We're going to work with the file system -- reading directories, inspecting file metadata, handling symlinks, and building a complete tree command by the end.
File system operations are one of those things that sound boring until you actually need them, and then suddenly everything is edge cases. Paths with spaces. Symlink loops. Permission denied on a directory you own. Files that disappear between listing and opening. Zig's standard library gives us std.fs which wraps the POSIX and Windows syscalls with proper error handling -- no silent failures, no null pointer surprises. We've used std.fs briefly in episode 10 for basic file I/O and in the shell project (episodes 47-50) for process spawning. Now we're going to really explore what it can do.
Here we go!
The fundamental operation: open a directory and list what's inside. In Zig, std.fs.cwd().openDir() gives you a Dir handle, and .iterate() gives you an iterator that yields entries one at a time. This is the lazy evaluation pattern we covered in episode 23 -- entries are read from the kernel on demand, not all at once:
const std = @import("std");
pub fn listDirectory(dir_path: []const u8) !void {
var dir = try std.fs.cwd().openDir(dir_path, .{ .iterate = true });
defer dir.close();
const stdout = std.io.getStdOut().writer();
var iter = dir.iterate();
while (try iter.next()) |entry| {
const kind_str = switch (entry.kind) {
.file => "FILE",
.directory => "DIR ",
.sym_link => "LINK",
.named_pipe => "PIPE",
.unix_domain_socket => "SOCK",
.block_device => "BDEV",
.character_device => "CDEV",
.whiteout => "WHTE",
.door => "DOOR",
.event_port => "EVPT",
.unknown => "????",
};
try stdout.print(" {s} {s}\n", .{ kind_str, entry.name });
}
}
pub fn main() !void {
const args = try std.process.argsAlloc(std.heap.page_allocator);
defer std.process.argsFree(std.heap.page_allocator, args);
const path = if (args.len > 1) args[1] else ".";
try listDirectory(path);
}
A few things to notice here. The .{ .iterate = true } flag is mandatory -- if you try to iterate a Dir that was opened without it, you get a compile error. This is Zig being explicit about intentions: opening a directory for iteration might require different kernel flags than opening it for path resolution, and Zig wants you to say what you actually need.
The entry.kind field tells you what type of file system object this is. Most of the time you'll see .file and .directory, but on Unix systems you can also find symbolic links, named pipes (FIFOs), Unix domain sockets, and device files. The .whiteout type is an overlay filesystem thing (Docker uses it). The .unknown type means the filesystem didn't report the entry type -- some older filesystems don't include type information in directory entries, and you'd need to stat the file to find out.
One important detail: the iterator does NOT guarantee any particular ordering. You might get entries alphabetically, or by inode number, or in whatever order the filesystem stored them. If you need sorted output (like the tree command we're building later), you'll have to collect entries into a list and sort them yourself.
Knowing a file's name and type is only the start. The stat call gives you everything else -- size, timestamps, permissions, owner, number of hard links:
const Stat = std.fs.File.Stat;
pub fn printFileInfo(dir: std.fs.Dir, name: []const u8, writer: anytype) !void {
const stat = dir.statFile(name) catch |err| {
try writer.print(" {s}: cannot stat: {}\n", .{ name, err });
return;
};
const size = stat.size;
const atime = stat.atime;
const mtime = stat.mtime;
// convert nanosecond timestamps to seconds
const atime_secs = @divFloor(atime, std.time.ns_per_s);
const mtime_secs = @divFloor(mtime, std.time.ns_per_s);
try writer.print(" {s}:\n", .{name});
try writer.print(" size: {d} bytes\n", .{size});
try writer.print(" atime: {d} (epoch secs)\n", .{atime_secs});
try writer.print(" mtime: {d} (epoch secs)\n", .{mtime_secs});
// permissions (Unix only)
const mode = stat.mode;
try writer.print(" mode: 0o{o}\n", .{mode});
// decode permission bits
const owner_r: u8 = if (mode & 0o400 != 0) 'r' else '-';
const owner_w: u8 = if (mode & 0o200 != 0) 'w' else '-';
const owner_x: u8 = if (mode & 0o100 != 0) 'x' else '-';
const group_r: u8 = if (mode & 0o040 != 0) 'r' else '-';
const group_w: u8 = if (mode & 0o020 != 0) 'w' else '-';
const group_x: u8 = if (mode & 0o010 != 0) 'x' else '-';
const other_r: u8 = if (mode & 0o004 != 0) 'r' else '-';
const other_w: u8 = if (mode & 0o002 != 0) 'w' else '-';
const other_x: u8 = if (mode & 0o001 != 0) 'x' else '-';
try writer.print(" perms: {c}{c}{c}{c}{c}{c}{c}{c}{c}\n", .{
owner_r, owner_w, owner_x,
group_r, group_w, group_x,
other_r, other_w, other_x,
});
}
pub fn main() !void {
var dir = try std.fs.cwd().openDir(".", .{ .iterate = true });
defer dir.close();
const stdout = std.io.getStdOut().writer();
var iter = dir.iterate();
while (try iter.next()) |entry| {
if (entry.kind == .file) {
try printFileInfo(dir, entry.name, stdout);
}
}
}
The stat call returns a Stat struct with fields for size (in bytes), access time (atime), modification time (mtime), and on Unix systems the mode field which encodes permissions. The timestamps are in nanoseconds since the Unix epoch (January 1, 1970) -- Zig uses nanosecond precision because modern filesystems support it (ext4, APFS, NTFS all track sub-second timestamps).
The permission bit decoding is classic Unix. The mode is a 12-bit value where the bottom 9 bits encode read/write/execute for owner, group, and others. 0o644 means owner can read+write, group and others can only read -- the default for most files. 0o755 adds execute permission for everyone -- typical for directories and executable binaries. We use octal formatting ({o}) because that's how permissions are traditionally displayed on Unix. The 0o prefix in Zig's number literals is the octal equivalent of 0x for hex.
Having said that, this code is Unix-specific. On Windows, stat.mode exists but has different semantics -- Windows uses ACLs (Access Control Lists) instead of the Unix rwx model. Cross-platform file permission handling is one of those areas where you basically need an if (builtin.os.tag == .windows) branch. Zig's standard library abstracts most of this, but permissions are inherently OS-specific.
Listing a single directory is easy. Walking an entire tree recursively is where things get interesting. We need to handle nested directories, track our depth (to avoid infinite recursion from symlink loops), and accumulate results:
const WalkEntry = struct {
path: []const u8,
name: []const u8,
kind: std.fs.Dir.Entry.Kind,
depth: usize,
};
const DirectoryWalker = struct {
allocator: std.mem.Allocator,
entries: std.ArrayList(WalkEntry),
max_depth: usize,
fn init(allocator: std.mem.Allocator, max_depth: usize) DirectoryWalker {
return .{
.allocator = allocator,
.entries = std.ArrayList(WalkEntry).init(allocator),
.max_depth = max_depth,
};
}
fn deinit(self: *DirectoryWalker) void {
for (self.entries.items) |entry| {
self.allocator.free(entry.path);
}
self.entries.deinit();
}
fn walk(self: *DirectoryWalker, base_path: []const u8) !void {
try self.walkRecursive(base_path, 0);
}
fn walkRecursive(self: *DirectoryWalker, dir_path: []const u8, depth: usize) !void {
if (depth > self.max_depth) return;
var dir = std.fs.cwd().openDir(dir_path, .{ .iterate = true }) catch |err| {
// permission denied, broken link, etc -- skip silently
_ = err;
return;
};
defer dir.close();
var iter = dir.iterate();
while (try iter.next()) |entry| {
// build full path
const full_path = try std.fs.path.join(self.allocator, &.{ dir_path, entry.name });
// copy name (entry.name is only valid during iteration)
const name_copy = try self.allocator.dupe(u8, entry.name);
try self.entries.append(.{
.path = full_path,
.name = name_copy,
.kind = entry.kind,
.depth = depth,
});
// recurse into subdirectories
if (entry.kind == .directory) {
try self.walkRecursive(full_path, depth + 1);
}
}
}
};
test "walker finds files in nested directories" {
const allocator = std.testing.allocator;
// create a temp directory structure for testing
var tmp_dir = std.testing.tmpDir(.{});
defer tmp_dir.cleanup();
// create some files and subdirs
try tmp_dir.dir.writeFile(.{ .sub_path = "file1.txt", .data = "hello" });
try tmp_dir.dir.makeDir("subdir");
var sub = try tmp_dir.dir.openDir("subdir", .{});
defer sub.close();
try sub.writeFile(.{ .sub_path = "file2.txt", .data = "world" });
// get the temp dir path
const tmp_path = try tmp_dir.dir.realpathAlloc(allocator, ".");
defer allocator.free(tmp_path);
var walker = DirectoryWalker.init(allocator, 10);
defer walker.deinit();
try walker.walk(tmp_path);
// should find at least: file1.txt, subdir, subdir/file2.txt
try std.testing.expect(walker.entries.items.len >= 3);
// verify we found file1.txt
var found_file1 = false;
for (walker.entries.items) |entry| {
if (std.mem.eql(u8, entry.name, "file1.txt")) {
found_file1 = true;
try std.testing.expectEqual(std.fs.Dir.Entry.Kind.file, entry.kind);
try std.testing.expectEqual(@as(usize, 0), entry.depth);
}
}
try std.testing.expect(found_file1);
}
The key decision here is storing full paths. During iteration, entry.name is a pointer into the iterator's internal buffer -- it's only valid until the next iter.next() call. If you store it and use it later, you'll get garbage or a crash. We must dupe (duplicate) the name into our own memory. Same story with the full path -- std.fs.path.join allocates a new string that we own.
The max_depth parameter prevents infinite recursion from symbolic link cycles. If directory A contains a symlink to directory B, and B contains a symlink back to A, a naive walker would recurse forever. The depth limit is a simple brute-force protection. A more sophisticated approach would track visited inodes (device number + inode number pairs uniquely identify filesystem objects), but the depth limit works for most practical cases.
We also silently skip directories we can't open. On a real system you'll hit "Permission denied" on /root, /proc/1, and other protected directories. Crashing the entire walk because of one inaccessible directory would be useless. The catch converts the error into a no-op -- the entry still appears in our results (we added it before trying to recurse), but its children don't.
Symbolic links are filesystem entries that point to another path. They're like shortcuts on Windows but more deeply integrated into the OS. The tricky part is that most filesystem operations follow symlinks transparently -- stat on a symlink gives you info about the target, not the link itself. To inspect the link itself you need lstat, and to read where it points you need readlink:
pub fn inspectSymlinks(dir_path: []const u8, allocator: std.mem.Allocator) !void {
var dir = try std.fs.cwd().openDir(dir_path, .{ .iterate = true });
defer dir.close();
const stdout = std.io.getStdOut().writer();
var iter = dir.iterate();
while (try iter.next()) |entry| {
if (entry.kind == .sym_link) {
// read the symlink target
var buf: [std.fs.max_path_bytes]u8 = undefined;
const target = dir.readLink(entry.name, &buf) catch |err| {
try stdout.print(" {s} -> (unreadable: {})\n", .{ entry.name, err });
continue;
};
// check if the target actually exists
const target_exists = blk: {
dir.access(entry.name, .{}) catch {
break :blk false;
};
break :blk true;
};
const status = if (target_exists) "OK" else "BROKEN";
try stdout.print(" {s} -> {s} [{s}]\n", .{ entry.name, target, status });
}
}
}
pub fn createAndInspectSymlink(allocator: std.mem.Allocator) !void {
const stdout = std.io.getStdOut().writer();
// create a temp file to link to
const tmp = try std.fs.cwd().createFile("_test_target.txt", .{});
tmp.close();
defer std.fs.cwd().deleteFile("_test_target.txt") catch {};
// create a symlink
try std.fs.cwd().symLink("_test_target.txt", "_test_link.txt", .{});
defer std.fs.cwd().deleteFile("_test_link.txt") catch {};
// read the link target
var buf: [std.fs.max_path_bytes]u8 = undefined;
const target = try std.fs.cwd().readLink("_test_link.txt", &buf);
try stdout.print("symlink: _test_link.txt -> {s}\n", .{target});
// stat follows the link (gives info about target)
const stat = try std.fs.cwd().statFile("_test_link.txt");
try stdout.print("stat size (target): {d}\n", .{stat.size});
_ = allocator;
}
The readLink function takes a buffer and returns a slice into it -- the symlink target path. Note that symlink targets can be relative or absolute. A relative symlink like ../lib/libfoo.so is resolved relative to the directory containing the link, not the current working directory. This distinction catches people all the time.
The "broken symlink" check uses access which follows the link and checks if the final target exists. If the target was deleted or moved, the symlink is "dangling" -- it still exists as a directory entry but points nowhere. The ls -la command shows these in red on most terminals. Our code labels them BROKEN in the output.
One thing Zig does NOT provide in the standard library is an lstat equivalent -- a stat call that returns info about the symlink itself rather than its target. On Linux you'd use the lstat syscall directly through std.posix.lstat. For most applications you don't need it, but if you're writing a backup tool that needs to preserve symlinks as-is (rather than following them), it matters.
On Unix systems, every file has an owner (user ID), a group (group ID), and a 12-bit permission mode. The bottom 9 bits are the familiar rwxrwxrwx. The top 3 bits are the setuid, setgid, and sticky bits -- special permissions that affect execution behavior:
const PermissionInfo = struct {
mode: u32,
readable: bool,
writable: bool,
executable: bool,
fn fromMode(mode: u32) PermissionInfo {
return .{
.mode = mode,
.readable = (mode & 0o444) != 0,
.writable = (mode & 0o222) != 0,
.executable = (mode & 0o111) != 0,
};
}
fn formatMode(mode: u32, buf: *[10]u8) []const u8 {
buf[0] = if (mode & 0o400 != 0) 'r' else '-';
buf[1] = if (mode & 0o200 != 0) 'w' else '-';
buf[2] = if (mode & 0o100 != 0) 'x' else '-';
buf[3] = if (mode & 0o040 != 0) 'r' else '-';
buf[4] = if (mode & 0o020 != 0) 'w' else '-';
buf[5] = if (mode & 0o010 != 0) 'x' else '-';
buf[6] = if (mode & 0o004 != 0) 'r' else '-';
buf[7] = if (mode & 0o002 != 0) 'w' else '-';
buf[8] = if (mode & 0o001 != 0) 'x' else '-';
buf[9] = 0;
return buf[0..9];
}
};
pub fn checkPermissions(path: []const u8) !void {
const stdout = std.io.getStdOut().writer();
const stat = try std.fs.cwd().statFile(path);
const info = PermissionInfo.fromMode(stat.mode);
var buf: [10]u8 = undefined;
const perms_str = PermissionInfo.formatMode(stat.mode, &buf);
try stdout.print("{s}: {s} (0o{o})\n", .{ path, perms_str, stat.mode & 0o7777 });
try stdout.print(" readable: {}\n", .{info.readable});
try stdout.print(" writable: {}\n", .{info.writable});
try stdout.print(" executable: {}\n", .{info.executable});
// check for special bits
if (stat.mode & 0o4000 != 0) try stdout.print(" SETUID bit set\n", .{});
if (stat.mode & 0o2000 != 0) try stdout.print(" SETGID bit set\n", .{});
if (stat.mode & 0o1000 != 0) try stdout.print(" STICKY bit set\n", .{});
}
The setuid bit (0o4000) is one of those Unix mechanisms that's both incredibly useful and terrifyingly dangerous. When set on an executable, it runs as the file's owner rather than the user who launched it. That's how passwd works -- it needs to write to /etc/shadow which is owned by root, so the passwd binary has setuid root. But a setuid program with a buffer overflow is an instant privilege escalation vulnerability. If you've been following the ethical hacking series, you know this is one of the first things pentesters check on a Linux system.
The sticky bit (0o1000) on a directory means only the file owner can delete files in it. That's why /tmp has permissions drwxrwxrwt -- the t at the end is the sticky bit. Without it, anyone could delete anyone else's temporary files.
For our purposes, reading permissions is more useful than setting them. Zig provides std.fs.cwd().chmod() for changing permissions, but you generally want to create files with the right permissions from the start rather than creating them with default permissions and then fixing them afterward.
The CRUD operations of the file system. These are straightforward individually but have interesting failure modes when combined:
pub fn fileSystemOperations(allocator: std.mem.Allocator) !void {
const stdout = std.io.getStdOut().writer();
_ = allocator;
// create a directory
std.fs.cwd().makeDir("test_project") catch |err| switch (err) {
error.PathAlreadyExists => {
try stdout.print("Directory already exists, continuing...\n", .{});
},
else => return err,
};
defer std.fs.cwd().deleteDir("test_project") catch {};
// create a nested directory structure (mkdir -p equivalent)
try std.fs.cwd().makePath("test_project/src/utils");
defer {
// cleanup in reverse order
std.fs.cwd().deleteDir("test_project/src/utils") catch {};
std.fs.cwd().deleteDir("test_project/src") catch {};
}
// create a file inside
{
const file = try std.fs.cwd().createFile("test_project/src/main.zig", .{});
defer file.close();
try file.writeAll("const std = @import(\"std\");\n");
}
defer std.fs.cwd().deleteFile("test_project/src/main.zig") catch {};
// verify it exists
const stat = try std.fs.cwd().statFile("test_project/src/main.zig");
try stdout.print("Created file: {d} bytes\n", .{stat.size});
// rename (move) the file
try std.fs.cwd().rename("test_project/src/main.zig", "test_project/src/app.zig");
// verify rename worked
std.fs.cwd().access("test_project/src/main.zig", .{}) catch {
try stdout.print("main.zig no longer exists (good)\n", .{});
};
try std.fs.cwd().access("test_project/src/app.zig", .{});
try stdout.print("app.zig exists (rename worked)\n", .{});
// cleanup the renamed file
try std.fs.cwd().deleteFile("test_project/src/app.zig");
}
test "create and delete directory tree" {
const dir = std.testing.tmpDir(.{});
defer dir.cleanup();
// create nested structure
try dir.dir.makePath("a/b/c");
try dir.dir.writeFile(.{ .sub_path = "a/b/c/file.txt", .data = "test" });
// verify it exists
const stat = try dir.dir.statFile("a/b/c/file.txt");
try std.testing.expectEqual(@as(u64, 4), stat.size);
// delete leaf file
try dir.dir.deleteFile("a/b/c/file.txt");
// now we can delete the empty directories
try dir.dir.deleteDir("a/b/c");
try dir.dir.deleteDir("a/b");
try dir.dir.deleteDir("a");
}
Several things to note. First, makeDir fails if the directory already exists -- that's by design, because "create this directory" and "ensure this directory exists" are different operations. The makePath function is the mkdir -p equivalent: it creates all intermediate directories and doesn't fail if any already exist.
Second, deleteDir only works on EMPTY directories. If there are files or subdirectories inside, it returns error.DirNotEmpty. To delete a directory tree you need to walk it bottom-up, deleting files first and then empty directories in reverse depth order. This is intentional -- accidentally deleting a non-empty directory tree is the kind of mistake that ends careers. The standard library makes you be explicit about it.
Third, rename is atomic on most filesystems. If your program crashes between calling rename and the function returning, the file will be at either the old path or the new path, never in a half-moved state. This atomicity is what makes rename useful for safe file updates (which we'll use in the next section).
When writing important data, you don't want a crash to leave you with a half-written file. The solution: write to a temporary file first, then atomically rename it over the target. This way the target file is either the old version or the new version, never a corrupted mix:
const AtomicWriter = struct {
allocator: std.mem.Allocator,
target_path: []const u8,
tmp_path: []const u8,
file: std.fs.File,
fn init(allocator: std.mem.Allocator, target_path: []const u8) !AtomicWriter {
// create temp file path by appending .tmp suffix
const tmp_path = try std.fmt.allocPrint(allocator, "{s}.tmp", .{target_path});
const file = try std.fs.cwd().createFile(tmp_path, .{});
return .{
.allocator = allocator,
.target_path = target_path,
.tmp_path = tmp_path,
.file = file,
};
}
fn write(self: *AtomicWriter, data: []const u8) !void {
try self.file.writeAll(data);
}
fn commit(self: *AtomicWriter) !void {
// flush and close the temp file
self.file.close();
// atomically replace target with temp file
std.fs.cwd().rename(self.tmp_path, self.target_path) catch |err| {
// if rename fails, try to clean up temp file
std.fs.cwd().deleteFile(self.tmp_path) catch {};
return err;
};
self.allocator.free(self.tmp_path);
}
fn abort(self: *AtomicWriter) void {
self.file.close();
std.fs.cwd().deleteFile(self.tmp_path) catch {};
self.allocator.free(self.tmp_path);
}
};
test "atomic write produces complete file or nothing" {
const allocator = std.testing.allocator;
// write atomically
var writer = try AtomicWriter.init(allocator, "/tmp/test_atomic.txt");
try writer.write("line 1\n");
try writer.write("line 2\n");
try writer.write("line 3\n");
try writer.commit();
// verify the file exists and has the right content
const content = try std.fs.cwd().readFileAlloc(allocator, "/tmp/test_atomic.txt", 1024);
defer allocator.free(content);
try std.testing.expectEqualStrings("line 1\nline 2\nline 3\n", content);
// cleanup
try std.fs.cwd().deleteFile("/tmp/test_atomic.txt");
}
test "atomic write abort leaves no file" {
const allocator = std.testing.allocator;
// start writing but abort
var writer = try AtomicWriter.init(allocator, "/tmp/test_abort.txt");
try writer.write("partial data");
writer.abort();
// verify target file does NOT exist
std.fs.cwd().access("/tmp/test_abort.txt", .{}) catch |err| {
try std.testing.expectEqual(error.FileNotFound, err);
return;
};
// if we get here, the file exists (bad)
try std.fs.cwd().deleteFile("/tmp/test_abort.txt");
return error.TestFailed;
}
The commit function is the critical piece. It closes the temporary file (flushing any buffered data to disk), then renames it over the target. The rename is atomic on POSIX filesystems -- either it happens entirely or not at all. If the program crashes during writeAll, the temp file has partial data but the target file is untouched. If the program crashes during rename... well, rename is a single syscall, so it's effectively instantaneous from the crash perspective.
The abort function is for when something goes wrong during the write. It closes and deletes the temp file, leaving the filesystem exactly as it was before init was called. This is the error recovery path.
Real applications use this pattern everywhere. Text editors, databases, configuration managers -- anything where losing data is unacceptable. SQLite famously uses a write-ahead log (which we built in Zig back in episode 41) for the same reason: crash safety. The atomic rename approach is simpler and works well for files that are written in their entirety (configs, serialized state, generated output).
Time to put it all together. The tree command is a classic Unix utility that displays directory structure as an indented tree. Our version will show file types, sizes, and handle the box-drawing characters that make tree output look nice:
const TreeConfig = struct {
show_hidden: bool = false,
show_size: bool = false,
max_depth: usize = 10,
dirs_only: bool = false,
};
const TreeStats = struct {
dirs: usize = 0,
files: usize = 0,
symlinks: usize = 0,
total_size: u64 = 0,
};
fn printTree(
dir: std.fs.Dir,
prefix: []const u8,
config: TreeConfig,
stats: *TreeStats,
depth: usize,
allocator: std.mem.Allocator,
writer: anytype,
) !void {
if (depth >= config.max_depth) return;
// collect and sort entries
var entries = std.ArrayList(std.fs.Dir.Entry).init(allocator);
defer entries.deinit();
var names = std.ArrayList([]const u8).init(allocator);
defer {
for (names.items) |n| allocator.free(n);
names.deinit();
}
var iter = dir.iterate();
while (try iter.next()) |entry| {
if (!config.show_hidden and entry.name[0] == '.') continue;
if (config.dirs_only and entry.kind != .directory) continue;
// copy name since iterator reuses buffer
const name_copy = try allocator.dupe(u8, entry.name);
try names.append(name_copy);
try entries.append(.{
.name = names.items[names.items.len - 1],
.kind = entry.kind,
});
}
// sort entries alphabetically (directories first, then files)
const SortCtx = struct {
fn lessThan(_: void, a: std.fs.Dir.Entry, b: std.fs.Dir.Entry) bool {
// directories before files
if (a.kind == .directory and b.kind != .directory) return true;
if (a.kind != .directory and b.kind == .directory) return false;
// alphabetical within same type
return std.mem.lessThan(u8, a.name, b.name);
}
};
std.mem.sort(std.fs.Dir.Entry, entries.items, {}, SortCtx.lessThan);
for (entries.items, 0..) |entry, i| {
const is_last = (i == entries.items.len - 1);
const connector = if (is_last) "\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80 " else "\xe2\x94\x9c\xe2\x94\x80\xe2\x94\x80 ";
const extension = if (is_last) " " else "\xe2\x94\x82 ";
// print the entry
try writer.print("{s}{s}", .{ prefix, connector });
switch (entry.kind) {
.directory => {
try writer.print("{s}/\n", .{entry.name});
stats.dirs += 1;
// recurse
const new_prefix = try std.fmt.allocPrint(allocator, "{s}{s}", .{ prefix, extension });
defer allocator.free(new_prefix);
var sub = dir.openDir(entry.name, .{ .iterate = true }) catch {
continue; // skip inaccessible directories
};
defer sub.close();
try printTree(sub, new_prefix, config, stats, depth + 1, allocator, writer);
},
.sym_link => {
var buf: [std.fs.max_path_bytes]u8 = undefined;
const target = dir.readLink(entry.name, &buf) catch "???";
try writer.print("{s} -> {s}\n", .{ entry.name, target });
stats.symlinks += 1;
},
else => {
if (config.show_size) {
const stat = dir.statFile(entry.name) catch null;
if (stat) |s| {
stats.total_size += s.size;
try writer.print("{s} [{d} bytes]\n", .{ entry.name, s.size });
} else {
try writer.print("{s}\n", .{entry.name});
}
} else {
try writer.print("{s}\n", .{entry.name});
}
stats.files += 1;
},
}
}
}
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer {
const check = gpa.deinit();
if (check == .leak) @panic("memory leak detected");
}
const allocator = gpa.allocator();
const args = try std.process.argsAlloc(allocator);
defer std.process.argsFree(allocator, args);
const root_path = if (args.len > 1) args[1] else ".";
// parse flags
var config = TreeConfig{};
for (args[1..]) |arg| {
if (std.mem.eql(u8, arg, "-a")) config.show_hidden = true;
if (std.mem.eql(u8, arg, "-s")) config.show_size = true;
if (std.mem.eql(u8, arg, "-d")) config.dirs_only = true;
}
const stdout = std.io.getStdOut().writer();
try stdout.print("{s}\n", .{root_path});
var dir = try std.fs.cwd().openDir(root_path, .{ .iterate = true });
defer dir.close();
var stats = TreeStats{};
try printTree(dir, "", config, &stats, 0, allocator, stdout);
try stdout.print("\n{d} directories, {d} files", .{ stats.dirs, stats.files });
if (stats.symlinks > 0) {
try stdout.print(", {d} symlinks", .{stats.symlinks});
}
if (config.show_size) {
try stdout.print(", {d} bytes total", .{stats.total_size});
}
try stdout.print("\n", .{});
}
The tree-drawing characters (those Unicode box-drawing symbols) are the visual magic. The connector is either a T-junction for middle entries or an L-junction for the last entry. The extension is either a vertical bar (for entries that have siblings below) or spaces (for the last entry). This recursive indentation pattern is what creates the tree structure in the output. The UTF-8 bytes are hardcoded rather than using Unicode escapes because Zig strings are raw bytes -- \xe2\x94\x82 is the UTF-8 encoding of the box-drawing vertical line character.
The sorting puts directories before files, then alphabetical within each group. This matches how tree and most file managers display things -- directories at the top, files at the bottom. The std.mem.sort function takes a comparison function as a parameter, which is the same pattern we'd use in C's qsort but type-safe.
The printTree function is recursive but bounded by config.max_depth. Each recursive call opens a subdirectory handle, iterates its contents, and closes it when done. We're NOT accumulating all entries in memory like the DirectoryWalker earlier -- instead we print as we go, which means this works fine on enormous directory trees that wouldn't fit in memory. The tradeoff is that you can't sort the entire tree globally, only within each directory level. But that's how the real tree command works too.
One performance note: opening and closing a directory handle for every subdirectory involves syscalls, and on deep trees with thousands of directories that adds up. The real tree command uses the fts family of functions (or nftw on some systems) which are optimized for recursive traversal. Our version is simpler but plenty fast for normal use.
Extend the DirectoryWalker to skip directories that contain a .ignore file. When the walker encounters a directory, it should check for the presense of a .ignore file inside it, and if found, skip that entire subtree. Write a test that creates a directory tree with .ignore in one branch and verifies that branch is excluded.
Write a findFiles function that takes a directory path and a glob pattern (like "*.zig" or "test_*") and returns all matching files recursively. Implement basic glob matching yourself (just * wildcards at the start and end, no regex). Test it by creating temporary files with various names and verifying the correct ones are returned.
Build a du (disk usage) command that walks a directory tree and prints the total size of each subdirectory, sorted from largest to smallest. Format sizes in human-readable units (bytes, KB, MB). Test it with a known directory structure where you've written files of specific sizes.
std.fs.cwd().openDir() with .iterate = true gives you a directory iterator that lazily yields entries -- names and types without loading everything into memorystatFile returns metadata (size, timestamps, permissions) and timestamps use nanosecond precision since the Unix epochreadLink to find their target -- most operations follow symlinks transparently, so detecting and inspecting them requires explicit codemakeDir fails on existing directories (be explicit) while makePath creates the full chain silently -- different operations for diferent needstree command combines directory iteration, sorting, recursive descent, and UTF-8 box-drawing characters into a practical tool that demonstrates real-world filesystem programmingWe started a new arc today -- low-level OS interaction from Zig. File systems are just the beginning. The concepts here (handles, iterators, error handling on every operation, graceful degradation) apply to all OS-level programming. Working directly with the file system also forces you to think about things higher-level languages hide from you: when memory for names gets freed, what happens if a file disapears between checking and opening, how permissions interact with process identity.
Thanks for reading!