Scaling OBJ Parsing in Zig: Streaming, Allocators, and Web Integration

Context: Personal Zig study (obj-parser repo). Built to understand memory management outside JavaScript; not shipped in any production renderer.
AI assist: ChatGPT/Copilot provided starter snippets (allocators, WGSL integration). I audited every line before committing.
Status: Benchmarks come from my M2 laptop and sample OBJ files. Treat them as directional, not formal performance claims.

Reality snapshot

Goal: parse multi-megabyte OBJ files without loading them entirely into memory, then stream the vertices directly into a browser renderer.
Approach: streaming I/O + allocator mix (Arena, GeneralPurposeAllocator, FixedBufferAllocator) + binary export for PixiJS.
Limitations: Handles ~50 MB files comfortably. Anything bigger exposes TODOs (MTL parsing, normals, materials).

Architecture

obj-parser/
├── src/
│   ├── main.zig
│   ├── parser.zig
│   └── exporter.zig
├── tests/
│   └── parser_test.zig
├── examples/
│   ├── cube.obj
│   └── teapot.obj
└── web/
    └── preview.js

parser.zig – streaming reader + line processor.
exporter.zig – writes a compact binary format (vertex count + raw floats) so the web client can map it into typed arrays.
web/preview.js – PixiJS viewer to prove the data actually renders.

Streaming & allocators

pub fn parse(allocator: std.mem.Allocator, reader: anytype) !Model {
    var buffered = std.io.bufferedReader(reader);
    var stream = buffered.reader();
    var model = Model.init(allocator);

    var line: [1024]u8 = undefined;
    while (try stream.readUntilDelimiterOrEof(&line, '\n')) |slice| {
        try model.processLine(slice);
    }

    return model;
}

Reads one line at a time, reducing peak memory ~90% versus slurping the whole file.
errdefer ensures partially built models release memory when parsing fails.
Arena allocator handles temporary buffers; GPA stores final vertex/face arrays; FixedBufferAllocator tokenizes hot loops.

Binary hand-off to the browser

pub fn exportBinary(model: Model, writer: anytype) !void {
    try writer.writeInt(u32, model.vertices.len, .Little);
    for (model.vertices) |v| {
        try writer.writeAll(std.mem.asBytes(&v));
    }
}

const response = await fetch("model.bin");
const buffer = await response.arrayBuffer();
const count = new DataView(buffer).getUint32(0, true);
const vertices = new Float32Array(buffer, 4, count * 3);

Zero-copy: the browser maps the ArrayBuffer directly. No JSON, no duplicate arrays.
Endianness is explicit (Little), so the JS side knows what to expect.

Benchmarks (anecdotal)

50 MB OBJ → ~150 ms parse on an M2 Max, ~5 MB peak memory.
JavaScript baseline (old parser) → ~800 ms + 90 MB peak memory.
zig test + fuzz cases cover malformed vertices/faces; coverage reports hover around 90%. Failures trigger bug tickets before I merge.

Lessons learned

Allocator choice is architecture. Arenas speed up temp allocations; GPA’s leak detection saved me countless times.
Streaming keeps memory predictable but forces careful buffer sizing. I clamp line length and surface useful errors when models exceed limits.
Interop demands rigor: typed arrays, endianness, and struct packing must line up or PixiJS draws garbage.
Zig isn’t “faster Rust”; it just gives me explicit control, which fit this problem well.

TODOs

Parse MTL files + textures. Right now it’s vertices/faces only.
Compile to WebAssembly so the parsing can happen in-browser (bye-bye network trip).
Better logging and benchmark automation (currently rely on scripts + manual recording).
Document a step-by-step tutorial so other Zig learners can follow along.

Scaling OBJ Parsing in Zig: Streaming, Allocators, and Web Integration

Reality snapshot

Architecture

Streaming & allocators

Binary hand-off to the browser

Benchmarks (anecdotal)

Lessons learned

TODOs

Links

References