← Back to Episode
DevTools Radio

A tail-call interpreter in (nightly) Rust

April 6, 2026 3:20 Episode 0

Host A: Welcome to DevTools Radio, I'm your host and today we're diving into something that genuinely surprised me when I came across it — a tail-call interpreter written in nightly Rust that actually outperforms hand-written assembly.

Host B: Okay, that's a bold claim right out of the gate. Hand-written assembly is usually the ceiling, not something you beat with higher-level code. What's the context here?

Host A: So the author has been on this long journey emulating a tiny virtual CPU called Uxn — it's a simple stack machine with 256 instructions, used in the Hundred Rabbits creative ecosystem. And they've been chasing performance across multiple iterations.

Host B: Right, and I think the key word there is "simple." A 256-instruction stack machine with 64K of memory — this isn't emulating a modern x86 chip. So what was the original approach?

Host A: The naive approach is just a big match statement in a loop — read an opcode from RAM, dispatch to the right function, repeat. Clean, readable, but the branch predictor hates it because those jumps are basically unpredictable.

Host B: So you've got this hot dispatch loop that the CPU can't get ahead of, and that's your bottleneck. How did they solve it in assembly?

Host A: With a technique called threaded code — you store the VM's entire state in CPU registers, and at the end of each instruction, you jump directly to the next instruction's implementation. The dispatch is spread out, so the branch predictor can actually learn patterns.

Host B: That's clever, but I'm guessing "2000 lines of hand-written assembly" comes with some serious maintenance headaches?

Host A: Massive ones. The author actually introduced an out-of-bounds write in the x86 port that only showed up as a segfault during fuzzing under very specific conditions. So the question became — can we get the same performance without the assembly nightmare?

Host B: And this is where the `become` keyword in nightly Rust comes in, right? Tell me how that works.

Host A: The idea is elegant. You represent VM state as function arguments — which get mapped to registers by the calling convention — and then each instruction ends by calling the next instruction instead of returning. That's a tail call. Without explicit support though, Rust keeps growing the stack until you get a stack overflow.

Host B: So the `become` keyword is basically Rust pinky-promising to the compiler that this should be a branch, not a branch-and-link — no new stack frame, just replace the current one.

Host A: Exactly. One word change, and suddenly you have guaranteed tail-call optimization. The compiler can't just decide to skip it — it's a hard contract. And the result benchmarks faster than both the previous Rust implementation and the hand-coded ARM64 assembly.

Host B: That is genuinely wild. And what I love about this story is it's also a reminder that these nightly features in Rust — things that seem like compiler nerd trivia — can have very real, measurable impact on production-level performance work.

Host A: Totally. The author also notes all the code is human-written, which feels like a gentle callback to some controversy around their previous post where they used an LLM to port assembly. Either way, the result is a high-performance VM that's actually maintainable.

Host B: Which, honestly, is the dream — code that future-you doesn't want to throw into the sun.

Host A: That's a wrap for today's episode of DevTools Radio. If you're working on interpreters, emulators, or just want to geek out on low-level Rust, this writeup is absolutely worth your time — we'll link it in the show notes.

Host B: And if nightly Rust ever stabilizes `become`, remember you heard about it here first. Thanks for listening, everyone — catch you next time.

Listen to This Episode

Prefer to listen? Head back to the episode page for the full audio.