Meta ∇ 14

There Is No Greater Joy In Life Than Writing Distributed Systems

publié sur 2025-08-06 12 min de lecture

Hello friends.

It's happened again.

I've once more rebuilt all the infrastructure on this blog.

You may remember me saying this a little while ago in a previous post.

Next I want to figure out what I'm doing with deployment infrastructure. The obvious answer is Kubernetes and Pulumi, but I think it might not be too hard to edit b8s into something that works for Numby. And if that's the case, I may as well properly productionise it and make it more general purpose for other- and oh my Bidoof I'm just thinking about turning it into Kubernetes. Let's not do that.

And well, the thing is…

I have NOT done that.

Makiatto is a CDN purely for serving static files. It is NOT Kubernetes, I will NEVER try to make Kubernetes. This blog does NOT need to run on ANYTHING like Kubernetes. ⁽¹⁾ ⁽¹⁾ Please forget that time when it was running on Kubernetes years ago. And neither does Numby, because Numby is dead (I'm sorry you had to find out this way).

So what does Makiatto do that b8s didn't?

Well for starters, it isn't hardcoded to deploy my blog and is configurable with all kinds of domains and content you can serve from it.

Yes! That means YOU can use it!!!

But should YOU use it???

I dunno, if you want I guess.

I've also removed the Docker requirement, and now we just upload the files directly to the server instead of using Docker Hub as a useless middleman.

The trade-off to this is that there is no way to get "server-side" dynamic content, as I've had to drop nbymx. But don't worry, I have a plan to add it back which I will explain later.

Another change is removing the central control plan that was the dom daemon. If you remember, in b8s this was the daemon that ran on a separate server to all the nodes and was responsible for their orchestration, updating certificates, health checking (if I ever added it), and whatever else you'd expect.

Now the network is completely decentralised, mostly thanks to Corrosion a CRDT-based distributed SQLite database which allows us to easily share eventually consistent state between nodes.

It works very well, but truth be told, I'm not actually super happy with how it's being used right now. Makiatto actually embeds Corrosion, although I needed to fork it so it could run on the latest Rust version and update a few dependencies (to avoid conflicts).

The issue is that we're still "communicating" with Corrosion like it's an external process - calling its HTTP API to update and query data. This seems suboptimal, considering we should have access to its internals.

However, I think I tried doing that initially and I couldn't find a way to get it to work, which I guess is because Corrosion isn't really designed to be embedded. So perhaps my fork will need more work - and that would give me an excuse to trim some fat off it, and update the rest of the dependencies (because holy shit are these versions old).

And that's basically all the differences (in terms of features) to b8s. I know it doesn't seem like much, but b8s was really small. That thing is 2 thousand lines of code that was written in a little over a weekend, Makiatto took a month and is currently at 12 thousand lines (with a clear path to gain a lot more).

As it turns out, decentralised distributed systems are actually kinda hard to write. Who would have guessed?

They're also really fucking hard to test.

I've mostly focused on integration and e2e tests that revolve around Docker containers managed by the testcontainers library. Each test spins up isolated environments with their own networking, file systems, and database instances. This allows tests to run in parallel without interfering with each other and gives us a lot of control over the context and environment that each container runs in, which lets us closely simulate how Makiatto would behave in production across multiple servers.

But this isn't enough, not even close.

Distributed systems are inherently non-deterministic. Unlike a single-process application where function calls happen in a predictable sequence, distributed systems involve multiple independent processes communicating over unreliable networks. Messages can arrive out of order, be delayed, or get lost entirely. A test that passes on your local machine might fail in CI simply because network latency was slightly different, making it extremely difficult to write reliable, reproducible tests.

As a related note, a very strange thing that has been happening to me while writing Makiatto is that tests are extremely flaky on my machine, with a few tests having to do several retries to pass, but not flaky at all in CI, where a test will rarely need to retry more than once. I don't know what to think of this, it's pretty much unheard of to me for tests to be flaky locally but not in CI. And I have no fucking idea why it's happening.

But back on topic - partial failures are the norm in distributed systems, not the exception. What happens when two of three nodes can communicate, but the third is network-partitioned? How does the system behave when a node crashes mid-replication? These failure modes are difficult to test because they require precise control over network conditions and process lifecycles. Makiatto's container-based approach helps by allowing tests to kill containers or manipulate network rules, but even then, covering all possible failure combinations is practically impossible.

Debugging failures is also pretty difficult. When a test fails, the relevant information is scattered across multiple containers' logs, database states, and network traces. A simple assertion failure might require correlating timestamps across different nodes to understand the sequence of events that led to the failure.

You have no idea how many times I've had to add a sleep(1000000000) to a test so I could go and inspect the Docker containers to find out what state they're in and why. And then after I got the information I have to SIGTERM the test (unless I want to wait 31.709792 years for it to stop) which means the Rust destructor ⁽²⁾ ⁽²⁾ When the context goes out of scope - Rust's `Drop` trait ensures all Docker containers are stopped and removed. This eliminates manual cleanup code and prevents resource leaks from orphaned containers. This is super useful for these tests. Another big win for the Rust crabs. for the context won't run, and I have to manually clean up everything, and ARRRGGHH SO ANNOYING. I HATE TESTS WHY CAN'T THINGS JUST WORK.

But I guess this is what Anthithesis is for. If only I was a popular cloud computing company like Fly.io, who can afford this service for their distributed database, and not just a mere sole developer who can only afford to rent 5 servers to put their blog on ;(

Though now that I think about it, maybe I don't want all these problems solved for me. Yes, all these failure modes are the most frustrating shit I've ever banged my head against, but you know, Bidoof gives His most difficult challenges to His strongest warriors.

I spent an entire day reading and thinking about consensus algorithms instead of actually doing my job, only to go home and write something that was completely overengineered and did not work at all in the slightest. That was the most fun I've had since I got Covid and wrote the DNS server that led me down this path two years ago. ⁽³⁾ ⁽³⁾ Oh yeah, that's a little lore for you that I didn't mention in that initial post. I got Covid in the middle of my holiday.

There's just something deeply compelling in getting these machines to work as a cohesive whole. When you see nodes find each other almost by themselves, start communicating, and replicating data across continents - I still can't really believe it works. Even though I wrote the fucking thing, it kinda feels like magic.

The CAP theorem tells us that distributed systems can only guarantee two of three properties: Consistency (all nodes see the same data), Availability (the system stays operational), and Partition tolerance (surviving network splits). But in practice, it's even messier because performance is also part of the equation. Every architectural decision forces you to balance these competing concerns.

Even simple decisions like whether to verify content hashes on every file request forces you to choose between correctness (check every time), performance (trust the cache), or complexity (periodic background verification). These aren't bugs to fix but fundamental trade-offs where every choice sacrifices something - strong consistency means waiting for all nodes, high availability means accepting temporary inconsistencies, and maximum performance often means relaxing guarantees about data freshness.

This complexity means there's always something new to explore and optimise. There's no "correct" answer - only choices that align better or worse with your specific requirements.

This is everything I want from programming.

So what's next for Makiatto?

Well I really need to write docs for it, it needs a few more CLI commands to help with managing nodes, and I certainly need to improve its resilience in node failure scenarios. But in terms of big features, I really want to have a way to get my dynamic content back.

And I already have a plan to achieve this by adding WebAssembly extensions. So it'll be like nginx JavaScript modules but with WebAssembly instead of JavaScript.

In fact, I want to go even further and let you upload and run WebAssembly through Makiatto, like Cloudflare Workers. I think that'd be sick, it's like a self-hosted ~~cgi-bin~~ serverless solution.

Epilogue

I want to let you in a secret not many people know about me.

I used to like Go.

In fact, I'd say I've liked all the languages in my "main" four ⁽⁴⁾ ⁽⁴⁾ Main being defined by whether I feel comfortable writing software for other people in said language while being paid my standard rate for it. - being Python, JavaScript, Go, and Rust. I mean, there's a reason I've learnt them to the degree that I have.

But pretty much since the start of my professional career, my approval for them has been declining - admittedly some faster than others (looking at you JavaScript).

The one exception is Rust.

And I've been thinking about that. I think it makes sense for there to not be one "best" language, and that you should use the best tool for the job. But I can't reconcile that with the fact that in almost every situation I can think of, I would always choose Rust.

Out of the last 3 years there has only been one time that I have chosen a different language for my job that didn't have the usual external pressures that would make me not choose Rust.

It was an 8pm task (with an 11pm deadline because my friends wanted to play Overwatch) to quickly create a prototype involving a lot of concurrent streaming. Go was the obvious choice. Then the following day I rewrote said prototype into TypeScript because the same external pressures that would make me not choose Rust also apply to Go.

But the point still stands. It really struck me - that was the first time in many years when I. Did. Not. Choose. Rust.

Why?

If Rust really is the best language that can and should be used for basically everything, why isn't everyone else using it? Are they stupid?

Now believe it or not, I don't really want to think everyone is stupid, ⁽⁵⁾ ⁽⁵⁾ I know, that probably is hard to believe, but just trust me. so maybe there's some other reason.

Since Rust is the newest language to me out of the "main" four, maybe it's just recency bias, and I'm still in the honeymoon phase. But I'm not convinced on that one, I started learning Rust in 2017 - that's plenty of time to fall out of love with it and is certainly a lot longer than what I've had with the other languages.

In fact my Rust usage has only been going up. I only like it more the more I learn about. Even as recently as the b8s post I said this in the epilogue:

But a website is like the one place I don't think Rust is that good for, I've tried it plenty of times and have never been particularly impressed.

I'll have you know that Numby actually changed that position. I would absolutely use Rust for a website now. I mean, I still probably won't rewrite this blog in Rust - I have much more interesting problems to solve (see what this post is about), but any future websites (that don't have previously stated external pressures) I have to make will most likely be in Rust.

Although, I don't really enjoy making websites Rust or not, so hopefully that situation never comes.

But yeah, I just don't get it. This obsession with Rust feels irrational to me, and I am somewhat concerned I can't rationalise my way out of it. Like don't get me wrong, there are things I've come to dislike about Rust. But I'm also for some reason a lot more willing to forgive Rust for its problems than I am with other languages.

Why?

Like am I in a cult? That seems to be how other people see the Rust community. But I don't even really interact with the Rust community, I would not consider myself part of it anymore than I am a member of the Go or Python community.

I don't fucking get it dude.

I dunno what to do… ;(

Meta ∇ 14