Meta ∇ 13

No One:
Me: I Replaced Nginx With Rust

publié sur 2024-08-17 16 min de lecture

My fellow Rustaceans (and non-believers),

You may already be aware that in the past month I've been on a warpath agressively optimising this blog. You can look at the past posts in this "Meta" series if this is news to you or you need a refresher.

But it's not just optimisations I've been trying to do. It's also general code improvements; getting rid of things that make me a little "uncomfortable."

And you know what makes me really uncomfortable?

JavaScript.

This Svelte version of the blog was made before Svelte supported TypeScript, so it goes that this blog was written almost entirely in plain, un-typed JavaScript.

That is a big no-no for me these days to say the least. And was also a no-no for those days as well actually, but I couldn't do anything about it back then.

Unsurprisingly, a complete conversion to TypeScript didn't take any longer than a day which leaves me wondering why I didn't do it earlier but it is what it is. There isn't much else to say about this so let's move on.

So then, what's next in the "makes me uncomfortable" list?

TypeScript.

More specifically, the remainder of the code written "almost entirely in plain, un-typed JavaScript." More specifically, the njs script for nginx that I wrote in TypeScript.

I've covered what I use njs for several times before on this blog but to put it simply, it's mostly been a way to handle dynamic elements of the blog without using client-side JavaScript by injecting HTML into the page as it's served to you from nginx.

I don't like how it's written in JavaScript. I know they use a special subset of JavaScript to keep performance, but even with the added static types in TypeScript, it's still JavaScript. The idea that my web server is running JavaScript is just… gross.

It makes me uncomfortable.

There are Rust bindings for nginx which is great, but another reason for njs making me uncomfortable is that I don't really understand nginx. A while ago I found out my njs script was very slightly broken in weird ways that stem from me not understanding how nginx requests and buffers work.

You could consider this a skill issue. And that maybe I should just learn how to write nginx extensions properly then. To that I say, "no thanks I can't really be bothered, could you please leave my apartment now?"

Also I didn't write nginx, so do I really know if it sucks or not? I know if I wrote my own web server it wouldn't suck because I don't write sucky code. But I can't give that guarantee to nginx as I haven't done any code reviews for them. Hmm really makes you think, huh?

No but seriously, I would like to have more control over the web server and this is as good as an excuse as any to drop nginx in favour of our crab god, doing my part to further the inevitable carcanisation of the universe and reality itself as we too rewrite ourselves to become blazingly fast and memory safe with fearless concurrency.

First let's get a few benchmarks going, because if it turns out I write sucky code that is like two times less performant than nginx, ⁽¹⁾ ⁽¹⁾ Let's be real, I'm under no illusions that I can write a server faster than nginx. I'll settle for "not that much slower." I'm going to quit my job, give all the equity I own to the Red Panda Network, and sacrifice myself to the ocean where I will become food for Them, the next most useful thing a failed Rust main could be.

This benchmark is for my nginx config as it is now with the njs extension doing its thing.

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    10.54ms  8.58ms   0.04ms   144.78ms
  Requests:
    Total:  16272  Req/Sec: 3253.89
  Transfer:
    Total: 1.51 GB Transfer Rate: 309.51 MB/Sec

16316 Errors: received unexpected message from connection
6 Errors: channel closed

I'm using the benchmarking tool rewrk against the /post route (I wanted a bigger page than /) with -t 12 threads and -c 60 connections for -d 5 seconds. It's a very small benchmark, but good enough for our purposes. ⁽²⁾ ⁽²⁾ Yes, I'm aware of the 16322 errors reported. I'm ignoring them.

For another data point let's look at Static Web Server which is another web server written in Rust. Originally, I wanted to move to something like this instead, but I couldn't find anything that allowed me to extend it like nginx, so this bench is without any of the dynamic processing that normally happens with njs.

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    11.02ms  7.53ms   0.34ms   119.56ms
  Requests:
    Total:  27170  Req/Sec: 5433.98
  Transfer:
    Total: 4.40 MB Transfer Rate: 902.13 KB/Sec

So latencies are a little worse in general, but requests are way up and yet total transfer is way down. This is a little strange and I'm guessing has something to do with all the errors we were getting with the nginx test.

Interesting.

I'm also curious about Caddy which is the latest hotness in the realm of web servers. It doesn't have njs-like extensions as well but it's always good to have another data-point.

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    12.71ms  8.96ms   1.00ms   83.71ms
  Requests:
    Total:  23556  Req/Sec: 4711.02
  Transfer:
    Total: 2.30 GB Transfer Rate: 471.36 MB/Sec

Honestly, that's better than what I was expecting from something written in Go. I know many people call Caddy "fast" but that's usually from [SLUR REDACTED] gophers who don't know how to write performant code or understand why a language that ignores decades of progress in programming language research isn't actually that fast, so I always assumed they meant "faster than NodeJS" which is not a very high bar to clear.

So the fact that Caddy is this close to the others is commendable. They have done a good job here.

Now let's look at nginx without the njs extension to better compare with these two.

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    3.24ms   0.93ms   0.98ms   17.19ms
  Requests:
    Total:  92561  Req/Sec: 18513.12
  Transfer:
    Total: 8.56 GB Transfer Rate: 1.71 GB/Sec

60 Errors: connection closed

Wait a minute, what the fuck is this? Are you seeing this chat? Holy shit.

Why is this so much faster then the others? I thought njs wasn't supposed to have that big of an impact on performance. What the heck?? There must be yet more I'm doing wrong with njs. But again, can't really be bothered to figure it out so I'm going to move on from this.

Right, now that we have our benchmarks let's start work on our new web server (I mean it's entirely mine, but I want you to feel included). Here's the absolute bare minimum we need to get the blog running on axum, a popular web framework for Rust.

use axum::Router;
use std::net::SocketAddr;
use tower_http::services::ServeDir;

#[tokio::main]
async fn main() {
    axum::serve(
        tokio::net::TcpListener::bind(SocketAddr::from(([127, 0, 0, 1], 80)))
            .await
            .unwrap(),
        Router::new().nest_service("/", ServeDir::new("../build")),
    )
    .await
    .unwrap();
}

And the benchmark.

$ rewrk -h http://127.0.0.1/post.html -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post.html for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    2.76ms   1.26ms   0.19ms   17.59ms
  Requests:
    Total: 108650  Req/Sec: 21729.41
  Transfer:
    Total: 10.69 GB Transfer Rate: 2.14 GB/Sec

I had to specify the route as /post.html instead of /post because the server doesn't do any URL cleaning. Now is this extra work that axum didn't have to do giving it a small unfair advantage? Probably. But it also probably doesn't matter that much.

Anyway, this is pretty good but I'm half disappointed when you consider that adding to it is only going to make it worse. It kinda makes me want to shop around for a faster framework like ntex but remember, we're not here to beat nginx, we're here to beat (or at least not be too far from) nginx with njs and that's looking very achievable.

So here's a non-exhaustive list of things we need to add to be feature equivalent to what I configure nginx for:

HTTPS/2
Cache headers
Gzip/Brotli compression
Rate limiting
URL cleaning
Keepalive timeout
Dynamic content (njs replacement)

Thankfully this isn't actually very much work because most of these are avaliable as Tower middleware I can plug into axum. I'm not going to do a deep dive into the code because it's not that interesting. You can go see the entire 500 lines of nbymx (that's its name) for yourself here.

Now what's the benchmark looking like?

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    4.52ms   2.24ms   0.40ms   50.89ms
  Requests:
    Total:  66282  Req/Sec: 13255.76
  Transfer:
    Total: 6.55 GB Transfer Rate: 1.31 GB/Sec

Huh, that's a lot better than I thought it would be.

I assumed I would get similar results to Static Web Server since they're both built on hyper, ⁽³⁾ ⁽³⁾ axum is a light wrapper around hyper. which is why I was in retrospect not aiming very high by just wanting to be near in performance to njs. But to be over twice as performant (in this very small and specific benchmark)? I guess this is what I meant by saying I can't trust other people to write code that isn't sucky.

Ok that's enough, I need to come clean to you about something. I lied about not having anything to say about the code in nbymx. I just wanted you to look at it first, that way you'll be more used to seeing Rust code and thus more willing to accept Them into your heart.

I put more thought into the cache control headers. If HTML files are cached like they were before with nginx you won't be getting new assets even if they're cache busted because the links to all the other assets wouldn't have changed in the cached HTML. Now I set HTML to no-cache which makes the browser validate them with nbymx before using the cache.

The server validates the browser cache with the ETag header, ⁽⁴⁾ ⁽⁴⁾ Real quaso.engineering fans will remember when I didn't know what the fuck an "ETag" was. which is sent in the response for files. This is normally a hash of the file, but can really be anything. I tried to use etag crate to generate ETag as hashes from the file but that added around a millisecond of latency, which was unacceptable. The ETag is just the content length now. I know nginx uses content length and last modified time, but I don't think there will be many situations where I change a page and it stays the same size so it's fine.

It was also interesting to get TLS implemented. I know TLS is supposed to be a protocol that just runs on top of any transport (in a different layer of the OSI model), but I never properly understood that it really is just putting it on top of the transport.

I had to go a lower level API in hyper for HTTPS connections, but the code here basically is just, "do the tls handshake" then "do the http request."

let Ok(stream) = tls_acceptor.accept(cnx).await else {
    error!("error during tls handshake connection from {}", addr);
    return;
};

let ret = hyper_util::server::conn::auto::Builder::new(TokioExecutor::new())
    .serve_connection_with_upgrades(TokioIo::new(stream), hyper_service)
    .await;

I dunno, for some reason I always imagined it being a bit more involved then this even with tls_acceptor doing all the heavy lifting by actually encyrpting it.

Next, I thought it would be cool to add Prometheus metrics to the server. They're publically accessible so you can see the metrics (for your b8s node) at /metrics. I plan to collect and display these in some way on the status page.

I guess this technically counts as analytics, so it seems like I've finally added them. I mean it's no big deal, I didn't actually set out to have a "no analytics" rule and wouldn't have had a problem with just parsing nginx logs or something, ⁽⁵⁾ ⁽⁵⁾ I am not adding any unnecssary client-side JavaScript to this site, so analytics would always need to come from the server. Also when doing it this way, there's nothing you can do to stop it :) You can also see all the 404s that the robots keep trying to access. I just wasn't bothered to do it before.

I realise that's becoming a common theme on this blog now.

Epilogue

I've done a bit more research on njs to find answers on why my module degragded performance by so much. And by research I mean I just watched the talk "NGINX JavaScript in Your Web Server Configuration" which goes through everything.

The section I was most interested was this slide called "Why is njs fast?"

Register based VM.
- Small memory footprint.
UTF8 strings, bytes string optimizations.
- ECMASscript spec require UTF-16.
Disabled garbage collection.
- Instead cloned VM is destroyed at once.

So it looks like njs is interpretered in a similar way to Python where it compiles bytecode to be read as a VM. I think the general design makes a lot of sense, keep a strict subset to reduce overhead and tailor it to the nginx runtime.

Then the problem was certainly with me and not njs right? I mean, it seems like njs would start falling apart when you start trying to do more complicated things (which is why there are plans to introduce an alternative JS engine), but I was not doing anything complicated, it was literally just a find and replace.

For reference, removing the same thing in nybmx does not provide a noticable improvement. It is not a computationally expensive thing to do.

Fine, let's do a little bit more debugging. How about we instead benchmark it with one of the njs examples nginx gives instead of what I was doing. How about this function to convert the body to lowercase?

function to_lower_case(r, data, flags) {
  r.sendBuffer(data.toLowerCase(), flags);
}

export default { to_lower_case };

That's the entire script. Literally 4 lines. This has to run well, surely nginx themselves wouldn't provide a bad example.

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    13.76ms  6.49ms   2.42ms   61.57ms
  Requests:
    Total:  21750  Req/Sec: 4349.83
  Transfer:
    Total: 2.13 GB Transfer Rate: 435.91 MB/Sec

WHAT!? WHY IS IT STILL SHIT AND HOW ARE THE LATENCIES WORSE????

That must be a mistake, let's do it again.

$ rewrk -h http://127.0.0.1/post -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    13.94ms  7.05ms   3.03ms   69.99ms
  Requests:
    Total:  21482  Req/Sec: 4296.23
  Transfer:
    Total: 2.10 GB Transfer Rate: 430.57 MB/Sec

I give up. There aren't any errors for me to look at and say I did something wrong. I guess it just sucks.

So then it's confirmed for real; everyone writes shit code except for me.

Update 2024-08-18: A reader thought I should have benchmarked Go's standard library HTTP server as well. I think that's a great idea, punching down is always fun to do.

$ rewrk -h http://127.0.0.1/post.html -t 12 -c 60 -d 5s
Beginning round 1...
Benchmarking 60 connections @ http://127.0.0.1/post.html for 5 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    1.77ms   2.40ms   0.06ms   45.05ms
  Requests:
    Total: 169711  Req/Sec: 33940.34
  Transfer:
    Total: 16.70 GB Transfer Rate: 3.34 GB/Sec

Uhhhhhhhhh……

live kafka reaction

You know these benchmarks are genuinely pretty small and not at all indicative of real load for high traffic. Let's really fucking stress test it and bump it up to 200 connections on 30 threads for 30 seconds.

Keep in mind that I've had to move the rewrk command to run on my Linux server because it now goes over the open file limit in MacOS. The servers are still running on the same M2 MacBook but the requests are going over a LAN instead of being local, which would have big affect on the results. ⁽⁶⁾ ⁽⁶⁾ The same test of nbymx with 12 threads and 60 connections for 5 seconds from the Linux server gives around 250ms of average latency, which is roughly a 5000% increase.

$ rewrk -h http://10.0.0.4/post.html -t 30 -c 200 -d 30s
Beginning round 1...
Benchmarking 200 connections @ http://10.0.0.4/post.html for 30 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    1542.36ms  1274.95ms  100.07ms  10176.23ms
  Requests:
    Total:  3661   Req/Sec: 122.03
  Transfer:
    Total: 377.36 MB Transfer Rate: 12.58 MB/Sec

Okay, now we just need to bench nbymx again for a proper comparison. Fingers crosse- I mean, this is a completely unbiased benchmark, I am taking the neutral posistion and not rooting for any server.

$ rewrk -h http://10.0.0.4/post -t 30 -c 200 -d 30s
Beginning round 1...
Benchmarking 200 connections @ http://10.0.0.4/post for 30 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    1095.26ms  996.47ms  64.41ms  9934.41ms
  Requests:
    Total:  5238   Req/Sec: 174.59
  Transfer:
    Total: 538.67 MB Transfer Rate: 17.95 MB/Sec

Oh thank fuck. I love benchmarking.

For funsies, let's look at Static Web Server and Caddy again (I am not bothered to setup nginx once more).

This is Static Web Server.

$ rewrk -h http://10.0.0.4/post -t 30 -c 200 -d 30s
Beginning round 1...
Benchmarking 200 connections @ http://10.0.0.4/post for 30 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    1454.31ms  1442.76ms  68.78ms  12660.38ms
  Requests:
    Total:  3867   Req/Sec: 128.89
  Transfer:
    Total: 398.24 MB Transfer Rate: 13.27 MB/Sec

And this is Caddy.

$ rewrk -h http://10.0.0.4/post -t 30 -c 200 -d 30s
Beginning round 1...
Benchmarking 200 connections @ http://10.0.0.4/post for 30 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    1579.78ms  1506.73ms  81.18ms  11136.93ms
  Requests:
    Total:  3567   Req/Sec: 118.89
  Transfer:
    Total: 368.27 MB Transfer Rate: 12.28 MB/Sec

Phew… we had a little scare there, but everything is in it's right place once again and the Go programs are exactly where they should be.

At the bottom of the benchmarks.

Meta ∇ 13

No One:Me: I Replaced Nginx With Rust

Epilogue

No One:
Me: I Replaced Nginx With Rust