Actix Web: Optimization Amongst Optimizations

drenvuk · on Jan 5, 2020

The reason the rust framework is winning on the fortunes is because of the ecosystem more so than the framework itself.

For instance, any database benchmark is going to favor frameworks that have a library that implements pipelining and asynchronicity. The fact that you're not allowed to modify the library unless you want your framework to be labeled a "raw" benchmark increases the benchmarks worth but doesn't come without caveats. The implementer of h20 which is the c server framework/library just below actix in that chart has mentioned something related to this:

https://github.com/TechEmpower/FrameworkBenchmarks/tree/mast...

So rust is "production grade" because its official library handles it for the framework separately. I LOVE these benchmarks because it points out the differences in each ecosystem when you're trying to find the fastest thing. I wouldn't have found a bunch of interesting projects without it.

If you want to really know about the raw speed, just look at the plaintext bench. The top frameworks are all being limited by the 40(!) gbps connection these guys are running. After picking any one of them the code you write would be the limiting factor.

bhauer · on Jan 5, 2020

Great write-up, brandur!

Actix has definitely deployed a broad suite of optimizations. When we posted Round 17 of our Framework Benchmarks, we wrote [1] about the stratified Fortunes results and attributed it in large part to our agreement to support the pipelining feature of PostgreSQL's protocol.

We had a conversation with the community about pipelined Postgres [2] when it was initially brought to our attention. Initially, we were opposed because it had a smell of cheating. However, after the dialog on the mailing list and several internal conversations, we remembered that one of the reasons we created the project was to encourage friendly competition, yield higher-performance platforms and frameworks, and ultimately benefit the application developers who use those frameworks. In that light, this form of optimization is exactly what we wanted to see (whether or not its genesis actually traces to this project). We want to see such creative ways to reduce the overhead of frameworks, leaving more CPU capacity to the discretion of the application developer.

As long as they continue to live up to the spirit of our tests, of course. We believed this did live up to the spirit of our tests because these optimizations did not require the application developer to do anything to reap the benefit--they just run queries as usual.

And then there's that SIMD HTML escaping, which I admit kinda blew my mind.

[1] https://www.techempower.com/blog/2018/10/30/framework-benchm...

[2] https://groups.google.com/d/topic/framework-benchmarks/Kbd2N...

pornel · on Jan 5, 2020

I see it also as a success story for Cargo. Actix can be optimized from top to bottom, without being a monolithic framework. It doesn't have to reinvent and optimize every component itself.

The SSE-accelerated escaping, postgres driver, and state-of-the-art hash tables are all packages that can be used without Actix. And Actix can take advantage of them without undue runtime overhead (thanks to generics+inlining) and without burdening users with build/install complexity (it depends on over 140 packages, and Cargo seamlessly takes care of that).

JoshMcguigan · on Jan 5, 2020

> On startup, 128 request and response objects are pre-allocated in per-thread pools. As a request comes in, one of them is checked out, populated, and handed off to user code. When that code finishes with it and the object is going out of scope, Drop kicks in and checks it back into the pool for reuse. New objects are allocated only when pools are exhausted, thereby saving many needless memory allocations and deallocations as a web server handles normal traffic load.

As someone not very familiar with systems programming, I would have expected this is an optimization that would already happen automatically within the system memory allocator. Is that not the case?

fafhrd91 · on Jan 5, 2020

You still need to initialize inner structures and buffers. Cache allows to avoid that.

JoshMcguigan · on Jan 5, 2020

Thanks for the response.

So it sounds like it is not so much the allocation/de-allocation savings that matters, but rather that you get to skip initializing the memory because it is already known to be a valid instance of a given struct type?

jlokier · on Jan 6, 2020

Yes, that.

See also SLAB allocation, used in many OS kernels: https://en.wikipedia.org/wiki/Slab_allocation

anp · on Jan 5, 2020

I enjoyed this quite a bit. I had read Nikolai's comment back when it was linked in a few discussions and felt familiar with the techniques, but it's really nice to have a detailed catalog with explanations and links :D.

nit re:

    Although designed to help speed up the parsing of 
    XML documents, it turns out they’re also useful for 
    optimizing web-related features.

HTML is syntactically very close to XML in a bunch of ways (cf XHTML's attempt to unify them IIRC?), so it seems almost exactly like the intended use of the instructions based on this description.

iknowstuff · on Jan 5, 2020

Nice write-up. Glad to see 2.0 performing just as well.

darksaints · on Jan 6, 2020

Is anyone else just completely flabbergasted at the performance levels we're seeing at the top end of these benchmarks? 880k requests per second, with each request handling a database query? 7 million requests per second without database access? All on a single 16 core server? I don't even know what to think anymore.

The_rationalist · on Jan 5, 2020

Beware, actix web has been caught cheating at this benchmarck.

Cf: https://64.github.io/actix/

dpc_pw · on Jan 5, 2020

Can you point in more detail where was the cheating?

What you link to talks over the great `unsafe` drama, but that's not cheating. `unsafe` is in Rust for a good reason. Every C and C++ codebase is effectively one big `unsafe` block from the perspective of Rust developer, and we don't think C/C++ are cheating. Rust devs frown upon overusing `usafe` for a good reason: it leads to obscure bugs and security issues. But if you use it correctly, in place that is critical for your performance - go for it.

Having said that - a lot of frameworks in that benchmark are over-optimizing in ways that no real practical app would do and arguably are cheating.If there's anything in particular about actix here, please let me know - I'm honestly interested in it.

mwcampbell · on Jan 5, 2020

The post linked in the GP discusses cheating in this section: https://64.github.io/actix/#blazingly-fast-or-not

Note that this is for the "raw" version of the plain-text and JSON benchmarks, not the fortune benchmark covered in the OP. But I wonder why that raw version is even there. It certainly isn't "realistic", as it claims to be in benchmark_config.json

gpm · on Jan 6, 2020

Also from that section

> In fairness, it seems other benchmarks are doing these things

Personally, I think it's pretty reasonable to sink to the same level as the competition.