Comparing TCP and QUIC

drewg123 · on Nov 3, 2022

What nobody talks about is the lack of server-side offloads for QUIC. Things like TSO, LRO, and even hardware offloaded kTLS. Without those offloads, I estimate I'd be lucky to get 200Gb/s out of the same Netflix CDN server hardware that can serve TLS-encrypted TCP at over 700Gb/s.

Do the benefits of QUIC really justify the economic and environmental impacts of that kind of loss of inefficiency on the server side?

And yes, I know that some of these offloads are being worked on, but they are not here today.

miohtama · on Nov 3, 2022

A good question. Here is 2021 study:

“Takeaway: We observe that QUIC provides significant improvements over TLS/TCP in low-bandwidth and high-RTT regions for video downloads. QUIC handshakes towards YouTube media servers offer an improvement of 534 ms (IN) and 406 ms (DE) when compared with TLS/TCP. We also observe that the overall download rate for TLS/TCP is higher than QUIC partly due to kernel optimizations such as LRO available for the TCP stack. QUIC provides a better video streaming experience with a lesser number and duration of stall events compared to TLS/TCP. We observe that TLS/TCP exhibits up to 50% longer stall durations compared to QUIC at 50th percentile for high loss networks.“

https://vaibhavbajpai.com/documents/papers/proceedings/quic-...

Also the study discussed about the server-side CPU usage. If you are worried about server-side offloading, you can still stick to TCP/IP until the hardware acceleration and other offloading catches up.

drewg123 · on Nov 3, 2022

I skimmed the paper, but I could not determine what TCP congestion control method they were testing against. Given the stall time, I expect that it was not BBR (which is the congestion control used by QUIC). If you improve TCP's congestion control and can also control the client TCP stack, most of BBR's advantages over TCP disappear.

matsur · on Nov 4, 2022

QUIC doesn't imply BBR as congestion control. eg. Our (Cloudflare's) QUIC implementation supports multiple algorithms. https://blog.cloudflare.com/cubic-and-hystart-support-in-qui...

miohtama · on Nov 4, 2022

Very interesting post! Do you have any sources what is the true packet loss % on different real world Internet connections?

matsur · on Nov 5, 2022

We have some data :) I’ll see if we can get a blog together.

iforgotpassword · on Nov 3, 2022

I'd say for most use cases, TCP will be fine for a long time. QUIC has two things going for it: The Google (web) use-case where you load a bazillion resources for your website in parallel, and want to employ some manually determined prioritization of what should load first, ie. the parts that make the site look complete before some script doing random things in the footer. And ads.

The other advantage is that thanks to being based on UDP, you get a STUNable version of TCP, which is really great for p2p apps like Syncthing.

For making sense in the data center, or for delivering bulk data, QUIC just doesn't have a killer feature really, compared to the ages old battle-tested TCP.

sbierwagen · on Nov 4, 2022

Some of Google's first ASICs were put into their in-house routers, more than a decade ago now. One wonders if they don't already have NICs with QUIC offload.

quietbritishjim · on Nov 3, 2022

A great writeup, but just to take issue (or at least discuss) this one point:

> There may be some performance penalty of shifting the transport code from the kernel to user space

This makes it sound like a kernel has potential for slightly more optimised implementation. But I think it's more than that - the transport code can be completely offloaded from the CPU to the network card/processor. That can only happen if it's abstracted behind syscalls, not written in user space.

jacob019 · on Nov 3, 2022

Here's a relevant discussion from 2021: https://lore.kernel.org/netdev/CAK-6q+id+CJgoSHaMGMs=d1Lr81b...

If QUIC sees wide adoption then it's just a matter of time before the protocol starts moving up stack.

hinkley · on Nov 3, 2022

When I read about eBPF there always seems to be an implication that we could move this logic to the NIC.

I don’t know enough about it to know if that’s aspirational or an active project. But the idea of being able to push code to the “right” side of bottlenecks is a running theme, including in questions of user space versus kernel space. Chatty communication kills throughout and/or latency.

mdaverde · on Nov 3, 2022

The NICs I could find that enable this behavior belong to Netronome:

https://www.netronome.com/products/agilio-cx/

I've been meaning to get my hand on one of these to play around with eBPF offload.

There is also XDP (a subsystem of eBPF) which lets you take action before the packets run up the entire network stack.

hinkley · on Nov 4, 2022

I thought we had a couple options now? Well that sucks then.

Netronome doesn’t print their prices. That’s not a great sign. If you have to ask, you can’t afford them.

iamcalledrob · on Nov 3, 2022

I often see QUIC described as "faster than TCP", but in my experience this has only been the case when it comes to handshake latency.

Throughput-wise, I've found in real-world testing that QUIC is often slower than TCP: (1) QUIC uses more CPU, due to the processing in user-space. 1Gbps required 3 CPU cores. On my 1x-CPU VPS, QUIC maxed out at 400Mbps due to the CPU. With TCP+TLS, I could comfortably achieve 5Gbps. (2) QUIC was less resilient to packet loss (surprisingly). This was particularly noticeable on mobile devices.

If your use-case is to move bytes between powerful servers over a reliable, wired connection, QUIC may beat TCP in most ways that matter. But for use in real-world mobile apps, TCP may still offer better throughput.

Caveat: This is all data from using the quic-go package. The C libraries may well be more efficient :)

skissane · on Nov 3, 2022

> QUIC uses more CPU, due to the processing in user-space.

That’s not inherent to the QUIC protocol itself though, that’s an implementation decision. There is no fundamental reason why QUIC couldn’t be implemented in kernel-space, such an implementation would still be conforming.

iamcalledrob · on Nov 3, 2022

Absolutely, and it would be great if there was OS support here.

TCP processing can be also offloaded to network cards today, so the same might need to happen for QUIC to be similarly efficient.

Animats · on Nov 3, 2022

No performance numbers, though. I was hoping for third party benchmarks.

QUIC is optimized for Google's use case - Google client talking to Google servers, with many Google streams combined into one big pipe. This is not the normal non-Google case. Early performance numbers from Google only indicated a relatively small gain (15%?) even for that case.

quietbritishjim · on Nov 3, 2022

As I understood it: QUIC, or more accurately HTTP/3 over QUIC, is optimised for consumer web browser, mainly mobile devices over lossy network. That is not (only) Google's use case at all.

If you're loading several images over the same HTTP2/TCP connection and you miss a packet of one then you have to get later packets resent for all of them. Even if the delayed packet does eventually arrive without needing to be resent, it still has the effect of holding up all the other streams. With HTTP3/QUIC, a missed or delayed packet does still affect the stream it's from within that connection (still a pity) but at least the other streams carry on unaffected.

This problem in TCP is called head-of-line blocking:

https://en.wikipedia.org/wiki/Head-of-line_blocking

Animats · on Nov 3, 2022

Even if the delayed packet does eventually arrive without needing to be resent, it still has the effect of holding up all the other streams.

Only if you're multiplexing multiple streams inside one TCP connection. Multiple TCP connections will not affect each other.

quietbritishjim · on Nov 4, 2022

Yes, that is exactly what HTTP allows: multiple data streams over a single TCP connection. In general, that's a good thing: if you have a page with dozens of small files (icon images, CSS, JS fragments, etc.) then you don't want to instantiate a new TCP connection for every one.

Browsers typically have multiple HTTP connections open as a compromise so that you don't need a new TCP connection for every file but don't have the head of line problem so severely. But it's not as good as fixing the underlying problem.

moderation · on Nov 4, 2022

The notion that QUIC + HTTP/3 only benefits Google is often repeated here. Do we think that Google brainwashed or tricked Cloudflare, Fastly, Akamai, F5, Facebook, the browser makers (Apple, Mozilla / Firefox), the proxy makers (Envoy proxy and all of the companies building on them), Caddy, Traefix, NGINX (sort of), Apache etc.) and others who are supporting HTTP/3 and have written at length about its benefits (through a multi-year IETF process that they participated in)?

dinosaurdynasty · on Nov 3, 2022

15% is massive in absolute terms for a company like Google.

stevewatson301 · on Nov 3, 2022

One of the best writeups about QUIC, and my new go to reference for anyone who asks about the topic (since the QUIC RFCs are obtuse, and I say that as someone who's read many RFCs to troubleshoot networking issues).

zackmorris · on Nov 3, 2022

This is great! I've tried writing reliable stream implementations over UDP for games but always got lost in the weeds.

The main use case I need is to have reliable streams and unreliable datagrams that continue working even if either peer's IP address changes. Something like that would allow cell phones to form scalable p2p mesh networks, for example.

It needs to be encrypted and punch through NAT in a fully automated way, falling back to a (secure/anonymous) matching server if both peers are behind NAT. I don't know about that last part, but it appears that WebTransport over HTTP/3 over QUIC over UDP might be able to do most of that and be a potential replacement for WebRTC data channels:

https://web.dev/webtransport/

https://www.w3.org/TR/webtransport/

Sean-Der · on Nov 3, 2022

What caused you to get lost in the weeds before, would love to help! Was it a software or technical resources problem. I work on https://github.com/pion/webrtc and https://webrtcforthecurious.com and love solving both problem :)

WebRTC has a really cool feature called `ICE Restart` that solves the 'walk out the door' problem.

zackmorris · on Nov 4, 2022

Oh nice, I didn't know about ICE Restart before. Is there a way to make it automatic or set the timeout? I just skimmed the docs and it looks like it takes 30 seconds to realize that the peers aren't talking anymore. For a game, I might make that a few seconds to stay below the roughly 7 second attention span of most people.

But ya, I got windowing working, and came up with a scheme to store the app's identifier (4 digit creator code on macOS) in a checksum by subtracting the actual checksum to be left with a remainder like 'app1'. So the game would transmit on checksum app1 instead of port 12345 for example. But I realized the overhead of computing the checksum might make it vulnerable to denial of service attacks, so maybe a separate field might have been better for faster filtering. I halfway implemented some slow start and Nagle algorithm stuff too. But NAT negotiation with STUN/TURN/UPnP/NAT-PMP or similar killed the project. That's so hard to solve for humans that I wouldn't even try now, I would probably enter all of the states in a giant truth table or graph and write a stateless solution with the time element removed instead.

This was over 15 years ago though so the C++ code isn't really any good anymore. Also because of buffer overrun exploits in C, I think it's better to use a proven (formally tested?) framework like WebRTC instead, even for low-level stuff like games. We have GHz processors now that can handle any level of complexity around ~1500 byte packets and still saturate pipes, so trying to go bare-metal there strikes me as premature optimization that isn't usually worth the risk it exposes.

The next thing that got me was, I didn't know about coroutines. So I had incredibly elaborate state machines for the game logic that quickly became unmaintainable even for the simple puzzle games I was prototyping. Today I would use a state manager like Redux or a software-transactional memory (STM) or a conflict-free replicated data type (CRDT), maybe distributed with Raft/Paxos somehow so that all peers can see the same state. An event-driven interface like the one used by Firebase/RethinkDB/Supabase, maybe CouchDB or similar, would be good too. I got stuck on solving consensus, like when 2 people hit each other at the same time, but maybe that's been solved.

And then there's the matchmaking/lounge server. I cobbled something together using an IRC-style syntax, but was never happy with it. Maybe something like Matrix would work there.

Believe it or not, I haven't actually used WebRTC yet, but want to. I also don't want a competing standard in WebTransport. I think what I was always looking for was a connectionless reliable stream, which was never a thing on the web, which set innovation back perhaps 25 years because only a handful of apps like Skype managed to pull it off. It appears that QUIC comes closest. And if WebRTC is the only one that's solved NAT and the connectionless part with ICE Restart and peer identifiers, then I'll consider that the standard.

<rant>

Sorry this got a little long, I'm just passionate about the injustice of stuff like NAT. I feel that the lack of solutions around this stuff was deliberate by the telecom industry. It all but ruined my game development career by making it difficult to compete with large AAA studios after the mid-2000s (until the arrival of Unreal/Unity perhaps). This is 1 of 100 rabbit holes that I've been down only to fail. So I appreciate that people like you are shedding light on this stuff so that maybe the next generation will be able to work feely without artificial barriers holding them back.

superkuh · on Nov 3, 2022

Games that use QUIC (no connections without CA TLS) will have a higher chance of eventually stopping working because they'll have to maintain the infrastructure to update their leased TLS cert from the CA every $x days. This is less important now than in the past because most games have moved to a microtransactions model that is (mostly, see exceptions like Team Fortress 2) incompatible with hosting your own server. So usually the company kills the game or goes under itself first.

But for those games that do use QUIC and do have private servers, hacks will be needed to keep the games going long term (20+ years). Retro-gaming is going to be much harder in the future.

superkuh · on Nov 3, 2022

Under "QUIC Issues" it mentions "Private QUIC" and offhandedly remarks that there is no way to use QUIC without CA based TLS which is built in. This means that QUIC cannot be used without the continuing approval of a third party incorporated entity. This is a very serious problem when QUIC takes over in user software (made by megacorps) and eventually drops HTTP/1.1 support. It will be the end of the open web and the beginning of a corporate controlled one.

Ignoring this extremely dangerous outcome of a QUIC only world, this write-up is excellent and really clears things up.

jmillikin · on Nov 3, 2022

Operating a private CA is trivial, there are packages to automate it at every level from simple manual scripting to fully-automated ACME servers such as Boulder[0].

For proprietary software it's idiomatic to use a private CA for securing connections to a central server, to avoid the risk of DNS hijacking or a malicious third-party CA. For open-source software it will, definitionally, allow substituting different server key validation options for users who prefer to run their own servers.

The only real problem QUIC's mandatory TLS causes is that localhost development becomes marginally more difficult, but even as someone who loves running my in-development stuff on localhost it's clear that the security of the internet as a whole is of far greater value.

[0] https://github.com/letsencrypt/boulder

FuriouslyAdrift · on Nov 3, 2022

Bulder is not suited for production use as per it's own docs... "It is not suitable for use as a production environment."

superkuh · on Nov 3, 2022

Operating a private CA + QUIC does not allow me to host my own website that is visitable by a random person on the internet. HTTP/1.1 does.

Private CAs solve other problems for internal uses but not this big important one for the public web.

kevstev · on Nov 3, 2022

I am continually shocked at how few people actually care about this. Maybe I am just showing my age, because I remember in college, and even when I got my first cable modem, I could easily just spin up apache or IIS and have a website hosted from my house, with no additional money or hoops required- start up a webserver, pop in the IP into my browser, or my dynamic dns, and there we were on the web! This lead to all kinds of experimentation and such, even if I did get "hacked by Chinese!" when I had IIS running around 2001ish.

Google has been doing a great job of killing the "personal internet" by making it undiscoverable for several years, but now that we require a third party to somehow be involved, and likely a fee of some sort to be paid- I don't trust that the likes of Lets Encrypt and its duplicators to be available forever, the original "anyone can publish" dream of the web is dying in front of us. This: https://justinjackson.ca/words.html does not require SSL in any form, in fact most websites don't need it at all- and the obsession with even blogs wanting you to make accounts and to have to login, its all just heading in a very wrong direction.

Its just deplorable to me that I now need a third party to essentially tell me that I can be on the Internet. Its a huge step backwards in so many ways. And no one seems to care!

dinosaurdynasty · on Nov 3, 2022

Caddy is just as easier if not easier than that.

> Its just deplorable to me that I now need a third party

You've always needed one or more third parties (ISPs, backbone, ICANN, etc), this is no different.

jgwil2 · on Nov 4, 2022

It's another barrier that's been added, perhaps for good reason, but it is a tradeoff worthy of debate.

jmillikin · on Nov 3, 2022

HTTP over plaintext TCP is obsolete in a similar way to Telnet (vs SSH) or FTP (vs SFTP). It took something like 20 years for support for FTP to be dropped from major user agents[0].

By the time the major browsers drop support for plaintext HTTP, that protocol will be as quaint and archaic as Gopher is today.

(and we'll all be better off for it -- good riddance to anything ASCII-based or unauthenticated)

[0] https://blog.mozilla.org/security/2021/07/20/stopping-ftp-su...

skissane · on Nov 3, 2022

> HTTP over plaintext TCP is obsolete in a similar way to Telnet (vs SSH) or FTP (vs SFTP).

That’s not comparing apples and oranges-HTTPS was originally just (and often still is just) plaintext HTTP/1.x wrapped in TLS. Whereas, SFTP and SSH are not FTP/TELNET over TLS, they are completely separate protocols. FTPS and TELNETS are the original protocols over TLS, but have seen only limited adoption (mostly in mainframe environments, in which SSH and SFTP perform poorly.)

Even HTTP 2 and 3 are just a binary encoding of the HTTP/1.x protocol over an encrypted transport. By contrast, SSH and SFTP are essentially unrelated protocols to TELNET and FTP, as opposed to just binary encodings of them. (The TELNET protocol is already non-text-based anyway-the data may be mostly text but the control messages have never been.)

superkuh · on Nov 3, 2022

I think even you will be surprised how fast Google drops HTTP/1.1 support in Chrome. Humanity won't be better off for it but commerce certainly will. And I guess that's all that matters.

ahefner · on Nov 4, 2022

This take is really not crediting the different threat models motivating SSH versus HTTPS. In both cases the network is considered untrusted, but beyond this the analogy largely fails - almost every application of telnet risks leaking authentication credentials to an eavesdropper (in addition to allowing them to subsequently surveil your session), whereas the classical HTTP use case is to retrieve information anonymously. I cheered the widespread move toward HTTPS as a response to outrageous behavior by certain ISPs and their modifying/injecting content into sessions, which arguably should be criminalized, yet if the forces of abusive service providers and unaccountable state security agencies is becoming so omnipresent that we can only retreat beyond a veil of encryption, we are living in very dark times.

I'd be more optimistic if not for the (cynical?) fear that those most in need of encryption will be the first to be denied it by the force of the state, a day only accelerated by its increasing ubiquity. Your government can force you to install a state-sanctioned CA cert, almost entirely undermining TLS from that perspective, which Google can/will do nothing to resist, and their parallel desire to deprecate plaintext HTTP is entirely orthogonal to this.

0x000xca0xfe · on Nov 3, 2022

HTTP/1.1 is not obsolete -- HTTP/2 and HTTP/3 preserve basically all semantics. HTTP/1 is better than 2 and 3 in many cases, e.g. when you cannot afford the complexity like in tiny embedded devices, or when you just need to bootstrap a WebSocket, or you don't need TLS.

HTTP/1.1 will be obsolete when intranets and tiny devices are obsolete. Well, it looks like that could become true...

jayd16 · on Nov 3, 2022

I guess you could still use a self signed cert if your users accept unsigned anyway.

tenebrisalietum · on Nov 3, 2022

> Operating a private CA ... does not allow me to host my own website that is visitable by a random person on the internet.

It does if they install the CA cert.

UI is not pretty, but not insurmountable on standard browsers, OSs, or phones, and there is no interference unless you are working on a domain-joined Windows PC or your phone is under some sort of MDM.

These UI issues with this process could be resolved by some browser UI work by parties that aren't interested in perpetuating the current status quo and can get past handholding boomers apparently willing to enter bank credentials into anything that moves.

Or just use Gemini.

_abox · on Nov 3, 2022

> UI is not pretty, but not insurmountable on standard browsers, OSs, or phones, and there is no interference unless you are working on a domain-joined Windows PC or your phone is under some sort of MDM.

The problem is that while you can still add root CAs on modern Android, apps have to specifically opt-in to user-added root CAs. They can ignore them as they wish and I believe they have to even opt in to third-party CAs. Needless to say most apps don't bother. This is a big problem with self-signed certs on Android. I believe this was changed in version 7 or so.

Not sure how this works on iOS but on Android it's really a problem.

tenebrisalietum · on Nov 3, 2022

I learned something here. Been a long time since I've had an Android phone (4.0.3 was the version of the last one I used IIRC).

jakeogh · on Nov 3, 2022

Neat. Project Gemini: https://gemini.circumlunar.space/docs/specification.html

  4.2 Server certificate validation
  Clients can validate TLS connections however they like (including not at all) but the strongly RECOMMENDED approach is to implement a lightweight "TOFU" certificate-pinning system which treats self-signed certificates as first- class citizens. This greatly reduces TLS overhead on the network (only one cert needs to be sent, not a whole chain) and lowers the barrier to entry for setting up a Gemini site (no need to pay a CA or setup a Let's Encrypt cron job, just make a cert and go).

I'm surprised to not see an optional response size field in the response header. How does the client calculate progress%? Reading....

superkuh · on Nov 3, 2022

I'm wondering how you think a random users on the internet is going to have already installed my self-signed CA root cert before going to my website?

Gemini is fine. I love trust on first use for TLS. But there's no reason to drop HTTP/1.1.

jacob019 · on Nov 3, 2022

I think it will be a very long time before HTTP/1.1 is dropped entirely. It will get more warnings, and eventually get locked behind feature flags, but it's still quite useful on the LAN and will be required by legacy business software for decades. I do think it's appropriate to deprecate it's use on the public Internet. Let's Encrypt is operated by ISRG, a 501(c)(3) nonprofit. TLS certificate signing is an imperfect solution, but it is distributed enough that I wouldn't call it corporate controlled.

jakeogh · on Nov 3, 2022

Exactly. It's 3rd party permission to speak, among other deliberate problems: https://news.ycombinator.com/item?id=20992508

jacob019 · on Nov 3, 2022

Does QUIC not work with self-signed certificates?

dinosaurdynasty · on Nov 3, 2022

Browsers probably won't support it.

I know quinn (a rust QUIC library) definitely does, though you have to enable custom verification as a compile option (which is reasonable).

zekica · on Nov 3, 2022

Browsers support QUIC with self signed certificates. It doesn't work with HSTS but that is by-design.