What Can I Do About Bufferbloat?

Arie · on July 3, 2018

An awful lot of home broadband connections suffer from bufferbloat and even for the ones that don't, a single host can easily hog all the bandwidth. If you're used to getting lag in your VoIP or gaming when a house mate starts a stream/download/torrent, this can be fixed :)

The cake traffic shaper in OpenWRT is amazing for fighting bufferbloat in your home network and it can also do almost perfect fairness in dividing the available bandwidth per LAN host with very little configuration. Just get it as part of the SQM tools in OpenWRT and enable it. For the per-host-fairness take a look at the "Make cake sing and dance" from this link: https://openwrt.org/docs/guide-user/network/traffic-shaping/...

If you use an Edgerouter, you can get the cake traffic shaper but you'll have to do without the easy web interface OpenWRT has: https://community.ubnt.com/t5/EdgeRouter/Cake-and-FQ-PIE-com...

bdelay · on July 3, 2018

Parents live in a smaller town with two awful ISP selections. They had a bunch of WiFi devices on an ISP router and the connection quality and latency was just terrible when more than one device was in use and any bandwidth intensive services were being used. (Low quality Netflix is intensive on small-town monopoly internet.)

I purchased them a Netgear R7800 and installed hnyman's LEDE build [1] to enable SQM. Night and day difference in latency response. No more staring at a white screen for 3 seconds per URL click.

The build has been stable for several months. I wouldn't recommend this for non-technical users or anyone not willing to spend time troubleshooting, but it has been a great improvement. I couldn't find any other device capable of doing this without running x86 hardware or something else silly.

A few other people mention it, but yes, this is only going to work on slower connections on current SOHO hardware. I think the R7800 can do software SQM at up to 150mbps or so. Plus, if you have a gigabit symmetric connection, hopefully you aren't having bufferbloat issues.

Just wish a popular manufacturer would release an easy-to-use router with SQM so I could install it for non-technical users and forget. Ubiquiti is somewhat close to that, but I believe their prosumer hardware (USG) is running a slow processor at the moment and doesn't even support SQM without installing custom kernels.

[1] https://forum.lede-project.org/t/build-for-netgear-r7800/316

pedrocr · on July 3, 2018

>I couldn't find any other device capable of doing this without running x86 hardware or something else silly.

Was the internet speed so high that you couldn't use a normal supported router like a TP-Link Archer C7 at half the cost? I need to do more testing but it seems my C7 can handle my 100/100Mbps fiber connection doing SQM without too much issue.

thermodynthrway · on July 3, 2018

I haven't seen a router under $400 that can do fair queueing in hardware faster than ~200 megabits. If you have fiber it's cheaper to setup a beefy Linux box and run PFSense on it. Hardware offload is usually disabled when you turn on QoS so doing slow will often slow down gigabit LAN links as well

technofiend · on July 3, 2018

This seems like a potential sweet spot for the Espressobin [1] with pfsense but the pfsense folks have not released their ARM version, only demonstrated it [2], perhaps they're just too dang busy or perhaps it would cut into the margins of their x86 solutions. Regardless it would make a nice appliance if they ever do release a PFsense ARM image.

[1]http://espressobin.net/

[2] https://gist.github.com/gonzopancho/760ab9ecee9dfbc1b6033e48...

StudentStuff · on July 3, 2018

PFSense is great if your okay with pulling out a monitor and keyboard every time there is a config issue or interface change. Do not bring in any interfaces over USB if you like to preserve your sanity and want to use PFSense.

These days I just run OpenWRT on x86, no more will my router sit in a broken state that I can't fix by logging in over the LAN or WAN (via OpenVPN ofc). Wish PFSense would get sane defaults in this regard!

pedrocr · on July 3, 2018

I'm at 100/100 though, so wouldn't something simpler be enough? The wired side of the router is gigabit but that's just an integrated gigabit switch, it doesn't even touch any CPU. I'll be doing more testing to make sure but my ISP doesn't seem to have too bad a buffer bloat anyway.

thermodynthrway · on July 3, 2018

Sometimes turning on QoS disables offload on all ports including lan. Try a download test between local machines

pedrocr · on July 3, 2018

My understanding of these routers is that the gigabit switch is independent from the router. They're physically on the same board but the router is just another machine on the switch. If the switch table says portA->portB it doesn't matter what the router on portC has decided to offload or not.

Edit: Maybe you mean Wifi to wired may have a disabled offload? That path does go through the router and not directly through the switch. For bigger installations I end up having one of these with wifi disabled as the router (firewall, dhcp, etc) and individual ones connected through ethernet as dumb access points (same SSID on all and straight bridge from Wifi to Ethernet). That should also avoid any issues and is a good setup to get more wifi coverage with a simple config.

thermodynthrway · on July 3, 2018

Good to know, would explain why there's a phantom eth port on some of these routers, must be used to connect between router and switch chips. Sounds like you're right about wifi->wired transit though if this is the casr

wtallis · on July 3, 2018

The typical architecture for routers these days is that the main SoC has two ethernet interfaces, each of which is connected to a 7+ port managed switch. One of the host CPU's interfaces is on the WAN VLAN, and the other is on the LAN VLAN. Some older routers used to have just one ethernet link between the switch and the CPU, with the CPU's other interface exposed directly as the WAN port. That made it easier to avoid bloat or bugs in the ethernet switch itself, but was fundamentally incompatible with the NAT offload those switches provide, so that configuration is now almost impossible to find.

pedrocr · on July 3, 2018

This also makes these little routers extremely powerful. Since those switches have VLANs as well you can create very interesting topologies that would require much more expensive managed switches to achieve. I run an extra VLAN from my router, to one of my APs, to a dedicated wifi SSID to another wireless router to ethernet to a TV box so I can have the TV signal in a place I can't run ethernet to. Doing it through the normal Wifi would be a bad idea because the provider uses multicast IPTV and if you put that on your wifi every connected devices receives it. And this is all done with 3 50€ routers running LEDE, each with different VLANs and wifi SSIDs configured. They make for a really flexible setup.

bdelay · on July 3, 2018

Just read the r7800 had the best range for an all-in-one unit. Not sure if it's true, but it has been an amazing router. I picked one up for myself -- they are 130$ refurbished on Amazon every now and then.

To answer your question: I have no idea. Would be neat if a much cheaper model had the horsepower though.

pedrocr · on July 3, 2018

Sounds like a good recommendation. The C7 has been my go-to for cheap, good wifi, and solid LEDE support. But I haven't stress tested it to check how it will take a very congested network. My uses have had fairly light users.

StudentStuff · on July 3, 2018

The Archer C7 seems to do about 400Mbps with no configuration/optimization when running OpenWRT, plus with them being available for $20 to $30 on Craigslist and their knockoffs (Offerup & Letgo), its easy to nab one for cheap.

I hear hardware offload is possible, but I have yet to try a build that has the patches for it.

pedrocr · on July 3, 2018

With Gigabit fiber internet I can see needing something more. I find 100Mb internet to be enough for my needs and the Wifi performance to be adequate to the NAS on the LAN. So I even prefer that there's no offload to hardware and that it's the well tested Linux kernel code doing the heavy lifting.

pedrocr · on July 3, 2018

Ran the tests and apparently the C7 is perfectly capable of doing 100Mb/s with cake. But as it turns out I don't really need it. I already get an A for bufferbloat with my provider without it and turning it on doesn't get me to A+.

sprintf · on July 3, 2018

I've found that I only need to shape upload and I get almost all the bufferbloat benefits, while reducing CPU requirements because download is not shaped. Thus, a $15 router with a slow CPU can be fine for fixing bufferbloat.

graystevens · on July 3, 2018

Thanks for the Edgerouter link, saved me doing some Googling - Off to got and apply this now and see what difference this makes to the Bufferbloat tests.

As Arie mentions, this is a little more involved on the EdgeOS stuff, but doesn't look too complex for those that are used to a CLI or two.

Edit: Entirely off-topic, but an equally interesting find – There is a WireGuard client for the EdgeRouter line: https://community.ubnt.com/t5/EdgeRouter/Release-WireGuard-f...

voltagex_ · on July 3, 2018

Ah crap, I wonder if I've done the wrong thing going with a switch + Security Gateway instead of an EdgeRouter.

andrewaylett · on July 3, 2018

You get integration with the Unifi UI and a much easier configuration experience, you lose only because what you've got is (probably) overkill for residential use. If you've got more than one AP then you start to win again because you can power all your APs from the switch.

If you were happy with your setup before hearing about the EdgeRouter, I hope you'll still be happy with it now :).

voltagex_ · on July 4, 2018

It looks like I'll be fine with the US-8-60W and the USG until I get an Internet (downstream) speed of > 100 megabits, which, in Australia, is not likely to happen for a long time.

I'd hope I'd have upgraded my LAN to at least 2.5 gigabit by then, anyway.

philjohn · on July 3, 2018

There's one downside - if you've got a fast connection, anything less than a top-of-the-range router with dual or quad core highly clocked CPU's are going to struggle to shape that much traffic.

bytematic · on July 3, 2018

I just got an edgerouter-x how does this compare to just using smart queue? dslreports gives me an A+ for bufferbloat.

Arie · on July 4, 2018

Smart queue (fq_codel + htb) does not do per-host fairness, only per flow fairness. So a single host on the LAN can still hog all the bandwidth.

olfactory · on July 3, 2018

Great links. I've been using cake for a while but without the advanced options.

One random question: Does it ever make sense to add SQM to a tap_soft interface? I have two locations and both have SQM set up to minimize bufferbloat, but when I VPN from one to the other there is some bufferbloat on the VPN connection.

dtaht99 · on July 3, 2018

if the vpn terminates on the routers... fq_codel now (I cannot remember the kernel version, sorry) can preserve the inner hash of the vpn traffic and manage the flows before they hit the tunnel. This is mostly an ipsec, not openvpn, sort of thing.

dtaht99 · on July 3, 2018

Recently I ranted at the ISPs and at the network neutrality people. fq_codel (RFC8290) is now nearly ubiquitous in as the default queuing mechanism in most linux distributions, and it is long past time more ISPs supplied it in the gear they give customers.

http://blog.cerowrt.org/post/net_neutrality_customers/

sch_cake (derived from fq_codel with an integral shaper) is about to enter the Linux mainline after years of baking in openwrt.

https://lwn.net/SubscriberLink/758353/05c20f25115c852d/

We can end bufferbloat, now, on everything.

philjohn · on July 3, 2018

The problem is that ISP provided kit is built to a price, so will tend to have a more aenemic processor. Shaping high bandwidth needs CPU cycles, or better yet, hardware offloading.

Funnily enough, I've just got a firmware applied to my cable connection with Virgin Media in the UK that mitigates the Puma 6 issue on their provided DOCSIS 3 modem - it's eliminated buffer bloat as well, which is a nice side effect.

People are listening, Docsis 3.1 includes active queue management for instance, which goes a long way to preventing the issue.

StudentStuff · on July 3, 2018

Note that the firmware update doesn't fix the issue (which is Intel failed to put much cache on their chips to save $$), but merely helps to lessen the impact of Intel's shoddy hardware engineering. I would still replace your modem with a non-Intel modem ASAP if you want good performance.

This thread covers the issue pretty well: https://www.dslreports.com/forum/r31614833-Equip-Intel-Puma-...

philjohn · on July 4, 2018

Virgin Media in the UK don't allow third party modems.

I've run a lot of tests since the update, and it's definitely on a par now with my old FTTC connection, jiter especially is much improved.

pja · on July 3, 2018

I enabled sch_cake on my router (an R7800 with Openwrt 18.06 on it) recently. Worked like a charm. Would happily recommend.

I imagine the main thing that's holding back ISPs from using it on their consumer routers is the (minimal) configuration the user has to do to set to upload bandwidth limit. If ISP routers were also using decent queuing algorithms and not buffering like crazy then that wouldn't be necessary of course, but it doesn't seem like that's going to change any time soon.

dtaht99 · on July 3, 2018

Well, many isp-born devices are provided with that info by the ISP at connect time, so the hope has always been that during configuration they'd just pass (for example) "bandwidth 10mbit docsis" to the network setup routine. DSL modems also typically get this info at startup.

akira2501 · on July 3, 2018

> We can end bufferbloat, now, on everything.

We have been able to for years. htb+sfq has been around forever.

wtallis · on July 3, 2018

Neither HTB or SFQ includes a CoDel-like AQM component. HTB+SFQ will get you fair sharing of bandwidth shaped at the rate of your choosing, but does nothing to control queue lengths.

akira2501 · on July 3, 2018

> but does nothing to control queue lengths.

For directly managing bufferbloat, I never understood this to be a requirement. Either technique is meant to create back-pressure on the TCP stack so it can more effectively manage it's window.. and both do that perfectly well.

My understanding is that you would want AQM in order to keep your bandwidth utilization closer to the actual wire speed than a simple priority queue would.

So, we've had the tools to deal with bufferbloat for a long time, it's just that this mode now just provides slightly better peak performance.

wtallis · on July 3, 2018

> For directly managing bufferbloat, I never understood this to be a requirement. Either technique is meant to create back-pressure on the TCP stack so it can more effectively manage it's window.. and both do that perfectly well.

If you have a traffic shaper being fed by a deep, dumb queue, you aren't giving useful back-pressure to TCP until it's too late. You need an AQM that gives either ECN marks or packet drops soon enough that you don't build up or sustain deep queues of packets. That's the core of what bufferbloat is. The queue(s) in front of HTB in your router might not be as stupidly oversized as the queue in your cable or DSL modem, but they're still susceptible to the same problems when there's no AQM component. Some TCPs do a decent job of backing off when latency climbs, before the buffers actually fill and start causing packet drops. But AQM in the router can more directly observe and act on congestion, and works with older TCPs and non-TCP traffic.

dtaht99 · on July 3, 2018

I am immensely cheered up by the progress reports contributed by so many on this thread. But, can I ask that if you grok it, go fix it for two friends? Go fix it for a local small business, a coffee shop, or a hotel. And ask your ISP to fix it in their default gear?

There's probably over 2b routers without bufferbloat fixes installed, and only if we work together to get them deployed will the internet as a whole get better, more capable of handling web, games, videoconferencing, and other applications that demand consistent low latency.

Thanks.

doctorsher · on July 3, 2018

Thanks for everything that you do, Dave. Your bufferbloat and WiFi work is excellent, and I'm excited for the airtime fairness work with Toke to make it into more 802.11 drivers than ath9k. Cheers.

dtaht99 · on July 3, 2018

I was essentially in a overwork-induced coma for the last 18 months. The work was carried forward by (many, many) others.

I was happy to wake up a few weeks back, and find RFC8290 published, and sch_cake being readied for mainline, and multiple commercial products finally shipping what we'd worked on all these years.

vanburen · on July 3, 2018

I recently bought an Ubiquiti EdgeRouter X [1]. Which has a smart queue function that utilises fq_codel.

Results from the DSLReports speed test [2], using the same router, the only difference being turning smart queue on:

Smart queue off: https://i.imgur.com/zeY4rTd.png

Smart queue on: https://i.imgur.com/jfHpiFb.png

Notice the improvement in bufferbloat score from D to A+.

Subjectively, I have noticed the connection seems more responsive and there no longer seem to be latency spikes when utilising all the upload bandwidth.

It’s certainly worth considering bufferbloat, if you suffer from latency spikes when using all your upload bandwidth (I used to suffer from this a lot more, when I had an ADSL connection, with only 1 megabit upload).

[1] https://www.ubnt.com/edgemax/edgerouter-x/

[2] http://www.dslreports.com/speedtest

Arie · on July 3, 2018

I love the ER-X because it's cheap, generally available and a fine router with SQM. Just keep in mind that with smart queue enabled it will top out at about 150-170Mbit. If you're on a faster connection the Edgerouter ER4 with its faster quad core CPU should be able to handle up to about 300-350Mbit.

When you need to shape even more bandwidth, routers based on the Marvel XP Armada chipset do very well. I flashed a Linksys WRT1900ACS with OpenWRT and was able to shape about 600-750Mbps before it ran out of horsepower.

matwood · on July 3, 2018

> I love the ER-X because it's cheap

I love my ER-X, but once I got my gigabit connection it can't quite keep up compared to plugging directly into the modem. It's close enough that I'm not looking to replace it, but the next time I need a router I may go the NUC build your own route.

0xffff2 · on July 3, 2018

How does one go about building a router from a NUC? Don't you need two NIC's for a router? (One for the modem, one to connect a switch for the LAN.) Or have enough people given up on wired networking that they just build routers where the entire LAN is on WiFi?

jonathanlydall · on July 3, 2018

If you have a VLAN capable switch and appropriate network card with the right drivers, then you can have all the logical ports you could possibly want.

I have done this at my parents place with a cheap Intel Celeron based computer with a single network port, but 2 logical networks with vlanning. One is for their personal network and the other is for their guest suite they offer through Airbnb.

It’s only 20MB/s fibre, so not much cpu power needed. The no name computer cost less than $250, came with a 60gb SSD and I put pfsense on it.

matwood · on July 4, 2018

You can get NUCs with dual NICs or just use one of the mini PCs out there. Ars has few articles building a home brew router with speed comparisons.

supertrope · on July 3, 2018

Turn on hwnat to keep up with gigabit. (It won't keep up with simultaneous gigabit down and up).

NelsonMinar · on July 3, 2018

For anyone else looking, there's quick-and-dirty notes on how to configure this back in the 1.7 release notes. (Search for "smart queue") https://community.ubnt.com/t5/EdgeMAX-Updates-Blog/EdgeMAX-E...

Note the comment "the actual rate limits will be set to 95% of the specified value". That explains why I saw a 5-10% dropoff in throughput when I enabled it with honest numbers.

thermodynthrway · on July 3, 2018

Note that if you have gigabit or faster it's usually cheaper to build a router box with a really fast CPU. Most/all reasonably priced network applicances can't do FQ in hardware offload and their CPU's are pretty weak. Never managed to get over 200mbps on any network box besides a super beefy PC.

ectospheno · on July 3, 2018

An APU2 running OpenBSD with codel and a reasonable pf ruleset gets 450.

blattimwind · on July 3, 2018

I'm currently behind a Linux router with basically nothing but firewall rules configured; I still get an A on that test (meaningful?). I guess this sort of thing is just the default nowadays?

dtaht99 · on July 3, 2018

What router? what ISP? What link technology? What bandwidth? A pointer to your dslreports result? There are plenty of small ISPs that have adopted this stuff... and a few router makers.

I am always happy to hear of a bloat free connection.

Arie · on July 3, 2018

You're one of the lucky ones with an ISP with properly configured buffers. A+ is the goal on that test though, which basically means no extra latency under load.

wnevets · on July 3, 2018

doesn't the smart queue feature reduce the maximum bandwidth the router can handle?

wtallis · on July 3, 2018

In some cases, your connection speed may be fast enough that your router's CPU can't keep pace when doing traffic shaping. But if your router is powerful enough, then properly configured SQM will not result in any meaningful reduction in throughput (and can lead to better real-world throughput, by allowing congestion control to work properly). The more you know about the properties of your WAN connection, the more accurately you can configure SQM to account for the true limits of your connection. If you have an ADSL connection, then it helps to tell SQM to take into account ATM framing overhead, for example.

treis · on July 3, 2018

Holllllleeeeeyyyyy sheeeeet!!!

I went from a buffer bloat of 1.2s UP and 0.6s DOWN to effectively 0. It's like I have a whole new internet connection.

OP and everyone else involved gets a million internet points from me!

thermodynthrway · on July 3, 2018

If your network is all Linux hosts, configure BBR TCP congestion control and you won't ever have to worry about bufferbloat again (unless you use a lot of UDP) . There's a ton of research in this area, but to summarize, there's two main ways of controlling outgoing packet rate.

Measuring RTT(round trip time) or packet loss rate. Unfortunately all the early (and still common) TCP protocols control their send rate by measuring packet drop. Sending faster and faster until upstream routers somewhere start dropping traffic. This has the effect of completely filling the outbound buffers of the slowest link in the chain.

To combat this, most bufferbloat fighting algorithms focus on dropping traffic before buffers fill, namely RED (random early drop) and CODEL.

Newer algorithms like TCP Vegas and BBR measure RTT and lower transmit rate when they detect buffers down the line filling, preventing bloat.

In most cases, you still need a router configured to prevent bufferbloat because even a single naughty protocol on your network can fill outgoing buffers.

The most important thing to know about controlling bufferbloat with QoS is that you MUST have an accurate estimation of your max upload/download rate. This is because you can only control bloat if you are in control of the slowest link where it will build first. All effective bufferbloat solutions rely on artificially making your router the slowest link, usually by limiting bandwitdh through them to ~90% actual.

Once you have control of the slowest link you can pick which packets get dropped as the buffers fill. And using something like FQ_CODEL you can assign equal bandwitdh to all IP's on the network. The nice thing about controlling bandwidth this way vs hard speed limits per user is that it allows users to use as much bandwith as they want until staying below line rate requires sharing

Also notice I keep saying outbound. You actually have far less control of bufferbloat on the inbound end. Best you can hope for is that dropping inbound packets coming in over the configured rate will stop whatever server from sending you more as quickly, but this isn't always the case. Luckily, in my experience, the vast majority of "lag" is due to outbound buffers filling, not inbound. So setting up bufferbloat fighting bandwidth sharing is usually extremely effective

dtaht99 · on July 3, 2018

some comments:

1) BBR is currently not something I'd recommend at home.

2) The hope has always been that the core two bufferbloat-fighting algorithms (BQL, and fq_codel) would end up in the cable, fiber or dsl modem hardware, so that no shaping would be required, as there would be sufficient backpressure from the link itself to regulate the link intelligently. The cpu costs on this are nearly 0! BQL is 6 lines of new code in the device driver. fq_codel has been shipping in linux for 6 years. It's just a matter of turning it on...

But: lacking that support from the ISP-supplied gear, we shape with htb + fq_codel, (as you say), to ~90% of the link rate... with another box - or even in the same box if the device driver can't be fixed and is overbuffered. We are painfully aware of how much cpu shaping costs but modern cpus usually have enough oomph to handle it.

btw: We've come up with a new deficit shaper (sch_cake) that lets us get to ~100% of the isp bandwidth (so long as you get the wire framing exactly right), while providing vastly better queue management in the sqm system.

4) fq_codel is fair to flows, not devices. This works well in the general case, but has edge cases where abusive apps that open a lot of flows gain priority. Adding per host fq (while retaining per flow fq), even through nat, was the number 1 request from the users for sch_cake, and one of the main reasons why cake exists.

thermodynthrway · on July 3, 2018

You can configure FQ_Codel to match on different parts of of the packet. Default matching "tuple" for flows includes source/dest port along with source/dest IP. If you set this to source IP only it should fairly distribute bandwidth based on LAN IP alone. If cake is a normal QDisc you might be able to enable flow-per-ip without adding any code. For an example with FQ CODEL see:

$TC filter add dev $IF_WAN parent 11: handle 11 protocol all flow hash keys nfct-src divisor 1024

At https://openwrt.org/docs/guide-user/network/traffic-shaping/...

Unfortunately every modem I've seen, including "business" models, doesn't use CODEL.

josteink · on July 3, 2018

As someone who has meddled with QOS in the past, this page didn't really feel very concrete to me.

Especially regarding SQM... Is SQM just a new name for QOS or am I missing something?

dtaht99 · on July 3, 2018

There were long debates about continuing to use the term QOS for what we did with htb + fq_codel. From: https://www.bufferbloat.net/projects/cerowrt/wiki/Smart_Queu...

“SQM” is shorthand for an integrated network system that performs better per-packet/per flow network scheduling, active queue length management (AQM), traffic shaping/rate limiting, and QoS (prioritization).

“Classic” QoS does prioritization only.

“Classic” AQM manages queue lengths only.

“Classic” packet scheduling does some form of fair queuing only.

“Classic” traffic shaping and policing sets hard limits on queue lengths and transfer rates

“Classic” rate limiting sets hard limits on network speeds.

It has become apparent that in order to ensure a good internet experience all of these techniques need to be combined and used as an integrated whole, and also represented as such to end-users."

Rapzid · on July 3, 2018

QoS Classic.. A total branding excercise. All of these strategies were deployed in the past under the umbrella term "QoS".

rusk · on July 3, 2018

Isn't QoS a more general qualitative, perceived phenomenon? SQM sounds like it relates to something you'd do to improve QoS but then only at one link in the chain, at one layer.

dtaht99 · on July 3, 2018

I wish! QoS could have been a good term to keep using if the existing deployments of it on the Internet it wasn't hopelessly mapped generally to mere packet prioritization (diffserv) which doesn't actually work on today's internet.

QoE is a better, less overloaded, in the "qualitative, perceived" sort of description.

So (after endless debates) - we defined SQM as as a superset of classic as-defined-for-internet QoS and hoped for the best. (https://www.bufferbloat.net/projects/cerowrt/wiki/Smart_Queu... )

There's been plethora of other trade names for what we do with htb+fq_codel - streamboost, adaptive qos, etc.

I like that eero, edgerouter, and openwrt and derivatives also call what we do sqm. It simplifies the discussion, and the core scripts for linux generically are available as the sqm-scripts on github.

I also hope we see more RFPs specifying RFC8290.

rusk · on July 3, 2018

> deployments of it on the Internet it wasn't hopelessly mapped generally to mere packet prioritization (diffserv)

Sorry yeah, I forgot about how the term has been mangled over the years. I was coming from a 3GPP perspective where the term is specifically defined as experience [0]

In particular:

* only the QoS perceived by end-user matter

* QoS definitions have to be future proof;

* QoS has to be provided end-to-end.

* QoS attributes (or mapping of them) should not be restricted to one or few external QoS control mechanisms

Not the first time the ivory tower of Telecomms has been out of touch with the outside world ...

[0] 4.1 (p. 7) https://www.etsi.org/deliver/etsi_ts/123100_123199/123107/05...

magoon · on July 3, 2018

I cured bhfferbloat with a cheap DD-WRT capable router (NOT Puma 6 chipset). I found it important to limit the WAN bandwidth on my router to about 15% below the max speed I’m paying for. What a difference!

Netgear R6300V2

Using fq_codel because the others use more CPU than I’m confident these devices can handle.

I find that most people suffer from bufferbloat unknowingly, having bought fancy routers 5-10 years ago.

theogravity · on July 3, 2018

If you use Opnsense, I've written a tutorial on how you can configure it to reduce bufferbloat.

https://forum.opnsense.org/index.php?topic=7423.0

berkeleyjunk · on July 3, 2018

If you are interested in the background for this work I would greatly recommend the following two articles from ACM queue

BufferBloat: What's Wrong with the Internet? - discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys

https://queue.acm.org/detail.cfm?id=2076798

Bufferbloat: Dark Buffers in the Internet - Jim Gettys & Kathleen Nichols

https://queue.acm.org/detail.cfm?id=2071893

dtaht99 · on July 3, 2018

That's good basics, but it's all fixed now. Jim's latest blog post goes into exausting detail.

https://gettys.wordpress.com/category/bufferbloat/

JdeBP · on July 4, 2018

See also John Nagle on the subject:

* https://news.ycombinator.com/item?id=17455276

criddell · on July 3, 2018

The Security Now podcast covered this as well:

https://www.grc.com/sn/sn-345.htm

gwillen · on July 3, 2018

I have a home internet connection that is a resold AT&T U-Verse connection. I doubt any of these fixes are available to me -- the extremely user-hostile equipment provides essentially no user-configurable options, and I have been told it has no switched mode, not even a secret one. So there's no way I can introduce my own router hardware unless I want to be double-NATted.

(Also, it has a broken caching DNS server, and forces that broken server into its DHCP responses; the server it returns is not user-configurable.)

So what's an ordinary consumer to do?

ra1n85 · on July 3, 2018

>Twiddling with QoS might help, but a faster internet connection probably won’t help at all.

The key issue here is contention, or a busy egress interface - that's when buffering occurs. I do not understand why adding capacity, or reducing the probability of a busy egress interface, "won't help at all".

tfha · on July 3, 2018

The torrenting app will just consume all the extra space, leaving just as little for gaming as there was before.

dtaht99 · on July 3, 2018

totally untrue. fq_codel manages normal torrents just fine in the presence of gamer-style traffic. We tested against torrents in fixing bufferbloat a lot! Pure aqm systems like pie (in docsis 3.1) do pretty well also. cake (per host fq) can make even insane amounts of torrenting (or slashdotting!) bearable.

tgb · on July 3, 2018

I think you misunderstood the above poster who was talking about the case without SQM (or I'm misunderstanding things because I've never heard any of these terms before now).

meta_AU · on July 3, 2018

Anyone trying to work out if Mikrotik has some equivalent of this SQM, looks like no. Well, not explicitly fq_codel or any extensions. Though using the sfq queue system with traffic shaping seems to give the same improvement to bufferbloat.

dsiemon · on July 5, 2018

It really is amazing the difference that modern AQMs like FQ-CoDel make. This is a bit of self promotion but we leverage FQ-CoDel in our product (https://www.preseem.com) to help ISPs provide a better experience to their subscribers. Our customers regularly pass along anecdotes from subscribers who are very happy that they can now do big downloads or heavy streaming without breaking interactive applications like gaming.

snikeris · on July 3, 2018

News to me. dslreports reported a C for bufferbloat. My router (eero) has a "labs" section that had SQM disabled. After enabling, dslreports is reporting an A.

amelius · on July 3, 2018

Curious: is it possible already in Linux to give the highest bandwidth to the currently active application? And will routers play along?

supertrope · on July 3, 2018

Careful, logically this means giving BitTorrent #1 priority when brought to the top.

stuaxo · on July 3, 2018

Is there a list of routers and their support for the different queuing methods (and how they are setup by default) ?

dtaht99 · on July 3, 2018

DOCSIS 3.1 devices mandate pie, which helps a lot, but it's not as good as fq_codel, nor do they do shaping from the isp, which is kind of needed for all the cable links in the USA I've tried. Get one if you can, though, they are better across the board in many other ways.

pfsense has fq_codel.

Anything (1000s of routers) from lede/openwrt has the most advanced bufferbloat-fighting stuff in it, followed by dd-wrt, tomato, etc. If you need high bandwidths the multi-core arms are the best. fq_codel is pre-configured on all links (ethernet/usb/fiber/wifi/whatever) but if you need shaping to the ISP provided rate you need to configure it. All the research that went into fixing bufferbloat queuing problems everywhere landed in openwrt and lede first. Most of the research that improved tcp everywhere came out of google.

Most gaming routers now sold commercially have some variant of fq_codel in them in their trade name ISP "qos" system.

Also fq_codel derived anti-bufferbloat work has landed in many commercial wifi routers on the wifi side (eero, google wifi, some ubnt products, meraki, many others). The paper behind all that was: https://arxiv.org/pdf/1703.00064.pdf - happily that work was "good enough" to enable by default, and boy, does it make a difference if wifi is your bottleneck.

The current premier dsl router with cake is evenroute. I think there are several new models from several manufacturers that are going to get it right, soon.

Not tracking FIOS (gpon fiber) closely at the moment. Yes, fiber networks have bufferbloat, but it's harder to hit, and generally smaller than on dsl and cable technologies. I configured cake on sonic fiber recently and got 60ms back. (going from 60ms latency under load to 3ms )

Regrettably shaper setup is finicky and requires a few minutes of testing with a site like dslreports or a tool like flent.org to get right. If more ISPs published their shapers' bitrate and burst rate settings, life would be easier here... but the hope has always been they'd just ship a router with this stuff on and remotely configured to be "right".

In some ways, what are you doing about bufferbloat is

dtaht99 · on July 3, 2018

Let's see. The turris omnia is a very good router (but only available in europe). For oomph (gbit shaping) people often leverage lede on a pcengines apu2 or run a full distro of pfsense or linux on it.

doctorsher · on July 3, 2018

In my experience, changing the default DNS servers, enabling sch_cake, and minimizing shared spectrum interference are the most significant improvements for a home WiFi connection. Can anyone think of an additional dimension for improvement, besides upgrading the link itself?

wtallis · on July 3, 2018

> Can anyone think of an additional dimension for improvement, besides upgrading the link itself?

Depending on the hardware, you can upgrade the link itself with newer, smarter WiFi drivers. After pretty much solving the bufferbloat problem for wired connections, many of the same developers moved on to fixing WiFi, and some fruits of that effort are currently available in OpenWRT and LEDE.

doctorsher · on July 3, 2018

By 'enabling sch_cake', I meant that I use sch_cake on my wireless router (which I flashed with LEDE). Sorry, I was a bit light on the details.

philg_jr · on July 3, 2018

Anyone have any suggestions for pfSense? I played around with the traffic shaper, setting the scheduler type to CODELQ and limiting bandwidth to 95%, but it doesn't seem to do much from what I can tell while testing with the speedtest on dslreports.

dtaht99 · on July 3, 2018

85-90% is a better starting point.

tgb · on July 3, 2018

Wow the manual test is very convincing: running fast.com's speedtest while pinging google makes the time increase from <30ms to over 1000ms! It's taking 1.5 seconds to ping google? I had no idea this could be happening.

dtaht99 · on July 3, 2018

It's been happening to everyone. Spread the word.

scruffyherder · on July 3, 2018

Residential internet in Hong Kong can go as high as 10gig.

I don't see anything about links anywhere around 10gig or higher, if I wanted this on my 100gig backbone or 40gig metro links.

dtaht99 · on July 3, 2018

Bufferbloat happens on high speed links like those but amount of bloat you see is in the 30ms - 60ms range (vs seconds(!!) on home links). Bloat happens mostly (aside from microbursts) on overloaded links - and high speed backbones are typically overprovisioned so the problem only shows up when there's an outage or fiber cut. Example of what happens today on a fiber cut: http://blog.cerowrt.org/post/bufferbloat_on_the_backbone/

IF things like fq_codel were deployed on those we'd not see latencies climb that much at all, we'd see bandwidths decline to the actual capacity available - and only the biggest flows would be hit to do so.

FQ_codel is lightweight enough to fit directly in high speed hardware and it does indeed run on 40gigE plus devices on linux, ddpk, BSD, in software. But it takes a long time for new chipsets to adopt new algorithms even if they incorporate support for deeply desirable features like ECN.

That said... 10Gbit to the home.... ooohhhhh. it's really hard to bloat that!

petermcneeley · on July 3, 2018

How is priority distinguished between two TCP streams? I thought that QoS basically didnt exist at the internet level of the network.

dtaht99 · on July 3, 2018

Priority is the wrong way to think about it. Given all the sources of bursts on the internet today, fair queuing (or "flow queuing") has become the way to turn flows back into packets.

there's an awful lot of lit on FQ, what we do with fq_codel is to not only interleave packets better but apply congestion control signals at the right time so competing tcp flows don't overwhelm the link (with under 10ms of buffering (v seconds common on fifo ISP links)).

https://en.wikipedia.org/wiki/Fair_queuing

Of course, being perfectly fair to flows is sometimes undesirable, but making something strictly higher priority[1] is fraught with peril as you end up with a classification nightmare.

Having fq gives you the best shot at smaller flows completing sooner, and of big flows sharing better with each other.

Having vastly reduced buffering improves the responsiveness of competing TCP flows a lot, grabbing bandwidth whenever available, faster.

My take on folk that want "prioritization" is ask them to try some variant of sqm with just fq and codel and get back to us. being fair with well managed buffers works really well.[2]

[1] making something lower priority than best effort is actually a good idea. [2] but if you really want some flows or devices prioritized, see the sch_cake work mentioned on this thread. I still tend to think per host FQ is what many want rather than attempting to raise the priority of certain flows from certain services.

https://arxiv.org/abs/1804.07617

petermcneeley · on July 3, 2018

I have been away from this field from some time. The Cake seems to solve the problem with machines but not the problem with the same machine having streams with different latency/bw requirements. The priority way of solving this problem is round robin on packet priority and priority limit for machine/user (different queues per discritized priority level meaning no bufferbloat for high priority). The primary issue with this solution is that it would require the packets to be labeled.

dtaht99 · on July 3, 2018

yep. solved - for 6 years in the sqm-scripts and now in cake. (not solved, in docsis-pie)

We use diffserv for this, for apps willing to use it. Example: ssh sets the imm diffserv bit for interactive use. cake respects that (I've cited the relevant paper elsewhere, another place is https://www.bufferbloat.net/projects/codel/wiki/CakeTechnica... but after extensive testing we settled on 3, rather than four tiers of priority)

stuff derived from the sqm-scripts use the same method (using htb + fq_codel) but the problem has always been that diffserv is not respected end to end. However, within your network, you can make your intention known and have it work, if you have the bottleneck.

Also, we have always made the latency/bandwidth tradeoff explicit - if you want less latency, you must want less bandwidth. It's the only safe answer to apps gaming the diffserv markings.

petermcneeley · on July 3, 2018

Ya in gaming we usually have 2 packets. Synchronization that must occur and then visual fidelity sync. First one is small bw and less latency. The others can even be dropped with some minor visual desyncing.

mekhami · on July 3, 2018

Can someone explain what I can do about this to a networking/hardware layman?

vanburen · on July 3, 2018

This video gives an overview of setting up the EdgeRouter X: https://www.youtube.com/watch?v=o-g2P3R84dw

k__ · on July 3, 2018

Hm, I only get A and A+ results.

But I'm on cabel, not DSL.

isostatic · on July 3, 2018

Do you want to have jitter, or do you want loss?

In reality I don't see buffer bloat on the internet adding jitter of more than a few milliseconds. I do see loss though or 20, 50, even 100ms. I'd rather have 50ms of jitter than 50ms of loss, but that's just my application.

Arie · on July 3, 2018

Why would you have to choose either?

fq_codel and cake use a tiny bit of packet loss to get a sending host to back off, for example to keep a large download flow within the limits of your home link. Other flows aren't affected.

Bufferbloat regularly adds hundreds of ms on home internet connections, you can get an indication of your bufferbloat on http://www.dslreports.com/speedtest

isostatic · on July 3, 2018

Not on my connection. I would rather my UDP packet arrive 30ms late than not arrive - especially on high latency links where I want to process the packet before a 300ms round trip nack/retransmit has a chance to work.

I don't see any buffer bloat or excessive jitter on my home internet (at least on wired connection) on BT ftth.

tgb · on July 3, 2018

Note that the bufferbloat choice for me is whether or not I want my packets to arrive 1.5 seconds late, not 30ms.

doctorsher · on July 3, 2018

Anecdotally, I've seen larger amounts of packet loss and jitter when TCP accelerates faster into a loss event. The small amount of preventative loss reduces both of these values.

dtaht99 · on July 3, 2018

there is no such thing as 50ms of loss. Loss, when your queue is short, is recovered from in an round trip. Bloat on the internet can add seconds of jitter. ( http://www.dslreports.com/speedtest/results/bufferbloat)

But, I too am allergic to loss. fq_codel supports ECN, (explicit congestion notification), which is enabled, now, universally by IOS. As near as I can tell, the ECN usage of 6% (https://www.ietf.org/proceedings/98/slides/slides-98-maprg-t... ) in france is almost entirely from free.fr's deployment of fq_codel which they enabled by default in 2012 (!!!!!!). I had expected all the ISPs to have lept on this by now....

isostatic · on July 3, 2018

Yes there is. I have multiple UDP/RTP streams at different bitrates, from the same IP to the same IP.

I then get a loss on a 30mbit stream (3000 packets a second) of 150 packets. At the exact same time I get a loss on a 20mbit stream of 100 packets, and a 10mbit stream of 50 packets.

This is an outage for 150ms, probably because of a reroute in an MPLS network somewhere.

My packets have already been emmitted by the time any round trip resend would have come back.

chacham15 · on July 3, 2018

Tcp needs packets to be dropped quickly to get the feedback that your link can't handle the speed. Without that, it will appear that the speed of the network is fluctuating (as the buffer fills it appears your network is faster than it is, when the buffer is full it will start dropping packets which causes tcp to back off which causes the buffer to drain, now if tcp starts sending data before the buffer is empty you'll get a different apparent speed to tcp) and it is that fluctuation which causes the issues.