From 2acfd5c462f0efc6ef7570390cee57ffa982eb30 Mon Sep 17 00:00:00 2001 From: Daniel Micay Date: Sun, 27 Dec 2020 03:31:22 -0500 Subject: [PATCH] add server traffic shaping article --- static/articles/server-traffic-shaping.html | 210 ++++++++++++++++++++ static/sitemap.xml | 4 + 2 files changed, 214 insertions(+) create mode 100644 static/articles/server-traffic-shaping.html diff --git a/static/articles/server-traffic-shaping.html b/static/articles/server-traffic-shaping.html new file mode 100644 index 00000000..611c18ea --- /dev/null +++ b/static/articles/server-traffic-shaping.html @@ -0,0 +1,210 @@ + + + + + Server traffic shaping | GrapheneOS + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+

Server traffic shaping

+ +

This article covers implementing server traffic shaping on Linux with CAKE. The aim + is to provide fair usage of bandwidth between clients and consistently low latency + for dedicated and virtual servers provided by companies like OVH and others.

+ +

Traffic shaping is generally discussed in the context of a router shaping traffic + for a local network with assorted clients connected. It also has a lot to offer on a + server where you don't control the network. If you control your own infrastructure + from the server to the ISP, you probably want to do this on the routers instead.

+ +

This article was motivated by the serious lack of up-to-date information on this + topic elsewhere. It's very easy to implement on modern Linux kernels and the results + and impressive from extremely simple test cases to heavily loaded servers.

+ +

We'll be improving this article with more details and results. It's documentation + we intend to maintain and expand, not a fire and forget blog post.

+ +
+

Problem

+ +

A server will generally be provisioned with a specific amount of bandwidth + enforced by a router in close proximity. This router acts as the bottleneck and + ends up being in charge of most of the queuing and congestion decisions. Unless + that's under your control, the best you can hope for is that the router is + configured to use fq_codel as the queuing discipline (qdisc) to + provide fair queuing between streams and low latency by preventing a substantial + backlog of data.

+ +

Unfortunately, the Linux kernel still defaults to pfifo_fast + instead of the much saner fq_codel algorithm. This is changed by a + configuration file shipped with systemd, so most distributions using + systemd as init end up with a sane default. Debian removes that configuration and + doesn't set a sane default itself, and is widely used. Many server providers like + OVH do not appear to use consistently use modern queue disciplines like + fq_codel within their networks, particularly at artificial + bottlenecks implementing rate limiting based on product tiers.

+ +

If the bottleneck doesn't use fair queuing, division of bandwidth across + streams is very arbitrary and latency suffers under congestion. These issues are + often referred to as bufferbloat, and fq_codel is quite good at + resolving it.

+ +

The fq_codel algorithm is far from perfect. It has issues with + hash collisions and more importantly only does fair queuing between streams. + Buffer bloat also isn't the only relevant issue. Clients with multiple connections + receive more bandwidth and a client can open a large number of connections to + maximize their bandwidth usage at the expense of others. Fair queuing is important + beyond as a solution to bufferbloat and there's more to fair queuing than doing it + only based on streams.

+ +

Traditionally, web browsers open a bunch of HTTP/1.1 connections to each server + which ends up giving them an unfair amount of bandwidth. HTTP/2 is much friendlier + since it uses a single connection to each server for the entire browser. Download + managers take this to the extreme and intentionally use many connections to bypass + server limits and game the division of resources between clients.

+
+ +
+

Solution

+ +

Linux 4.19 and later makes it easy to solve all of these problems. The CAKE + queuing discipline provides sophisticated fair queuing based on destination and + source addresses with finer-grained fairness for individual streams.

+ +

Unfortunately, simply enabling it as your queuing discipline isn't enough + since it's highly unlikely that your server is the network bottleneck. You need to + configure it with a bandwidth limit based on the provisioned bandwidth to move the + bottleneck under your control where you can control how traffic is queued.

+
+ +
+

Results

+ +

We've used an 100mbit OVH server for as a test platform for a case where + clients can easily max out the server bandwidth on their own. As a very simple + example, consider 2 clients with more than 100mbit of bandwidth each downloading a + large file. These are (rounded) real world results with CAKE:

+ +
    +
  • client A with 1 connection gets 50mbit
  • +
  • client B with 10 connections gets 5mbit each adding up to 50mbit
  • +
+ +

CAKE with flows instead of the default triple-isolate to + mimic fq_codel at a bottleneck:

+ +
    +
  • client A with 1 connection gets 9mbit
  • +
  • client B with 10 connections gets 9mbit each adding up to 90mbit
  • +
+ +

The situation without traffic shaping is a mess. Latency takes a serious hit + that's very noticeable via SSH. Bandwidth is consistently allocated very unevenly + and ends up fluctuating substantially between test runs. The connections tend to + settle near a rate, often significantly lower or higher than the fair 9mbit + amount. It's generally something like this, but the range varies a lot:

+ +
    +
  • client A with 1 connection gets ~6mbit to ~14mbit
  • +
  • client B with 10 connections gets ~6mbit to ~14mbit each adding up to ~86mbit + to ~94mbit
  • +
+ +

CAKE continues working as expected with a far higher number of connections. It + technically has a higher CPU cost than fq_codel, but that's much more + of a concern for low end router hardware. It hardly matters on a server, even one + that's under heavy CPU load. The improvement in user experience is substantial and + it's very noticeable in web page load speeds when a server is under load.

+
+ +
+

Implementation

+ +

For a server with 2000mbit of bandwidth provisioned, you could start by trying + it with 99.75% of the provisioned bandwidth:

+ +
tc qdisc replace dev eth0 root cake bandwidth 1995mbit besteffort
+ +

This also sets besteffort for the common case where the server + doesn't have Quality of Service markings via Diffserv. If you actually have + traffic marked as bulk, video or voice to differentiate it consider changing that. + On a server, setting it to use 100% of the provisioned bandwidth may work fine in + practice. Unlike a local network connected to a consumer ISP, you shouldn't need + to sacrifice anywhere close to the typically recommended 5-10% of your bandwidth + for traffic shaping.

+ +

You can use the tc -s qdisc command to monitor CAKE:

+ +
tc -s qdisc show dev eth0
+ +

If you want to keep an eye on how it changes over time:

+ +
watch -n 1 tc -s qdisc show dev eth0
+ +

This is very helpful for figuring out if you've successfully moved the + bottleneck to the server. If the bandwidth is being fully used, it should + consistently have a backlog of data where it's applying the queuing discipline. + The backlog shouldn't be draining to near zero under full bandwidth usage as that + indicates the bottleneck is the server application itself or a different network + bottleneck.

+
+ +
+

Future

+ +

Ideally, data centers would deploy CAKE within their networks. This may mean + they need to use more powerful hardware for routing. If the natural bottlenecks + used CAKE, setting up traffic shaping on the server wouldn't be necessary. This + doesn't seem likely any time soon. Deploying fq_codel is much more + realistic and tackles buffer bloat but not the issue of fairness between hosts + rather than only streams.

+
+
+ + + diff --git a/static/sitemap.xml b/static/sitemap.xml index dcdc9971..1aaf062d 100644 --- a/static/sitemap.xml +++ b/static/sitemap.xml @@ -58,4 +58,8 @@ https://grapheneos.org/legal/Micay_%20Copperhead_%20Statement%20of%20Defendant%20and%20Counterclaim.pdf 0.5 + + https://grapheneos.org/articles/server-traffic-shaping + 0.5 +