
Chromium already supports dynamic edge-to-edge viewports. This change opts-in by default, making the gesture navigation bar (chin) invisible without needing scroll interaction. No other changes were necessary, as no content relied on specific viewport insets. Command used: ``` sed -i 's/<meta name="viewport" content="width=device-width, initial-scale=1"\/>/<meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"\/>/g' **/*.html ```
260 lines
16 KiB
HTML
260 lines
16 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en" prefix="og: https://ogp.me/ns#">
|
|
<head>
|
|
<meta charset="utf-8"/>
|
|
<title>Server traffic shaping | Articles | GrapheneOS</title>
|
|
<meta name="description" content="Implementing server traffic shaping on Linux with CAKE."/>
|
|
<meta name="theme-color" content="#212121"/>
|
|
<meta name="color-scheme" content="dark light"/>
|
|
<meta name="msapplication-TileColor" content="#ffffff"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/>
|
|
<meta name="twitter:site" content="@GrapheneOS"/>
|
|
<meta name="twitter:creator" content="@GrapheneOS"/>
|
|
<meta property="og:title" content="Server traffic shaping"/>
|
|
<meta property="og:description" content="Implementing server traffic shaping on Linux with CAKE."/>
|
|
<meta property="og:type" content="website"/>
|
|
<meta property="og:image" content="https://grapheneos.org/opengraph.png"/>
|
|
<meta property="og:image:width" content="512"/>
|
|
<meta property="og:image:height" content="512"/>
|
|
<meta property="og:image:alt" content="GrapheneOS logo"/>
|
|
<meta property="og:site_name" content="GrapheneOS"/>
|
|
<meta property="og:url" content="https://grapheneos.org/articles/server-traffic-shaping"/>
|
|
<link rel="canonical" href="https://grapheneos.org/articles/server-traffic-shaping"/>
|
|
<link rel="icon" href="/favicon.ico"/>
|
|
<link rel="icon" sizes="any" type="image/svg+xml" href="/favicon.svg"/>
|
|
<link rel="mask-icon" href="[[path|/mask-icon.svg]]" color="#1a1a1a"/>
|
|
<link rel="apple-touch-icon" href="/apple-touch-icon.png"/>
|
|
[[css|/main.css]]
|
|
<link rel="manifest" href="/manifest.webmanifest"/>
|
|
<link rel="license" href="/LICENSE.txt"/>
|
|
<link rel="me" href="https://grapheneos.social/@GrapheneOS"/>
|
|
</head>
|
|
<body>
|
|
{% include "header.html" %}
|
|
<main id="server-traffic-shaping">
|
|
<h1><a href="#server-traffic-shaping">Server traffic shaping</a></h1>
|
|
|
|
<p>This article covers implementing server traffic shaping on Linux with CAKE. The aim
|
|
is to provide fair usage of bandwidth between clients and consistently low latency
|
|
for dedicated and virtual servers provided by companies like OVH and others.</p>
|
|
|
|
<p>Traffic shaping is generally discussed in the context of a router shaping traffic
|
|
for a local network with assorted clients connected. It also has a lot to offer on a
|
|
server where you don't control the network. If you control your own infrastructure
|
|
from the server to the ISP, you probably want to do this on the routers instead.</p>
|
|
|
|
<p>This article was motivated by the serious lack of up-to-date information on this
|
|
topic elsewhere. It's very easy to implement on modern Linux kernels and the results
|
|
are impressive from extremely simple test cases to heavily loaded servers.</p>
|
|
|
|
<section id="problem">
|
|
<h2><a href="#problem">Problem</a></h2>
|
|
|
|
<p>A server will generally be provisioned with a specific amount of bandwidth
|
|
enforced by a router in close proximity. This router acts as the bottleneck and
|
|
ends up being in charge of most of the queuing and congestion decisions. Unless
|
|
that's under your control, the best you can hope for is that the router is
|
|
configured to use <code>fq_codel</code> as the queuing discipline (qdisc) to
|
|
provide fair queuing between streams and low latency by preventing a substantial
|
|
backlog of data.</p>
|
|
|
|
<p>Unfortunately, the Linux kernel still defaults to <code>pfifo_fast</code>
|
|
instead of the much saner <code>fq_codel</code> algorithm. This is changed by a
|
|
configuration file shipped with systemd, so <em>most</em> distributions using
|
|
systemd as init end up with a sane default. Debian removes that configuration and
|
|
doesn't set a sane default itself, and is widely used. Many server providers like
|
|
OVH do not appear to consistently use modern queue disciplines like
|
|
<code>fq_codel</code> within their networks, particularly at artificial
|
|
bottlenecks implementing rate limiting based on product tiers.</p>
|
|
|
|
<p>If the bottleneck doesn't use fair queuing, division of bandwidth across
|
|
streams is very arbitrary and latency suffers under congestion. These issues are
|
|
often referred to as bufferbloat, and <code>fq_codel</code> is quite good at
|
|
resolving it.</p>
|
|
|
|
<p>The <code>fq_codel</code> algorithm is far from perfect. It has issues with
|
|
hash collisions and more importantly only does fair queuing between streams.
|
|
Buffer bloat also isn't the only relevant issue. Clients with multiple connections
|
|
receive more bandwidth and a client can open a large number of connections to
|
|
maximize their bandwidth usage at the expense of others. Fair queuing is important
|
|
beyond as a solution to bufferbloat and there's more to fair queuing than doing it
|
|
only based on streams.</p>
|
|
|
|
<p>Traditionally, web browsers open a bunch of HTTP/1.1 connections to each server
|
|
which ends up giving them an unfair amount of bandwidth. HTTP/2 is much friendlier
|
|
since it uses a single connection to each server for the entire browser. Download
|
|
managers take this to the extreme and intentionally use many connections to bypass
|
|
server limits and game the division of resources between clients.</p>
|
|
</section>
|
|
|
|
<section id="solution">
|
|
<h2><a href="#solution">Solution</a></h2>
|
|
|
|
<p>Linux 4.19 and later makes it easy to solve all of these problems. The CAKE
|
|
queuing discipline provides sophisticated fair queuing based on destination and
|
|
source addresses with finer-grained fairness for individual streams.</p>
|
|
|
|
<p> Unfortunately, simply enabling it as your queuing discipline isn't enough
|
|
since it's highly unlikely that your server is the network bottleneck. You need to
|
|
configure it with a bandwidth limit based on the provisioned bandwidth to move the
|
|
bottleneck under your control where you can control how traffic is queued.</p>
|
|
</section>
|
|
|
|
<section id="results">
|
|
<h2><a href="#results">Results</a></h2>
|
|
|
|
<p>We've used an 100mbit OVH server for as a test platform for a case where
|
|
clients can easily max out the server bandwidth on their own. As a very simple
|
|
example, consider 2 clients with more than 100mbit of bandwidth each downloading a
|
|
large file. These are (rounded) real world results with CAKE:</p>
|
|
|
|
<ul>
|
|
<li>client A with 1 connection gets 50mbit</li>
|
|
<li>client B with 10 connections gets 5mbit each adding up to 50mbit</li>
|
|
</ul>
|
|
|
|
<p>CAKE with <code>flows</code> instead of the default <code>triple-isolate</code> to
|
|
mimic <code>fq_codel</code> at a bottleneck:</p>
|
|
|
|
<ul>
|
|
<li>client A with 1 connection gets 9mbit</li>
|
|
<li>client B with 10 connections gets 9mbit each adding up to 90mbit</li>
|
|
</ul>
|
|
|
|
<p>The situation without traffic shaping is a mess. Latency takes a serious hit
|
|
that's very noticeable via SSH. Bandwidth is consistently allocated very unevenly
|
|
and ends up fluctuating substantially between test runs. The connections tend to
|
|
settle near a rate, often significantly lower or higher than the fair 9mbit
|
|
amount. It's generally something like this, but the range varies a lot:</p>
|
|
|
|
<ul>
|
|
<li>client A with 1 connection gets ~6mbit to ~14mbit</li>
|
|
<li>client B with 10 connections gets ~6mbit to ~14mbit each adding up to ~86mbit
|
|
to ~94mbit</li>
|
|
</ul>
|
|
|
|
<p>CAKE continues working as expected with a far higher number of connections. It
|
|
technically has a higher CPU cost than <code>fq_codel</code>, but that's much more
|
|
of a concern for low end router hardware. It hardly matters on a server, even one
|
|
that's under heavy CPU load. The improvement in user experience is substantial and
|
|
it's very noticeable in web page load speeds when a server is under load.</p>
|
|
</section>
|
|
|
|
<section id="implementation">
|
|
<h2><a href="#implementation">Implementation</a></h2>
|
|
|
|
<p>For a server with 2000mbit of bandwidth provisioned, you could start by trying
|
|
it with 99.75% of the provisioned bandwidth:</p>
|
|
|
|
<pre>tc qdisc replace dev eth0 root cake bandwidth 1995mbit besteffort</pre>
|
|
|
|
<p>On a server, setting it to use 100% of the provisioned bandwidth may work fine
|
|
in practice. Unlike a local network connected to a consumer ISP, you shouldn't
|
|
need to sacrifice anywhere close to the typically recommended 5-10% of your
|
|
bandwidth for traffic shaping.</p>
|
|
|
|
<p>This also sets <code>besteffort</code> for the common case where the server
|
|
doesn't have appropriate Quality of Service markings set up via Diffserv. Fair
|
|
scheduling is already great at providing low latency by cycling through the hosts
|
|
and streams without needing this kind of configuration. The defaults for Diffserv
|
|
traffic classes like real-time video are set up to yield substantial bandwidth in
|
|
exchange for lower latency. It's easy to set this up wrong and it usually won't
|
|
make much sense on a server. You might want to set up marking low priority traffic
|
|
like system updates, but it will already get a tiny share of the overall traffic
|
|
on a loaded server due to fair scheduling between hosts and streams.</p>
|
|
|
|
<p>You can use the <code>tc -s qdisc</code> command to monitor CAKE:</p>
|
|
|
|
<pre>tc -s qdisc show dev eth0</pre>
|
|
|
|
<p>If you want to keep an eye on how it changes over time:</p>
|
|
|
|
<pre>watch -n 1 tc -s qdisc show dev eth0</pre>
|
|
|
|
<p>This is very helpful for figuring out if you've successfully moved the
|
|
bottleneck to the server. If the bandwidth is being fully used, it should
|
|
consistently have a backlog of data where it's applying the queuing discipline.
|
|
The backlog shouldn't be draining to near zero under full bandwidth usage as that
|
|
indicates the bottleneck is the server application itself or a different network
|
|
bottleneck.</p>
|
|
|
|
<p>If you use systemd-network, you can add a CAKE configuration section to the
|
|
network configuration file instead of manually running the <code>tc</code> command
|
|
with a <code>Type=oneshot</code> service on boot:</p>
|
|
|
|
<pre>[CAKE]
|
|
Bandwidth=1995M
|
|
PriorityQueueingPreset=besteffort</pre>
|
|
</section>
|
|
|
|
<section id="quicker-backpressure-propagation">
|
|
<h2><a href="#quicker-backpressure-propagation">Quicker backpressure propagation</a></h2>
|
|
|
|
<p>The Linux kernel can be tuned to more quickly propagate TCP backpressure up to
|
|
applications while still maximizing bandwidth usage. This is incredibly useful for
|
|
interactive applications aiming to send the freshest possible copy of data and for
|
|
protocols like HTTP/2 multiplexing streams/messages with different priorities over
|
|
the same TCP connection. This can also substantially reduce memory usage for TCP
|
|
by reducing buffer sizes closer to the optimal amount for maximizing bandwidth
|
|
use without wasting memory. The downside to quicker backpressure propagation is
|
|
increased CPU usage from additional system calls and context switches.</p>
|
|
|
|
<p>The Linux kernel automatically adjusts the size of the write queue to maximize
|
|
bandwidth usage. The write queue is divided into unacknowledged bytes (TCP window
|
|
size) and unsent bytes. As acknowledgements of transmitted data are received, it
|
|
frees up space for the application to queue more data. The queue of unsent bytes
|
|
provides the leeway needed to wake the application and obtain more data. This can
|
|
be reduced using <code>net.ipv4.tcp_notsent_lowat</code> to reduce the default and
|
|
the <code>TCP_NOTSENT_LOWAT</code> socket option to override it per-socket.</p>
|
|
|
|
<p>A reasonable choice for internet-based workloads concerned about latency and
|
|
particularly prioritization within TCP connections but unwilling to sacrifice
|
|
throughput is 128kiB. To configure this, set the following in
|
|
<code>/etc/sysctl.d/local.conf</code> or another sysctl configuration file and
|
|
load it with <code>sysctl --system</code>:</p>
|
|
|
|
<pre>net.ipv4.tcp_notsent_lowat = 131072</pre>
|
|
|
|
<p>Using values as low as 16384 can make sense to further improve latency and
|
|
prioritization. However, it's more likely to negatively impact throughput and will
|
|
further increase CPU usage. Use at least 128k or the default of not limiting the
|
|
automatic unsent buffer size unless you're going to do substantial testing to make
|
|
sure there's not a negative impact for the workload.</p>
|
|
|
|
<p>If you decide to use <code>tcp_notsent_lowat</code>, be aware that newer Linux
|
|
kernels (Linux 5.0+ with a further improvement for Linux 5.10+) are recommended to
|
|
substantially reduce system calls / context switches by not triggering the
|
|
application to provide more data until over half the unsent byte buffer is
|
|
empty.</p>
|
|
</section>
|
|
|
|
<section id="high-link-speed">
|
|
<h2><a href="#high-link-speed">High link speed</a></h2>
|
|
|
|
<p>By default, CAKE splits General Segmentation Offload (GSO) super-packets to
|
|
reduce latency at the expense of CPU efficiency and throughput. This can create a
|
|
bottleneck at high link speeds. We've had to disable this on the 2Gbit GrapheneOS
|
|
update servers.</p>
|
|
|
|
<pre>[CAKE]
|
|
Bandwidth=1995M
|
|
PriorityQueueingPreset=besteffort
|
|
SplitGSO=false</pre>
|
|
</section>
|
|
|
|
<section id="future">
|
|
<h2><a href="#future">Future</a></h2>
|
|
|
|
<p>Ideally, data centers would deploy CAKE throughout their networks with the
|
|
default <code>triple-isolate</code> flow isolation. This may mean they need to use
|
|
more powerful hardware for routing. If the natural bottlenecks used CAKE, setting
|
|
up traffic shaping on the server wouldn't be necessary. This doesn't seem likely
|
|
any time soon. Deploying <code>fq_codel</code> is much more realistic and tackles
|
|
buffer bloat but not the issue of fairness between hosts rather than only
|
|
streams.</p>
|
|
</section>
|
|
</main>
|
|
{% include "footer.html" %}
|
|
</body>
|
|
</html>
|