This is an old version of this page.

Go to most recent version Browse history

A caching service is a set of reverse proxies keeping a smaller cache of content in memory to speed up access to resources on a slower backend web server.

WARNING: This service was retired in early 2022 and this documentation is now outdated. It is kept for historical purposes.

Tutorial

To inspect the current cache hit ratio, head over to the cache health dashboard in howto/grafana. It should be at least 75% and generally over or close to 90%.

How-to

Traffic inspection

A quick way to see how much traffic is flowing through the cache is to fire up slurm on the public interface of the caching server (currently cache01 and cache-02):

slurm -i eth0

This will display a realtime graphic of the traffic going in and out of the server. It should be below 1Gbit/s (or around 120MB/s).

Another way to see throughput is to use iftop, in a similar way:

iftop -i eth0 -n

This will show per host traffic statistics, which might allow pinpointing possible abusers. Hit the L key to turn on the logarithmic scale, without which the display quickly becomes unreadable.

Log files are in /var/log/nginx (although those might eventually go away, see ticket #32461). The lnav program can be used to show those log files in a pretty way and do extensive queries on them. Hit the i button to flip to the "histogram" view and z multiple times to zoom all the way into a per-second hit rate view. Hit q to go back to the normal view, which is useful to inspect individual hits and diagnose why they fail to be cached, for example.

Immediate hit ratio can be extracted from lnav thanks to our custom log parser shipped through Puppet. Load the log file in lnav:

lnav /var/log/nginx/ssl.blog.torproject.org.access.log

then hit ; to enter the SQL query mode and issue this query:

SELECT count(*), upstream_cache_status FROM logline WHERE status_code < 300 GROUP BY upstream_cache_status;

See also howto/logging for more information about lnav.

Pager playbook

The only monitoring for this service is to ensure the proper number of nginx processes are running. If this gets triggered, the fix might be to just restart nginx:

service nginx restart

... although it might be a sign of a deeper issue requiring further traffic inspection.

Disaster recovery

In case of fire, head to the torproject.org zone in the dns/domains and flip the DNS record of the affected service back to the backend. See ticket #32239 for details on that.

TODO: disaster recovery could be improved. How to deal with DDOS? Memory, disk exhaustion? Performance issues?

Reference

Installation

Include roles::cache in Puppet.

TODO: document how to add new sites in the cache. See ticket#32462 for that project.

SLA

Service should generally stay online as much as possible, because it fronts critical web sites for the Tor project, but otherwise shouldn't especially differ from other SLA.

Hit ratio should be high enough to reduce costs significantly on the backend.

Design

The cache service generally constitutes of two or more servers in geographically distinct areas that run a webserver acting as a reverse proxy. In our case, we run the Nginx webserver with the proxy module for the https://blog.torproject.org/ website (and eventually others, see ticket #32462). One server is in the howto/ganeti cluster, and another is a VM in the Hetzner Cloud (2.50EUR/mth).

DNS for the site points to cache.torproject.org, an alias for the caching servers, which are currently two: cache01.torproject.org [sic] and cache-02. An HTTPS certificate for the site was issued through howto/letsencrypt. Like the Nginx configuration, the certificate is deployed by Puppet in the roles::cache class.

When a user hits the cache server, content is served from the cache stored in /var/cache/nginx, with a filename derived from the proxy_cache_key and proxy_cache_path settings. Those files should end up being cached by the kernel in virtual memory, which should make those accesses fast. If the cache is present and valid, it is returned directly to the user. If it is missing or invalid, it is fetched from the backend immediately. The backend is configured in Puppet as well.

Requests to the cache are logged to the disk in /var/log/nginx/ssl.$hostname.access.log, with IP address and user agent removed. Then mtail parses those log files and increments various counters and exposes those as metrics that are then scraped by howto/prometheus. We use howto/grafana to display that hit ratio which, at the time of writing, is about 88% for the blog.

Puppet architecture

Because the Puppet code isn't public yet (ticket #29387, here's a quick overview of how we set things up for others to follow.

The entry point in Puppet is the roles::cache class, which configures an "Nginx server" (like an Apache vhost) to do the caching of the backend. It also includes our common Nginx configuration in profile::nginx which in turns delegates most of the configuration to the Voxpupuli Nginx Module.

The role is essentially consists of:

include profile::nginx

nginx::resource::server { 'blog.torproject.org':
  ssl_cert              => '/etc/ssl/torproject/certs/blog.torproject.org.crt-chained',
  ssl_key               => '/etc/ssl/private/blog.torproject.org.key',
  proxy                 => 'https://live-tor-blog-8.pantheonsite.io',
  # no servicable parts below
  ipv6_enable           => true,
  ipv6_listen_options   => '',
  ssl                   => true,
  # part of HSTS configuration, the other bit is in add_header below
  ssl_redirect          => true,
  # proxy configuration
  #
  # pass the Host header to the backend (otherwise the proxy URL above is used)
  proxy_set_header      => ['Host $host'],
  # should map to a cache zone defined in the nginx profile
  proxy_cache           => 'default',
  # start caching redirects and 404s. this code is taken from the
  # upstream documentation in
  # https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_valid
  proxy_cache_valid     => [
    '200 302 10m',
    '301      1h',
    'any 1m',
  ],
  # allow serving stale content on error, timeout, or refresh
  proxy_cache_use_stale => 'error timeout updating',
  # allow only first request through backend
  proxy_cache_lock      => 'on',
  # purge headers from backend we will override. X-Served-By and Via
  # are merged into the Via header, as per rfc7230 section 5.7.1
  proxy_hide_header     => ['Strict-Transport-Security', 'Via', 'X-Served-By'],
  add_header            => {
    # this is a rough equivalent to Varnish's Age header: it caches
    # when the page was cached, instead of its age
    'X-Cache-Date'              => '$upstream_http_date',
    # if this was served from cache
    'X-Cache-Status'            => '$upstream_cache_status',
    # replace the Via header with ours
    'Via'                       => '$server_protocol $server_name',
    # cargo-culted from Apache's configuration
    'Strict-Transport-Security' => 'max-age=15768000; preload',
  },
  # cache 304 not modified entries
  raw_append            => "proxy_cache_revalidate on;\n",
  # caches shouldn't log, because it is too slow
  #access_log            => 'off',
  format_log            => 'cacheprivacy',
}

There are also firewall (to open the monitoring, HTTP and HTTPS ports) and mtail (to read the log fiels for hit ratios) configurations but those are not essential to get Nginx itself working.

The profile::nginx class is our common Nginx configuration that also covers non-caching setups:

# common nginx configuration
#
# @param client_max_body_size max upload size on this server. upstream
#                             default is 1m, see:
#                             https://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size
class profile::nginx(
  Optional[String] $client_max_body_size = '1m',
) {
  include webserver
  class { 'nginx':
    confd_purge           => true,
    server_purge          => true,
    manage_repo           => false,
    http2                 => 'on',
    server_tokens         => 'off',
    package_flavor        => 'light',
    log_format            => {
      # built-in, according to: http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
      # 'combined' => '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

      # "privacy" censors the client IP address from logs, taken from
      # the Apache config, minus the "day" granularity because of
      # limitations in nginx. we remove the IP address and user agent
      # but keep the original request time, in other words.
      'privacy'      => '0.0.0.0 - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "-"',

      # the "cache" formats adds information about the backend, namely:
      # upstream_addr - address and port of upstream server (string)
      # upstream_response_time - total time spent talking to the backend server, in seconds (float)
      # upstream_cache_status - state fo the cache (MISS, HIT, UPDATING, etc)
      # request_time - total time spent answering this query, in seconds (float)
      'cache'        => '$server_name:$server_port $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $upstream_addr $upstream_response_time $upstream_cache_status $request_time',  #lint:ignore:140chars
      'cacheprivacy' => '$server_name:$server_port 0.0.0.0 - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "-" $upstream_addr $upstream_response_time $upstream_cache_status $request_time',  #lint:ignore:140chars
    },
    # XXX: doesn't work because a default is specified in the
    # class. doesn't matter much because the puppet module reuses
    # upstream default.
    worker_rlimit_nofile  => undef,
    accept_mutex          => 'off',
    # XXX: doesn't work because a default is specified in the
    # class. but that doesn't matter because accept_mutex is off so
    # this has no effect
    accept_mutex_delay    => undef,
    http_tcp_nopush       => 'on',
    gzip                  => 'on',
    client_max_body_size  => $client_max_body_size,
    run_dir               => '/run/nginx',
    client_body_temp_path => '/run/nginx/client_body_temp',
    proxy_temp_path       => '/run/nginx/proxy_temp',
    proxy_connect_timeout => '60s',
    proxy_read_timeout    => '60s',
    proxy_send_timeout    => '60s',
    proxy_cache_path      => '/var/cache/nginx/',
    proxy_cache_levels    => '1:2',
    proxy_cache_keys_zone => 'default:10m',
    # XXX: hardcoded, should just let nginx figure it out
    proxy_cache_max_size  => '15g',
    proxy_cache_inactive  => '24h',
    ssl_protocols         => 'TLSv1 TLSv1.1 TLSv1.2 TLSv1.3',
    # XXX: from the apache module see also https://bugs.torproject.org/32351
    ssl_ciphers           => 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS', # lint:ignore:140chars
  }
  # recreate the default vhost
  nginx::resource::server { 'default':
    server_name         => ['_'],
    www_root            => "/srv/www/${webserver::defaultpage::defaultdomain}/htdocs/",
    listen_options      => 'default_server',
    ipv6_enable         => true,
    ipv6_listen_options => 'default_server',
    # XXX: until we have an anonymous log format
    access_log          => 'off',
    ssl                 => true,
    ssl_redirect        => true,
    ssl_cert            => '/etc/ssl/torproject-auto/servercerts/thishost.crt',
    ssl_key             => '/etc/ssl/torproject-auto/serverkeys/thishost.key';
  }
}

There are lots of config settings there, but they are provided to reduce the diff between the upstream debian package and the Nginx module from the forge. This was filed upstream as a bug.

Issues

Only serious issues, or issues that are not in the cache component but still relevant to the service, are listed here:

the cipher suite is an old hardcoded copy derived from Apache, see ticket #32351
the Nginx puppet module diverges needlessly from upstream and Debian package configuration see puppet-nginx-1359

The service was launched as part of improvements to the blog infrastructure, in ticket #32090. The launch checklist and progress was tricket in ticket #32239.

File or search for issues in the services - cache component.

Monitoring and testing

The caching servers are monitored like other servers by the Nagios server. The Nginx cache manager and the blog endpoint are also monitored for availability.

Logs and metrics

Nginx logs are currently kept in a way that violates typical policy (tpo/tpa/team#32461). They do not contain IP addresses, but do contain accurate time records (granularity to the second) which might be exploited for correlation attacks.

Nginx logs are fed into mtail to extract hit rate information, which is exported to Prometheus, which, in turn, is used to create a Grafana dashboard which shows request and hit rates on the caching servers.

Discussion

This section regroups notes that were gathered during the research, configuration, and deployment of the service. That includes goals, cost, benchmarks and configuration samples.

Launch was done in the first week of November 2019 as part of ticket#32239, to front the https://blog.torproject.org/ site.

Overview

The original goal of this project is to create a pair of caching servers in front of the blog to reduce the bandwidth costs we're being charged there.

Goals

Must have

reduce the traffic on the blog, hosted at a costly provider (#32090 (closed))
HTTPS support in the frontend and backend
deployment through Puppet
anonymized logs
hit rate stats

Nice to have

provide a frontend for our existing mirror infrastructure, a home-made CDN for TBB and other releases
no on-disk logs
cute dashboard or grafana integration
well-maintained upstream Puppet module

Approvals required

approved and requested by vegas

Non-Goals

global CDN for users outside of TPO
geoDNS

Cost

Somewhere between 11EUR and 100EUR/mth for bandwidth and hardware.

We're getting apparently around 2.2M "page views" per month at Pantheon. That is about 1 hit per second and 12 terabyte per month, 36Mbit/s on average:

$ qalc
> 2 200 000 ∕ (30d) to hertz

  2200000 / (30 * day) = approx. 0.84876543 Hz

> 2 200 000 * 5Mibyte

  2200000 * (5 * mebibyte) = 11.534336 terabytes

> 2 200 000 * 5Mibyte/(30d) to megabit / s

  (2200000 * (5 * mebibyte)) / (30 * day) = approx. 35.599802 megabits / s

Hetzner charges 1EUR/TB/month over our 1TB quota, so bandwidth would cost 11EUR/month on average. If costs become prohibitive, we could switch to a Hetzner VM which includes 20TB of traffic per month at costs ranging from 3EUR/mth to 30EUR/mth depending on the VPS size (between 1 vCPU, 2GB ram, 20GB SSD and 8vCPU, 32GB ram and 240GB SSD).

Dedicated servers start at 34EUR/mth (EX42, 64GB ram 2x4TB HDD) for unlimited gigabit.

We first go with a virtual machine in the howto/ganeti cluster and also a VM in Hetzner Cloud (2.50EUR/mth).

Proposed Solution

Nginx will be deployed on two servers. ATS was found to be somewhat difficult to configure and debug, while Nginx has a more "regular" configuration file format. Furthermore, performance was equivalent or better in Nginx.

Finally, there is the possibility of converging all HTTP services towards Nginx if desired, which would reduce the number of moving parts in the infrastructure.

Benchmark results overview

Hits per second:

Server	AB	Siege	Bombardier	B. HTTP/1
Upstream	n/a	n/a	2800	n/a
ATS, local	800	569	n/a	n/a
ATS, remote	249	241	2050	1322
Nginx	324	269	2117	n/a

Throughput (megabyte/s):

Server	AB	Siege	Bombardier	B. HTTP/1
Upstream	n/a	n/a	145	n/a
ATS, local	42	5	n/a	n/a
ATS, remote	13	2	105	14
Nginx	17	14	107	n/a

Launch checklist

See #32239 for a followup on the launch procedure.

Benchmarking procedures

See the benchmark procedures.

Baseline benchmark

Baseline benchmark of the actual blog site, from cache02:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/  -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[================================================================================================================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2796.01     716.69    6891.48
  Latency       35.96ms    22.59ms      1.02s
  Latency Distribution
     50%    33.07ms
     75%    40.06ms
     90%    47.91ms
     95%    54.66ms
     99%    75.69ms
  HTTP codes:
    1xx - 0, 2xx - 333646, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   144.79MB/s

This is strangely much higher, in terms of throughput, and faster, in terms of latency, than testing against our own servers. Different avenues were explored to explain that disparity with our servers:

jumbo frames? nope, both connexions see packets larger than 1500 bytes
protocol differences? nope, both go over IPv6 and (probably) HTTP/2 (at least not over UDP)
different link speeds

The last theory is currently the only one standing. Indeed, 144.79MB/s should not be possible on regular gigabit ethernet (GigE), as it is actually more than 1000Mbit/s (1158.32Mbit/s). Sometimes the above benchmark even gives 152MB/s (1222Mbit/s), way beyond what a regular GigE link should be able to provide.

Alternatives considered

Four alternatives were seriously considered:

Apache Traffic Server
Nginx proxying + caching
Varnish + stunnel
Fastly

Other alternatives were not:

Apache HTTPD caching - performance expected to be sub-par
Envoy - not designed for caching, external cache support planned in 2019
HAproxy - not designed to cache large objects
H2O - HTTP/[123], written from scratch for HTTP/2+, presumably faster than Nginx, didn't find out about it until after the project launched
Ledge - caching extension to Nginx with ESI, Redis, and cache purge support, not packaged in Debian
Nuster - new project, not packaged in Debian (based on HAproxy), performance comparable with nginx and varnish according to upstream, although impressive improvements
Polipo - not designed for production use
Squid - not designed as a reverse proxy
Traefik - not designed for caching

Apache Traffic Server

Summary of online reviews

Pros:

HTTPS
HTTP/2
industry leader (behind cloudflare)
out of the box clustering support

Cons:

load balancing is an experimental plugin (at least in 2016)
no static file serving? or slower?
no commercial support

Used by Yahoo, Apple and Comcast.

First impressions

Pros:

Puppet module available
no query logging by default (good?)
good documentation, but a bit lacking in tutorials
nice little dashboard shipped by default (traffic_top) although it could be more useful (doesn't seem to show hit ratio clearly)

Cons:

configuration spread out over many different configuration file
complex and arcane configuration language (e.g. try to guess what this actually does:: CONFIG proxy.config.http.server_ports STRING 8080:ipv6:tr-full 443:ssl ip-in=192.168.17.1:80:ip-out=[fc01:10:10:1::1]:ip-out=10.10.10.1)
configuration syntax varies across config files and plugins
~~couldn't decouple backend hostname and passed Host header~~ bad random tutorial found on the internet
couldn't figure out how to make HTTP/2 work
no prometheus exporters

Configuration

apt install trafficserver

Default Debian config seems sane when compared to the Cicimov tutorial. On thing we will need to change is the default listening port, which is by default:

CONFIG proxy.config.http.server_ports STRING 8080 8080:ipv6

We want something more like this:

CONFIG proxy.config.http.server_ports STRING 80 80:ipv6 443:ssl 443:ssl:ipv6

We also need to tell ATS to keep the original Host header:

CONFIG proxy.config.url_remap.pristine_host_hdr INT 1

It's clearly stated in the tutorial, but mistakenly in Cicimov's.

Then we also need to configure the path to the SSL certs, we use the self-signed certs for benchmarking:

CONFIG proxy.config.ssl.server.cert.path STRING /etc/ssl/torproject-auto/servercerts/
CONFIG proxy.config.ssl.server.private_key.path STRING /etc/ssl/torproject-auto/serverkeys/

When we have a real cert created in let's encrypt, we can use:

CONFIG proxy.config.ssl.server.cert.path STRING /etc/ssl/torproject/certs/
CONFIG proxy.config.ssl.server.private_key.path STRING /etc/ssl/private/

Either way, we need to tell ATS about those certs:

#dest_ip=* ssl_cert_name=thishost.crt ssl_key_name=thishost.key
ssl_cert_name=blog.torproject.org.crt ssl_key_name=blog.torproject.org.key

We need to add trafficserver to the ssl-cert group so it can read those:

adduser trafficserver ssl-cert

Then we setup this remapping rule:

map https://blog.torproject.org/ https://backend.example.com/

(backend.example.com is the prod alias of our backend.)

And finally curl is able to talk to the proxy:

curl --proxy-cacert /etc/ssl/torproject-auto/servercerts/ca.crt --proxy https://cache01.torproject.org/ https://blog.torproject.org

Troubleshooting

Proxy fails to hit backend

curl: (56) Received HTTP code 404 from proxy after CONNECT

Same with plain GET:

# curl -s -k -I --resolve *:443:127.0.0.1 https://blog.torproject.org | head -1
HTTP/1.1 404 Not Found on Accelerator

It seems that the backend needs to respond on the right-side of the remap rule correctly, as ATS doesn't reuse the Host header correctly, which is kind of a problem because the backend wants to redirect everything to the canonical hostname for SEO purposes. We could tweak that and make backend.example.com the canonical host, but then it would make disaster recovery much harder, and could make some links point there instead of the real canonical host.

I tried the mysterious regex_remap plugin:

map http://cache01.torproject.org/ http://localhost:8000/ @plugin=regex_remap.so @pparam=maps.reg @pparam=host

with this in maps.reg:

.* $s://$f/$P/

... which basically means "redirect everything to the original scheme, host and path", but that (obviously, maybe) fails with:

# curl -I -s http://cache01.torproject.org/ | head -1
HTTP/1.1 400 Multi-Hop Cycle Detected

It feels it really doesn't want to act as a transparent proxy...

I also tried a header rewrite:

map http://cache01.torproject.org/ http://localhost:8000/ @plugin=header_rewrite.so @pparam=rules1.conf

with rules1.conf like:

set-header host cache01.torproject.org
set-header foo bar

... and the Host header is untouched. The rule works though because the Foo header appears in the request.

The solution to this is the proxy.config.url_remap.pristine_host_hdr documented above.

HTTP/2 support missing

Next hurdle: no HTTP/2 support, even when using proto=http2;http (falls back on HTTP/1.1) and proto=http2 only (fails with WARNING: Unregistered protocol type 0).

Benchmarks

Same host tests

With blog.tpo in /etc/hosts, because proxy-host doesn't work, and running on the same host as the proxy (!), cold cache:

root@cache01:~# siege https://blog.torproject.org/
** SIEGE 4.0.4
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:                  68068 hits
Availability:                 100.00 %
Elapsed time:                 119.53 secs
Data transferred:             654.47 MB
Response time:                  0.18 secs
Transaction rate:             569.46 trans/sec
Throughput:                     5.48 MB/sec
Concurrency:                   99.67
Successful transactions:       68068
Failed transactions:               0
Longest transaction:            0.56
Shortest transaction:           0.00

Warm cache:

root@cache01:~# siege https://blog.torproject.org/
** SIEGE 4.0.4
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:                  65953 hits
Availability:                 100.00 %
Elapsed time:                 119.71 secs
Data transferred:             634.13 MB
Response time:                  0.18 secs
Transaction rate:             550.94 trans/sec
Throughput:                     5.30 MB/sec
Concurrency:                   99.72
Successful transactions:       65953
Failed transactions:               0
Longest transaction:            0.62
Shortest transaction:           0.00

And traffic_top looks like this after the second run:

         CACHE INFORMATION                     CLIENT REQUEST & RESPONSE        
Disk Used   77.8K    Ram Hit     99.9%   GET         98.7%    200         98.3%
Disk Total 268.1M    Fresh       98.2%   HEAD         0.0%    206          0.0%
Ram Used    16.5K    Revalidate   0.0%   POST         0.0%    301          0.0%
Ram Total  352.3K    Cold         0.0%   2xx         98.3%    302          0.0%
Lookups    134.2K    Changed      0.1%   3xx          0.0%    304          0.0%
Writes      13.0     Not Cache    0.0%   4xx          2.0%    404          0.4%
Updates      1.0     No Cache     0.0%   5xx          0.0%    502          0.0%
Deletes      0.0     Fresh (ms)   8.6M   Conn Fail    0.0     100 B        0.1%
Read Activ   0.0     Reval (ms)   0.0    Other Err    2.8K    1 KB         2.0%
Writes Act   0.0     Cold (ms)   26.2G   Abort      111.0     3 KB         0.0%
Update Act   0.0     Chang (ms)  11.0G                        5 KB         0.0%
Entries      2.0     Not (ms)     0.0                         10 KB       98.2%
Avg Size    38.9K    No (ms)      0.0                         1 MB         0.0%
DNS Lookup 156.0     DNS Hit     89.7%                        > 1 MB       0.0%
DNS Hits   140.0     DNS Entry    2.0   
             CLIENT                                ORIGIN SERVER                
Requests   136.5K    Head Bytes 151.6M   Requests   152.0     Head Bytes 156.5K
Req/Conn     1.0     Body Bytes   1.4G   Req/Conn     1.1     Body Bytes   1.1M
New Conn   137.0K    Avg Size    11.0K   New Conn   144.0     Avg Size     8.0K
Curr Conn    0.0     Net (bits)  12.0G   Curr Conn    0.0     Net (bits)   9.8M
Active Con   0.0     Resp (ms)    1.2   
Dynamic KA   0.0                        
cache01                                    (r)esponse (q)uit (h)elp (A)bsolute

ab:

# ab -c 100 -n 1000 https://blog.torproject.org/
[...]
Server Software:        ATS/8.0.2
Server Hostname:        blog.torproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        blog.torproject.org

Document Path:          /
Document Length:        52873 bytes

Concurrency Level:      100
Time taken for tests:   1.248 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      53974000 bytes
HTML transferred:       52873000 bytes
Requests per second:    801.43 [#/sec] (mean)
Time per request:       124.776 [ms] (mean)
Time per request:       1.248 [ms] (mean, across all concurrent requests)
Transfer rate:          42242.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        8   47  20.5     46     121
Processing:     6   75  16.2     76     116
Waiting:        1   13   6.8     12      49
Total:         37  122  21.6    122     196

Percentage of the requests served within a certain time (ms)
  50%    122
  66%    128
  75%    133
  80%    137
  90%    151
  95%    160
  98%    169
  99%    172
 100%    196 (longest request)

Separate host

Those tests were performed from one cache server to the other, to avoid the benchmarking tool fighting for resources with the server.

In .siege/siege.conf:

verbose = false
fullurl = true
concurrent = 100
time = 2M
url = https://blog.torproject.org/
delay = 1
internet = false
benchmark = true

Siege:

root@cache-02:~# siege
** SIEGE 4.0.4
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		       28895 hits
Availability:		      100.00 %
Elapsed time:		      119.73 secs
Data transferred:	      285.18 MB
Response time:		        0.40 secs
Transaction rate:	      241.33 trans/sec
Throughput:		        2.38 MB/sec
Concurrency:		       96.77
Successful transactions:       28895
Failed transactions:	           0
Longest transaction:	        1.26
Shortest transaction:	        0.05

Load went to about 2 (Load average: 1.65 0.80 0.36 after test), with one CPU constantly busy and the other at about 50%, memory usage was low (~800M).

ab:

# ab -c 100 -n 1000 https://blog.torproject.org/
[...]
Server Software:        ATS/8.0.2
Server Hostname:        blog.torproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,4096,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        blog.torproject.org

Document Path:          /
Document Length:        53320 bytes

Concurrency Level:      100
Time taken for tests:   4.010 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      54421000 bytes
HTML transferred:       53320000 bytes
Requests per second:    249.37 [#/sec] (mean)
Time per request:       401.013 [ms] (mean)
Time per request:       4.010 [ms] (mean, across all concurrent requests)
Transfer rate:          13252.82 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       23  254 150.0    303     549
Processing:    14  119  89.3    122     361
Waiting:        5  105  89.7    105     356
Total:         37  373 214.9    464     738

Percentage of the requests served within a certain time (ms)
  50%    464
  66%    515
  75%    549
  80%    566
  90%    600
  95%    633
  98%    659
  99%    675
 100%    738 (longest request)

Bombardier results are much better and almost max out the gigabit connexion:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/  -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[=========================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2049.82     533.46    7083.03
  Latency       49.75ms    20.82ms   837.07ms
  Latency Distribution
     50%    48.53ms
     75%    57.98ms
     90%    69.05ms
     95%    78.44ms
     99%   128.34ms
  HTTP codes:
    1xx - 0, 2xx - 241187, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   104.67MB/s

It might be because it supports doing HTTP/2 requests and, indeed, the Throughput drops down to 14MB/s when we use the --http1 flag, along with rates closer to ab:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/ --http1 -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[=========================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      1322.21     253.18    1911.21
  Latency       78.40ms    18.65ms   688.60ms
  Latency Distribution
     50%    75.53ms
     75%    88.52ms
     90%   101.30ms
     95%   110.68ms
     99%   132.89ms
  HTTP codes:
    1xx - 0, 2xx - 153114, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    14.22MB/s

Inter-server communication is good, according to iperf3:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.00 GBytes   859 Mbits/sec                  receiver

So we see the roundtrip does add significant overhead to ab and siege. It's possible this is due to the nature of the virtual server, much less powerful than the server. This seems to be confirmed by bombardieer's success, since it's possibly better designed than the other two to maximize resources on the client side.

Nginx

Summary of online reviews

Pros:

provides full webserver stack means much more flexibility, possibility of converging over a single solution across the infrastructure
very popular
load balancing (but no active check in free version)
can serve static content
HTTP/2
HTTPS

Cons:

provides full webserver stack (!) means larger attack surface
no ESI or ICP?
does not cache out of the box, requires config which might imply lesser performance
opencore model with paid features, especially "active health checks", "Cache Purging API" (although there are hackish ways to clear the cache and a module), and "session persistence based on cookies"
most plugins are statically compiled in different "flavors", although it's possible to have dynamic modules

Used by Cloudflare, Dropbox, MaxCDN and Netflix.

First impressions

Pros:

"approved" Puppet module
single file configuration
config easy to understand and fairly straightforward
just frigging works
easy to serve static content in case of problems
can be leveraged for other applications
performance comparable or better than ATS

Cons:

default caching module uses MD5 as a hashing algorithm
configuration refers to magic variables that are documented all over the place (e.g. what is $proxy_host vs $host?)
documentation mixes content from the commercial version which makes it difficult to tell what is actually possible
reload may crash the server (instead of not reloading) on config errors
no shiny dashboard like ATS
manual cache sizing?
detailed cache stats are only in the "plus" version

Configuration

picking the "light" debian package. The modules that would be interesting in others would be "cache purge" (from extras) and "geoip" (from full):

apt install nginx-light

Then drop this config file in /etc/nginx/sites-available and symlink into sites-enabled:

server_names_hash_bucket_size 64;
proxy_cache_path /var/cache/nginx/ levels=1:2 keys_zone=blog:10m;

server {
    listen 80;
    listen [::]:80;
    listen 443 ssl;
    listen [::]:443 ssl;
    ssl_certificate /etc/ssl/torproject/certs/blog.torproject.org.crt-chained;
    ssl_certificate_key /etc/ssl/private/blog.torproject.org.key;

    server_name blog.torproject.org;
    proxy_cache blog;

    location / {
        proxy_pass https://live-tor-blog-8.pantheonsite.io;
        proxy_set_header Host       $host;

        # cache 304
        proxy_cache_revalidate on;

        # add cookie to cache key
        #proxy_cache_key "$host$request_uri$cookie_user";
        # not sure what the cookie name is
        proxy_cache_key $scheme$proxy_host$request_uri;

        # allow serving stale content on error, timeout, or refresh
        proxy_cache_use_stale error timeout updating;
        # allow only first request through backend
        proxy_cache_lock on;

        # add header
        add_header X-Cache-Status $upstream_cache_status;
    }
}

... and reload nginx.

I tested that logged in users bypass the cache and things generally work well.

A key problem with Nginx is getting decent statistics out. The upstream nginx exporter supports only (basically) hits per second through the stub status module a very limited module shipped with core Nginx. The commercial version, Nginx Plus, supports a more extensive API which includes the hit rate, but that's not an option for us.

There are two solutions to work around this problem:

create our own metrics using the Nginx Lua Prometheus module: this can have performance impacts and involves a custom configuration
write and parse log files, that's the way the munin plugin works - this could possibly be fed directly into mtail to avoid storing logs on disk but still get the date (include $upstream_cache_status in the logs)
use a third-party module like vts or sts and the exporter to expose those metrics - the vts module doesn't seem to be very well maintained (no release since 2018) and it's unclear if this will work for our use case

Here's an example of how to do the mtail hack. First tell nginx to write to syslog, to act as a buffer, so that parsing doesn't slow processing, excerpt from the nginx.conf snippet:

# Log response times so that we can compute latency histograms
# (using mtail). Works around the lack of Prometheus
# instrumentation in NGINX.
log_format extended '$server_name:$server_port '
            '$remote_addr - $remote_user [$time_local] '
            '"$request" $status $body_bytes_sent '
            '"$http_referer" "$http_user_agent" '
            '$upstream_addr $upstream_response_time $request_time';

access_log syslog:server=unix:/dev/log,facility=local3,tag=nginx_access extended;

(We would also need to add $upstream_cache_status in that format.)

Then count the different stats using mtail, excerpt from the mtail config snippet:

# Define the exported metrics.
counter nginx_http_request_total
counter nginx_http_requests by host, vhost, method, code, backend
counter nginx_http_bytes by host, vhost, method, code, backend
counter nginx_http_requests_ms by le, host, vhost, method, code, backend 

/(?P<hostname>[-0-9A-Za-z._:]+) nginx_access: (?P<vhost>[-0-9A-Za-z._:]+) (?P<remote_addr>[0-9a-f\.:]+) - - \[^\](^\)+\] "(?P<request_method>[A-Z]+) (?P<request_uri>\S+) (?P<http_version>HTTP\/[0-9\.]+)" (?P<status>\d{3}) ((?P<response_size>\d+)|-) "[^"]*" "[^"]*" (?P<upstream_addr>[-0-9A-Za-z._:]+) ((?P<ups_resp_seconds>\d+\.\d+)|-) (?P<request_seconds>\d+)\.(?P<request_milliseconds>\d+)/ {

	nginx_http_request_total++
    # [...]
}

We'd also need to check the cache statuf in that parser.

A variation of the mtail hack was adopted in our design.

Benchmarks

ab:

root@cache-02:~# ab -c 100 -n 1000 https://blog.torproject.org/
[...]
Server Software:        nginx/1.14.2
Server Hostname:        blog.torproject.org
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,4096,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        blog.torproject.org

Document Path:          /
Document Length:        53313 bytes

Concurrency Level:      100
Time taken for tests:   3.083 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      54458000 bytes
HTML transferred:       53313000 bytes
Requests per second:    324.31 [#/sec] (mean)
Time per request:       308.349 [ms] (mean)
Time per request:       3.083 [ms] (mean, across all concurrent requests)
Transfer rate:          17247.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       30  255  78.0    262     458
Processing:    18   35  19.2     28     119
Waiting:        7   19   7.4     18      58
Total:         81  290  88.3    291     569

Percentage of the requests served within a certain time (ms)
  50%    291
  66%    298
  75%    303
  80%    306
  90%    321
  95%    533
  98%    561
  99%    562
 100%    569 (longest request)

About 50% faster than ATS.

Siege:

Transactions:		       32246 hits
Availability:		      100.00 %
Elapsed time:		      119.57 secs
Data transferred:	     1639.49 MB
Response time:		        0.37 secs
Transaction rate:	      269.68 trans/sec
Throughput:		       13.71 MB/sec
Concurrency:		       99.60
Successful transactions:       32246
Failed transactions:	           0
Longest transaction:	        1.65
Shortest transaction:	        0.23

Almost an order of magnitude faster than ATS. Update: that's for the throughput. The transaction rate is actually similar, which implies the page size might have changed between benchmarks.

Bombardier:

anarcat@cache-02:~$ ./go/bin/bombardier --duration=2m --latencies https://blog.torproject.org/  -c 100
Bombarding https://blog.torproject.org:443/ for 2m0s using 100 connection(s)
[=========================================================================] 2m0s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2116.74     506.01    5495.77
  Latency       48.42ms    34.25ms      2.15s
  Latency Distribution
     50%    37.19ms
     75%    50.44ms
     90%    89.58ms
     95%   109.59ms
     99%   169.69ms
  HTTP codes:
    1xx - 0, 2xx - 247827, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   107.43MB/s

Almost maxes out the gigabit connexion as well, but only marginally faster (~3%?) than ATS.

Does not max theoritical gigabit maximal performance, which is apparently at around 118MB/s without jumbo frames (and 123MB/s with).

Varnish

Pros:

specifically built for caching
very flexible
grace mode can keep objects even after TTL expired (when backends go down)
third most popular, after Cloudflare and ATS

Cons:

no HTTPS support on frontend or backend in the free version, would require stunnel hacks
configuration is compiled and a bit weird
static content needs to be generated in the config file, or sidecar
no HTTP/2 support

Used by Fastly.

Fastly itself

We could just put Fastly in front of all this and shove the costs on there.

Pros:

easy
possibly free

Cons:

might go over our quotas during large campaigns
sending more of our visitors to Fastly, non-anonymously

Sources

Benchmarks:

Bizety: Nginx vs Varnish vs Apache Traffic Server - High Level Comparison - "Each proxy server has strengths and weakness"
ScaleScale: Nginx vs Varnish: which one is better? - nginx + tmpfs good alternative to varnish
garron.me: Nginx + Varnish compared to Nginx - equivalent
Uptime Made Easy: Nginx or Varnish Which is Faster? - equivalent
kpayne.me: Apache Traffic Server as a Reverse Proxy - "According to blitz.io, Varnish and Traffic Server benchmark results are close. According to ab, Traffic Server is twice as fast as Varnish"
University of Oslo: Performance Evaluation of the Apache Traffic Server and Varnish Reverse Proxies - "Varnish seems the more promising reverse proxy server"
Loggly: Benchmarking 5 Popular Load Balancers: Nginx, HAProxy, Envoy, Traefik, and ALB
SpinupWP: Page Caching: Varnish Vs Nginx FastCGI Cache 2018 Update - "Nginx FastCGI Cache is the clear winner when it comes to outright performance. It’s not only able to handle more requests per second, but also serve each request 55ms quicker on average."

Tutorials and documentation:

Apache.org: Why Apache Traffic Server - upstream docs
czerasz.com: Nginx Caching Tutorial - You Can Run Faster - tutorial
Igor Cicimov: Apache Traffic Server as Caching Reverse Proxy - tutorial, "Apache TS presents a stable, fast and scalable caching proxy platform"
Datanyze.com: Web Accelerators Market Share Report

Comments

Please register or sign in to add a comment.

cache

Tutorial

How-to

Traffic inspection

Pager playbook

Disaster recovery

Reference

Installation

SLA

Design

Puppet architecture

Issues

Monitoring and testing

Logs and metrics

Other documentation

Discussion

Overview

Goals

Must have

Nice to have

Approvals required

Non-Goals

Cost

Proposed Solution

Benchmark results overview

Launch checklist

Benchmarking procedures

Baseline benchmark

Alternatives considered

Apache Traffic Server

Summary of online reviews

First impressions

Configuration

Troubleshooting

Proxy fails to hit backend

HTTP/2 support missing

Benchmarks

Same host tests

Separate host

Nginx

Summary of online reviews

First impressions

Configuration

Benchmarks

Varnish

Fastly itself

Sources

Comments