# allow serving stale content on error, timeout, or refresh
## Non-Goals
proxy_cache_use_stale error timeout updating;
# allow only first request through backend
proxy_cache_lock on;
# add header
* global CDN for users outside of TPO
add_header X-Cache-Status $upstream_cache_status;
* geoDNS
}
}
... and reload nginx.
## Cost
I tested that logged in users bypass the cache and things generally
Somewhere between 11EUR and 100EUR/mth for bandwidth and hardware.
work well.
A key problem with Nginx is getting decent statistics out. The
We're getting apparently around 2.2M "page views" per month at
[upstream nginx exporter](https://github.com/nginxinc/nginx-prometheus-exporter) supports only (basically) hits per second
Pantheon. That is about 1 hit per second and 12 terabyte per month,
through the [stub status module](http://nginx.org/en/docs/http/ngx_http_stub_status_module.html) a very limited module shipped with
36Mbit/s on average:
core Nginx. The commercial version, Nginx Plus, supports a [more
extensive API](https://nginx.org/en/docs/http/ngx_http_api_module.html#api) which includes the hit rate, but that's not an
option for us.
There are two solutions to work around this problem:
$ qalc
> 2 200 000 ∕ (30d) to hertz
* create our own metrics using the [Nginx Lua Prometheus module](https://github.com/knyar/nginx-lua-prometheus):
2200000 / (30 * day) = approx. 0.84876543 Hz
this can have performance impacts and involves a custom
configuration
* write and parse log files, that's the way the [munin plugin](https://github.com/munin-monitoring/contrib/blob/master/plugins/nginx/nginx-cache-hit-rate)
works - this could possibly be fed *directly* into [mtail](https://github.com/google/mtail) to
avoid storing logs on disk but still get the date (include
[`$upstream_cache_status`](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_cache_status) in the logs)
* use a third-party module like [vts](https://github.com/vozlt/nginx-module-vts) or [sts](https://github.com/vozlt/nginx-module-sts) and the
[exporter](https://github.com/hnlq715/nginx-vts-exporter) to expose those metrics - the vts module doesn't seem
to be very well maintained (no release since 2018) and it's unclear
if this will work for our use case
Here's an example of how to do the mtail hack. First tell nginx to
> 2 200 000 * 5Mibyte
write to syslog, to act as a buffer, so that parsing doesn't slow
processing, excerpt from the [nginx.conf snippet](https://git.autistici.org/ai3/float/blob/master/roles/nginx/templates/config/nginx.conf#L34):
# Log response times so that we can compute latency histograms
2200000 * (5 * mebibyte) = 11.534336 terabytes
# (using mtail). Works around the lack of Prometheus
* [perusio@github.com: Nginx configuration for running Drupal](https://github.com/perusio/drupal-with-nginx) -
interesting [snippet](https://github.com/perusio/drupal-with-nginx/blob/D7/apps/drupal/map_cache.conf) for cookies handling, not required
* [NGINX: Maximizing Drupal 8 Performance with NGINX, Part 2: Caching and Load Balancing](https://www.nginx.com/blog/maximizing-drupal-8-performance-nginx-part-ii-caching-load-balancing/)
### Benchmarks
Will require a test VM (or two?) to hit the caches.
ab:
### Common procedure
root@cache-02:~# ab -c 100 -n 1000 https://blog.torproject.org/
1. punch a hole in the firewall to allow cache2 to access cache1
* [wrk](https://github.com/wg/wrk/) - multithreaded, epoll, Lua scriptable, no HTTPS, only in
Debian unstable
## Alternatives considered
Four alternatives were seriously considered:
* Apache Traffic Server
* Nginx proxying + caching
* Varnish + stunnel
* Fastly
Other alternatives were not:
* [Apache HTTPD caching](https://httpd.apache.org/docs/2.4/caching.html) - performance expected to be sub-par
* [Envoy][] - [not designed for caching](https://github.com/envoyproxy/envoy/issues/868), [external cache support
planned in 2019](https://blog.getambassador.io/envoy-proxy-in-2019-security-caching-wasm-http-3-and-more-e5ba82da0197?gi=82c1a78157b8)
* [HAproxy](https://www.haproxy.com/) - [not designed to cache large objects](https://www.haproxy.com/documentation/aloha/9-5/traffic-management/lb-layer7/caching-small-objects/)
* [Ledge](https://github.com/ledgetech/ledge) - caching extension to Nginx with ESI, Redis, and cache
purge support, not packaged in Debian
* [Nuster](https://github.com/jiangwenyuan/nuster) - new project, not packaged in Debian (based on
HAproxy), performance [comparable with nginx and varnish](https://github.com/jiangwenyuan/nuster/wiki/Web-cache-server-performance-benchmark:-nuster-vs-nginx-vs-varnish-vs-squid#results)
according to upstream, although impressive improvements
* [Polipo](https://en.wikipedia.org/wiki/Polipo) - not designed for production use
* [Squid](http://www.squid-cache.org/) - not designed as a reverse proxy
* [Traefik](https://traefik.io/) - [not designed for caching](https://github.com/containous/traefik/issues/878)
[Envoy]: https://www.envoyproxy.io/
### Apache Traffic Server
#### Summary of online reviews
Pros:
* HTTPS
* HTTP/2
* industry leader (behind cloudflare)
* out of the box clustering support
Cons:
* load balancing is an experimental plugin (at least in 2016)
Hetzner charges 1EUR/TB/month over our 1TB quota, so bandwidth would
cost 11EUR/month on average. If costs become prohibitive, we could
switch to a Hetzner VM which includ 20TB of traffic per month at costs
ranging from 3EUR/mth to 30EUR/mth depending on the VPS size (between
1 vCPU, 2GB ram, 20GB SSD and 8vCPU, 32GB ram and 240GB SSD).
Dedicated servers start at 34EUR/mth (`EX42`, 64GB ram 2x4TB HDD) for
unlimited gigabit.
## Alternatives considered
Four alternatives were seriously considered:
* Apache Traffic Server
* Nginx proxying + caching
* Varnish + stunnel
* Fastly
Other alternatives were not:
* [Apache HTTPD caching](https://httpd.apache.org/docs/2.4/caching.html) - performance expected to be sub-par
* [Envoy][] - [not designed for caching](https://github.com/envoyproxy/envoy/issues/868), [external cache support
planned in 2019](https://blog.getambassador.io/envoy-proxy-in-2019-security-caching-wasm-http-3-and-more-e5ba82da0197?gi=82c1a78157b8)
* [HAproxy](https://www.haproxy.com/) - [not designed to cache large objects](https://www.haproxy.com/documentation/aloha/9-5/traffic-management/lb-layer7/caching-small-objects/)
* [Ledge](https://github.com/ledgetech/ledge) - caching extension to Nginx with ESI, Redis, and cache
purge support, not packaged in Debian
* [Nuster](https://github.com/jiangwenyuan/nuster) - new project, not packaged in Debian (based on
HAproxy), performance [comparable with nginx and varnish](https://github.com/jiangwenyuan/nuster/wiki/Web-cache-server-performance-benchmark:-nuster-vs-nginx-vs-varnish-vs-squid#results)
according to upstream, although impressive improvements
* [Polipo](https://en.wikipedia.org/wiki/Polipo) - not designed for production use
* [Squid](http://www.squid-cache.org/) - not designed as a reverse proxy
* [Traefik](https://traefik.io/) - [not designed for caching](https://github.com/containous/traefik/issues/878)
[Envoy]: https://www.envoyproxy.io/
### Apache Traffic Server
#### Summary of online reviews
Pros:
* HTTPS
* HTTP/2
* industry leader (behind cloudflare)
* out of the box clustering support
Cons:
* load balancing is an experimental plugin (at least in 2016)
* no static file serving? or slower?
* no commercial support
Used by Yahoo, Apple and Comcast.
#### First impressions
It might be because it supports doing HTTP/2 requests and, indeed, the
`Throughput` drops down to `14MB/s` when we use the `--http1` flag,
# allow serving stale content on error, timeout, or refresh
proxy_cache_use_stale error timeout updating;
# allow only first request through backend
proxy_cache_lock on;
# add header
add_header X-Cache-Status $upstream_cache_status;
}
}
... and reload nginx.
I tested that logged in users bypass the cache and things generally
work well.
A key problem with Nginx is getting decent statistics out. The
[upstream nginx exporter](https://github.com/nginxinc/nginx-prometheus-exporter) supports only (basically) hits per second
through the [stub status module](http://nginx.org/en/docs/http/ngx_http_stub_status_module.html) a very limited module shipped with
core Nginx. The commercial version, Nginx Plus, supports a [more
extensive API](https://nginx.org/en/docs/http/ngx_http_api_module.html#api) which includes the hit rate, but that's not an
option for us.
There are two solutions to work around this problem:
* create our own metrics using the [Nginx Lua Prometheus module](https://github.com/knyar/nginx-lua-prometheus):
this can have performance impacts and involves a custom
configuration
* write and parse log files, that's the way the [munin plugin](https://github.com/munin-monitoring/contrib/blob/master/plugins/nginx/nginx-cache-hit-rate)
works - this could possibly be fed *directly* into [mtail](https://github.com/google/mtail) to
avoid storing logs on disk but still get the date (include
[`$upstream_cache_status`](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_cache_status) in the logs)
* use a third-party module like [vts](https://github.com/vozlt/nginx-module-vts) or [sts](https://github.com/vozlt/nginx-module-sts) and the
[exporter](https://github.com/hnlq715/nginx-vts-exporter) to expose those metrics - the vts module doesn't seem
to be very well maintained (no release since 2018) and it's unclear
if this will work for our use case
Here's an example of how to do the mtail hack. First tell nginx to
write to syslog, to act as a buffer, so that parsing doesn't slow
processing, excerpt from the [nginx.conf snippet](https://git.autistici.org/ai3/float/blob/master/roles/nginx/templates/config/nginx.conf#L34):
# Log response times so that we can compute latency histograms
# (using mtail). Works around the lack of Prometheus
* [perusio@github.com: Nginx configuration for running Drupal](https://github.com/perusio/drupal-with-nginx) -
interesting [snippet](https://github.com/perusio/drupal-with-nginx/blob/D7/apps/drupal/map_cache.conf) for cookies handling, not required
* [NGINX: Maximizing Drupal 8 Performance with NGINX, Part 2: Caching and Load Balancing](https://www.nginx.com/blog/maximizing-drupal-8-performance-nginx-part-ii-caching-load-balancing/)
#### Benchmarks
ab:
root@cache-02:~# ab -c 100 -n 1000 https://blog.torproject.org/