Make nf_conntrack changes persistent
Investigating the loss of users from Iran and the increase of users from Russia that only affected snowflake-02 that happened in the second half of February 2023, I discovered that the increased nf_conntrack table size from #40239 (closed) had not taken effect since the bridges were rebooted on 2023-02-16 for #40253 (closed).
root@snowflake-01:~# date -u --iso=sec
2023-03-04T07:13:32+00:00
root@snowflake-01:~# cat /proc/sys/net/netfilter/nf_conntrack_{count,max,buckets}
186671
262144
65536
root@snowflake-02:~# date -u --iso=sec
2023-03-04T07:36:45+00:00
root@snowflake-02:~# cat /proc/sys/net/netfilter/nf_conntrack_{count,max,buckets}
21155
262144
65536
The default settings were supposed to be overridden by /etc/sysctl.d/nf_conntrack.conf, which is loaded during boot by sysctl:
root@snowflake-01:~# cat /etc/sysctl.d/nf_conntrack.conf
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_buckets = 524288
I think the problem is that the sysctl settings are made before the nf_conntrack module is first loaded, so they have no effect, as the sysctl.d(5) man page cautions:
Many sysctl parameters only become available when certain kernel modules are loaded. Modules are usually loaded on demand, e.g. when certain hardware is plugged in or network brought up. This means that systemd-sysctl.service(8) which runs during early boot will not configure such parameters if they become available after it has run. To set such parameters, it is recommended to add an udev(7) rule to set those parameters when they become available. Alternatively, a slightly simpler and less efficient option is to add the module to modules-load.d(5), causing it to be loaded statically before sysctl settings are applied (see example below).
The workaround, as the man page suggests,
is to add nf_conntrack
to /etc/modules to make it be loaded earlier in the boot sequence,
or add a udev rule to re-load the net.netfilter
sysctl settings
when nf_conntrack is loaded.
- https://github.com/systemd/systemd/issues/1113#issuecomment-138051408
- https://serverfault.com/a/676721
- https://github.com/coreos/bugs/issues/785
I've re-started the conntrack.sh script
to track nf_conntrack_count
over time.
I will let it run for a couple of days to get a baseline,
then I will increase the nf_conntrack limits again
to see if it has an effect on usership.
Then, I will try one of the above workarounds to make the change actually persistent this time.
/cc @linus