Tor fails badly when accept(2) returns EMFILE or ENFILE
If accept(2) in connection_handle_listener_read returns EMFILE or ENFILE, Tor logs a failure and returns to the event loop. The listening socket remains ready for reading, however, so that Tor again tries to accept a connection. This leads to tens of thousands of logged failures per second. Here is an excerpt from my syslog:
Feb 11 05:57:36 Tor[20415]: accept failed: Too many open files. Dropping incoming connection. Feb 11 05:57:54 last message repeated 301536 times Feb 11 05:57:54 Tor[20415]: Failing because we have 1765 connections already. Please raise your ulimit -n. Feb 11 05:57:54 Tor[20415]: accept failed: Too many open files. Dropping incoming connection. Feb 11 05:58:05 last message repeated 184158 times Feb 11 05:58:05 Tor[20415]: Failing because we have 1765 connections already. Please raise your ulimit -n. Feb 11 05:58:05 Tor[20415]: accept failed: Too many open files. Dropping incoming connection. Feb 11 05:58:13 last message repeated 127556 times Feb 11 05:58:13 Tor[20415]: Failing because we have 1765 connections already. Please raise your ulimit -n. Feb 11 05:58:13 Tor[20415]: accept failed: Too many open files. Dropping incoming connection. Feb 11 05:58:26 last message repeated 223556 times Feb 11 05:58:26 Tor[20415]: Failing because we have 1765 connections already. Please raise your ulimit -n.
I don't know what the right thing to do here is, but spiking the CPU and spraying log messages is not a very graceful mode of failure. One way to mitigate the damage might be to close the listening socket, which I believe won't be reopened until a minute later. This is no worse for the Tor network than just wedging, and perhaps better, since prospective connectors would be refused rather than silently forgotten in a flurry of furious logging.
Also, it would be nice to document the number of file descriptors generally required by a Tor relay, or a formula for computing it. For example, is it proportional to the bandwidth and to the number of relays in the Tor network? Or to the bandwidth and to the number of users in the Tor network? This way, prospective operators of Tor relays would not need to repeatedly restart their relays as they test incremental bumps in the file descriptor ulimits, unless there is some way to bump them without restarting the relay (but I doubt whether there is).
(Apologies if this is duplicated: I hit !^A while editing this, in order to move to the beginning of the line, but the obnoxious !@#!^%&%!^& web form [and my obnoxiously colluding web browser] interpreted it to mean something else for which I quickly hit the stop button. I don't know what hitting !^A actually did.)
[Automatically added by flyspray2trac: Operating System: All]