We're raising RLIMIT_NOFILE too much in set_max_file_descriptors()
To cite restrict.c:
We compute this by finding the largest number that we can use.
This is wrong, why we ever need more than e.g. 2^16 FDs? On modern linux $ ulimit -Hn may output something like 10^9. So lets bump cur_limit to 10^9 and then in process_unix_exec():
for (fd = STDERR_FILENO + 1; fd < max_fd; fd++) close(fd);
go for 10^9 loop iterations. I encountered this on my machine - tor process got stuck in this CPU-heavy loop for a minutes (at first glance its just infinite loop).
Comment in process_unix.c suggests that we probably don't need this loop anymore:
XXX: We should now be doing enough FD_CLOEXEC setting to make this needless. But I think that we also should not bump RLIMIT_NOFILE to value greater than some constant. I provide toy patch to cap RLIMIT_NOFILE to 2^16.
diff --git a/src/lib/process/restrict.c b/src/lib/process/restrict.c
index 61ea664bc0..65e7b07640 100644
--- a/src/lib/process/restrict.c
+++ b/src/lib/process/restrict.c
@@ -161,6 +161,7 @@ tor_mlockall(void)
/** Number of extra file descriptors to keep in reserve beyond those that we
* tell Tor it's allowed to use. */
#define ULIMIT_BUFFER 32 /* keep 32 extra fd's beyond ConnLimit_ */
+#define UPPER_NOFILE 65535 /* Absolute maximum of FDs which we ever could use */
/** Learn the maximum allowed number of file descriptors, and tell the
* system we want to use up to that number. (Some systems have a low soft
@@ -237,7 +238,7 @@ set_max_file_descriptors(rlim_t limit, int *max_out)
* max fails at least we'll have a valid value of maximum sockets. */
*max_out = (int)rlim.rlim_cur - ULIMIT_BUFFER;
set_max_sockets(*max_out);
- rlim.rlim_cur = rlim.rlim_max;
+ rlim.rlim_cur = MAX(rlim.rlim_cur, MIN(rlim.rlim_max, (rlim_t) UPPER_NOFILE));
if (setrlimit(RLIMIT_NOFILE, &rlim) != 0) {
int couldnt_set = 1;