We're raising RLIMIT_NOFILE too much in set_max_file_descriptors()

To cite restrict.c:

We compute this by finding the largest number that we can use.

This is wrong, why we ever need more than e.g. 2^16 FDs? On modern linux $ ulimit -Hn may output something like 10^9. So lets bump cur_limit to 10^9 and then in process_unix_exec():

for (fd = STDERR_FILENO + 1; fd < max_fd; fd++) close(fd);

go for 10^9 loop iterations. I encountered this on my machine - tor process got stuck in this CPU-heavy loop for a minutes (at first glance its just infinite loop).

Comment in process_unix.c suggests that we probably don't need this loop anymore:

XXX: We should now be doing enough FD_CLOEXEC setting to make this needless. But I think that we also should not bump RLIMIT_NOFILE to value greater than some constant. I provide toy patch to cap RLIMIT_NOFILE to 2^16.

diff --git a/src/lib/process/restrict.c b/src/lib/process/restrict.c
index 61ea664bc0..65e7b07640 100644
--- a/src/lib/process/restrict.c
+++ b/src/lib/process/restrict.c
@@ -161,6 +161,7 @@ tor_mlockall(void)
 /** Number of extra file descriptors to keep in reserve beyond those that we
  * tell Tor it's allowed to use. */
 #define ULIMIT_BUFFER 32 /* keep 32 extra fd's beyond ConnLimit_ */
+#define UPPER_NOFILE 65535 /* Absolute maximum of FDs which we ever could use */
 
 /** Learn the maximum allowed number of file descriptors, and tell the
  * system we want to use up to that number. (Some systems have a low soft
@@ -237,7 +238,7 @@ set_max_file_descriptors(rlim_t limit, int *max_out)
    * max fails at least we'll have a valid value of maximum sockets. */
   *max_out = (int)rlim.rlim_cur - ULIMIT_BUFFER;
   set_max_sockets(*max_out);
-  rlim.rlim_cur = rlim.rlim_max;
+  rlim.rlim_cur = MAX(rlim.rlim_cur, MIN(rlim.rlim_max, (rlim_t) UPPER_NOFILE));
 
   if (setrlimit(RLIMIT_NOFILE, &rlim) != 0) {
     int couldnt_set = 1;

Edited Jun 25, 2024 by David Goulet

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information