Problem overview
Feb 06 02:47:39.469 [err] do_main_loop(): select failed: No buffer space available [WSAENOBUFS ] [10055]
If your Tor server is experiencing a problem with "[WSAENOBUFS] [10055]" error messages while running Tor, you are experiencing Bug #98 (moved). This is a well known, and apparently commonly experienced, bug with running Tor servers on non-server versions of Microsoft Windows 98, ME, 2000, and XP.
The official Microsoft description for WSAENOBUFS is:
WSAENOBUFS
10055
No buffer space available.
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
The WSAENOBUFS is related to a buffer used for data before and after it traverses the TCP/IP stack. As far as we can tell, there is no common hardware or software platform for those who experience this problem.
Running a Tor server on a vanilla XP install does not (easily) trigger the problem. But it can be consistently reproduced if you also run TCP/IP intensive applications such as P2P clients (Bit``Torrent, eDonkey, eMule, etc).
The result is that the activity overloads the TCP/IP stack. Since network drivers share the same buffers, often the whole network on the computer ceases to work, and it requires a reboot to fix.
Things that are not the problem
This error is entirely unrelated to the WSAENOCONN error WinXP Home and Pro users commonly experience. The error messages are different: WSAENOCONN causes Event Log entries such as "EventID 4226: TCP/IP has reached the security limit imposed on the number of concurrent TCP connect attempts". TCPIP.SYS in XP is hardcoded to a limit of 10 half-open connections per second. A sufficiently high bandwidth Tor exit server WILL experience this error, but this does not cause Tor to crash (though it does cause some outbound connections to fail, and eventually we should build some workarounds for this). Speed``Guide.net provides a [detailed explanation].
So what IS the problem?
We're not totally sure. But we have a theory.
First, some background. One of the ways Windows does networking with lots of connections at once is with an approach called "overlapped IO". Basically you hand it a socket, a length, and a buffer, and tell it to either read or write, and Windows will take it from there and let you know when it's done.
http://www.codeproject.com/internet/IOCP_Server_client.asp?msg=1187159
With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the WSAENOBUFS error.
But Tor doesn't use overlapped IO: it uses the select() system call to learn when sockets are available for reading or writing, and then uses non-blocking writes and reads to send and receive data.
So our theory is that when we send() and recv(), Windows copies the contents of the buffer into a kernel buffer. If we send or recv too much at once, Windows runs out of kernel buffer space.
Our current plan is that we need to abandon select() on Windows in favor of overlapped IO. This involves three steps. Step one is to add overlapped IO support to libevent. (Libevent already has a notion of a buffer api, so we could extend that.) Step two is to change the way Tor calls OpenSSL, so it operates on local buffers rather than interacting with the network itself (presumably using recv and send). The third step is to change Tor's networking loop to use libevent's buffer API rather than the socket API. If you'd like to help with any of these steps, let us know!
http://cvs.sourceforge.net/viewcvs.py/levent/libevent/WIN32-Code/win32.c?view=auto
How to make it break less quickly
You can try increasing the priority of Tor, Privoxy, and Vidalia in Taskmanager by hitting CTRL-ALT-DEL, going to the processes tab, and right clicking on each process and changing the priority to "Above normal". You can use Prio to make this automatic every time you start Tor.
You can also screw with the registry:
The following registry entries have been shown to mitigate the buffer issues to varying degrees of success. As always, if you do not understand the Windows Registry, and Reg``Edit, do not attempt these modifications. Your mileage may vary.
http://web.ircsystems.net/codemastr/bufspace.html
To do this go to Start, Run and type regedit. In the left pane navigate to
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters once
there, you must create the entry TcpNumConnections. To do this, right click in
the right pane and choose new from the menu and select DWORD Value. Give it the
name TcpNumConnections. Then right click it and select modify and enter a value
of 800. Then restart your computer.
There are a few TCP related registry entries that potentially manipulate
the internal buffer size available for data to be passed through the
tcp stack. Manipulating
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize
and TcpWindowSize to 0xfaf00 (1027840) seemed to increase the time
to failure when running Tor and Bit``Torrent.
Configuring HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323``Opts="3"
also seemed to help the exit server last longer.
Setting this to "1" is another option as it doesn't remove 12-bytes from every header for timestamp placement.
However, Tor seems to have lots of odd packet problems on an exit server (as shown by ethereal, lots of re-transmits,
lost ACKs, etc), and the "3" solution seemed to quiet these things down. (Only packet headers were captured during the tests, not actual data.)
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Sack``Opts="1" is another helpful setting.
An experimental feature recently added to Tor that constrains the send and receive socket buffer sizes may also reduce or alleviate this problem. If your Tor version supports it, try the following option in your configuration:
ConstrainedSockets 1
Some more data points
It appears that a system with 384MB of ram or greater, a fresh install of Win XP Home, fully patched via Windows Update, and solely running a Tor exit server does not experience these problems. This is true for both 0.1.0.16-stable and 0.1.1.12-alpha versions of Tor. The configuration of tor is a simple exit server with no bandwidth limits, burst restrictions, nor hibernation.
We continue to debug this issue. Recent tests show that total available ram at boot time correlates with the creation of the [WSAENOBUFS] error. The amount of memory available to the system was configured via the C:\boot.ini option of /MAXMEM=###. The results are as follows: *At /MAXMEM=128, simply starting up the tor server was enough to create a [WSAENOBUFS] error. *At /MAXMEM=256, the tor server did create a [WSAENOBUFS] error. Time varied from 2-5 hours. *At /MAXMEM=384, the tor server did not create a [WSAENOBUFS] error after 6 hours. *At /MAXMEM=512, the tor server did not create a [WSAENOBUFS] error after 6 hours. Further investigation is needed at this memory level. *At /MAXMEM=1024, the tor server did not create a [WSAENOBUFS] error after 48 hours.
We've learned that Windows does allocate large chunks of memory per socket on connect. See this graphic of [Pool Behaviorhttp://msdn.microsoft.com
Alternative solutions
Virtualization doesn't help solve the underlying problem, but perhaps helps build the installed base. For lateral thinkers, VMWare Player (available at no cost) can be used by Windows users to run Tor on Linux. In particular the Browser Appliance [here] might be a good starting point for a web client. There are many other [Appliances] which may also be easily modified to use Tor.