Can't connect to literal IPv6 address containing double colon
When an application wants to use Tor's SOCKS port to connect to a known IPv6 address, it has a couple of options:
It can specify a 16-byte binary address using address type 4.
It can specify the address as an ASCII string using address type 3.
If the address is specified as a string, Tor accepts IPv6 addresses either with or without brackets. For example, Tor will accept either "2a01:4f8:fff0:4f:266:37ff:fe2c:5d19" or "[2a01:4f8:fff0:4f:266:37ff:fe2c:5d19]".
However, if the address is abbreviated using double-colon notation, it only works if enclosed in brackets: "[2a00:1450:4001:800::200e]" works, but "2a00:1450:4001:800::200e" does not. On the other hand, the unabbreviated form "2a00:1450:4001:800:0:0:0:200e" does work.
The problem appears to be:
The destination is transmitted to the exit relay as a string of the form ":".
The exit relay tries to parse this string by calling the function tor_addr_port_split.
The string "2a00:1450:4001:800::200e:80" is a valid IPv6 literal, so tor_addr_port_split interprets it as an address with no port number.
The relay refuses to connect to an address with no port number.
Note that if the application uses the binary form (address type 4), this is internally converted into a string enclosed in brackets. However, it seems to be more common for applications to use the ASCII form, without brackets. For example, if you try to visit http://[2a00:1450:4001:800::200e]/ in Tor Browser, it will fail, whereas http://[2a01:4f8:fff0:4f:266:37ff:fe2c:5d19]/ succeeds.
So there are a few ways this could be fixed:
(a) applications could be changed to use either the binary form or wrap the address in brackets;
(b) the Tor proxy could automatically add brackets around IPv6 addresses;
(c) the exit relay could be smarter about parsing IPv6 addresses.
It seems to me that (b) would be the most sensible option, but it might be reasonable to do (c) as well.
In the long term, I think it'd be wise to deprecate the use of IPv6 addresses without brackets in RELAY_BEGIN, as well as any other places where tor_addr_port_split is used, because it's just confusing.