If you connect to a hidden service that's listening on virtual port 5222, and send it a begin cell for port 80, it will send you back an end cell but leave the circuit up.
I actually thought the design was more defensive: that if you ever asked for a virtual port that wasn't assigned, then it would close the circuit on you, to prevent scanning to find out what ports are open.
But it turns out I never built it that way. We should fix it.
With thanks to Ivan Pustogarov for noticing.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Started poking around a bit with this one. Here is what I think is possible.
Send back "END_STREAM_REASON_CONNECTREFUSED" which does not fully tell if the port is open or not. It could be that the service might exists but misconfigured or not started.
Drop the circuit with no reason which seems like a firewall with a DROP policy. However, I think it's pretty easy to link that to a "close port" for HS.
Send back a mysterious reason with "END_STREAM_REASON_MISC" but I think it is also too easy to associate that reason with a closed port for HS.
Usually firewall have two policies either DROP or REJECT (that sends a TCP reset) so at best I think we can make port scanning harder but not completely stop it. I would advocate for 1) here since it's actually a true and valid reason if the port is not supported by the HS and the other side (client) can't really confirm if it's a problem of the remote service or HS port.
'1' doesn't make me very satisfied. It means that if there is a port that's open, you can keep asking and you'll find it. That sounds like the same situation as now.
'2' indeed doesn't hide whether the port worked, but it sure slows down scanning. Can we argue that it slows down scanning enough to make it basically useless on a large scale? (A downside is that if somebody does decide to scan anyway, they'll sure be putting a lot of pain on the network.)
Does '4', for a low number, basically approximate one of the earlier options? E.g. we'd have to also include configured but actually down services, or you could just ask for the same one k times in a row and if it hangs up then you know it was the 'defense'.
Are there arguments against '2' other than 'it's not a complete solution'?
Send back END_STREAM_REASON_DONE as if the application closed the connection, optionally after either a certain amount of time has passed or a certain amount of data has been received from the peer. Once this happens once, tag the circuit as "kind of sketch", and apply the same policy for all new streams regardless of if the destination port is actually valid or not.
It makes debugging harder, but slows down scanning as much as 2 (though a intelligent scanner would try common ports first), and it gives the bad guys more work to do post processing the results.
Not sure about the implications for "close the stream after a while" in this context, parts of it scare me but it may be something that's less scary if the parameters are set correctly.
It seems clear here that we can't stop a network scan to finally succeed whatever behavior so our best course of actions is to slow down and made it as hard as we can for the attacker to scan.
Option '5' scares me a bit in terms of added overhead. Seems like adding delay to the bad circuit opens the door to some DoS for which an attacker could just open 65534 circuits to the HS with the wrong port and those circuits would stay open for an unknown amount of time?... If that would be acceptable for some reasons, adding a random delay before sending back the reason will also slow down quite a bit the scanner :).
I would go for sending back END_STREAM_REASON_DONE (application closed the connection) and DROP the circuit after.
It seems clear here that we can't stop a network scan to finally succeed whatever behavior so our best course of actions is to slow down and made it as hard as we can for the attacker to scan.
Indeed.
Option '5' scares me a bit in terms of added overhead. Seems like adding delay to the bad circuit opens the door to some DoS for which an attacker could just open 65534 circuits to the HS with the wrong port and those circuits would stay open for an unknown amount of time?... If that would be acceptable for some reasons, adding a random delay before sending back the reason will also slow down quite a bit the scanner :).
Yeah, I suggested doing that on the assumption (possibly wrong) that once established, keeping circuits around has minimal overhead. Establishing them is not cheap (because crypto), but keeping them around doesn't involve work right?
Right the idea is to try to get them to waste as much time as possible by adding randomized delay before the reason (and allowing the attacker to send a few cells of payload, once few is reached will also trigger sending REASON_DONE).
The 2nd part of 5 involves not dropping the circuit. Since we know they're up to no good, and have a way to waste their time and feed back garbage data into the scanner, we let the attacker "open" as many streams as they want using the known "eeevill" circuit, only to delay then close the stream down. (So say, someone starts scanning a HS that only is running httpd on a standard port, when their scanner gets to 80, it gets the "fake" discard+close behavior.).
We could add a maximum lifetime to evil portscanning circuits (based on number of stream open attempts and time), but all of this may be more effort than it's worth.
I would go for sending back END_STREAM_REASON_DONE (application closed the connection) and DROP the circuit after.
If doing the full blown "pretend we are there even accepting payload and injecting delay" is too expensive, this seems like the best lightweight alternative.
After a discussion on IRC at the little-t tor meeting, here is the consensus. In a nutshell, reason done and kill the circuit.
< nickm> Ultimately, there is no solution to #13667. If a client can try to connect to a port, and if that client can differentiate success from failure, and the scanner knows everything that the client does, then ultimately the scanner can scan ports.< dgoulet> yes exactly so our best course of action is to make it harder as we can I guess< nickm> so, if we do END_REASON_DONE and drop, they have to build more circuits and do more introduction handshakes.< dgoulet> "2)" has the possible drawback of the HS having a lot of opened circ.< nickm> If we do "insert random delays and finally drop at some point", they have to open just as many circuits, maybe, and their programming job gets a little harder, but they can do multiple queries in parallel, so ultimately we're not slowing them down much< dgoulet> nickm: yeah the parallel scanning makes that solution a bit useless< nickm> I think that "drop and kill the circuit" is probably a reasonable thing to do, in terms of trade-off between how much it helps and how hard it is.< Yawning> ;_;< Yawning> yeah< dgoulet> yeah< Yawning> if people are more paranoid, they could use authenticated HSes or something< nickm> Yeah. For a real answer, I'd think that better access control is the answer.
And since I'm helping you get up to speed as a developer, some minor points (which maybe you and nickm/andrea should talk through and agree about how to handle):
+ Circuit is origin here thus+ * will be set automatically to REASON_NONE anyway. (ref: #13667) */+ circuit_mark_for_close(circ, END_CIRC_REASON_NONE);
But in circuit_mark_for_close_ I see
if (reason == END_CIRC_AT_ORIGIN) { if (!CIRCUIT_IS_ORIGIN(circ)) { log_warn(LD_BUG, "Specified 'at-origin' non-reason for ending circuit, " "but circuit was not at origin. (called %s:%d, purpose=%d)", file, line, circ->purpose); } reason = END_CIRC_REASON_NONE; }
Does that mean saying END_CIRC_AT_ORIGIN is the safer plan here? How come we don't use ND_CIRC_AT_ORIGIN more often? If we opt not to use it here, should we just get rid of it?
Is "(ref: legacy/trac#13667 (moved))" the way we want to point people at the bugtracker? I know in the past nickm has wanted to make sure that the code stands independently of the bugtracker, but I think it does in this case.
"Note that this does not mitigate port scanning" -- I would say it actually does mitigate it. It just doesn't completely solve it.
If you happen to reformat the comments and they ended up a few columns shorter, I wouldn't object. :)
The first two rows of your changes file want to right-shift two spaces, and s/immetiately/immediately/
Yes I did. Chutney network, confirmed with torsocks and logs. I should have mention it, good point.
And since I'm helping you get up to speed as a developer, some minor points (which maybe you and nickm/andrea should talk through and agree about how to handle):
{{{
Circuit is origin here thus
* will be set automatically to REASON_NONE anyway. (ref: #13667) */
But in circuit_mark_for_close_ I see
{{{
if (reason == END_CIRC_AT_ORIGIN) {
if (!CIRCUIT_IS_ORIGIN(circ)) {
log_warn(LD_BUG, "Specified 'at-origin' non-reason for ending circuit, "
"but circuit was not at origin. (called %s:%d, purpose=%d)",
file, line, circ->purpose);
}
reason = END_CIRC_REASON_NONE;
}
}}}
Does that mean saying END_CIRC_AT_ORIGIN is the safer plan here? How come we don't use ND_CIRC_AT_ORIGIN more often? If we opt not to use it here, should we just get rid of it?
Not sure it's actually a safer plan. I don't know why we don't use it more often but providing the real reason on why we are closing the circuit seems a better option instead of having it changed for us. IMO, this is the caller responsability to provide the right reason, not the callee that set it to something else for some conditions.
Is "(ref: legacy/trac#13667 (moved))" the way we want to point people at the bugtracker? I know in the past nickm has wanted to make sure that the code stands independently of the bugtracker, but I think it does in this case.
Clarified by nickm on IRC, I removed it :).
"Note that this does not mitigate port scanning" -- I would say it actually does mitigate it. It just doesn't completely solve it.
Fixed.
If you happen to reformat the comments and they ended up a few columns shorter, I wouldn't object. :)
Just for you, I setup my vim for the entire tor code base to use 76 char :D.
The first two rows of your changes file want to right-shift two spaces, and s/immetiately/immediately/
Hrm on second thought, connection_exit_begin_conn() specifies that it should return a negative reason if we want the caller to close the circuit. However, fun fact, END_CIRC_REASON_NONE == 0 and closing the circuit right in that function is actually against the "rule" ... Thus returning "END_CIRC_AT_ORIGIN" could be more appropriate since it's value is -1.
IMO, this function should close the circuit itself on error or, for some reason, if we don't want that, set explicitely an err. reason in a variable and return -1 if we want the circuit to be closed. The NONE reason is 0 and also a valid reason to actually close a circuit. I don't see any palces where we send back a NONE and want to close the circuit but it's error prone in the long run...
We could add a higher-value return code that gets sent as NONE? Like, if we return -260 (value chosen at random), that could mean "close and send END_CIRC_REASON_NONE." I agree this probably needs api rethinking...
That means you have two values in the ABI for the same return code but behaves differently which will probably end up confusing some people when adding code or reviewing it I think. And it gets crazier when at some point in time you end up to -260 for a legitimate reasons (or other code). I feel that this could simply be an other ticket for a discussion on that and at some point a fix.
However, the fix for this current issue for now, I feel should be closing the circuit right away with NONE (current patch in bug13667_025_v2) or returning END_CIRC_AT_ORIGIN for a negative value and let the caller close the circuit.
Ok for this to move forward, I went with the "return negative circuit reason" so it can be close by the caller which is what is documented for connection_exit_begin_conn()