Skip to content
Snippets Groups Projects

Revisions to rpc-meta-draft

Merged Nick Mathewson requested to merge nickm/arti:rpc-draft-revision into main

This MR revises rpc-meta-draft for accuracy and up-to-dateness. It doesn't yet cover RPC-SOCKS integration; that's getting discussed in an arti comment till it matures.

Merge request reports

Pipeline #203250 passed with warnings

Pipeline passed with warnings for 7410d5cd on nickm:rpc-draft-revision

Approved by

Merged by Nick MathewsonNick Mathewson 6 months ago (Sep 11, 2024 5:29pm UTC)

Merge details

  • Changes merged into main with 464942e9.
  • Deleted the source branch.
  • Auto-merge enabled

Pipeline #203282 failed

Pipeline failed for 464942e9 on main

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
  • Ian Jackson
    Ian Jackson @Diziet started a thread on an outdated change in commit c41cf906
  • 673 675 > we always want a correct authentication handshake to complete
    674 676 > before we allow any requests to be handled,
    675 677 > even if the stream itself is such
    676 > that no authentication should be requires.
    678 > that no authentication should be required.
    677 679 > This helps prevent cross-protocol attacks in cases
    678 680 > where things are misconfigured.
    679 681
    682 > TODO-RPC: Make sure "close on any error" is implemented,
    683 > and test it.
    684 > It is essential for security.
    • Comment on lines +682 to +684

      Why is it essential for security? Why is this rule there at all? My initial feeling is that this rule is simply wrong, and there is no reason to close the connection when authentication fails. But I'm open to an explanation.

    • So, have a look at #1591 (closed).

      This is a defense-in-depth mechanism against framing attacks where an adversary can get some other protocol (say, HTTP from a browser) to open a connection to a TCP port where the RPC server is listening. Without this defense, Arti would gives a syntax error for each HTTP header, and then process the payload in the actual HTTP body.

      Early versions of the Tor control port could be configured to be vulnerable to this attack.

      Now, this attack is not sufficient on its own to bypass our security, if we have done our job: We should never allow a TCP connection to authenticate without proving knowledge of some secret or possession of some capability. Additionally, our mostly-unpredictable Object IDs make this attack harder to mount. But still, rejecting the connection early is IMO useful defense-in-depth against the possibility that we make a design error or enable a configuration error in the future.

      (One example of a possible configuration error: Suppose that we enabled HTTPS with client certificates and no further authentication needed. If a misguided user accidentally gave installed their client cert in their browser, they'd make themselves vulnerable to this attack maybe.)

      That said, I think it would be adequate to change the role so that it said that we close the connection on any pre-authentication syntax error. WDYT?

    • Oh. Thanks for the explanation.

      We definitely need to do something about this.

      That said, I think it would be adequate to change the role so that it said that we close the connection on any pre-authentication syntax error. WDYT?

      I think that is a better approach. There could be a good reason for sending commands that (sometimes) cause authentication errors, but I see no possible good reason for sending JSON syntax errors. (Yes, even considering the existence of possible JSON parsing disagreements.)

      I would be very happy if we said that any JSON parsing error would immediately terminate the connection. That would also defend against certain framing attacks, if an incautious client assembles a request by string concatenation of partially-trusted content.

    • Nick Mathewson changed this line in version 4 of the diff

      changed this line in version 4 of the diff

    • Changed the rule in 3c1747af. The new rule is "the server must close the connection whenever it gets a json syntax error," which is I think what you suggested. Also commented on #1591 (closed).

    • Yes. The new text looks good to me.

    • Please register or sign in to reply
  • Ian Jackson
    Ian Jackson @Diziet started a thread on commit 744ffdec
  • 731 731
    732 732 ### Cancellation
    733 733
    734 > TODO: take a request ID (as usual),
    735 > and the ID of the request-to-cancel as a parameter.
    736 >
    737 > (Using the 'id' as the subject of the request is too cute IMO,
    738 > even if we change the request's meaning to
    739 > "cancel every request with the same id as this request".)
    734 > At present (Sep 2024)
    735 > a cancellation mechanism is not implemented.
    740 736
    741 737 To try to cancel a request,
    742 there is a "cancel" method, taking arguments of the form:
    738 the RPC session object supports
    739 an `rpc:cancel` method, taking parameters of the form:
    • Comment on lines +738 to +739

      This should surely be a method on the connection, not on the session?

      You might authenticate more than once and get more than one session.

      (Or to put it another way: the session object is a capability for doing actual Stuff. The connection object is there to be the root object, which is always available, and necessarily a singleton per transport connection; it's that singleton that cancellation must be on, since commands don't necessarily know what their session is.)

    • This should surely be a method on the connection, not on the session?

      Indeed... otherwise there is no way to cancel requests that are not associated with a session (like auth:authenticate).

    • Hmm.

      (As an aside, I haven't implemented "authenticate more than once and get more than one session" — now that you mention it, I agree that it might be desirable to provide someday, although I hope we can agree that "authenticate only once" is okay behavior for now.)

      While I do agree that putting methods like "cancel" on the connection might make sense semantically, I don't know if I agree 100% with doing so. With my security hat on, I feel like it's a better idea to just put everything onto the session. Every time we put a method on the connection, we expose a piece of possible attack surface to an unauthenticated user, and we need to audit it harder for pre-authentication attacks.

      (This isn't just about cancel: ObjectId-manipulation methods likely also belong wherever we put "cancel", so there will be more than one piece of code to audit)

      As a middle ground (possibly satisfying nobody) we could say that the "cancel" method (and possibly others) are not available until authentication has been called? But that's still a bunch of attack surface to audit, and it's not exactly good capability-based design.

    • I do see your concern about pre-authentication attack surface, and that would make sense for most commands. But the cancel command is a piece of protocol framing, not a real method with actual functionality.

      I agree with @amjoseph's point that we might want to cancel an authentication request.

      haven't implemented "authenticate more than once and get more than one session"

      I failed to spot that, sorry. I think this is a bug.

    • But the cancel command is a piece of protocol framing, not a real method with actual functionality.

      Would you say the same thing about object-ID-related commands like "upgrade reference" and "downgrade handle" and "drop handle"? After all, object IDs are connection-local, not session-local.

      If you would not say the same thing about Object IDs, then I'm okay with this change.

      If you would say the same thing about object IDs, then I think I want to stick with my argument about attack surface.

      I failed to spot that, sorry. I think this is a bug.

      I'm okay opening a ticket for it, but I don't consider it must-implement. (I don't feel strongly whether we call it a "known deficiency" or a "unimplemented feature".)

    • Would you say the same thing about object-ID-related commands like "upgrade reference" and "downgrade handle" and "drop handle"? After all, object IDs are connection-local, not session-local.

      If you would not say the same thing about Object IDs, then I'm okay with this change.

      I think these are all methods on the object ID. Is there some reason why they couldn't or shouldn't be?

      In principle cancellation could be a method on the object that the original method was issued on but that seems to be inviting many canned worms to visit.

    • I think these are all methods on the object ID. Is there some reason why they couldn't or shouldn't be?

      I guess they could be, but this does raise a two implementation problems and one security qualm:

      1. Our dispatch system currently only believes that there are methods defined on objects; it doesn't have a notion of catch-all methods. That would be easy enough to fix, however.

      2. Our dispatch system currently only passes an Arc<dyn Object> to the object being invoked, and not the ObjectId by which that object was found. This would also be easy to change.

      3. If these were a catch-all methods that always existed on any object, they would be additional parts of the pre-authentication surface of the RPC system, which does kinda make me worried. (I can imagine a future where we "innocently" add a method to the Connection that creates a very safe non-session object, and "innocently" add a method that applies to all objects, then an attacker combines the two to make a pre-auth DoS.)

      That said, if we can come up with some defense-in-depth against 3 I'd feel better here.

    • I think these special methdods that do object namespace management can be implemented with ad-hoc code in tor-rpcbase, and do not need to go through the dispatch system.

    • I think these special methdods that do object namespace management can be implemented with ad-hoc code in tor-rpcbase, and do not need to go through the dispatch system.

      I agree with this, and I think it would solve my 1 and 2.

      With my 3 I think it would suffice to say that "we need to think about pre-authentication surface" as we build these methods, and remember this as we design and implement them.

      I think this solves the question here, and I'm happy moving "cancel" to the connection.

    • Updated cancel in d263cb25

    • With my 3 I think it would suffice to say that "we need to think about pre-authentication surface" as we build these methods, and remember this as we design and implement them.

      Right, I definitely agree. (If you find a good place to put it, you could write an imprecation in a comment somewhere in tor-rpcbase maybe.)

      I think this solves the question here, and I'm happy moving "cancel" to the connection.

      OK. great, thanks.

    • Please register or sign in to reply
  • Ian Jackson
  • Ian Jackson
  • I like most of this.

    Where I have qualms I've tried to explain why, but if more explanation is needed please say. As ever, I'm open to being convinced to change my mind.

  • amjoseph
  • 640 749 A successful response is the empty JSON object.
    641 750 If a successful response is sent,
    642 751 then the request was canceled,
    643 752 and an error was sent for the canceled request.
    • Maybe it would be easiest to have a fourth response type "cancelled"?

      If not, I think there ought to be a code for "this request was cancelled".

    • Cancellation is an error; allocating a code for it seems like a good idea. (I think we may have one, but I'll check and document it if we do.)

    • Yes, I think calling it an error is the right answer.

      If we have situations where the work has been partially done and we want to report that and the update mechanism isn't suitable, we can put additional stuff into the error payload.

    • Added a new code with cf0c68af

      We say that we "do not anticipate regularly extending" the list of codes, but this seems like an important enough thing not to be "regular".

    • Yes, absolutely. Thanks.

    • I don't think I can resolve threads, but as the person who started this one, I consider it resolved.

    • Please register or sign in to reply
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading