- Truncate descriptions
Activity
First part is to review and accept the specification for a new command in the control-spec.
See branch
ticket14847_01
in https://git.torproject.org/user/dgoulet/torspec.git for the patch adding theHSFETCH
command.Trac:
Status: new to needs_reviewReplying to nickm:
Suggestions:
- the server should be specified with digest only; nicknames are deprecated.
- I'd recommend server=HEXDIGEST rather than just HEXDIGEST; that way, other extensions are easier in the future.
I actually used "Fingerprint" which is already defined in the spec for the "server=" value.
Fixed both in branch
ticket14847_02
.Open question, should this command also trigger an
HSDESC
event? I think not because this command should be "self contained".Replying to nickm:
I think maybe it should; and probably it should be nonblocking. (Current designs in control world are that commands which might not finish immediately, and which cause Tor to hit the network, need to not block for the thing to finish.)
This is not intended to block since HSDir fetch is asynchronous anyway. That means probably that "250-" replies should be replaced by "650-" then?
Replying to arma:
haven't looked at the branch yet, but, i think a controller command to initiate the fetch, and if you want to see the answer you should listen for hsdesc events, is a totally reasonable design.
That means no descriptor content dump but maybe that's fine considering that you can ask that the fetch results are cached and then use #14845 (moved) to get the content. Sounds much more simpler!
Hi David, nice spec! I'm looking forward to finally having a reason to add HS Descriptor parsing support to Stem.
"HSFETCH" SP HSAddress *(SP "server=" Server) (SP "cache=" Cache) CRLF
Please use square brackets. That's what we usually do for optional keyword arguments...
Also, lets use all caps for 'SERVER=' and 'CACHE='.
If one or more Server are given, they are used instead.
What happens when you specify multiple? Does it pick among them randomly?
If Cache is specified, the value "yes" means that the result will be cached on the client.
Cached for how long? Permanently? Or do HS descriptors have a valid-until date? Can the cache be cleared?
The HS_DESC event should be used to get the results of the fetches.
How long does it take to retrieve a hidden service descriptor in practice? This is a lot clunkier for controllers. How about a 'BLOCKING=n' call for "I'm willing to wait up to n ms to get this descriptor"?
Replying to dgoulet:
Ok, with all the comments above, here is a much simpler version.
See branch
ticket14847_03
.What's the reasoning for the cache=yes or cache=no part? That is, why not just let rend_cache_store_v2_desc_as_client(() look at the hsdesc you get back and decide whether to keep it based on whatever rules it uses now ("I don't have a newer one, etc")?
I think to do the async thing here we'll want to extend the HSDESC event to just tell you the descriptor right then. Otherwise there's a three step process ("initiate the launch", "notice the event", "getinfo the response"), and if you initiate a bunch of launches, and then get a bunch of events, you'll only be able to getinfo one of them, and you might not even know which one it was, etc.
(Whether you extend the HSDESC event always, or add a separate HSDESC_AND_DUMP event, or what, is a matter of taste that I will leave to you and Nick if you like this approach.)
Now that I think about it, there may also be some adventure here with all of the implicit "oh a failure just happened, that means I should launch this other action" logic in hsdesc fetches. Maybe this is a good time to clean up some of that logic, or maybe it will turn out to be easier than we think to work with it. Or I guess option three is that this will just be no fun. :)
Replying to atagar:
{{{ If one or more Server are given, they are used instead. }}}
What happens when you specify multiple? Does it pick among them randomly?
I kind of thought that it would cause Tor to initiate multiple fetches, one from each.
But, good question. I wonder if that feature is valuable enough for the complexity, compared to just making the controller send you one HSFETCH per fetch you want it to launch.
Or maybe David did indeed mean to choose just one.
{{{ The HS_DESC event should be used to get the results of the fetches. }}}
How long does it take to retrieve a hidden service descriptor in practice? This is a lot clunkier for controllers. How about a 'BLOCKING=n' call for "I'm willing to wait up to n ms to get this descriptor"?
That's exactly what we've been heading away from with the async approach. That said, David, it would indeed be nice to give the controller writers some guess about how long they might need to wait until they see their HSDESC received or failed. I think the answer is "it's like fetching a thing over the Tor network -- typically pretty fast, but sometimes 5 to even 60 seconds."
That's exactly what we've been heading away from with the async approach.
Present world for descriptor fetching is...
- Controllers can make a simple, synchronous request to read cached descriptors.
- Scripts can contact a dirauth's DirPort to actively fetch the fresh thing.
This is nice. It means scripts can piggyback on a cache or download what they need, and in either case it's simple and synchronous.
If we go with an asynchronous approach the first method people will want is a simple blocking 'I want a hidden service descriptor, give it to me' method. If it doesn't live in tor it'll be in stem and that's fine. We already do something similar with creating circuits - tor doesn't provide a blocking method so stem adds a listener and waits for the event indicating that it's done...
https://gitweb.torproject.org/stem.git/tree/stem/control.py#n2705
Just makes for a more interesting dance on my end.
Please be very, very careful though that a HSDESC is always emitted, 1:1, with a call of this method. If there's any use case where the controller doesn't get either a success or failure message it'll be left hanging indefinitely.
Replying to arma:
Replying to dgoulet:
Ok, with all the comments above, here is a much simpler version.
See branch
ticket14847_03
.What's the reasoning for the cache=yes or cache=no part? That is, why not just let rend_cache_store_v2_desc_as_client(() look at the hsdesc you get back and decide whether to keep it based on whatever rules it uses now ("I don't have a newer one, etc")?
The original idea was to give a choice to the user to keep the fetched descriptor or not. However, if we go with a new HSDESC_* event to dump the content when it arrives, the "cache=" part could be removed and by default keeps the latest.
I think to do the async thing here we'll want to extend the HSDESC event to just tell you the descriptor right then. Otherwise there's a three step process ("initiate the launch", "notice the event", "getinfo the response"), and if you initiate a bunch of launches, and then get a bunch of events, you'll only be able to getinfo one of them, and you might not even know which one it was, etc.
(Whether you extend the HSDESC event always, or add a separate HSDESC_AND_DUMP event, or what, is a matter of taste that I will leave to you and Nick if you like this approach.)
I'm not too familiar what are the best practices but could we do something like this with the EXTENDED events feature?
C: SETVENTS HS_DESC DUMP=yes [...] S: 650 HS_DESC RECEIVED ... S: 650 HS_DESC EXTENDED DUMP xyz.onion <dump here>
Now that I think about it, there may also be some adventure here with all of the implicit "oh a failure just happened, that means I should launch this other action" logic in hsdesc fetches. Maybe this is a good time to clean up some of that logic, or maybe it will turn out to be easier than we think to work with it. Or I guess option three is that this will just be no fun. :)
I know... this is why I want this command accepted asap so I can start working on it. The HS fetch code is very "monolithic" in a way that it's a big block that does a lot of diffrent things. It would need to be much more modularized so we can cherry-pick the actions we need for this command and not really go through the normal process of fetching a descriptor right now.
Replying to arma:
Replying to atagar:
{{{ If one or more Server are given, they are used instead. }}}
What happens when you specify multiple? Does it pick among them randomly?
I kind of thought that it would cause Tor to initiate multiple fetches, one from each.
But, good question. I wonder if that feature is valuable enough for the complexity, compared to just making the controller send you one HSFETCH per fetch you want it to launch.
Or maybe David did indeed mean to choose just one.
"They are used instead", I meant by that if there are more than one Server specified, they are all used and a fetch is triggered on all of them
{{{ The HS_DESC event should be used to get the results of the fetches. }}}
How long does it take to retrieve a hidden service descriptor in practice? This is a lot clunkier for controllers. How about a 'BLOCKING=n' call for "I'm willing to wait up to n ms to get this descriptor"?
That's exactly what we've been heading away from with the async approach. That said, David, it would indeed be nice to give the controller writers some guess about how long they might need to wait until they see their HSDESC received or failed. I think the answer is "it's like fetching a thing over the Tor network -- typically pretty fast, but sometimes 5 to even 60 seconds."
Yup, "few seconds" up to a dir request timeout of ? (I don't know the value here). Should be added to the spec!
Ok I took a stab at it so we can go forward. Pretty sure this is not the "silver bullet" we are looking for but I think it's a good start considering the previous discussion.
I've basically added a new event called
HS_DESC_CONTENT
and removed thecache=
part of theHSFETCH
command. Also, I fixed the issues raised by atagar in comment:10.See branch
ticket14847_04
in https://git.torproject.org/user/dgoulet/torspec.gitGood discussion with arma and weasel on IRC, here is the new branch fixing what has been discussed.
See branch
ticket14847_05
in https://git.torproject.org/user/dgoulet/torspec.gitThis new version adds the DescID, Replica and TimePeriod option to the HSFETCH command. After a discussion on IRC with arma, turns out it would be very useful to have the ability to control these.
See branch
ticket14847_06
in https://git.torproject.org/user/dgoulet/torspec.git