From #21551 (moved): "Should we let users browser CollecTor files on the Tor Metrics website rather than in Apache-generated directory listings on the CollecTor page? My suggestion: I'd say yes, but that can be done in a subsequent patch. Also a fun project." -- "Yes, new ticket!"
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Once this is merged and deployed we can remove all web elements from CollecTor that have the sole purpose of making its directory listings prettier.
This gain does not justify implementing the JSON-index-handling a third time, in addition with slight differences. Even now as we're working toward using an internal API of metrics-lib and metrics-lib already supplies convenient methods for retrieving index.json*.
(metrics-lib has a way better coverage in addition.)
The change to the FileNode class only seems to concern the comparison. This leads to the question of either using this type of comparison also in metrics-lib (small new ticket) or simply providing a Comparator for the sorting needed here.
Heh, the good old index-handling classes discussion. We really disagree how to move forward there, don't we? :)
So, I went back and forth about this, too.
On the one hand the corresponding classes in metrics-lib are not part of the API yet, and they're not ready for that. (For example, I'd like to avoid that applications instantiate their own *Node classes. My preferred model would be that we provide some kind of IndexGenerator that takes one or more directories from the file system and generates a *Node structure and index.json* files from that. And an IndexParser would take an index.json file as input (or download it from a server) and provide a *Node structure as output. And all *Node classes would be interfaces, not classes, and they'd be located in packages that are part of the API.) But I can't really work on finalizing these classes enough to make them part of the API at this point. I'm trying to clean up a few things before the next Tor meeting, and making Tor Metrics the only user-facing website is one of them.
On the other hand I agree that duplicating or triplicating code is bad. To be fair, we're talking about 20 lines of code here, so the risk for bugs is relatively low (though not zero).
I can see two ways forward: a) We accept these 20 lines of duplicate code for now and plan to replace those classes with ones from metrics-lib once they're part of the API. b) We use types from metrics-lib despite them not being part of the API yet and put up a big warning sign that things might break if metrics-lib changes.
What's your preference?
Did you find anything else, or did you not look further after finding these internal *Node classes? :)
Hmmmmmm. While reviewing #23286 (moved) I thought about an alternative c): We move the org.torproject.descriptor.index package from metrics-lib to metrics-base where we call it org.torproject.metrics.index and re-use those classes in all code bases relying on metrics-base.
We might add even more commonly used classes to metrics-base as long as they are used by more than one code base. The difference to metrics-lib is that metrics-base is not an API to be used by products outside of the metrics space. If somebody uses that code, we change something, and their code breaks, it's not our fault. In contrast to that, keeping index things in metrics-lib and exposing them in the API, and maybe even adding more parts in the future, would lead to making it harder to maintain that interface in the future.
On the positive side, I'd be a lot less picky about the model for the index package if it's only contained in metrics-base and not in metrics-lib.
Hmmmmmm. While reviewing #23286 (moved) I thought about an alternative c): We move the org.torproject.descriptor.index package from metrics-lib to metrics-base where we call it org.torproject.metrics.index and re-use those classes in all code bases relying on metrics-base.
We might add even more commonly used classes to metrics-base as long as they are used by more than one code base. The difference to metrics-lib is that metrics-base is not an API to be used by products outside of the metrics space. If somebody uses that code, we change something, and their code breaks, it's not our fault. In contrast to that, keeping index things in metrics-lib and exposing them in the API, and maybe even adding more parts in the future, would lead to making it harder to maintain that interface in the future.
On the positive side, I'd be a lot less picky about the model for the index package if it's only contained in metrics-base and not in metrics-lib.
What do you think?
Most of the time trac tickets are useful for discussion. This time I feel we have many tickets with the same discussion, dropping different parts of the various threads. Let's keep this discussion for the next team meeting.
I'll add a summary and more thoughts to the pad :-)
I'm wondering about the wait-construct when no index is available:
The thread could be interrupted for any reason a short time (e.g. 5ms) after calling wait and the index preparation might not be done. And, more likely, it is not interrupted and then there is a wait of 10 seconds, but I'd guesstimate index creation takes way less than one second. Thus, it might be useful to 1) embed the wait in a loop checking if the waiting time elapsed or if there is an index available.
2) Secondly, the actual call to wait should use only 200 ms (or some other amount way smaller than 10000) as value.
There are three format* methods together with extractDirectoryListings, which do not depend on the actual object and should be made static. As these format* plus extractDirectoryListings methods prepare the output, I really would like to see tests here. This is not an empty exercise, but facilitates easier maintenance in future and enables review in first place.
I'm wondering about the wait-construct when no index is available:
The thread could be interrupted for any reason a short time (e.g. 5ms) after calling wait and the index preparation might not be done. And, more likely, it is not interrupted and then there is a wait of 10 seconds, but I'd guesstimate index creation takes way less than one second. Thus, it might be useful to 1) embed the wait in a loop checking if the waiting time elapsed or if there is an index available.
2) Secondly, the actual call to wait should use only 200 ms (or some other amount way smaller than 10000) as value.
Agreed about the loop.
But keep in mind that the timeout is for the time between sending a request for an index.json file until receiving and processing the response. If this were a local operation I'd say 200 ms are enough, but with the possible network delay I'd say we need to give it at least a few seconds. If 10000 ms are too much, how about 2000 ms?
There are three format* methods together with extractDirectoryListings, which do not depend on the actual object and should be made static. As these format* plus extractDirectoryListings methods prepare the output, I really would like to see tests here. This is not an empty exercise, but facilitates easier maintenance in future and enables review in first place.
Agreed.
This is on my list now. Will post a revised branch once I have one. (Thanks!)
I'm wondering about the wait-construct when no index is available:
The thread could be interrupted for any reason a short time (e.g. 5ms) after calling wait and the index preparation might not be done. And, more likely, it is not interrupted and then there is a wait of 10 seconds, but I'd guesstimate index creation takes way less than one second. Thus, it might be useful to 1) embed the wait in a loop checking if the waiting time elapsed or if there is an index available.
2) Secondly, the actual call to wait should use only 200 ms (or some other amount way smaller than 10000) as value.
Agreed about the loop.
But keep in mind that the timeout is for the time between sending a request for an index.json file until receiving and processing the response. If this were a local operation I'd say 200 ms are enough, but with the possible network delay I'd say we need to give it at least a few seconds. If 10000 ms are too much, how about 2000 ms?
I was referring to the wait-statement not the entire waiting time:
do wait(200)until(10 sec are over or an index is available)
The checks in the 'until' part are cheap and will improve response times. I assume we'll hardly ever reach the ten second limit.
In addition, there could be some logging (on debug) in order to adjust the values later.
There are three format* methods together with extractDirectoryListings, which do not depend on the actual object and should be made static. As these format* plus extractDirectoryListings methods prepare the output, I really would like to see tests here. This is not an empty exercise, but facilitates easier maintenance in future and enables review in first place.
Agreed.
This is on my list now. Will post a revised branch once I have one. (Thanks!)
I haven't looked at the code yet, but I think we don't have to reduce the wait time inside the loop to 200 ms, because we're being notified as soon as the index.json is successfully parsed.
Adding some logging there seems reasonable. In particular the total wait time when downloading and processing index.json for the first time after startup is something we should learn by this.
Will keep your idea of encapsulating things in mind when looking at this code.
I cannot find a test for static List<String[]> formatTableEntries, did you forget a commit?
The waiting time shouldn't be extended unnecessarily:
- } while (System.currentTimeMillis() < waitingSinceMillis + 10000L);+ } while (null == this.index+ && System.currentTimeMillis() < waitingSinceMillis + 10000L);
In addition, it might be safer and shorter to simply use AtomicReference instead of the synchronized methods.
Actually, I was hinting for a separate DirectoryListing class (in comment:14), but separating the retrieval of the index is important, too.
The separation you provided made it easy to introduce a CollectorDirectoryProvider, which fetches and provides the index incl. the suggested use of AtomicReference and the second while-condition. DirectoryListing is a Map<String, List<String[]>> and encapsulates the conversion from Index* to lists of strings. CollectorDirectoryProvider only manages the scheduler and handing out the index.
Another test testListing needs close inspection; it uses a fantasy index.json and checks the resulting map.
(I didn't change the byte-calculations, b/c they are only 'decoration' and the compiler will make sure that a constant is created for the Math.log(1024).)
I cannot find a test for static List<String[]> formatTableEntries, did you forget a commit?
I did not include one, because I didn't find an easy way to test this method. But I like your new test method and think it already covers this method quite well. Do you think we need to test more?
The waiting time shouldn't be extended unnecessarily:
{{{
} while (System.currentTimeMillis() < waitingSinceMillis + 10000L);
In addition, it might be safer and shorter to simply use AtomicReference instead of the synchronized methods.
Works for me.
Actually, I was hinting for a separate DirectoryListing class (in comment:14), but separating the retrieval of the index is important, too.
The separation you provided made it easy to introduce a CollectorDirectoryProvider, which fetches and provides the index incl. the suggested use of AtomicReference and the second while-condition. DirectoryListing is a Map<String, List<String[]>> and encapsulates the conversion from Index* to lists of strings. CollectorDirectoryProvider only manages the scheduler and handing out the index.
Looks good to me!
Another test testListing needs close inspection; it uses a fantasy index.json and checks the resulting map.
(I didn't change the byte-calculations, b/c they are only 'decoration' and the compiler will make sure that a constant is created for the Math.log(1024).)
I only made a few trivial tweaks in my updated task-22836 branch that is based on yours. If you like them, do you mind me squashing all commits into one before rebasing and pushing to master?
I cannot find a test for static List<String[]> formatTableEntries, did you forget a commit?
I did not include one, because I didn't find an easy way to test this method. But I like your new test method and think it already covers this method quite well. Do you think we need to test more?
Yes, only a different structure made ease of testing possible. If there had been another test, I would have suggested merging the test data.
In general, there are never enough tests ;-) All fine, until the next bug is discovered.
...
I only made a few trivial tweaks in my updated task-22836 branch that is based on yours. If you like them, do you mind me squashing all commits into one before rebasing and pushing to master?
FWIW, I just kicked out all of CollecTor's webapp stuff, which was one of the goals of this ticket. CollecTor's directory listings are still available, but they're generated by Apache without any additional styling provided by us. This seems fine to me, given that we don't expect the average user to browse that website anyway.
FWIW, I just kicked out all of CollecTor's webapp stuff, which was one of the goals of this ticket. CollecTor's directory listings are still available, but they're generated by Apache without any additional styling provided by us. This seems fine to me, given that we don't expect the average user to browse that website anyway.
That's true, but I would mention the removal of all styling resources (fonts, css, etc.) rather than calling it 'un-prettify' (which is a matter of taste) in the changelog.