Deprecate `DescriptorReader.setExcludeFiles()` and add two separate methods for loading and saving a history file
The history file implementation in DescriptorReader
has a design bug for much too long that I'd finally want to fix.
The issue is that DescriptorReader
writes the history file passed in setExcludeFiles()
immediately after reading and parsing the last descriptor and putting it into the queue, regardless of whether the application has finished processing those descriptors.
If the application fails after the history file is written, it may not be able to process descriptors in the next execution that have still been in the queue at the time of failing.
One way to reduce that gap would be to have BlockingIteratorImpl
tell DescriptorReader
when the application has first learned that hasNext()
returned false
or that next()
threw a NoSuchElementException
. The benefit of that solution would be that no interface change would be required. The downside would be that the application might not have fully cleaned up at the time of learning that there are no further descriptors available. In one example, the application would close a database import file, perform the database import, and only write the history file after learning that the database import has succeeded.
A better solution would be to decouple saving the history file to disk from completing the descriptor read operation. We would replace the setExcludeFiles()
method by a setHistoryFile()
and a saveHistoryFile()
method. Applications would use setHistoryFile()
before starting to read descriptors, process all descriptors, perform any cleaning up, and then call saveHistoryFile()
.
Here's an example of the code that this would save in applications that currently work around this known design bug by loading and saving history files themselves:
diff --git a/modules/hidserv/src/org/torproject/metrics/hidserv/Parser.java b/modules/hidserv/src/org/torproject/metrics/hidserv/Parser.java
index 2ef404e..215b0c9 100644
--- a/modules/hidserv/src/org/torproject/metrics/hidserv/Parser.java
+++ b/modules/hidserv/src/org/torproject/metrics/hidserv/Parser.java
@@ -96,33 +96,7 @@ public class Parser {
public void readParseHistory() {
if (this.parseHistoryFile.exists()
&& this.parseHistoryFile.isFile()) {
- SortedMap<String, Long> excludedFiles =
- new TreeMap<String, Long>();
- try {
- BufferedReader br = new BufferedReader(new FileReader(
- this.parseHistoryFile));
- String line;
- while ((line = br.readLine()) != null) {
- try {
- /* Each line is supposed to contain the last-modified time and
- * absolute path of a descriptor file. */
- String[] parts = line.split(" ", 2);
- excludedFiles.put(parts[1], Long.parseLong(parts[0]));
- } catch (NumberFormatException e) {
- System.err.printf("Illegal line '%s' in parse history. "
- + "Skipping line.%n", line);
- }
- }
- br.close();
- } catch (IOException e) {
- System.err.printf("Could not read history file '%s'. Not "
- + "excluding descriptors in this execution.",
- this.parseHistoryFile.getAbsolutePath());
- }
-
- /* Tell the descriptor reader to exclude the files contained in the
- * parse history file. */
- this.descriptorReader.setExcludedFiles(excludedFiles);
+ this.descriptorReader.setHistoryFile(this.parseHistoryFile);
}
}
@@ -130,33 +104,7 @@ public class Parser {
* and absolute paths to the parse history file to avoid parsing these
* files again, unless they change until the next execution. */
public void writeParseHistory() {
-
- /* Obtain the list of descriptor files that were either parsed now or
- * that were skipped in this execution from the descriptor reader. */
- SortedMap<String, Long> excludedAndParsedFiles =
- new TreeMap<String, Long>();
- excludedAndParsedFiles.putAll(
- this.descriptorReader.getExcludedFiles());
- excludedAndParsedFiles.putAll(this.descriptorReader.getParsedFiles());
- try {
- this.parseHistoryFile.getParentFile().mkdirs();
- BufferedWriter bw = new BufferedWriter(new FileWriter(
- this.parseHistoryFile));
- for (Map.Entry<String, Long> e
- : excludedAndParsedFiles.entrySet()) {
- /* Each line starts with the last-modified time of the descriptor
- * file, followed by its absolute path. */
- String absolutePath = e.getKey();
- long lastModifiedMillis = e.getValue();
- bw.write(String.valueOf(lastModifiedMillis) + " " + absolutePath
- + "\n");
- }
- bw.close();
- } catch (IOException e) {
- System.err.printf("Could not write history file '%s'. Not "
- + "excluding descriptors in next execution.",
- this.parseHistoryFile.getAbsolutePath());
- }
+ this.descriptorReader.saveHistoryFile(this.parseHistoryFile);
}
/** Set of all reported hidden-service statistics.
I'll push a branch to my repository in a moment.