Reduce memory overhead for storing uptimes
When I re-imported 2 years of bridge network statuses to fix legacy/trac#20994 (moved), I noticed that this is only possible when importing descriptors in batches of a few months. This is pretty bad, because it also means that we wouldn't be able to start a new Onionoo instance from scratch. And a new operator might not even figure out what's wrong and assume that Onionoo contains a serious bug. (Maybe not a totally unreasonable assumption.)
Today I found the reason for this problem: UptimeStatusUpdater
stores sets of Long
values of bridge network status publication times, and that simply does not scale for more than a few months of statuses. Unfortunately, we need to store all these publication times in memory, because opening and updating status files for each fingerprint contained in a status also does not scale, for a different reason.
But we can do better here. We can apply the same trick that we're using for storing relay flag strings, that is, resolve strings to indexes using a single, static map, and store indexes in a bitset rather than strings in a set. Here's a possible patch:
diff --git a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
index 2b5f5fc..6280902 100644
--- a/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
+++ b/src/main/java/org/torproject/onionoo/updater/UptimeStatusUpdater.java
@@ -86,14 +86,18 @@ public class UptimeStatusUpdater implements DescriptorListener,
}
}
+ private static Map<Long, Integer> dateHourMillisToIndex = new HashMap<>();
+
+ private static Map<Integer, Long> dateHourIndexToMillis = new HashMap<>();
+
private SortedMap<Long, Flags> newRelayStatuses = new TreeMap<>();
private SortedMap<String, SortedMap<Long, Flags>>
newRunningRelays = new TreeMap<>();
- private SortedSet<Long> newBridgeStatuses = new TreeSet<>();
+ private BitSet newBridgeStatuses = new BitSet();
- private SortedMap<String, SortedSet<Long>>
+ private SortedMap<String, BitSet>
newRunningBridges = new TreeMap<>();
private void processRelayNetworkStatusConsensus(
@@ -123,13 +127,18 @@ public class UptimeStatusUpdater implements DescriptorListener,
if (!fingerprints.isEmpty()) {
long dateHourMillis = (status.getPublishedMillis()
/ DateTimeHelper.ONE_HOUR) * DateTimeHelper.ONE_HOUR;
+ if (!dateHourMillisToIndex.containsKey(dateHourMillis)) {
+ dateHourIndexToMillis.put(dateHourIndexToMillis.size(), dateHourMillis);
+ dateHourMillisToIndex.put(dateHourMillis, dateHourMillisToIndex.size());
+ }
+ int dateHourIndex = dateHourMillisToIndex.get(dateHourMillis);
for (String fingerprint : fingerprints) {
if (!this.newRunningBridges.containsKey(fingerprint)) {
- this.newRunningBridges.put(fingerprint, new TreeSet<Long>());
+ this.newRunningBridges.put(fingerprint, new BitSet());
}
- this.newRunningBridges.get(fingerprint).add(dateHourMillis);
+ this.newRunningBridges.get(fingerprint).set(dateHourIndex);
}
- this.newBridgeStatuses.add(dateHourMillis);
+ this.newBridgeStatuses.set(dateHourIndex);
}
}
@@ -140,17 +149,19 @@ public class UptimeStatusUpdater implements DescriptorListener,
this.updateStatus(true, e.getKey(), e.getValue());
}
this.updateStatus(true, null, this.newRelayStatuses);
- for (Map.Entry<String, SortedSet<Long>> e :
- this.newRunningBridges.entrySet()) {
+ for (Map.Entry<String, BitSet> e : this.newRunningBridges.entrySet()) {
+ BitSet dateHourIndexes = e.getValue();
SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
- for (long dateHourMillis : e.getValue()) {
- dateHourMillisNoFlags.put(dateHourMillis, null);
+ for (int i = dateHourIndexes.nextSetBit(0); i >= 0;
+ i = dateHourIndexes.nextSetBit(i + 1)) {
+ dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
}
this.updateStatus(false, e.getKey(), dateHourMillisNoFlags);
}
SortedMap<Long, Flags> dateHourMillisNoFlags = new TreeMap<>();
- for (long dateHourMillis : this.newBridgeStatuses) {
- dateHourMillisNoFlags.put(dateHourMillis, null);
+ for (int i = this.newBridgeStatuses.nextSetBit(0); i >= 0;
+ i = this.newBridgeStatuses.nextSetBit(i + 1)) {
+ dateHourMillisNoFlags.put(dateHourIndexToMillis.get(i), null);
}
this.updateStatus(false, null, dateHourMillisNoFlags);
}
But let's not merge this yet, as it only solves this particular problem of importing lots of bridge network statuses, and maybe there's an ever better way to fix this.
Let's rather look at this (possible) improvement together with earlier improvements towards memory efficiency, and let's think about what remains to be done.
For example, should we avoid certain data structures (like TreeSet/TreeMap
) and replace them with more memory-efficient data structures?
Are there other tricks we should use more consistently?