Skip to content

mmap geoip databases, restrict countries to 2-alpha + ??

Alex Xu requested to merge Hello71/tor:geoip-mmap into main

Using mmap saves about 10 MB of anonymous memory. Using a binary format also cuts the uncompressed disk space of GeoIP databases in half (~5 MB) too; compressed size shrinks by 5-15% (since GeoIP databases are highly compressible, there is more benefit at lower compression levels). A B-tree would be more efficient but are too complicated to implement in C and aren't justified by the infrequent geoip lookup.

Officially restricting countries to 2-alpha allows arrays to be statically instead of dynamically allocated, saving more RAM and reducing code complexity, in particular by removing routerset_refresh_countries and many smartlists. Additionally, IPFire has removed A1 country code, so we don't need to store or handle it anymore. Also, stop storing ?? country in GeoIP files, since Tor smushes "no data for this IP" and "no country for this IP" together anyways.

A previous version of this change emitted binary GeoIP files directly from geoip-db-tool. While simpler, that approach does not work well with git; the default diff is useless, making it hard to see what is changing in the GeoIP database; additionally, git can potentially compress text deltas more efficiently than binary deltas. Awk was selected for convert_geoip because it is already a requirement for configure, and unlike C, does not interfere with cross-compilation.

Merge request reports