Design file format and Python/Java library for multiple GeoIP or AS databases
Whenever we look up relay IP addresses in GeoIP or AS databases covering more than a few months, we can't just use a single database. There's a reason why these databases change over time. An IP address that is resolved to a certain country or AS may have belonged to a different country or AS a year ago. What we should really do is use multiple databases and look up the IP address in the database that was most recent at the given time.
How about we design a file format for multiple GeoIP or AS databases and write a Python library for using it? The library should allow us to:
- Convert a given GeoIP or AS database from Maxmind or using a similar format to our format.
- Merge a new single GeoIP or AS database into our format.
- Look up the country code/name or AS number/name of an IP address on a given date.
I'd say that non-functional requirements are as follows, from most important to least important: lookup speed, lookup speed, lookup speed, memory consumption, file size, library code complexity.
I found these archives of GeoIP and AS databases:
http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/