Set up new web server logging and log analysis infrastructure
Moving phobos' comment from #1641 here, because it's really a new ticket:
A few random thoughts. In discussions with the EFF, they recommended this plan from a legal perspective, https://www.eff.org/wp/osp. This suggests we could log in the default CLF and purge the logs within 48 hours. This plan would save space and let us see summarized results.
I'd like to be able to see:
- visits and unique visitor counts by days of the week, day of the month
- top 25 pages by page view
- country of origin
- most viewed, entry, and exit pages
- downloads by package
- http errors
- referrers (sanitized if it includes PII)
- search engines, keyphrases and keywords
- overall traffic by webserver, if possible (is our load balancing working evenly?)
- top 10 paths through the site
- number of pages viewed per visit
- percentage and count of requests through tor exits
Or another idea is to see what we can provide from the feature list of http://awstats.sourceforge.net/ without disclosing PII.
And we should remember that this is more than just the logs for www.tpo, we have check, svn, gitweb, metrics, bridges, and trac websites to analyze.