From 43867e29df75ab6fc046aff8c0167352394a813d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org> Date: Tue, 23 Jun 2020 14:56:58 -0400 Subject: [PATCH] update trac archive procedure to document how it was done precisely --- tsa/howto/gitlab.md | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/tsa/howto/gitlab.md b/tsa/howto/gitlab.md index 169928ca..f9a41bac 100644 --- a/tsa/howto/gitlab.md +++ b/tsa/howto/gitlab.md @@ -604,12 +604,29 @@ archive the list itself as well. Simultaneously, a full crawl of the entire site (and first level outgoing links) was started, with: - !a --explain "Trac migrated to GitLab, readonly" https://trac.torproject.org/ - -A list of excludes was added to ignore traps and infinite loops. The -crawl was slowed down with a 500-1000ms delay to avoid hammering the server. - -(TODO: add the actual exclude lists and commands.) + !a https://trac.torproject.org --explain "migrated to gitlab, readonly" --delay 500 + +A list of excludes was added to ignore traps and infinite loops: + + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query.*[?&]order=(?!priority) + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query.*[&?]desc=1 + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://gitweb\.torproject\.org/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/timeline\? + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?status=!closed&keywords= + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?status=!closed&(version|reporter|owner|cc)= + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?(.*&)?(reporter|priority|component|severity|cc|owner|version)= + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://cdn\.media\.ccc\.de/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://www\.redditstatic\.com/desktop2x/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/report/\d+.*[?&]sort= + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://support\.stripe\.com/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://cdn\.cms-twdigitalassets\.com/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://cypherpunks\:writecode@trac\.torproject\.org/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://login\.blockchain\.com/ + !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://dnsprivacy\.org/ + +The crawl was slowed down with a 500-1000ms delay to avoid hammering the server: + + !d bpu6j3ucrv87g4aix1zdrhb6k 500 1000 The results will be accessible in the wayback machine a few days after the crawl. Another crawl was performed back in 2019, so the known full -- GitLab