From 43867e29df75ab6fc046aff8c0167352394a813d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Tue, 23 Jun 2020 14:56:58 -0400
Subject: [PATCH] update trac archive procedure to document how it was done
 precisely

---
 tsa/howto/gitlab.md | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/tsa/howto/gitlab.md b/tsa/howto/gitlab.md
index 169928ca..f9a41bac 100644
--- a/tsa/howto/gitlab.md
+++ b/tsa/howto/gitlab.md
@@ -604,12 +604,29 @@ archive the list itself as well.
 Simultaneously, a full crawl of the entire site (and first level
 outgoing links) was started, with:
 
-    !a --explain "Trac migrated to GitLab, readonly" https://trac.torproject.org/
-
-A list of excludes was added to ignore traps and infinite loops. The
-crawl was slowed down with a 500-1000ms delay to avoid hammering the server.
-
-(TODO: add the actual exclude lists and commands.)
+    !a https://trac.torproject.org --explain "migrated to gitlab, readonly" --delay 500
+
+A list of excludes was added to ignore traps and infinite loops:
+
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query.*[?&]order=(?!priority)
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query.*[&?]desc=1
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://gitweb\.torproject\.org/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/timeline\?
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?status=!closed&keywords=
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?status=!closed&(version|reporter|owner|cc)=
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?(.*&)?(reporter|priority|component|severity|cc|owner|version)=
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://cdn\.media\.ccc\.de/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://www\.redditstatic\.com/desktop2x/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/report/\d+.*[?&]sort=
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://support\.stripe\.com/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k  ^https?://cdn\.cms-twdigitalassets\.com/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k  ^https?://cypherpunks\:writecode@trac\.torproject\.org/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://login\.blockchain\.com/
+    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://dnsprivacy\.org/
+
+The crawl was slowed down with a 500-1000ms delay to avoid hammering the server:
+
+    !d bpu6j3ucrv87g4aix1zdrhb6k 500 1000
 
 The results will be accessible in the wayback machine a few days after
 the crawl. Another crawl was performed back in 2019, so the known full
-- 
GitLab