Verified Commit 43867e29 authored by anarcat's avatar anarcat
Browse files

update trac archive procedure to document how it was done precisely

parent 9c81ab64
Loading
Loading
Loading
Loading
+23 −6
Original line number Diff line number Diff line
@@ -604,12 +604,29 @@ archive the list itself as well.
Simultaneously, a full crawl of the entire site (and first level
outgoing links) was started, with:

    !a --explain "Trac migrated to GitLab, readonly" https://trac.torproject.org/

A list of excludes was added to ignore traps and infinite loops. The
crawl was slowed down with a 500-1000ms delay to avoid hammering the server.

(TODO: add the actual exclude lists and commands.)
    !a https://trac.torproject.org --explain "migrated to gitlab, readonly" --delay 500

A list of excludes was added to ignore traps and infinite loops:

    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query.*[?&]order=(?!priority)
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query.*[&?]desc=1
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://gitweb\.torproject\.org/
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/timeline\?
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?status=!closed&keywords=
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?status=!closed&(version|reporter|owner|cc)=
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/query\?(.*&)?(reporter|priority|component|severity|cc|owner|version)=
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://cdn\.media\.ccc\.de/
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://www\.redditstatic\.com/desktop2x/
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://trac\.torproject\.org/projects/tor/report/\d+.*[?&]sort=
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://support\.stripe\.com/
    !ig bpu6j3ucrv87g4aix1zdrhb6k  ^https?://cdn\.cms-twdigitalassets\.com/
    !ig bpu6j3ucrv87g4aix1zdrhb6k  ^https?://cypherpunks\:writecode@trac\.torproject\.org/
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://login\.blockchain\.com/
    !ig bpu6j3ucrv87g4aix1zdrhb6k ^https?://dnsprivacy\.org/

The crawl was slowed down with a 500-1000ms delay to avoid hammering the server:

    !d bpu6j3ucrv87g4aix1zdrhb6k 500 1000

The results will be accessible in the wayback machine a few days after
the crawl. Another crawl was performed back in 2019, so the known full