static-update-component is too slow
it takes an awfully long time to deploy review apps, and it's at least partly due to the static mirror system.
here's an example run:
$ ssh -o UserKnownHostsFile=".ssh/known_hosts" -i ".ssh/private_key" static-gitlab-shim@static-gitlab-shim.torproject.org static-update-component
/usr/local/bin/static-master-update-component: Acquiring lock on /srv/static.torproject.org/master/review.torproject.net.lock...
/usr/local/bin/static-master-update-component: Got them.
/usr/local/bin/static-master-update-component: Updating master copy of review.torproject.net...
/usr/local/bin/static-master-update-component: Done. Committing.
/usr/local/bin/static-master-update-component: Triggering mirror runs...
[2022-02-04 14:31:00] Acquiring lock for /srv/static.torproject.org/master/review.torproject.net.lock(3).
[2022-02-04 14:31:00] All locks acquired.
[2022-02-04 14:31:00] Serial is 1643985060.
[2022-02-04 14:31:00] Populating /srv/static.torproject.org/master/review.torproject.net-live.new-zrVHWg.
[2022-02-04 14:31:25] Removing existing /srv/static.torproject.org/master/review.torproject.net-current-push.
[2022-02-04 14:32:06] Renaming /srv/static.torproject.org/master/review.torproject.net-live.new-zrVHWg to /srv/static.torproject.org/master/review.torproject.net-current-push.
[2022-02-04 14:32:06] Calling clients...
[2022-02-04 14:32:06] Stage 1...
[2022-02-04 14:32:10] web-fsn-02.torproject.org >> [MSM] STAGE1-START (2022-02-04 14:32:10+00:00 on web-fsn-02.torproject.org)
[2022-02-04 14:32:53] web-fsn-02.torproject.org >> [MSM] STAGE1-DONE (2022-02-04 14:32:53+00:00 on web-fsn-02.torproject.org)
[2022-02-04 14:32:53] web-fsn-02.torproject.org: waiting
[2022-02-04 14:32:53] hetzner-hel1-03.torproject.org >> [MSM] STAGE1-START (2022-02-04 14:32:06+00:00 on hetzner-hel1-03.torproject.org)
[2022-02-04 14:32:53] hetzner-hel1-03.torproject.org >> [MSM] STAGE1-DONE (2022-02-04 14:32:18+00:00 on hetzner-hel1-03.torproject.org)
[2022-02-04 14:32:53] hetzner-hel1-03.torproject.org: waiting
[2022-02-04 14:32:53] web-chi-03.torproject.org >> [MSM] STAGE1-START (2022-02-04 14:32:08+00:00 on web-chi-03.torproject.org)
[2022-02-04 14:32:53] web-chi-03.torproject.org >> [MSM] STAGE1-DONE (2022-02-04 14:32:47+00:00 on web-chi-03.torproject.org)
[2022-02-04 14:32:53] web-chi-03.torproject.org: waiting
[2022-02-04 14:33:09] web-fsn-01.torproject.org >> [MSM] STAGE1-START (2022-02-04 14:33:09+00:00 on web-fsn-01.torproject.org)
[2022-02-04 14:36:43] web-fsn-01.torproject.org >> [MSM] STAGE1-DONE (2022-02-04 14:36:43+00:00 on web-fsn-01.torproject.org)
[2022-02-04 14:36:43] web-fsn-01.torproject.org: waiting
[2022-02-04 14:36:43] Stage 1 done.
[2022-02-04 14:36:43] Committing...
[2022-02-04 14:36:43] web-fsn-02.torproject.org << go
[2022-02-04 14:36:43] hetzner-hel1-03.torproject.org << go
[2022-02-04 14:36:43] web-chi-03.torproject.org << go
[2022-02-04 14:36:43] web-fsn-01.torproject.org << go
[2022-02-04 14:36:46] web-fsn-02.torproject.org >> [MSM] STAGE2-DONE
[2022-02-04 14:36:46] web-fsn-02.torproject.org >>
[2022-02-04 14:36:46] web-fsn-02.torproject.org: returned 0
[2022-02-04 14:36:46] hetzner-hel1-03.torproject.org >> [MSM] STAGE2-DONE
[2022-02-04 14:36:46] hetzner-hel1-03.torproject.org >>
[2022-02-04 14:36:46] hetzner-hel1-03.torproject.org: returned 0
full log:
https://gitlab.torproject.org/tpo/tpa/status-site/-/jobs/91607
you can see it takes over 5 minutes to sync the site over. that's stupidly slow! normally, we should only have to copy changed files, but it seems it's copying over everything every time. even worse, in this case we're removing files, because this is a stop-review
job: we should just be deleting a bunch of files in the status site and that's it, we shouldn't be touching other sites. this should be really fast, as the status site is only a few kilobytes.
an strace
on the mirror side of things (which pulls files from the master with rsync
) confirms that we're actually copying all the files over every time:
root@web-fsn-01:~# strace -e file -p 1737
strace: Process 1737 attached
lstat("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-squint.svg.vykHS5", {st_mode=S_IFREG|0600, st_size=692, ...}) = 0
utimensat(AT_FDCWD, "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-squint.svg.vykHS5", [UTIME_NOW, {tv_sec=1643982881, tv_nsec=0} /* 2022-02-04T13:54:41+0000 */], AT_SYMLINK_NOFOLLOW) = 0
chmod("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-squint.svg.vykHS5", 0644) = 0
rename("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-squint.svg.vykHS5", "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/laugh-squint.svg") = 0
openat(AT_FDCWD, "/srv/static.torproject.org/mirrors/review.torproject.net/tree-a/tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/laugh-wink.svg", O_RDONLY) = 3
openat(AT_FDCWD, "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-wink.svg.4T4w9Z", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
lstat("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-wink.svg.4T4w9Z", {st_mode=S_IFREG|0600, st_size=715, ...}) = 0
utimensat(AT_FDCWD, "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-wink.svg.4T4w9Z", [UTIME_NOW, {tv_sec=1643982881, tv_nsec=0} /* 2022-02-04T13:54:41+0000 */], AT_SYMLINK_NOFOLLOW) = 0
chmod("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-wink.svg.4T4w9Z", 0644) = 0
rename("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh-wink.svg.4T4w9Z", "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/laugh-wink.svg") = 0
openat(AT_FDCWD, "/srv/static.torproject.org/mirrors/review.torproject.net/tree-a/tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/laugh.svg", O_RDONLY) = 3
openat(AT_FDCWD, "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh.svg.Vl7z6V", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
lstat("tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh.svg.Vl7z6V", {st_mode=S_IFREG|0600, st_size=595, ...}) = 0
utimensat(AT_FDCWD, "tpo/web/support/l10n/static/fonts/fontawesome/svgs/regular/.laugh.svg.Vl7z6V", [UTIME_NOW, {tv_sec=1643982881, tv_nsec=0} /* 2022-02-04T13:54:41+0000 */], AT_SYMLINK_NOFOLLOW) = 0
so it seems we're not managing the timestamps correctly somehow. are we correctly passing --checksum
from GitLab CI? needs more testing and investigation, but this should be lighting fast.