Adapt static-shim script for review apps

added Next label

assigned to @anarcat

Oh and I forgot to mention that we'd have a third static mirror component just for review apps, leaving prod and staging well enough alone.

The accompanying GitLab CI snippet would look like this:

deploy_review:
  stage: deploy
  script:
    - echo "static-shim-deploy magic goes here"
  environment:
    name: review/$CI_COMMIT_REF_NAME
    url: https://$SITE_URL/$CI_ENVIRONMENT_SLUG
  only:
    - merge_requests

Reference: https://docs.gitlab.com/ee/ci/environments/#create-a-dynamic-environment

couldn't we just make the static-shim agnostic to environments and delegate this all to gitlab?

in other words, instead of:

  environment:
    name: review/$CI_COMMIT_REF_NAME
    url: https://$SITE_URL/$CI_ENVIRONMENT_SLUG

i'd say:

  environment:
    name: review/$CI_COMMIT_REF_NAME
    url: https://$SITE_SLUG-$CI_ENVIRONMENT_SLUG.torproject.org

or, in other words, we deploy to https://blog-staging.torproject.org instead of blog.torproject.org?

can we have environment-specific project variables?

We can have environment-specific variables, yes. Currently in the blog project I use that because I need two sets of SITE_URL and STATIC_GITLAB_SHIM_SSH_PRIVATE_KEY variables to deploy to two different static mirror components, as so: https://gitlab.torproject.org/tpo/web/blog/-/settings/ci_cd

But that's what environments are basically, just scopes. AIUI the url: key is only used for UI display purposes, it can't tell GitLab CI how to deploy to that URL. That's entirely the purview of CI jobs.

So yes, static-shim already is partly environment agnostic. What we need is the ability to rsync only part of the contents of a given static component files, because currently, like GitLab Pages, it's all-or-nothing.

I should add that not only can CI/CD variables be scoped per-environment, they can be scoped dynamically per-environment. When I wrote above that we can name our review environments review/$CI_COMMIT_REF_NAME, we can have an sshkey CI/CD variable for environment called review/*, that will be accessible to any environment created with this name.

So yes, static-shim already is partly environment agnostic. What we need is the ability to rsync only part of the contents of a given static component files, because currently, like GitLab Pages, it's all-or-nothing.

so i started writing a long reply explaining why that's not possible, linking to the design docs of both the staticsync and static shim... and now i'm not sure.

i think you may be onto something, and this might be worth a try. i would worry about:

breaking the general case: what happens when there's no magic argument?
chroot escapes: right now we do not accept any user-provided argument in the shim wrapper, if we just pass along the argument, an attacker could escape their sandbox and start writing outside of our predefined root. we could sanitize the string, but that's risky business, especially in bash
stray environments: what happens if somehow we forget to disable an environment? won't this leave stray files around?
subdir breakage: one of the problems we had with multiple environments in gitlab pages is that we had to hack at lektor to make it happy to run in a subdir, was that fixed at all?
naming: what would that URL actually be, say, for the blog? we already have blog.tpo, blog-staging.tpo... blog-review.tpo/$REF?

but yes, that could just work! :) maybe i could figure out something tomorrow.

breaking the general case: what happens when there's no magic argument?

The idea is without the magic argument, it would function as it does right now, which is to rsync the DocumentRoot.

chroot escapes: right now we do not accept any user-provided argument in the shim wrapper, if we just pass along the argument, an attacker could escape their sandbox and start writing outside of our predefined root. we could sanitize the string, but that's risky business, especially in bash

That's a serious concern for sure. I'm not certain of the best way we could address it besides accepting only a narrow set of characters in that argument (eg. [a-z09\-_]).

stray environments: what happens if somehow we forget to disable an environment? won't this leave stray files around?

We can set environments to auto-stop after a certain time period. Incidentally, we can also auto-delete environments when the associated branch is deleted.

subdir breakage: one of the problems we had with multiple environments in gitlab pages is that we had to hack at lektor to make it happy to run in a subdir, was that fixed at all?

For Review Apps I think we can live with some level of breakage. For example, I remember there were issues with URLs in the generated RSS/Atom feeds, but I don't think it's a deal-breaker if those bugs pop up in Review Apps. Actually, I was even thinking we might disable some non-essential blog features in Review Apps to speed up builds, like disabling feeds.

naming: what would that URL actually be, say, for the blog? we already have blog.tpo, blog-staging.tpo... blog-review.tpo/$REF?

I think we should start like this, yes. But it would be overkill to have one static mirror component per project so at some point review.tpo/

SITE_URL/

REF would probably be more convenient.

If we plan to go there eventually, we should start there because otherwise we'll have naming clashes.

But it seems like @emmapeel figured out a better way to handle this, with gitlab pages having support for multiple environments:

https://gitlab.torproject.org/tpo/translation/-/environments https://gitlab.torproject.org/tpo/community/l10n/-/blob/main/ci-templates/lektor-with-more-langs.yml#L51

There's only one problem: index.html doesn't seem to work:

gitlab#114 (closed)

a.

...

On 2021-11-16 01:35:43, Jérôme Charaoui (@lavamind) wrote:

I think we should start like this, yes. But it would be overkill to have one static mirror component per project so at some point review.tpo/
$PROJECT/$
REF would probably be more convenient.

-- Antoine Beaupré torproject.org system administration

I'm really not convinced the artifacts preview approach is the right one for reviews. In this scenario, a new environment, with a different name and URL, is created for every set of CI build artifact, and therefore every commit pushed to a branch.

What we want to do rather, and this is the concept behind Review Apps, AIUI, is tie a single review environment to a specific merge request which may see several builds/CI runs in its lifetime as the MR is developped and fine-tuned.

okay, i'll give it a shot then.

progress update: i rewrote the shim in python and added unit tests in preparation for this change, but i'm not sure how to actually do it. the arguments to the script are hardcoded in the authorized_keys file through puppet right now, i am not sure how to pass parameters through without starting to parse the rsync commandline...

In terms of simply having a staging environment, that's already in place, see:

https://gitlab.torproject.org/tpo/web/blog/-/blob/main/.gitlab-ci.yml

https://gitlab.torproject.org/tpo/web/blog/-/environments

What this issue is about is specifically Review Apps, where we would set it up so that we'd have CI deploy builds made from Merge Requests, using dynamic environments.

impressive, that's kind of neat.

mentioned in issue tpo/web/blog#40004 (closed)

added Doing label and removed Next label

another progress update: duh. rrsync already accepts subdirectories, so we just need to change the deploy template to make a different variable for the SUBDIR and REMOTE_SUBDIR.

i still keep the python rewrite because we need an easy way to flush out old environments, and i actually implemented that (the delete-environment command), so that's something we'll have to use too.

@lavamind has started working on a review.tpo site where review apps will get deployed. the next step here is to deploy the python rewrite to replace the .sh script and test the above theory about REMOTE_SUBDIR just working already, guh.

The first would be to allow a CI job to deploy to a subdirectory of the static mirror component DocumentRoot. For example, if the CI job passes a second argument of syncreview-new-blog-post (new-blog-post being the name of a MR) then the wrapper, via exec rrsync -wo "/srv/static-gitlab-shim/$SITE_URL/$REMOTE_SUBDIR" would rsync the artifacts to a /syncreview-new-blog-post subdirectory on the static mirror, below the DocumentRoot.

this is done, in the sense that we can already provide subdirectories to rsync. in fact, it was already possible to do so. we just hadn't realized it. don't forget to use --mkpath.

Secondly, we'd need to be able to call that same wrapper to delete such subdirectories when the associated Review Apps are no longer needed. For example, if we re-use the second-argument idea above, then something like deletereview-new-blog-post would trigger a rm -rf on that new-blog-post subdirectory.

that is new, and has been implemented as well.

there's sample code in status-site!17 (merged) to do all of this, which requires the simplify branch in ci-templates (ci-templates!25 (merged)).

it should probably be merged with the template at some point, but probably not before it's tested against another static site.

closed

mentioned in merge request ci-templates!25 (merged)

Adapt static-shim script for review apps

Designs

Child items ...

Activity