Skip to content
Snippets Groups Projects
Verified Commit 5452f437 authored by anarcat's avatar anarcat
Browse files

document major issues with the webhook implementation

parent 799ac90a
No related branches found
No related tags found
No related merge requests found
......@@ -18,14 +18,26 @@ TODO: review ticket for possible howtos
## Deploy artifacts manually
If there are concerns that the webhook is not behaving as it should,
it's possible to call the "puller" directly by hand. For example,
given the [Pipeline 13285](https://gitlab.torproject.org/tpo/tpa/status-site/-/pipelines/13285) has [job 38077](https://gitlab.torproject.org/tpo/tpa/status-site/-/jobs/38077), we can tell the
puller to deploy in debugging mode with this command:
If a site is not deploying normally, it's possible to deploy a site by
hand by downloading and extracting the artifacts using the
`static-gitlab-shim-pull` script.
For example, given the [Pipeline 13285](https://gitlab.torproject.org/tpo/tpa/status-site/-/pipelines/13285) has [job 38077](https://gitlab.torproject.org/tpo/tpa/status-site/-/jobs/38077), we can
tell the puller to deploy in debugging mode with this command:
sudo -u static-gitlab-shim /usr/local/bin/static-gitlab-shim-pull --artifacts-url https://gitlab.torproject.org/tpo/tpa/status-site/-/jobs/38077/artifacts/download --site-url status.torproject.org --debug
The `--artifacts-url` is the `Download` link in the job page.
The `--artifacts-url` is the `Download` link in the job page. This
will:
1. download the artifacts (which is a ZIP file)
2. extract them in a temporary directory
3. `rsync --checksum` them to the actual source directory (to avoid
spurious timestamp changes)
4. call `static-update-component` to deploy the site
Note that this script is part of the webhook implementation and might
eventually be retired if that implementation is completely removed.
## Pager playbook
......@@ -84,70 +96,24 @@ during downtimes, updates to websites are not possible.
how is this thing built, basically? -->
TODO: expand and review design.
NOTE: this is worded as if this was already implement, but this
implementation might be incomplete or even inexistent. See [ticket
40364](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40364) for more information on progress.
GitLab can publish pages in the static component mirror network
through the use of a "static shim" deployed on a static source. The
workflow goes a little like this:
1. user pushes a change to GitLab, which ...
2. triggers a CI pipeline
3. CI runner picks up the jobs and builds the website, pushes the
artifacts back to GitLab
4. GitLab fires a [webhook](https://gitlab.torproject.org/help/user/project/integrations/webhooks#pipeline-events), typically on [pipeline events](https://docs.gitlab.com/ee/user/project/integrations/webhooks.html#pipeline-events)
5. webhook receives the ping and authenticates against a
configuration, mapping to a given `static-component` (TODO: allow
list for gitlab?)
6. after authentication, the webhook fires a script
(`static-gitlab-shim-pull`)
7. `static-gitlab-shim-pull` parses the payload from the webhook and
finds the URL for the artifacts
8. it extracts the artifacts in a temporary directory
9. it runs `rsync -c` into the local static source, to avoid
resetting timestamps
10. it fires the static-update-component command to propagate changes
to the rest of the static-component system
A subset of those steps can be seen in the following design:
![Design of the static-shim](static-shim/architecture-static-shim.png)
The shim components run on a separate static-source, called
`static-gitlab-shim-source`. This is done to avoid adding complexity
to the already complicated, general purpose static source
(`staticiforme`). This has the added benefit that the source can be
hardened in the sense that access is restricted to TPA (which is not
the case of `staticiforme`).
The mapping between webhooks and static components is established in
Puppet, which generates the secrets and writes it to the webhook
configuration, along with the `site_url` which corresponds to the site
URL in the `static-components.yaml` file. This is done to ensure that
a given GitLab project only has access to a single site and cannot
overwrite other sites.
This involves that each site configured in this way must have a
secret token (in Trocla) and configuration (in Hiera) created by TPA
in Puppet. The secret token must also be configured in the GitLab
project. This could be automated by the judicious use of the GitLab
API using admin credentials, but considering that new sites are not
created very frequently, it could also be done by hand.
TODO: design still in flux, see "alternatives considered" below.
## Issues
<!-- such projects are never over. add a pointer to well-known issues -->
<!-- and show how to report problems. usually a link to the bugtracker -->
There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [team issue tracker][search].
[File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
[search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
This service was designed in [ticket 40364](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40364).
* the webhook implementation fails if sites take more than 10 seconds
to deploy.
* the webhook implementation doesn't provide much visibility on
failures or progress, to see the list of recent webhook calls, head
to Settings -> Webhooks -> Edit -> Recent deliveries
## Maintainer, users, and upstream
The shim was written by anarcat and is maintained by TPA. It is used
......@@ -239,8 +205,14 @@ and could be reconstructed from pipelines in case of a total disaster.
## Alternatives considered
We briefly considered using GitLab's [CI deployment mechanism](https://about.gitlab.com/blog/2021/02/05/ci-deployment-and-environments/)
instead of webhooks, but decided against it for the following reasons:
This shim was designed to replace Jenkins with GitLab CI. As suche,
other options were considered, see also the [Jenkins documentation](service/jenkins#gitlab-ci)
and [ticket 40364](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40364) on those.
### CI deployment
We considered using GitLab's [CI deployment mechanism](https://about.gitlab.com/blog/2021/02/05/ci-deployment-and-environments/) instead of
webhooks, but originally decided against it for the following reasons:
* the complexity is similar: both need a shared token between GitLab
and the static source
......@@ -263,6 +235,70 @@ instead of webhooks, but decided against it for the following reasons:
exception and is more error prone (e.g. if we somehow forget the
`command=` override, we open full shell access)
NOTE: See also the [Jenkins documentation](service/jenkins#gitlab-ci) and [ticket 40364](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40364)
for more information on the discussion on the different options that
were considered in the Jenkins retirement.
### webhook deployment
A designed based on GitLab webhooks was established, with a workflow
that goes something like this:
1. user pushes a change to GitLab, which ...
2. triggers a CI pipeline
3. CI runner picks up the jobs and builds the website, pushes the
artifacts back to GitLab
4. GitLab fires a [webhook](https://gitlab.torproject.org/help/user/project/integrations/webhooks#pipeline-events), typically on [pipeline events](https://docs.gitlab.com/ee/user/project/integrations/webhooks.html#pipeline-events)
5. webhook receives the ping and authenticates against a
configuration, mapping to a given `static-component` (TODO: allow
list for gitlab?)
6. after authentication, the webhook fires a script
(`static-gitlab-shim-pull`)
7. `static-gitlab-shim-pull` parses the payload from the webhook and
finds the URL for the artifacts
8. it extracts the artifacts in a temporary directory
9. it runs `rsync -c` into the local static source, to avoid
resetting timestamps
10. it fires the static-update-component command to propagate changes
to the rest of the static-component system
A subset of those steps can be seen in the following design:
![Design of the static-shim](static-shim/architecture-static-shim.png)
The shim components runs on a separate static-source, called
`static-gitlab-shim-source`. This is done to avoid adding complexity
to the already complicated, general purpose static source
(`staticiforme`). This has the added benefit that the source can be
hardened in the sense that access is restricted to TPA (which is not
the case of `staticiforme`).
The mapping between webhooks and static components is established in
Puppet, which generates the secrets and writes it to the webhook
configuration, along with the `site_url` which corresponds to the site
URL in the `static-components.yaml` file. This is done to ensure that
a given GitLab project only has access to a single site and cannot
overwrite other sites.
This involves that each site configured in this way must have a
secret token (in Trocla) and configuration (in Hiera) created by TPA
in Puppet. The secret token must also be configured in the GitLab
project. This could be automated by the judicious use of the GitLab
API using admin credentials, but considering that new sites are not
created very frequently, it could also be done by hand.
Unfortunately this design has two major flaws:
1. webhooks are designed to be fast and short-lived: most site
deployments take longer than the preconfigured webhook timeout (10
seconds) and therefore cannot be deployed synchronously, which
implies that...
2. webhook cannot propagate deployment errors back to the user
meaningfully: even if they run synchronously, errors in webhooks
do not show up in the CI pipeline, assuming the webhook manages to
complete at all. if the webhook fails to complete in time, no
output is available to the user at all. running asynchronously is
even worse as deployment errors do not show up in GitLab at all
and would require special monitoring by TPA, instead of delegating
that management to users.
In the short term, the webhook system might be used asynchronously,
but in the long term, we're considering switching back to the
deployment system documented above.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment