Skip to content
Snippets Groups Projects
Verified Commit f0dfbd69 authored by anarcat's avatar anarcat
Browse files

update crm docs to latest template, document block list

parent 654a57db
No related branches found
No related tags found
No related merge requests found
......@@ -166,6 +166,24 @@ stack.
The Puppet classes used on the two servers are
`roles::civicrm_int_2018` and `roles::civicrm_ext_2018`.
## Upgrades
As stated above, a new donation campaign involves changes to both the
static website (`donate.tpo`) and the CiviCRM server.
Changes to the CiviCRM server and donation middleware can be deployed
progressively through the test/staging/production sites, which all
have their own databases.
TODO: clarify the GiantRabbit deployment workflow. They seem to have
one branch per environment, but what does that include? Does it matter
for us?
There's a `drush` script that edits the dev/stage databases to
replace PII in general, and in particular change the email of everyone
to dummy aliases so that emails sent by accident wouldn't end up in
real people's mail boxes.
## SLA
This service is critical, as it is used to host donations, and should
......@@ -173,9 +191,9 @@ be as highly available as possible. Unfortunately, its design has
multiple single point of failures, which, in practice, makes this
target difficult to fulfill at this point.
## Design
## TODO Design
### Services
## Services
The CRM service is built with two distinct servers:
......@@ -213,7 +231,17 @@ be setup both inside the static site and CiviCRM.
The monthly newsletter is configured on CiviCRM and also archived on
the <https://newsletter.torproject.org> static site.
### Authentication
## Queues
CiviCRM can hold a large queue of emails to send, when a new
newsletter is generated. This, in turn, can turn in large Postfix
email queues when CiviCRM releases those mails in the email system.
TODO: It's unclear what other queues might exist in the system, Redis?
## TODO: Interfaces
## Authentication
The `crm-int-01` server doesn't talk to the outside internet and can
be accessed only via HTTP authentication.
......@@ -249,32 +277,6 @@ server) accounts, e.g.
crm-ext-01$ sudo -u tordonate git -C /srv/donate.torproject.org/htdocs-stag/ status
### Queues
CiviCRM can hold a large queue of emails to send, when a new
newsletter is generated. This, in turn, can turn in large Postfix
email queues when CiviCRM releases those mails in the email system.
TODO: It's unclear what other queues might exist in the system, Redis?
### Deployment
As stated above, a new donation campaign involves changes to both the
static website (`donate.tpo`) and the CiviCRM server.
Changes to the CiviCRM server and donation middleware can be deployed
progressively through the test/staging/production sites, which all
have their own databases.
TODO: clarify the GiantRabbit deployment workflow. They seem to have
one branch per environment, but what does that include? Does it matter
for us?
There's a `drush` script that edits the dev/stage databases to
replace PII in general, and in particular change the email of everyone
to dummy aliases so that emails sent by accident wouldn't end up in
real people's mail boxes.
### Stripe card testing
A common problem for non-profits that accept donations via Stripe is "card testing". Card testing is the practice of making small transactions with stolen credit card information to check that the card information is correct and the card is still working. Card testing impacts organizations negatively in several ways: in addition to the bad publicity of taking money from the victims of credit card theft, Stripe will automatically block transactions they deem to be suspicious or fraudulent. Stripe's automated fraud-blocking costs a very small amount of money per blocked transaction, when tens of thousands of transactions start getting blocked, tens of thousands of dollars can suddenly disappear. It's important for the safety of credit card theft victims and for the safety of the organization to crush card testing as fast as possible.
......@@ -298,6 +300,19 @@ Blocking IP ranges is not a silver bullet. The standard is to block all non-resi
As mentioned above, metrics are the biggest tool in the fight against card testing. Before you can do anything or even realize that you're being card tested, you'll need metrics. Metrics will let you identify card testers, or even let you know it's time to turn off donations before you get hit with a $10,000 from Stripe. Even if your card testing opponents are smart, and use wildly varying IP ranges from different autonomous systems, metrics will show you that you're having abnormally large/expensive amounts of blocked donations.
Sometimes, during attacks, log analysis is performed on the
`ratelimit.og` file (below) to ban certain botnets. The block list is
maintained in Puppet (`modules/profile/files/crm-blocklist.txt`) and
deployed in `/srv/donate.torproject.org/blocklist.txt`. That file is
hooked in the webserver which gives a 403 error when an entry is
present. A possible improvement to this might be to proactively add
IPs to the list once they cross a certain threshold and then redirect
users to a 403 page instead of giving a plain error code like this.
## TODO Implementation
## TODO Related services
## Issues
Since there are many components, here's a table outlining the known
......@@ -325,7 +340,7 @@ in the [TPA team issue tracker][search].
[File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
[search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
## Maintainer, users, and upstream
## Maintainer
CiviCRM, the PHP application and the Javascript component on
`donate-static` are all maintained by the external CiviCRM
......@@ -336,17 +351,27 @@ donations teams), except for the Javascript component. Any major
modification that involves also some Civi development is done by the
CiviCRM contractors.
## Monitoring and testing
## Users
## Upstream
## Monitoring and metrics
As other TPA servers, the CRM servers are monitored by
[Nagios](howto/nagios). The Redis server (and the related IPsec tunnel) is
particularly monitored by Nagios, using a special `PING` check, to
make sure both ends can talk to each other.
There's also [Prometheus](howto/prometheus) monitoring with graphs rendered by
[Grafana](howto/grafana). This includes an elaborate [Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org)
watching to two mailservers.
Exceptions are logged and emailed by slim and the donation processor
(which deal with the redis backend). See the logging configuration
below.
## Tests
### Donation tests
The donation process can be tested without a real credit card. When the
......@@ -360,11 +385,7 @@ inserted into the staging CiviCRM instance.
[Stripe test credit card numbers]: https://stripe.com/docs/testing?testing-method=card-numbers#cards
## Logs and metrics
As other TPA servers, the CRM servers are monitored by [Prometheus](howto/prometheus)
with graphs rendered by [Grafana](howto/grafana). This includes an elaborate
[Postfix dashboard](https://grafana.torproject.org/d/Ds5BxBYGk/postfix-mtail?orgId=1&from=now-24h&to=now&var-node=eugeni.torproject.org&var-node=crm-int-01.torproject.org) watching to two mailservers.
## Logs
The donate side (on `crm-ext-01.torproject.org`) uses the Monolog
framework for logging. Errors that take place on the production
......@@ -471,6 +492,34 @@ should be fairly up to date, in terms of security issues.
TODO: clarify which versions of CiviCRM, Drupal, Yarn, NVM, PHP,
Redis, and who knows what else are deployed, and whether it matters.
## Security and risk assessment
<!--
5. When was the last security review done on the project? What was
the outcome? Are there any security issues currently? Should it
have another security review?
6. When was the last risk assessment done? Something that would cover
risks from the data stored, the access required, etc.
-->
## Technical debt and next steps
<!--
7. Are there any in-progress projects? Technical debt cleanup?
Migrations? What state are they in? What's the urgency? What's the
next steps?
8. What urgent things need to be done on this project?
-->
## Proposed Solution
<!-- Link to RFC -->
## Goals
<!-- include bugs to be fixed -->
......@@ -489,6 +538,6 @@ Redis, and who knows what else are deployed, and whether it matters.
## Cost
## Alternatives considered
## Other alternatives
<!-- include benchmarks and procedure if relevant -->
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment