CRM stands for "Customer Relationship Management" but we actually use it to manage contacts and donations. It is how we send our massive newsletter once in a while.
Tutorial
Basic access
The main website is at:
It is protected by basic authentication and the site's login as well, so you actually need two sets of password to get in.
To set up basic authentication for a new user, the following command must be executed on the CiviCRM server:
htdigest /etc/apache2/htdigest 'Tor CRM' <username>
Once basic authentication is in place, the Drupal/CiviCRM login page can be accessed at: https://crm.torproject.org/user/login
Howto
Monitoring mailings
The CiviCRM server can generate large mailings, in the order of hundreds of thousands of unique email addresses. Those can create significant load on the server if mishandled, and worse, trigger blocking at various providers if not correctly rate-limited.
For this, we have various knobs and tools:
- Grafana dashboard watching the two main mailservers
-
Place to enable/disable mailing (grep for
Send sched
...) - Where the batches are defined
- The Civimail interface should show the latest mailings (when clicking twice on "STARTED", from there click the Report button to see how many mails have been sent, bounced, etc
The Grafana dashboard is based on metrics from Prometheus, which can be inspected live with the following command:
curl -s localhost:3903/metrics | grep -v -e ^go_ -e '^#' -e '^mtail' -e ^process -e _tls_; postfix-queues-sizes
Using lnav
can also be useful to monitor logs in real time, as it
provides per-queue ID navigation, marks warnings (deferred messages)
in yellow and errors (bounces) in red.
A few commands to inspect the email queue:
-
list the queue, with more recent entries first
postqueue -j | jq -C .recipients[] | tac
-
find how many emails in the queue, per domain:
postqueue -j | jq -r .recipients[].address | sed 's/.*@//' | sort | uniq -c | sort -n
Note that the
qshape deferred
command gives a similar (and actually better) output.
In case of a major problem, you can stop the mailing in CiviCRM and put all emails on hold with:
postsuper -h ALL
Then the postfix-trickle
script can be used to slowly release
emails:
postfix-trickle 10 5
When an email bounces, it should go to civicrm@crm.torproject.org
,
which is an IMAP mailbox periodically checked by CiviCRM. It will
ingest bounces landing in that mailbox and disable them for the next
mailings. It's also how users can unsubscribe from those mailings, so
it is critical that this service runs correctly.
A lot of those notes come from the issue where we enabled CiviCRM to receive its bounces.
Handling abuse complains
Our postmaster alias can receive emails like this:
Subject: Abuse Message [AbuseID:809C16:27]: AbuseFBL: UOL Abuse Report
Those emails usually contain enough information to figure out which email address filed a complaint. The action to take is to remove them from the mailing. Here's an example email sample:
Received: by crm-int-01.torproject.org (Postfix, from userid 33)
id 579C510392E; Thu, 4 Feb 2021 17:30:12 +0000 (UTC)
[...]
Message-Id: <20210204173012.579C510392E@crm-int-01.torproject.org>
[...]
List-Unsubscribe: <mailto:civicrm+u.2936.7009506.26d7b951968ebe4b@crm.torproject.org>
job_id: 2936
Precedence: bulk
[...]
X-CiviMail-Bounce: civicrm+b.2936.7009506.26d7b951968ebe4b@crm.torproject.org
[...]
Your bounce might have only some of those. Possible courses of action to find the victim's email:
- Grep for the queue ID (
579C510392E
) in the mail logs - Grep for the Message-Id
(
20210204173012.579C510392E@crm-int-01.torproject.org
) in mail logs (withpostfix-trace
)
Once you have the email address:
- head for the CiviCRM search interface to find that user
- remove the from the "Tor News" group, in the
Group
tab
Another option is to go in Donor record > Edit communication preferences > check do not email.
Alternatively, you can just send an email to the List-Unsubscribe
address or click the "unsubscribe" links at the bottom of the email.
The handle-abuse.py script in fabric-tasks.git
automatically
handles the CiviCRM bounces that way. Support for other bounces should
be added there as we can.
Special cases should be reported to the CiviCRM admin by forwarding
the email to the Giving
queue in RT.
Sometimes complaints come in about Mailman lists. Those are harder to handle because they do not have individual bounce adresess...
Granting access to the CiviCRM backend
The main CiviCRM is protected by Apache-based authentication,
accessible only by TPA. To add a user, on the backend server
(currently crm-int-01
):
htdigest /etc/apache2/htdigest 'Tor CRM' $USERNAME
Rotating API tokens
See the donate site docs for this.
Pager playbook
Security breach
If there's a major security breach on the service, the first thing to
do is probably to shutdown the CiviCRM server completely. Halt the
crm-int-01
and crm-ext-01
machines completely, and remove access
to the underlying storage from the attacker.
Then API keys secrets should probably be rotated, follow the Rotating API tokens procedure.
Disaster recovery
If Redis dies, we might lose in-process donations. But otherwise, it is disposable and data should be recreated as needed.
If the entire database gets destroyed, it needs to be restored from backups, by TPA.
Reference
Installation
Full documentation on the installation of this system is somewhat out of scope for TPA: sysadmins only installed the servers and setup basic services like a VPN (using IPsec) and an Apache, PHP, MySQL stack.
The Puppet classes used on the CiviCRM server is
roles::civicrm_int_2018
. That naming convention reflects the fact
that, before donate-neo, there used to be another roled named
roles::civicrm_ext_2018
for the frontend, retired in
tpo/tpa/team#41511.
Upgrades
As stated above, a new donation campaign involves changes to both the
donate-neo site (donate.tpo
) and the CiviCRM server.
Changes to the CiviCRM server and donation middleware can be deployed progressively through the test/staging/production sites, which all have their own databases. See the donate-neo docs for deployments of the frontend.
TODO: clarify the deployment workflow. They seem to have one branch per environment, but what does that include? Does it matter for us?
There's a drush
script that edits the dev/stage databases to
replace PII in general, and in particular change the email of everyone
to dummy aliases so that emails sent by accident wouldn't end up in
real people's mail boxes.
Upgrades are typically handled by the CiviCRM consultant.
See also the CiviCRM upgrade guide.
SLA
This service is critical, as it is used to host donations, and should be as highly available as possible. Unfortunately, its design has multiple single point of failures, which, in practice, makes this target difficult to fulfill at this point.
Design and architecture
Services
The CiviCRM service runs on the crm-int-01
server:
- software:
- CiviCRM on top of Drupal
- Drupal has a
tor_donation
module which has the code to receive/process Redis messages and initiate the corresponding actions in CiviCRM - Apache with PHP FPM
- MariaDB (MySQL) database (Drupal storage backend)
- Redis cache (?)
- Dovecot IMAP server (to handle bounces)
- sites:
-
crm.torproject.org
: production CiviCRM site -
staging.crm.torproject.org
: staging site -
test.crm.torproject.org
: testing site
-
The monthly newsletter is configured on CiviCRM and also archived on the https://newsletter.torproject.org static site.
Storage
CiviCRM stores most of its data in a MySQL database. There are separate databases for the dev/staging/prod sites.
TODO: does CiviCRM also write to disk?
Queues
CiviCRM can hold a large queue of emails to send, when a new newsletter is generated. This, in turn, can turn in large Postfix email queues when CiviCRM releases those mails in the email system.
The donate-neo frontend uses Redis to queue up transactions for CiviCRM. See the queue documentation in donate-neo. Queued jobs are de-queued by CiviCRM's Resque Scheduled Job, and crons, logs, monitoring, etc, all use standard CiviCRM tooling.
If the Resque Processor Job gets stuck between it failed to process an item, it will stop processing completely (assuming it's a bug, or something is wrong). It raises a "kill" flag that can be reset by going to Administer > Tor CRM (settings)
Interfaces
Authentication
The crm-int-01
server doesn't talk to the outside internet and can
be accessed only via HTTP authentication.
Users that need to access the CRM must be added to the Apache htdigest
file
on crm-int-01.tpo
and have a CiviCRM account created from them.
To extract a list of CiviCRM accounts and their roles, the following drush
command may be executed at the root of the Drupal installation:
drush uinf $(drush sqlq "SELECT GROUP_CONCAT(uid) FROM users")
The SSH server is firewalled (rules defined in Puppet,
profile::civicrm
). To get access to the port, ask TPA.
Implementation
CiviCRM is a PHP application licensed under the AGPLv3, supporting
PHP 8.1 and later at the time of writing. We are currently
running CiviCRM 5.73.4, released in May 30th 2024 (as of 2024-08-28),
the current version can be found in
/srv/crm.torproject.org/htdocs-prod/sites/all/modules/civicrm/release-notes.md
on the production server (crm-int-01
). See also the upstream release
announcements, the GitHub
tags page and the release management policy.
Upstream also has their own GitLab instance.
CiviCRM has a torcrm
extension under
sites/all/civicrm_extensions/torcrm
which includes most of the CiviCRM
customizations, including the Resque Processor job. It replaces the
old tor_donate
Drupal module, which is being phased out.
Related services
CiviCRM only holds donor information, actual transactions are processed by the donation site, donate-neo.
Issues
Since there are many components, here's a table outlining the known projects and issue trackers for the different sites.
Site | Project | Issues |
---|---|---|
https://crm.torproject.org | project | issues |
https://donate.torproject.org | project | issues |
https://newsletter.torproject.org | project | issues |
Issues with the server-level issues should be filed or in the TPA team issue tracker.
Upstream CiviCRM has their own StackExchange site and use GitLab issue queues
Maintainer
CiviCRM, the PHP application and the Javascript component on
donate-static
are all maintained by the external CiviCRM
contractors.
Users
Direct users of this service are mostly the fundraising team.
Upstream
Upstream is a healthy community of free software developers producing regular releases. Our consultant is part of the core team.
Monitoring and metrics
As other TPA servers, the CRM servers are monitored by
Nagios. The Redis server (and the related IPsec tunnel) is
particularly monitored by Nagios, using a special PING
check, to
make sure both ends can talk to each other.
There's also Prometheus monitoring with graphs rendered by Grafana. This includes an elaborate Postfix dashboard watching to two mailservers.
We have a task open to monitor the CiviCRM health better.
Tests
TODO: what to test on major CiviCRM upgrades?
Logs
The CRM side (crm-int-01.torproject.org
) has a similar configuration
and sends production environment errors via email.
The logging configuration is in:
crm-int-01:/srv/crm.torproject.org/htdocs-prod/sites/all/modules/custom/tor_donation/src/Donation/ErrorHandler.php
.
Resque processor logs are in the CiviCRM Scheduled Jobs logs under Administer > System Settings > Scheduled Jobs, then find the "Torcrm Resque Processing" job, then view the logs. There may also be fatal errors logged in the general CiviCRM log, under Administer > Admin Console > View Log.
Middleware logs
The PHP middleware responsible for bridging the Redis queue with CiviCRM logs
to syslog on crm-int-01
. Those logs can be read using journalctl -t processor
. They can be useful to determine the cause of donations being
submitted but not showing up in CiviCRM.
Backups
Backups are done with the regular backup procedures except for
the MariaDB/MySQL database, which are backed up in
/var/backups/local/mysql/
. See also the MySQL section in the backup
documentation.
Other documentation
Upstream has a documentation portal where our users will find:
Discussion
This section is reserved for future large changes proposed to this infrastructure. It can also be used to perform an audit on the current implementation.
Overview
The CiviCRM deployment is complex and feels a bit brittle. The separation between the CiviCRM backend and the middleware API evolved from an initial strict, two-server setup, into the current three-parts component after the static site frontend was added around 2020. The original two-server separation was performed out of a concern for security: we were worried about exposing CiviCRM to the public, because we felt the attack surface of both Drupal and CiviCRM was too wide to be reasonably defended against a determined attacker.
The downside is, obviously, a lot of complexity, which also makes the
service more fragile. The Redis monitoring, for example, was added
after we discovered the ipsec
tunnel would sometimes fail, which
would completely break donations.
Obviously, if either the donation middleware or CiviCRM fails, donations go down as well, so we have actually two single point of failures in that design.
A security review should probably be performed to make sure React, Drupal, its modules, CiviCRM, and other dependencies, are all up to date. Other components like Apache, Redis, or MariaDB are managed through Debian package, and supported by the Debian security team, so should be fairly up to date, in terms of security issues.
TODO: clarify which versions of CiviCRM, Drupal, Yarn, NVM, PHP,
Redis, and who knows what else are deployed, and whether it matters.