This is an old version of this page.

Go to most recent version Browse history

CRM stands for "Customer Relationship Management" but we actually use it to manage contacts and donations. It is how we send our massive newsletter once in a while.

Tutorial

Basic access

The main website is at:

https://crm.torproject.org/

It is protected by basic authentication and the site's login as well, so you actually need two sets of password to get in.

To set up basic authentication for a new user, the following command must be executed on the CiviCRM server:

htdigest /etc/apache2/htdigest 'Tor CRM' <username>

Once basic authentication is in place, the Drupal/CiviCRM login page can be accessed at: https://crm.torproject.org/user/login

Howto

Monitoring mailings

The CiviCRM server can generate large mailings, in the order of hundreds of thousands of unique email addresses. Those can create significant load on the server if mishandled, and worse, trigger blocking at various providers if not correctly rate-limited.

For this, we have various knobs and tools:

Grafana dashboard watching the two main mailservers
Place to enable/disable mailing (grep for Send sched...)
Where the batches are defined
The Civimail interface should show the latest mailings (when clicking twice on "STARTED", from there click the Report button to see how many mails have been sent, bounced, etc

The Grafana dashboard is based on metrics from Prometheus, which can be inspected live with the following command:

curl -s localhost:3903/metrics | grep -v -e ^go_ -e '^#' -e '^mtail' -e ^process -e _tls_; postfix-queues-sizes

Using lnav can also be useful to monitor logs in real time, as it provides per-queue ID navigation, marks warnings (deferred messages) in yellow and errors (bounces) in red.

A few commands to inspect the email queue:

list the queue, with more recent entries first
```
 postqueue -j | jq -C .recipients[] | tac
```
find how many emails in the queue, per domain:
```
 postqueue -j | jq -r .recipients[].address | sed 's/.*@//' | sort | uniq -c | sort -n
```
Note that the qshape deferred command gives a similar (and actually better) output.

In case of a major problem, you can stop the mailing in CiviCRM and put all emails on hold with:

postsuper -h ALL

Then the postfix-trickle script can be used to slowly release emails:

postfix-trickle 10 5

When an email bounces, it should go to civicrm@crm.torproject.org, which is an IMAP mailbox periodically checked by CiviCRM. It will ingest bounces landing in that mailbox and disable them for the next mailings. It's also how users can unsubscribe from those mailings, so it is critical that this service runs correctly.

A lot of those notes come from the issue where we enabled CiviCRM to receive its bounces.

Handling abuse complains

Our postmaster alias can receive emails like this:

Subject: Abuse Message [AbuseID:809C16:27]: AbuseFBL: UOL Abuse Report

Those emails usually contain enough information to figure out which email address filed a complaint. The action to take is to remove them from the mailing. Here's an example email sample:

Received: by crm-int-01.torproject.org (Postfix, from userid 33)
        id 579C510392E; Thu, 4 Feb 2021 17:30:12 +0000 (UTC)
[...]
Message-Id: <20210204173012.579C510392E@crm-int-01.torproject.org>
[...]
List-Unsubscribe: <mailto:civicrm+u.2936.7009506.26d7b951968ebe4b@crm.torproject.org>
job_id: 2936
Precedence: bulk
[...]
X-CiviMail-Bounce: civicrm+b.2936.7009506.26d7b951968ebe4b@crm.torproject.org
[...]

Your bounce might have only some of those. Possible courses of action to find the victim's email:

Grep for the queue ID (579C510392E) in the mail logs
Grep for the Message-Id (20210204173012.579C510392E@crm-int-01.torproject.org) in mail logs (with postfix-trace)

Once you have the email address:

head for the CiviCRM search interface to find that user
remove the from the "Tor News" group, in the Group tab

Another option is to go in Donor record > Edit communication preferences > check do not email.

Alternatively, you can just send an email to the List-Unsubscribe address or click the "unsubscribe" links at the bottom of the email. The handle-abuse.py script in fabric-tasks.git automatically handles the CiviCRM bounces that way. Support for other bounces should be added there as we can.

Special cases should be reported to the CiviCRM admin by forwarding the email to the Giving queue in RT.

Sometimes complaints come in about Mailman lists. Those are harder to handle because they do not have individual bounce adresess...

Granting access to the CiviCRM backend

The main CiviCRM is protected by Apache-based authentication, accessible only by TPA. To add a user, on the backend server (currently crm-int-01):

htdigest /etc/apache2/htdigest 'Tor CRM' $USERNAME

Rotating API tokens

See the donate site docs for this.

Pager playbook

Security breach

If there's a major security breach on the service, the first thing to do is probably to shutdown the CiviCRM server completely. Halt the crm-int-01 and crm-ext-01 machines completely, and remove access to the underlying storage from the attacker.

Then API keys secrets should probably be rotated, follow the Rotating API tokens procedure.

Disaster recovery

If Redis dies, we might lose in-process donations. But otherwise, it is disposable and data should be recreated as needed.

If the entire database gets destroyed, it needs to be restored from backups, by TPA.

Reference

Installation

Full documentation on the installation of this system is somewhat out of scope for TPA: sysadmins only installed the servers and setup basic services like a VPN (using IPsec) and an Apache, PHP, MySQL stack.

The Puppet classes used on the CiviCRM server is roles::civicrm_int_2018. That naming convention reflects the fact that, before donate-neo, there used to be another roled named roles::civicrm_ext_2018 for the frontend, retired in tpo/tpa/team#41511.

Upgrades

As stated above, a new donation campaign involves changes to both the donate-neo site (donate.tpo) and the CiviCRM server.

Changes to the CiviCRM server and donation middleware can be deployed progressively through the test/staging/production sites, which all have their own databases. See the donate-neo docs for deployments of the frontend.

TODO: clarify the deployment workflow. They seem to have one branch per environment, but what does that include? Does it matter for us?

There's a drush script that edits the dev/stage databases to replace PII in general, and in particular change the email of everyone to dummy aliases so that emails sent by accident wouldn't end up in real people's mail boxes.

Upgrades are typically handled by the CiviCRM consultant.

SLA

This service is critical, as it is used to host donations, and should be as highly available as possible. Unfortunately, its design has multiple single point of failures, which, in practice, makes this target difficult to fulfill at this point.

Design and architecture

Services

The CiviCRM service runs on the crm-int-01 server:

software:
- CiviCRM on top of Drupal
- Drupal has a tor_donation module which has the code to receive/process Redis messages and initiate the corresponding actions in CiviCRM
- Apache with PHP FPM
- MariaDB (MySQL) database (Drupal storage backend)
- Redis cache (?)
- Dovecot IMAP server (to handle bounces)
sites:
- crm.torproject.org: production CiviCRM site
- staging.crm.torproject.org: staging site
- test.crm.torproject.org: testing site

The monthly newsletter is configured on CiviCRM and also archived on the https://newsletter.torproject.org static site.

Storage

CiviCRM stores most of its data in a MySQL database. There are separate databases for the dev/staging/prod sites.

TODO: does CiviCRM also write to disk?

Queues

CiviCRM can hold a large queue of emails to send, when a new newsletter is generated. This, in turn, can turn in large Postfix email queues when CiviCRM releases those mails in the email system.

The donate-neo frontend uses Redis to queue up transactions for CiviCRM. See the queue documentation in donate-neo. Queued jobs are de-queued by CiviCRM's Resque Scheduled Job, and crons, logs, monitoring, etc, all use standard CiviCRM tooling.

If the Resque Processor Job gets stuck between it failed to process an item, it will stop processing completely (assuming it's a bug, or something is wrong). It raises a "kill" flag that can be reset by going to Administer > Tor CRM (settings)

Interfaces

Authentication

The crm-int-01 server doesn't talk to the outside internet and can be accessed only via HTTP authentication.

Users that need to access the CRM must be added to the Apache htdigest file on crm-int-01.tpo and have a CiviCRM account created from them.

To extract a list of CiviCRM accounts and their roles, the following drush command may be executed at the root of the Drupal installation:

drush uinf $(drush sqlq "SELECT GROUP_CONCAT(uid) FROM users")

The SSH server is firewalled (rules defined in Puppet, profile::civicrm). To get access to the port, ask TPA.

Implementation

CiviCRM is a PHP application licensed under the AGPLv3, supporting PHP 8.1 and later at the time of writing. We are currently running CiviCRM 5.73.4, released in May 30th 2024 (as of 2024-08-28), the current version can be found in /srv/crm.torproject.org/htdocs-prod/sites/all/modules/civicrm/release-notes.md on the production server (crm-int-01). See also the upstream release announcements, the GitHub tags page and the release management policy.

Upstream also has their own GitLab instance.

CiviCRM has a torcrm extension under sites/all/civicrm_extensions/torcrm which includes most of the CiviCRM customizations, including the Resque Processor job. It replaces the old tor_donate Drupal module, which is being phased out.

Related services

CiviCRM only holds donor information, actual transactions are processed by the donation site, donate-neo.

Issues

Since there are many components, here's a table outlining the known projects and issue trackers for the different sites.

Site	Project	Issues
https://crm.torproject.org	project	issues
https://donate.torproject.org	project	issues
https://newsletter.torproject.org	project	issues

Issues with the server-level issues should be filed or in the TPA team issue tracker.

Upstream CiviCRM has their own StackExchange site and use GitLab issue queues

Maintainer

CiviCRM, the PHP application and the Javascript component on donate-static are all maintained by the external CiviCRM contractors.

Users

Direct users of this service are mostly the fundraising team.

Upstream

Upstream is a healthy community of free software developers producing regular releases. Our consultant is part of the core team.

Monitoring and metrics

As other TPA servers, the CRM servers are monitored by Nagios. The Redis server (and the related IPsec tunnel) is particularly monitored by Nagios, using a special PING check, to make sure both ends can talk to each other.

There's also Prometheus monitoring with graphs rendered by Grafana. This includes an elaborate Postfix dashboard watching to two mailservers.

We have a task open to monitor the CiviCRM health better.

Tests

TODO: what to test on major CiviCRM upgrades?

Logs

The CRM side (crm-int-01.torproject.org) has a similar configuration and sends production environment errors via email.

The logging configuration is in: crm-int-01:/srv/crm.torproject.org/htdocs-prod/sites/all/modules/custom/tor_donation/src/Donation/ErrorHandler.php.

Resque processor logs are in the CiviCRM Scheduled Jobs logs under Administer > System Settings > Scheduled Jobs, then find the "Torcrm Resque Processing" job, then view the logs. There may also be fatal errors logged in the general CiviCRM log, under Administer > Admin Console > View Log.

Middleware logs

The PHP middleware responsible for bridging the Redis queue with CiviCRM logs to syslog on crm-int-01. Those logs can be read using journalctl -t processor. They can be useful to determine the cause of donations being submitted but not showing up in CiviCRM.

Backups

Backups are done with the regular backup procedures except for the MariaDB/MySQL database, which are backed up in /var/backups/local/mysql/. See also the MySQL section in the backup documentation.

Discussion

This section is reserved for future large changes proposed to this infrastructure. It can also be used to perform an audit on the current implementation.

Overview

The CiviCRM deployment is complex and feels a bit brittle. The separation between the CiviCRM backend and the middleware API evolved from an initial strict, two-server setup, into the current three-parts component after the static site frontend was added around 2020. The original two-server separation was performed out of a concern for security: we were worried about exposing CiviCRM to the public, because we felt the attack surface of both Drupal and CiviCRM was too wide to be reasonably defended against a determined attacker.

The downside is, obviously, a lot of complexity, which also makes the service more fragile. The Redis monitoring, for example, was added after we discovered the ipsec tunnel would sometimes fail, which would completely break donations.

Obviously, if either the donation middleware or CiviCRM fails, donations go down as well, so we have actually two single point of failures in that design.

A security review should probably be performed to make sure React, Drupal, its modules, CiviCRM, and other dependencies, are all up to date. Other components like Apache, Redis, or MariaDB are managed through Debian package, and supported by the Debian security team, so should be fairly up to date, in terms of security issues.

TODO: clarify which versions of ~~CiviCRM~~, Drupal, Yarn, NVM, PHP, Redis, and who knows what else are deployed, and whether it matters.

crm