service/lists: add pii redaction instructions authored by Jérôme Charaoui's avatar Jérôme Charaoui
see tpo/tpa/team#42002
......@@ -227,6 +227,194 @@ accessible for modification.
Note, in particular, how the [list creation procedure](#create-a-list) uses this to
modify the list settings on creation.
## Handling PII redaction requests
Below are instructions for handling a request for redaction of
personally-identifying information (PII) from the mail archive.
The first step is to ensure that the request is lawful and that the requestor
is the true "owner" of the PII involved in the request. For an email address,
this can include asking the requestor to send an email to prove that they
control the email address.
Secondly, the redaction request must be precise and not overly broad. For
example, redacting all instances of "Joe" from the mail archives would probably
not be acceptable.
Once all that is established, the actual redaction can proceed.
If the requests is limited to one or few messages, then the first compliance
option would be to simply delete the messages from the archives. This can be
done using an admin account directly from the web interface.
If the request involves many messages, then a "surgical" redaction is preferred
in order to reduce the collateral damage on the mail archive as a whole. We
must keep in mind that these archives are useful sources of information and
that widespread deletion of messages is susceptible to harm research and
support around the Tor Project.
Such "surgical" redaction is done using SQL statements against the `mailman3`
database directly, as mailman doesn't offer any similar compliance mechanism.
In this example, we'll pretend to handle a request to redact the name "Foo Bar"
and an associated email address, "foo@bar.com":
0. login to `lists-01`, run `sudo -u postgres psql` and `\c mailman3`
1. backup the affected database rows to temporary tables:
CREATE TEMP TABLE hyperkitty_attachment_redact AS
SELECT * FROM hyperkitty_attachment
WHERE
content_type = 'text/html'
and email_id IN
(SELECT id FROM hyperkitty_email
WHERE content LIKE '%Foo Bar%'
OR content LIKE '%foo@bar.com%');
CREATE TEMP TABLE hyperkitty_email_redact AS
SELECT * from hyperkitty_email
WHERE content LIKE '%Foo Bar%'
OR content LIKE '%foo@bar.com.com%';
CREATE TEMP TABLE hyperkitty_sender_redact AS
SELECT * from hyperkitty_sender
WHERE address = 'foo@bar.com';
CREATE TEMP TABLE address_redact AS
SELECT * FROM address
WHERE display_name = 'Foo Bar'
OR email = 'foo@bar.com';
CREATE TEMP TABLE user_redact AS
SELECT * from "user"
WHERE display_name = 'Foo Bar';
2. begin the transaction
BEGIN;
; --- hyperkitty_attachment ---
; redact the name and email in html attachments
; (only if found in plaintext email)
UPDATE hyperkitty_attachment
SET content = convert_to(
replace(
convert_from(content, 'UTF8'),
'Foo Bar',
'[REDACTED]'
),
'UTF8')
WHERE
content_type = 'text/html'
AND email_id IN
(SELECT id FROM hyperkitty_email
WHERE content LIKE '%Foo Bar%');
UPDATE hyperkitty_attachment
SET content = convert_to(
replace(
convert_from(content, 'UTF8'),
'foo@bar.com',
'[REDACTED]'
), 'UTF8')
WHERE
content_type = 'text/html'
AND email_id IN
(SELECT id FROM hyperkitty_email WHERE content LIKE '%foo@bar.com%');
; --- hyperkitty_email ---
; redact the name and email in plaintext emails
UPDATE hyperkitty_email
SET content = REPLACE(content,
'Foo Bar <foo@bar.com>',
'[REDACTED]')
WHERE content LIKE '%Foo Bar <foo@bar.com>%';
UPDATE hyperkitty_email
SET content = REPLACE(content,
'Foo Bar',
'[REDACTED]')
WHERE content LIKE '%Foo Bar%';
UPDATE hyperkitty_email
SET content = REPLACE(content,
'foo@bar.com',
'[REDACTED]')
WHERE content LIKE '%foo@bar.com%';
UPDATE hyperkitty_email -- done
SET sender_name = '[REDACTED]'
WHERE sender_name = 'Foo Bar';
-- obfuscate the sender_id, must be unique
-- combines the two updates to satisfy foreign key constraints:
WITH sender AS (
UPDATE hyperkitty_sender
SET address = encode(sha256(address::bytea), 'hex')
WHERE address = 'foo@bar.com'
RETURNING address
) UPDATE hyperkitty_email
SET sender_id = encode(sha256(sender_id::bytea), 'hex')
WHERE sender_id = 'foo@bar.com';
; --- address ---
; redact the name and email
; email must match the identifier used in hyperkitty_sender.address
UPDATE address -- done
SET display_name = '[REDACTED]'
WHERE display_name = 'Foo Bar';
UPDATE address -- done
SET email = encode(sha256(email::bytea), 'hex')
WHERE email = 'foo@bar.com';
; --- user ---
; redact the name
; use double quotes around the table name
; redact display_name in user table
UPDATE "user"
SET display_name = '[REDACTED]'
WHERE display_name = 'Foo Bar';
3. look around the modified tables, do `COMMIT;` if all good, otherwise `ROLLBACK;`
4. ending the `psql` session discards the temporary tables, so keep it open
5. look at the archives if everything is ok
6. end the `psql` session
To rollback changes after the transaction has been committed to the database,
using the temporary tables:
UPDATE hyperkitty_attachment hka
SET content = hkar.content
FROM hyperkitty_attachment_redact hkar WHERE hka.id = hkar.id;
UPDATE hyperkitty_email hke
SET content = hker.content,
sender_id = hker.sender_id,
sender_name = hker.sender_name
FROM hyperkitty_email_redact hker WHERE hke.id = hker.id;
UPDATE hyperkitty_sender hks
SET address = hksr.address
FROM hyperkitty_sender_redact hksr WHERE hks.mailman_id = hksr.mailman_id;
UPDATE address a
SET email = ar.email,
display_name = ar.display_name
FROM address_redact ar WHERE a.id = ar.id;
UPDATE "user" u
SET display_name = ur.display_name
FROM user_redact ur WHERE u.id = ur.id;
## TODO Pager playbook
<!-- information about common errors from the monitoring system and -->
......
......