lists: document search engine woes (#41957) authored by anarcat's avatar anarcat
...@@ -413,6 +413,36 @@ A list of addresses is stored in `/var/spool/postfix/mailman3` for ...@@ -413,6 +413,36 @@ A list of addresses is stored in `/var/spool/postfix/mailman3` for
Postfix to know about mailing lists. There's the trace of a SQLite Postfix to know about mailing lists. There's the trace of a SQLite
database there, but it is believed to be stale. database there, but it is believed to be stale.
### Search engine
The search engine shipped with Mailman is built with
[Django-Haystack](https://django-haystack.readthedocs.io/), whose default backend is [Whoosh](https://whoosh.readthedocs.io/).
In February 2025, we've experimented with switching to [Xapian](https://xapian.org/),
through the [Xapian Haystack plugin](https://github.com/notanumber/xapian-haystack/) instead because of severe
performance problems that were attributed to search
([tpo/tpa/team#41957](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957)). This involved changing the configuration
(see puppet-control@f9b0206ff) and rebuilding the index with the
[`update_index` command](https://django-haystack.readthedocs.io/en/master/management_commands.html#update-index):
date; time sudo -u www-data nice ionice -c 3 /usr/share/mailman3-web/manage.py update_index ; date
Note how we wrap the call in time(1) (to track resource usage),
date(1) (to track run time), nice(1) and ionice(1) (to reduce server
load). This works because the Xapian index was empty: to rebuild the
index from scratch, we'd need the [`rebuild_index`](https://django-haystack.readthedocs.io/en/master/management_commands.html#rebuild-index) command.
This also involved *patching* the `python3-xapian-haystack` package,
as it would otherwise crash ([Hyperkitty issue 408](https://gitlab.com/mailman/hyperkitty/-/issues/408)). We used a variation of [upstream PR
181](https://github.com/notanumber/xapian-haystack/pull/181).
The index for a single mailing list can be rebuilt with:
sudo -u www-data /usr/share/mailman3-web/manage.py update_index_one_list test@lists.torproject.org
For large lists, a similar approach to the larger indexing should be
used.
## Queues ## Queues
Mailman seems to store Python objects of in-flight emails (like Mailman seems to store Python objects of in-flight emails (like
... ...
......