From 61e50a6dccf3d2e32ebfa2ace3909122e1785249 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Wed, 20 May 2020 16:41:57 -0400
Subject: [PATCH] note issues with the script

---
 tsa/howto/rt.mdwn | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tsa/howto/rt.mdwn b/tsa/howto/rt.mdwn
index 129918d5..9bddc45a 100644
--- a/tsa/howto/rt.mdwn
+++ b/tsa/howto/rt.mdwn
@@ -255,3 +255,28 @@ folder, moving it to `.spam.learned` or `.xham.learned` when done.
 
 Then, interestingly, those emails are destroyed. It's unclear why that
 is not done in the `spam-learn` step directly.
+
+### Possible improvements
+
+The above design has a few problems:
+
+ 1. it assumes "ham" queues are named "help-*" - but there are other
+    queues in the system
+ 2. it might be slow: if there are lots of emails to process, it will
+    do an SQL query for each and a move, and not all at once
+ 3. it is split over multiple shell scripts, not versioned
+
+I would recommend the following:
+
+ 1. reverse the logic of the queue checks: instead of checking for
+    folders and queues named `help-*`, check if the folders or queues
+    are *not* named `spam*` or `xham*`
+ 2. batch jobs: use a generator to yield Message-Id, then pick a
+    certain number of emails and batch-send them to psql and the
+    rename
+ 3. do all operations at once: look in psql, move the files in the
+    learning folder, and train, possibly in parallel, but at least all
+    in the same script
+ 4. sa-learn can read from a folder now, so there's no need for that
+    wrapper shell script in any case
+ 5. commit the script to version control and, even better, puppet
-- 
GitLab