note issues with the script

61e50a6d · anarcat · 14c875b9 · 61e50a6d
Unverified Commit 61e50a6d authored 4 years ago by anarcat
--- a/tsa/howto/rt.mdwn
+++ b/tsa/howto/rt.mdwn
@@ -255,3 +255,28 @@ folder, moving it to `.spam.learned` or `.xham.learned` when done.

 Then, interestingly, those emails are destroyed. It's unclear why that
 is not done in the `spam-learn` step directly.
+
+### Possible improvements
+
+The above design has a few problems:
+
+ 1. it assumes "ham" queues are named "help-*" - but there are other
+    queues in the system
+ 2. it might be slow: if there are lots of emails to process, it will
+    do an SQL query for each and a move, and not all at once
+ 3. it is split over multiple shell scripts, not versioned
+
+I would recommend the following:
+
+ 1. reverse the logic of the queue checks: instead of checking for
+    folders and queues named `help-*`, check if the folders or queues
+    are *not* named `spam*` or `xham*`
+ 2. batch jobs: use a generator to yield Message-Id, then pick a
+    certain number of emails and batch-send them to psql and the
+    rename
+ 3. do all operations at once: look in psql, move the files in the
+    learning folder, and train, possibly in parallel, but at least all
+    in the same script
+ 4. sa-learn can read from a folder now, so there's no need for that
+    wrapper shell script in any case
+ 5. commit the script to version control and, even better, puppet