Skip to content
Snippets Groups Projects
Unverified Commit 61e50a6d authored by anarcat's avatar anarcat
Browse files

note issues with the script

parent 14c875b9
No related branches found
No related tags found
No related merge requests found
......@@ -255,3 +255,28 @@ folder, moving it to `.spam.learned` or `.xham.learned` when done.
Then, interestingly, those emails are destroyed. It's unclear why that
is not done in the `spam-learn` step directly.
### Possible improvements
The above design has a few problems:
1. it assumes "ham" queues are named "help-*" - but there are other
queues in the system
2. it might be slow: if there are lots of emails to process, it will
do an SQL query for each and a move, and not all at once
3. it is split over multiple shell scripts, not versioned
I would recommend the following:
1. reverse the logic of the queue checks: instead of checking for
folders and queues named `help-*`, check if the folders or queues
are *not* named `spam*` or `xham*`
2. batch jobs: use a generator to yield Message-Id, then pick a
certain number of emails and batch-send them to psql and the
rename
3. do all operations at once: look in psql, move the files in the
learning folder, and train, possibly in parallel, but at least all
in the same script
4. sa-learn can read from a folder now, so there's no need for that
wrapper shell script in any case
5. commit the script to version control and, even better, puppet
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment