Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Wiki Replica
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
The Tor Project
TPA
Wiki Replica
Commits
61e50a6d
Unverified
Commit
61e50a6d
authored
4 years ago
by
anarcat
Browse files
Options
Downloads
Patches
Plain Diff
note issues with the script
parent
14c875b9
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
tsa/howto/rt.mdwn
+25
-0
25 additions, 0 deletions
tsa/howto/rt.mdwn
with
25 additions
and
0 deletions
tsa/howto/rt.mdwn
+
25
−
0
View file @
61e50a6d
...
...
@@ -255,3 +255,28 @@ folder, moving it to `.spam.learned` or `.xham.learned` when done.
Then, interestingly, those emails are destroyed. It's unclear why that
is not done in the `spam-learn` step directly.
### Possible improvements
The above design has a few problems:
1. it assumes "ham" queues are named "help-*" - but there are other
queues in the system
2. it might be slow: if there are lots of emails to process, it will
do an SQL query for each and a move, and not all at once
3. it is split over multiple shell scripts, not versioned
I would recommend the following:
1. reverse the logic of the queue checks: instead of checking for
folders and queues named `help-*`, check if the folders or queues
are *not* named `spam*` or `xham*`
2. batch jobs: use a generator to yield Message-Id, then pick a
certain number of emails and batch-send them to psql and the
rename
3. do all operations at once: look in psql, move the files in the
learning folder, and train, possibly in parallel, but at least all
in the same script
4. sa-learn can read from a folder now, so there's no need for that
wrapper shell script in any case
5. commit the script to version control and, even better, puppet
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment