spam
Spam Stopping Solutions
M J Ray
AFFS Treasurer
GNUstep Webmaster
Turo Technology LLP
16th December 2003
Abstract
A round up of some of the options. Please add to this list.
1
1 Classification
For each possibility, consider:
- the "level" that it works on (ex. incoming SMTP)
- probability and consequences of false positives
- and for "collatoral damage"
- the "cost" it adds to emails
2
2 DCCProc
- as the email is delivered to your inbox, it sends a UDP packet containing some "fuzzy hashes" to a DCC server, then adds a header to the email.
- False positives fairly unusual and thresholds can be set locally. Effect depends on your mail client filtering.
- Must remember to "whitelist" mailing lists, as it detects bulk email rather than spam.
- Stateless UDP connections are fairly cheap. If no answer is received quickly enough, it passes the email through.
3
3 POPMail
- Checks through a POP3 mailbox and deletes according to rules, including header filters and sizes.
- False positives causes the email to be deleted, but can be limited by cautious filter setups.
- Deleted email details are recorded in a log file, so you have to monitor that.
- Quite expensive to scan and download headers from all emails in the mailbox, which will be downloaded again by the mail client.
4
4 Bayesian methods
- Does word probability analysis on the email, often inspired by Paul Graham's paper "A Plan for Spam".
- False positives rare, as the probability tables are built from email that you have had.
- Damage depends on the action taken.
- Often seems fairly expensive, keeping a large database of word probabilities and doing fairly intense computation on each email.
5
5 Mailfilter, Procmail et al
- Simple email filtering based on headers or body.
- False positives can be avoided with appropriate care on the filter setup.
- Damage depends on action taken on a failure.
- Fairly well-known and optimised, but only successful if all your spam is easily spotted and can be classified by rules.
6
6 SpamAssassin
- Computes a number of scores based on different tests and averages them.
- False positives fairly rare, but care needed with configuration.
- Damage depends on the user-defined action taken.
- Cost depends on the test performed. Possibly has a "Delphi effect" potential.
7
7 DNSBL
- Sending machine IPs are checked against lists of known spam senders, or known dialup machines.
- False positives fairly common in some lists, rare in others.
- Often implemented on the SMTP machine receiving the email, failing a DNSBL can leave a legitimate sender without a way to contact the machine's owner. Can be implemented on the mail client machine and used to filter into a spam trap, but that is less common.
- Fairly cheap, generally requiring one extra DNS lookup for each blacklist used.
8