spam

M J Ray
AFFS Treasurer
GNUstep Webmaster
Turo Technology LLP

16th December 2003

Abstract

A round up of some of the options. Please add to this list.

For each possibility, consider:

as the email is delivered to your inbox, it sends a UDP packet containing some "fuzzy hashes" to a DCC server, then adds a header to the email.
False positives fairly unusual and thresholds can be set locally. Effect depends on your mail client filtering.
Must remember to "whitelist" mailing lists, as it detects bulk email rather than spam.
Stateless UDP connections are fairly cheap. If no answer is received quickly enough, it passes the email through.

Checks through a POP3 mailbox and deletes according to rules, including header filters and sizes.
False positives causes the email to be deleted, but can be limited by cautious filter setups.
Deleted email details are recorded in a log file, so you have to monitor that.
Quite expensive to scan and download headers from all emails in the mailbox, which will be downloaded again by the mail client.

Does word probability analysis on the email, often inspired by Paul Graham's paper "A Plan for Spam".
False positives rare, as the probability tables are built from email that you have had.
Damage depends on the action taken.
Often seems fairly expensive, keeping a large database of word probabilities and doing fairly intense computation on each email.

Simple email filtering based on headers or body.
False positives can be avoided with appropriate care on the filter setup.
Damage depends on action taken on a failure.
Fairly well-known and optimised, but only successful if all your spam is easily spotted and can be classified by rules.

Sending machine IPs are checked against lists of known spam senders, or known dialup machines.
False positives fairly common in some lists, rare in others.
Often implemented on the SMTP machine receiving the email, failing a DNSBL can leave a legitimate sender without a way to contact the machine's owner. Can be implemented on the mail client machine and used to filter into a spam trap, but that is less common.
Fairly cheap, generally requiring one extra DNS lookup for each blacklist used.