Fighting Spam

by Chris BeHanna

We all hate spam. Here are some helpful ways to cut down on the amount of spam you receive. I use these, and I block more than 600 attempted spam deliveries per day on a home network connected to the world via ADSL.

The first method is generally only available to people who run some form of UNIX on their home machines. The second method is available to everyone, now that there are Windows tools that use SpamAssassin technology.

Blocking Initial Delivery Attempts

The first technique, if it is available to you, is to block delivery attempts when they first attempt to touch your computer. I run the FreeBSD operating system (a free offshoot of BSD Unix—see http://www.freebsd.org), which includes a recent version of the venerable sendmail mail delivery system. This version, by default, does not allow relaying—the delivery of mail from an outside sender to an outside recipient through your mail gateway. That was the old way that spammers would hijack your equipment to ply their trade.

The current version of sendmail also, by default, rejects incoming mail from non-existent domains (the portion of an address that follows the @ sign). That measure alone blocks more than 95% of the spam that attempts to reach my inbox. It is important to note that this feature does not even allow the piece of spam to reach your hard disk—it is rejected when the delivery attempt is first made. I really like this. :-)

To make use of sendmail’s anti-spam features, you will either need to have your email delivered directly to your machine, or you will have to use some other tool, such as fetchmail, to poll your ISP, download your mail, and fake delivery of it to your local mail system. Setting that up is beyond the scope of this document.

Advanced Delivery Blocking—`/etc/mail/access`

sendmail has another nify, easy-to-use feature: you can configure it to selectively allow or block mail from addresses and/or domains listed in a special file, /etc/mail/access. This file is a table with addresses or domains on one side, and actions to perform on those addresses or domains on the other side. One address or domain is paired with exactly one action.

Example

spammer.com ERROR:550 "We don't accept mail from spammers

generallybad.com ERROR:550 "We don't accept mail from spammers

myfriend@generallybad.com OK

The example will result in the following actions:

All mail originating from spammer.com will be bounced with error code 550 (the reserved email error code for this purpose) and the message, “We don’t accept mail from spammers.” You may use any message you like. I recommend being firm but polite.

Generally, mail originating from generallybad.com will also be bounced with the same error code and message used for mail from spammer.com.

My friend unfortunately has an account with the ISP generallybad.com, and is not himself a spammer, so I make an exception for him with the last rule, and explicitly allow mail through from myfriend@generallybad.com.

Here is a recent copy the anti-spam portion of my /etc/mail/access. Every entry in it was precipitated by receipt of an actual piece of spam, and was carefully verified to be from a likely spammer. I am not into blacklisting entire netblocks on the basis of a single bad actor—there are some jokers on the internet who think they’re cool when they do that, but I find that approach akin to throwing the baby out with the bath water.

Filtering Spam That Slips Through

Even if you are able to use the sendmail anti-spam features described above (and there are many more available, but they start getting into baby and bath water territory very quickly), much spam will still get through by forging delivery headers from real users at real domains (I have received spam from myself, so to speak, in this manner, and some of my friends have received spam that forged the address of a deceased friend of ours—spammers are scum with no honor and no shame.). You can filter this spam in a number of ways, but the most efficient is to filter based upon content, and compare the “signature” or “fingerprint” of that content to known pieces of spam stored in a database. By now, the most famous (free) anti-spam software that does this is SpamAssassin. Alternatives include, but are not limited to, bogofilter and CRM114, the lattermost written by Crash of “Junkyard Wars” fame.

I use SpamAssassin because it was quick and easy to set up, and it is reasonably efficient and accurate. All three filtering systems are trainable—they can learn what is spam and what is not, and thereby reduce the false-positive rate. All of them will let some small amount of spam through, because spammers keep finding new ways around the filters (but the filters catch up in a few days).

All of these tools have configuration information available, so I will not reproduce that here.

Conclusion

No anti-spam measure is foolproof, and those that try too hard often end up blocking large amounts of legitimate mail. The setup I’ve described above errs on the side of caution, yet still blocks roughly 600 spams per day from being delivered to my machine at all. Roughly 70 more spams make it through but are caught by SpamAssassin and get shunted off to spam folders to give me a chance to check for false positives before I delete them en masse. About two or three spams per day actually make it to my inbox, so I'm filtering quite literally more than 99.5% of the spam out of my life using completely free tools. Not bad, eh?

If you would like to comment on this article, you can email me using the username chris. The domain name is behanna.org. I do not provide a clickable mailto link—that is a major way that spambots harvest addresses. Eventually, I’ll put up my send-me-mail form page, but it’s late and my real job beckons.

--Chris

$Date: 2003/07/29 04:03:39 $