Spam frittered

Introduction

If you use email, you know about spam: unsolicited email messages that clutter in-boxes (and hard drives) with advertisements, outlandish offers, and things we don't want the kids to see. Internet users have bemoaned spam for years, and legislators and software companies have tried to make it go away. But so far, they've failed - and the problem is getting worse. Tools for keeping spam at bay fall into three categories: features built into email clients, stand-alone utilities that run on the desktop, and server-based subscription tools and services. These products won't make all spam disappear forever, but the right one will bring measurable relief. Once the general concepts behind each type of tool, and the pros and cons of individual applications are understood, you can choose the product that best fits your needs. The goal of all antispam utilities is to automatically delete unwanted email - or at least stash it out of sight. Antispam programs intercept spam in two ways: by examining mail as it arrives at a mail server (usually at an ISP or employer) or by downloading mail to a Mac and examining it there. Each method has its advantages: intercepting spam at the server means less trash downloaded to your Mac or stored in accounts (great with limited space, or when using a modem); utilities that run on a Mac are usually easier to configure and work in conjunction with most email programs. There are three ways in which antispam programs identify junk: Boolean filters, points-based filters, and Bayesian filters. The amount and nature of legitimate email received will determine which method works best. Boolean filters
Think of Boolean filters as black-or-white, yes-or-no rules. To them, a message either is or isn't spam - there's no middle ground. For instance, you can create a filter that automatically deletes any message containing the words herbal and Viagra. This filter will probably work well - at least until Aunt Polly writes a letter about her garden and Uncle Theo's new prescription. Boolean filters are built into many email applications, including Qualcomm's Eudora, Microsoft Entourage, and Netscape Communicator. They work quickly, and they're well suited to organizing mail and creating whitelists - the addresses (usually of friends, family, and mailing lists) you always want to receive mail from, no matter what that mail contains. Relying solely on Boolean filters requires a lot of effort, because hundreds (or thousands) of filters must be created to cope with spam's ever-changing permutations. Points-based filters
Points-based filters also apply fixed criteria to email messages. But these filters aren't of the all-or-nothing variety - instead, they keep score. For example, a filter can assign 50 points to the word herbal and 50 points to the word Viagra; then the antispam utility can set aside messages with more than 50 points. That would make Aunt Polly's message - at 100 points - over the limit. But an additional filter can be set to subtract 250 points from any email she sends - virtually guaranteeing that all mail from her will pass through the filters unscathed. How we tested
To research this article, we tapped into an archive of more than 250,000 spam messages received between 1993 and 2002. Seventy-five per cent of the spam messages we used were collected between December 2001 and December 2002. We created sets of messages, including two sets of 10,000 spam messages – one to train Bayesian filters, and another to test them. For legitimate email, we used email messages received between November 1997 and December 2002; one-third of the messages in each test set was from friends, family, and acquaintances; one-third was related to work; and one-third was from mailing lists. We configured the antispam programs that support whitelists or other processing exceptions for mailing lists and buddies with information for all subscribed mailing lists, as well as the email addresses of everyone who appeared ten or more times in a message set. Points-based systems are more flexible than Boolean systems, but they're often slower (since all rules must be applied to all messages), and it can be tough to determine how rules interact - a lot are needed to account for common forms of spam. Matterform Media's Spamfire and the open-source SpamAssassin are examples of programs that offer points-based filtering. Bayesian filters
A different approach altogether - and the latest rage in antispam technology - is offered by products such as Apple's Mail and Michael Tsai's SpamSieve: Bayesian filters make a list of every word in an email message, and you tell the program whether the message is legitimate. The filter then adds that list of words to one of its two internal catalogues - good words and bad words. As the Bayesian filter adds words, the frequency with which particular terms appear in either legitimate mail or spam trains the filter to differentiate between the two kinds of email. Most Bayesian filters come pre-trained to recognize common spam terms, and after you identify a few hundred good and bad messages, the filters can begin to assess whether a message is legitimate, solely by analyzing the words it contains. Bayesian filters can adapt to new types of spam and legitimate email - when they make a mistake, just correct them. As a result, they become highly individual, so Bayesian filters you've trained won't work as well for someone else. Bayesian filters often require more memory and processing than Boolean or points-based filters. Accidents happen
No matter which filtering method is used, the more diverse email is, the more likely it is that antispam software will produce false positives and false negatives - legitimate messages incorrectly identified as spam, and spam that gets through the filters. False positives are generally much worse than false negatives, but some people might not mind losing some legitimate email in exchange for eliminating all spam. Bandwidth and storage
When choosing an antispam utility, consider its impact on bandwidth and storage. Text, images, and attachments in spam have to be transmitted and received just like every other email message. When receiving email via a modem, those bytes add up in a hurry and go by slowly. Worse, with a metered Internet service - where the bill is determined by the amount of data transmitted and received - you pay to have junk mail delivered. Some antispam utilities don't reduce the bandwidth spam takes up - some increase it. Also consider the storage spam consumes, both on a hard drive and on a mail server. You can't use space taken-up by junk email, and if your mailbox at an ISP fills up with spam, the ISP may reject all email sent until some messages are deleted. Antispam utilities that keep spam off the hard drive, or out of an email account altogether, may be more useful than utilities that download it to a Mac or leave it sitting in the in-box at an ISP. Email clients
You might think that the first place to look for spam-fighting tools is your email program - but although almost every email program offers rules that can perform Boolean filtering, at press time, Apple's Mail 1.2 and Microsoft Entourage X were the only OS X apps that promised features specifically for combating spam. Mail
The only widely used Mac email client to include Bayesian and Boolean filtering, Mail is easy to train - just identify spam messages with the Junk and Not Junk buttons in the mailbox window. Once you've trained the program, Mail's Automatic mode moves suspected spam to its Junk mailbox. But make training choices carefully - aside from repeated training, there's no way to view or modify the data Mail uses to filter junk mail. We used 10,000 legitimate email messages and 10,000 spam messages to train Mail (the number of messages appropriate for testing differs from one application to another, and then we asked it to filter another 20,000 messages, half of which were spam. Mail correctly identified about 75 per cent of incoming spam, and it marked only two legitimate email messages as spam. Mail must download messages from an mail server before applying its filters. It neither reduces the amount of time spam takes to download nor prevents junk mail from getting to the hard drive, but it can automatically delete junk mail after a day, week, or month, or when you quit the program. Entourage
Microsoft Entourage X offers traditional Boolean filtering and the Junk Mail Filter, which is essentially a small collection of Boolean filters and points-based rules that function as a single unit in the program's normal mail filtering. You control the sensitivity of the Junk Mail Filter with a slider, but because there's no way to use the slider to control the sensitivity of individual rules, it's simple but imprecise. In our testing, which involved 3,000 spam messages and 3,000 legitimate messages, Entourage X's Junk Mail Filter at its most sensitive setting identified just over 18 per cent of the spam messages correctly, while incorrectly identifying roughly 13 per cent of the legitimate messages as spam. Entourage must download messages before it can apply filters, so no bandwidth is saved, and filtered spam stays on a hard drive unless a rule is created that deletes it automatically - which we don't recommend with filters this inaccurate. Other tools
Neither Bare Bones Software's Mailsmith nor Eudora have built-in spam filters. However, forthcoming versions of both products will offer improved integration with external spam utilities such as the ones we describe in the next section. Client-side antispam utilities
There are several antispam utilities that offer sophisticated mail-filtering features and interact with a variety of Mac email clients. However, using these tools can be awkward. Because they run as separate programs, they often require that you change filters and the way email is checked. Some also require that you install and use scripts. But for some, the rewards may be worth the effort. Spamfire
Matterform Media's Spamfire 1.3.2 is an add-on utility that takes over the job of checking email. Spamfire logs in to a mail server and applies its points-based rules to mail stored there. (Spamfire comes with a large set of rules, which can be added to.) The application identifies spam, downloads it to a holding area, and then deletes it from the server. The regular email program downloads the remaining messages. Spamfire works with any OS 9 or OS X email program. Spamfire supports whitelists, and since it has to check email independently of the main mail program, it needs account passwords, user names, and server information. If you want to use Spamfire with more than one email account, then coordinating this information between them and Spamfire can be problematic. The Pro version of Spamfire comes with 12 months of online filter updates (essential for this sort of utility). All versions include a Revenge menu with several options, such as filling spammers' server logs with useless information, which makes it hard for them to collate the data they try to gather (by using identifiable links and images in their spam). But although we understand the satisfaction that revenge can bring, Spamfire's Revenge options are unlikely to have a measurable dampening effect. In our tests, with 5,000 legitimate messages and 5,000 spam messages, Spamfire correctly identified 76 per cent of the spam and incorrectly marked less than 3 per cent of the legitimate mail as spam. But Spamfire can be hard on bandwidth: it can download legitimate messages twice, and misidentified messages may make three trips. SpamSieve
Like Spamfire, Michael Tsai's SpamSieve 1.2.2 works as an add-on to Entourage, MailSmith, CTM Development's PowerMail, and Eudora 5.2, but unlike Spamfire, this program can stay within a familiar email application, so there's no need to change the way mail is managed to take advantage of SpamSieve. Supplied AppleScripts tell SpamSieve about good and bad messages. Once SpamSieve's Bayesian filters have been trained, the program automatically filters new mail as it comes in, and you use scripts to continue training SpamSieve about new types of junk and legitimate email. We trained SpamSieve with 10,000 legitimate messages and 10,000 junk messages. SpamSieve correctly identified just over 82 per cent of the spam it received. It also misidentified almost 1 per cent of the legitimate mail. SpamSieve 1.2.2 doesn't let you edit its list of words and scores, but future versions will. Version 1.2.2 offers a pruning function to remove little-used terms, which may help SpamSieve's performance if it gets too slow. SpamSieve's documentation is weak, and integration with Eudora 5.2 is clumsy and unreliable - if already using Eudora's filters, you'll have to edit the script of a second, helper application and rework your filtering to use SpamSieve effectively. But the difficulties are caused by the notification function in Eudora, not SpamSieve. PostArmor
A Java-based app that connects to POP and IMAP servers, P Manna's PostArmor 1.2 applies points-based filters to the headers of mail on servers, and it can delete anything it thinks is spam. The leftovers are downloaded to an email application. Java applications tended to be slow and unstable under OS 9, but PostArmor works well under OS X, as long as you bring a working knowledge of regular expressions (text matching using wildcards, patterns, and ranges of characters instead of fixed terms). On the plus side, the program includes links to common DNS blacklists (see "Blacklist pros and cons"), the ability to check the validity of sender addresses by connecting to the sender's server, and email reports that show which rules are firing and what mail PostArmor has rejected. In our testing, with 3,000 legitimate messages and 3,000 spam messages, PostArmor correctly identified just over 66 per cent of the spam, and it misidentified about 8 per cent of the legitimate email. However, PostArmor's performance, even on a local Ethernet network, was slow: if you routinely receive a lot of email, PostArmor can be frustrating. PostArmor's integration with DNS blacklists is automatic, and there's no way to selectively disable them. PostArmor tries to save bandwidth by downloading only header information, rather than entire messages, from a mailbox. But because PostArmor's rules aren't applied to the bodies of incoming messages, obvious spam can slip through undetected. PostArmor comes with a set of predefined rules, and it's best to add more of your own to handle the specific types of legitimate mail received. MailfilterX
Frank Blome's MailfilterX 0.2.0 adds an OS X interface to Mailfilter 0.40, a Unix utility that can log in to POP accounts. It applies a series of Boolean filters to mail and deletes messages identified as spam from the server. Mailfilter supports whitelists and regular expressions. It can also normalize subjects (so it recognizes "f-r-e-e" as the word free, for example), remove duplicate messages, and filter messages by size. Mailfilter deletes spam from a server's mailbox while downloading as little as possible. Mailfilter is not for those wary of OS X's Terminal application; although MailfilterX puts a bit of a friendly face on the text-based Mailfilter configuration file, Unix and regular-expressions skills are needed to get Mailfilter running and configured meaningfully. Neither Mailfilter nor MailfilterX ships with a default filter set (although some samples are provided), so we couldn't test out-of-the-box effectiveness: you'll have to write rules and hope they're successful. Server-side utilities and services
The desktop isn't the only place to combat spam: many ISPs and organizations can block or label unwanted email before it leaves their servers. Server-side spam filtering can be great for saving bandwidth and keeping an email account within its file-size limit, because spam blocked by the server is never delivered to an account. And there's no need to manage an antispam utility: its care and maintenance are left to those who run the mail server. But server-side spam filtering has its faults. Generally, server-side tools are much less configurable than antispam utilities running on your Mac. The sensitivity may be adjustable for some features, but you usually won't be able to see the rules the server applies to your mail, let alone enable and disable them. Also, there may be no indication that email was blocked: while some systems can shunt potential spam to a Web-based holding area - which must be regularly reviewed for misidentified spam - other systems don't notify you of blocked mail. Though server-side filtering is not for everyone, it's a great option in some circumstances. Your ISP or mail provider may already offer some server-side spam-fighting tools, or you may want to set an address with a provider that offers spam-protected addresses. Check out SpamAssassin stack up to protected forwarding addresses from Pobox and SpamCop.net. Because spammers are always changing their methods, visit the following Web sites occasionally for late-breaking information on new spam trends and ways to fight them. CAUCE The Coalition Against Unsolicited Commercial Email (www.cauce.org) provides information on legislation and other industry news. Spam Abuse For a wide range of general information on the mailbox scourge, and tips aimed at administrators and even marketers who want to use email responsibly, visit http://spam.abuse.net. The Spam-L FAQ This page provides a good explanation of the technical details necessary to trace and report spam: www.claws-and-paws.com/spam-l/index.html. MacinTouch Spam and Scam Resources You can chronicle your own experiences and investigate reader reports at www.macintouch.com/spam.shtml.
Find the best price