Blocking newsletter spam

It's incredible how many people are of the misguided belief that just because I happen to run a company, I am automatically interested in their newsletter about whatever it is that they are doing, no matter how far it is removed from the kinds of things my company actually does.

Are these people spammers? Yes, definitely, and I don't want to do business with them. But there's a major difference between this kind of mails and your common nigerian scammer or counterfeit blue pill "salesperson". Unlike the latter, some newsletter spammers are interested in forming a genuine business relationship with my company. They're going about it the wrong way, but that doesn't necessarily mean they're trying to trick me into doing something that would not be in my best interest—they're not just after my money.

Although their methods are wrong, that does not mean they're entirely clueless. Some of these unwanted newsletters are sent with VERP-style return paths, which suggests that if the mail bounces at SMTP time, I would no longer receive their junk. So bouncing them is what I do. Exim makes this very easy:

acl_check_mail:
  deny
	message = Your domain has been blacklisted
	log_message = domain blacklisted
	condition = ${
			lookup{$sender_address_domain}
			wildlsearch{/etc/exim4/blacklist-domains}
			{true}
			{false}
		     }
  accept

What this does is use a wildlsearch lookup to verify whether the domain of the envelope sender (i.e., as specified in the MAIL FROM: SMTP command) exists in the /etc/exim4/blacklist-domains file. Since we use a wildlsearch, we can use the * as a wildcard—*grep.be would mean 'grep.be, or any of its subdomains', whereas *.grep.be would mean 'any of the grep.be subdomains'. This is because at least one of the people I've blacklisted that way sends their newsletter through a distributed service, and the VERP-style header is based upon the server that actually communicates with my system; and others have a subdomain for the newsletter, but don't use it (or use a different one) for regular mail. If I'm not interested in their spam, I'm probably also not interested in their other mail, so therefore the wildcard (is this overkill? Maybe, but I don't care—I don't do business with spammers).

This ACL is then activated for the SMTP MAIL FROM: command (search for acl_smtp_mail variable in the exim specification). This makes it impossible for the spammer to reach postmaster@ from the same domain, too, but that doesn't matter; they can always use a different address.

One might be wondering why I'm using this kind of domain-based blacklisting rather than a regular bayesian spamfilter, or anything of the sorts. The reason is fairly simple: because the general format of these newsletters is distinctly different from regular spam. For instance, some of these newsletter spammers are in fact competitors who didn't bother to check who they're sending mail to. As a result, their newsletters would contain key words that would appear in mails which I send to my regular customers, too; if I were to classify them as spam in my bayesian classifier, that would increase the chance of the classifier misclassifying a mail from an actual customer as spam. Most of these are very similar in format to newsletters that I did consciously subscribe to, and which are therefore not spam, etc.

Finally, bouncing mail rather than blackholing it or filing it in a separate folder (as I have spamassassin do) has the added advantage of making it clear to a newsletter spammer that their junk is not wanted. Most (though certainly not all) will then remove me from their newsletter, saving me bandwith and processing power. And since we do this at MAIL FROM: time, rather than upon completion of the RCPT TO: or DATA commands, I'm not actually giving away any information that they don't have, either.