Spam (e-mail)

Spam by e-mail is one type of spamming that involves sending identical or nearly identical messages to thousands (or millions) of recipients. Addresses of recipients are often harvested from Usenet postings or web pages, obtained from databases, or simply guessed by using common names and domains. By definition, spam is sent without the permission of the recipients.

The terms unsolicited commercial email (UCE) and unsolicited bulk email (UBE) are sometimes used as more precise or less slang-like expressions for spam. Most US legislative efforts against spam are tailored to address UCE. A small but noticeable proportion of unsolicited bulk email is not, in fact, also commercial; examples include political advocacy spam and chain letters.

Table of contents

Overview

Sending spam is a violation of the Acceptable Use Policy (AUP) of almost all ISPs, and can lead to the termination of the sender's account. In many jurisdictions, spamming is a crime or an actionable tort, such as in the United States, where the act is regulated by the Can Spam Act of 2003.

Spammers engage in deliberate fraud to send out their messages. Spammers frequently use false names, addresses, phone numbers, and other contact information to set up "disposable" accounts at various Internet service providers. They also often use falsified or stolen credit card numbers to pay for these accounts. This allows them to quickly move from one account to the next as each one is discovered and shut down by the host ISPs.

Spammers go to great lengths to conceal the origin of their messages. They do this by spoofing email addresses (similar to Internet protocol spoofing). The spammer hacks the email message so it looks like it is coming from another email address. Some ISPs and domains require the use of SMTP-AUTH allowing the specific account from which an email originates to be positively identified.

It is not possible to completely spoof an email since the actual connection from the last mailserver's IP address is recorded by your own mailserver; however, the rest of the history of the mailservers the E-mail was sent through can be forged by spammers. But tracing an email messages route is usually fruitless since many ISPs have thousands of customers and identifying just one spammer is tedious.

Spammers frequently seek out and make use of vulnerable third-party systems such as open mail relays and open proxy servers. The SMTP system, used to send email across the Internet, forwards mail from one server to another; mail servers that ISPs run commonly require some form of authentication that the user is a customer of that ISP. Open relays, however, do not properly check who is using the mail server and pass all mail to the destination address, making it quite a bit harder to track down spammers.

Spoofing can have serious consequences for legitimate email users. Not only can their email inboxes get clogged up with "undeliverable" emails in addition to volumes of spam, they can mistakenly be identified as a spammer. Not only may they receive irate email from spam victims, but (if spam victims report the email address owner to the ISP, for example) their ISP may terminate their service for spamming.

Gathering of addresses

In order to send spam, spammers need to obtain the email addresses of the intended recipients. Toward this end, both spammers themselves and list merchants gather huge lists of potential email addresses. Since spam is, by definition, unsolicited, this address harvesting is done without the consent (and frequently against the expressed will) of the address owners. As a consequence, spammers' address lists are of remarkably poor accuracy. A single spam run may target tens of millions of possible addresses -- many of which are invalid, malformed, or undeliverable.

Spam differs from legitimate direct marketing in many ways, one of them being that it costs no more to send to a larger number of recipients than a smaller number. For this reason, there is little pressure upon spammers to limit the number of addresses targeted in a spam run, or to restrict it to persons likely to be interested. One consequence of this fact is that many people receive spam written in languages they cannot read -- a good deal of spam sent to English-speaking recipients is in Chinese or Korean, for instance. Likewise, lists of addresses sold for use in spam frequently contain malformed addresses, duplicate addresses, and addresses of role accounts such as postmaster. [1] (http://rejo.zenger.nl/abuse/emailcd.php)

Email addresses may be harvested from a number of sources. A popular method has been to use email addresses which their owners have published for other purposes. Usenet posts, especially those in archives such as Google Groups, are a frequent target. Simply searching the Web for pages with addresses -- such as corporate staff directories -- can yield thousands of addresses, most of them deliverable. Spammers have also subscribed to discussion mailing lists for the purpose of gathering the addresses of posters. The DNS and WHOIS systems require the publication of technical contact information for all Internet domains; spammers have illegally trawled these resources for email addresses.

Because spammers offload the bulk of their costs onto others, however, they can use even more computationally expensive means to generate addresses. A dictionary attack is an exhaustive attempt to gain access to a resource by trying all possible credentials -- usually, usernames and passwords. Spammers have applied this principle to guessing email addresses -- as by taking common names and generating likely email addresses for them at each of thousands of domain names. [2] (http://www.wired.com/news/infostructure/0,1377,57132,00.html)

A recent, controversial tactic is called "e-pending" -- for the appending of email addresses to direct-marketing databases. Direct marketers normally obtain lists of prospects from sources such as magazine subscriptions and customer lists. By searching the Web and other resources for email addresses corresponding to the names and street addresses in their records, direct marketers can send targeted spam email. However, as with most spammer "targeting", this is imprecise: Users have reported, for instance, receiving solicitations to mortgage their house at a specific street address -- with the address being clearly a business address including mail stop and office number!

Spammers sometimes use various means to confirm addresses as deliverable. For instance, including a Web bug in a spam message written in HTML may cause the recipient's mail client to transmit the recipient's address, or any other unique key, to the spammer's Web site. [3] (http://archive.infoworld.com/articles/hn/xml/00/12/05/001205hnwebbug.xml?p=br&s=3) Likewise, spammers sometimes operate Web pages which purport to remove submitted addresses from spam lists. In several cases, these have been found to subscribe the entered addresses to receive more spam. [4] (http://www.spamhaus.org/removelists.html)

Delivering spam messages

Internet users and system administrators have deployed a vast array of techniques to block, filter, or otherwise banish spam from users' mailboxes. Almost all Internet service providers forbid the use of their services to send spam or to operate spam-support services. Both commercial firms and volunteers run subscriber services dedicated to blocking or filtering spam, such as Brightmail, Postini, and the various DNSBLs. How, then, do spammers still manage to deliver messages which users wish not to receive and network owners wish not to carry?

Using other people's computers

Early on, spammers discovered that if they sent large quantities of spam directly from their ISP accounts, recipients would complain and ISPs would shut their accounts down. Thus, one of the basic techniques of sending spam has been to send it from someone else's computer and network connection. By doing this, spammers protect themselves several ways: they hide their tracks, get others' systems to do most of the work of delivering messages, and direct the efforts of investigators towards the other systems rather than the spammers themselves.

In the 1990s, the most common way spammers did this was to use open mail relays. An open relay is an MTA, or mail server, which is configured to pass along messages sent to it from any location, to any recipient. In the original SMTP mail architecture, this was the default behavior: a user could send mail to practically any mail server, which would pass it along towards the intended recipient's mail server.

The standard was written in an era before spamming when there were few hosts on the internet, and those on the internet abided be a certain level of conduct. While this cooperative, open approach was useful in ensuring that mail was delivered, it was vulnerable to abuse by spammers -- and abused it soon was. Spammers could forward batches of spam through open relays, leaving the job of delivering the messages up to the relays. In response, mail system administrators concerned about spam began to demand that other mail operators configure MTAs to cease being open relays. The first DNSBLs, such as MAPS RBL (http://www.mail-abuse.org/rbl/) and the now-defunct ORBS, aimed chiefly at allowing mail sites to refuse mail from known open relays.

Within a few years, open relays became rare and spammers resorted to other tactics. Chief among these was the use of open proxies. A proxy is a network service for making indirect connections to other network services. The client connects to the proxy and instructs it to connect to a server. The server perceives an incoming connection from the proxy, not the original client. Proxies have many purposes, including Web-page caching, protection of privacy, filtering of Web content, and selectively bypassing firewalls. An open proxy is one which will create connections for any client to any server, without authentication. Like open relays, open proxies were once relatively common, as many administrators did not see a need to restrict access to them.

A spammer can direct an open proxy to connect to a mail server, and send spam through it. The mail server logs a connection from the proxy -- not the spammer's own computer. This provides an even greater degree of concealment for the spammer than an open relay, since most relays log the client address in the headers of messages they pass. Open proxies have also been used to conceal the sources of attacks against other services besides mail, such as Web sites or IRC servers.

Besides relays and proxies, spammers have used other insecure services to send spam. One example is the now-infamous FormMail.pl, a CGI script to allow Web-site users to send email feedback from an HTML form. [5] (http://www.scriptarchive.com/formmail.html) Several versions of this program, and others like it, allowed the user to redirect email to arbitrary addresses. Spam sent through open FormMail scripts is frequently marked by the program's characteristic opening line: "Below is the result of your feedback form."

As spam from proxies and other "spammable" resources grew, DNSBL operators started targeting these as well as open relays. Blocklists such as Blitzed Open Proxy Monitor (http://opm.blitzed.org/info) and Composite Blocking List (http://cbl.abuseat.org/) chiefly target open proxies.

In 2003, spam investigators saw a radical change in the way spammers sent spam. Rather than searching the global network for exploitable services such as open relays and proxies, spammers began creating "services" of their own. By commissioning computer viruses designed to deploy proxies and other spam-sending tools, spammers could harness hundreds of thousands of end-user computers. Most of the major Windows email viruses of 2003, including the Sobig (http://www.symantec.com/avcenter/venc/data/w32.sobig.a@mm.html) and Mimail (http://www.symantec.com/avcenter/venc/data/w32.mimail.a@mm.html) virus families, were spammer viruses: viruses designed expressly to make infected computers available as spamming tools. [6] (http://www.cnn.com/2003/TECH/internet/08/22/sobig.culprit/) [7] (http://www.infoworld.com/article/03/07/11/HNtorjanpeddle-1.html)

Besides sending spam, spammer viruses serve spammers in other ways. Beginning in July 2003, spammers started using some of these same viruses to perpetrate distributed denial-of-service (DDoS) attacks upon DNSBLs and other anti-spam resources. [8] (http://www.spamhaus.org/news.lasso?article=13) Although this was by no means the first time that illegal attacks have been used against anti-spam sites, it was perhaps the first wave of effective attacks. In August of that year, engineering company Osirusoft ceased providing DNSBL mirrors of the SPEWS and other blocklists, after several days of unceasing attack from virus-infected hosts. [9] (http://www.zdnet.com.au/news/communications/0,2000061791,20277794,00.htm) The very next month, DNSBL operator Monkeys.com succumbed to the attacks as well. [10] (http://www.dnsbl.info/forums/topic.asp?TOPIC-ID=12) Other DNSBL operators, such as Spamhaus (http://www.spamhaus.org/), have deployed global mirroring and other anti-DDoS methods to resist these attacks.

As of early 2004, virus-infected hosts remain a major source of spam.

Legality

Accessing privately owned computer resources without the owner's permission is illegal under computer crime statutes in most nations. Deliberate spreading of computer viruses is also illegal in the United States and elsewhere. Thus, some of spammers' most common behaviors are criminal quite independently of the legal status of spamming per se. Even before the advent of laws specifically banning or regulating spamming, spammers have been successfully prosecuted under computer fraud and abuse laws for wrongfully using others' computers.

Consequences for anti-spam methods

The fact that spammers use other people's computers has been an obstacle to some efforts to fight spam. For instance, a number of persons including Microsoft's Bill Gates have proposed "email postage" systems, under which email senders would be required to pay for each message sent. The intention of email postage is to deter spam by making it too expensive to send a large number of messages. However, since spammers already use other people's computers, there is every reason to believe that they would offload the postage charge onto others as well. This would render email postage ineffective at stopping spam.

Using Webmail services

Another common practice of spammers is to create accounts on free webmail services, such as Hotmail, to send spam or to receive emailed responses from potential customers. Because of the amount of mail sent by spammers, they require several email accounts, and use web bots to automate the creation of these accounts. In an effort to cut down on this abuse, many of these services have adopted a system called the captcha: users attempting to create a new account are presented with a graphic of a word, which uses a strange font, on a difficult to read background. Humans are able to read these graphics, and are required to enter the word to complete the application for a new account, while computers are unable to get accurate readings of the words using standard OCR techniques. Blind users of captchas typically get an audio sample.

Spammers have, however, found a means of circumventing this measure. They have set up sites offering free pornography: to get access to the site, a user is presented with a graphic from one of these webmail sites, and required to enter the word. Once the bot has successfully created the account, the user is given access to the porn.

Obfuscating message content

Many spam-filtering techniques work by searching for patterns in the headers or bodies of messages. For instance, a user may decide that all email she receives with the word "Viagra" in the subject line is spam, and instruct her mail program to automatically delete all such messages. To defeat such filters, the spammer may misspell commonly-filtered words, or insert other characters, as in the following examples:

 V1agra  Via'gra  V I A G R A  Vaigra  \/iagra 

The principle of this method is to leave the word readable to a human, but not recognizable to a literally-minded computer program. This is effective up to a point. Eventually, filter patterns become generic enough to recognize the word "Viagra" no matter how misspelled -- or else they target the obfuscation methods themselves, such as insertion of punctuation into unusual places in a word.

HTML-based email gives the spammer more tools to obfuscate text. Inserting HTML comments between letters can foil some filters, as can including text made invisible by setting the font color to white on a white background, or shrinking the font size to the smallest fine print.

As Bayesian filtering has become popular as a spam-filtering technique, spammers have started using methods to weaken it. To a rough approximation, Bayesian filters rely on word probabilities. If a message contains many words which are only used in spam, and few which are never used in spam, it is likely to be spam. To weaken Bayesian filters, some spammers now include lines of irrelevant, random words alongside the sales pitch. A variant on this tactic may be borrowed from the Usenet abuser known as "Hipcrime" -- to include passages from books taken from Project Gutenberg, or nonsense sentences generated with "dissociated press" algorithms. Randomly generated phrases can create spamoetry (spam poetry) or spam art.

Avoiding Spam

Spam can be avoided in several ways.

  • Education: Basic computer literacy should include an understanding of the basics of spamming and spam avoidance.
  • End users should take reasonable precautions in using their email addresses.
  • System administrators should use appropriate tools to keep spam off of their systems.

Perhaps the best way to avoid spam is to avoid giving your email address to spammers, directly or indirectly. Never place your email address on any Web site. Never reply to a spam email, or click an "opt-out" link (this simply confirms that your email address is valid).

If a Web site requests registration in order to allow useful operations, such as posting in Internet forums, a user may give a temporary address that is used only for such a purpose, periodically deleting such temporary email accounts from their email servers. Users should be sure to notify such forums of the new replacement addresses so they can continue to be contacted for valid purposes.

Several tools have been released, both for end users and systems administrators, which automate spam removal by scanning through all emails in search of traits typical of spam.

Tools for end users range in capabilities from tracing and reporting spam to hiding email addresses from spammers to removing and/or blocking spam. These tools include SpamCop (http://www.spamcop.net/), NoSpam (http://www.vrooms.net/), SpamGuard (http://spamguard.net/), and even mail clients, such as the one built in to Mozilla.

Tools for systems administrators allow them to block incoming email from particular spamming IPs, block Usenet spam, block formmail spam, and determine if mail is spam. One of the most popular amongst systems administrators is SpamAssassin. One of the statistically most accurate on the spam corpus is CRM114, which can be integrated into SpamAssassin.

Spamgourmet, quite unknown, but very powerful, takes a completely different approach, and offers free disposable e-mail addresses. The project was "created by folks who've been driven rabid by spam since 1993 or so" (quote from their FAQ (http://www.spamgourmet.com/disposableemail.pl?printpage=faq.html)). All the code they've written is open source.

 

Back to Marketing Guide Index

 

 

Like This? Please Share It!