Spam Detection Concepts

[ home / articles / Spam Detection Concepts]
The following article is exerpted from Chapter 11 of Postfix: The Definitive Guide. This is the introductory section of the chapter and is meant to give you the conceptual background information you need to create your own spam blocking configuration. The rest of the chapter (not included here) provides more information on the Postfix details.

As long as you're not operating an open relay, you can be confident that your systems are not being used to harm other systems. Your next consideration is to protect yourself and your users by limiting the spam your network receives. Ideally, your mail server could simply reject any message that looks like spam. Unfortunately, whereas humans can look at a message and know instantly that it's spam, computers have a tougher time detecting it without making mistakes. The ugly truth is that once you start to reject spam, there is always a risk that you will block legitimate correspondence.

Misidentifying a legitimate message as spam is referred to as a false-positive identification. Your anti-spam efforts are an attempt to detect as much spam as you can with the fewest possible false-positives. You have to weigh the size of your spam problem against the possibility of rejecting real email when deciding how aggressive to be in implementing your anti-spam measures. The extremes range from permitting all spam to accepting mail only from pre-approved individuals. Pre-approval may seem severe, but the problem is getting bad enough for some people that whitelist applications, where any correspondent you receive mail from must be identified ahead of time, are becoming more common.

There are two primary ways of detecting spam: identifying a known spamming client and inspecting the contents of a message for tell-tale phrases or other clues that reveal the true nature of a spam message. Despite the difficulties, postmasters can achieve some success with minimal false-positives by implementing various spam detection measures.

[Go to comments]

Client-based Spam Detection

Client blocking techniques use IP addresses, hostnames, or email addresses supplied by clients when they connect to deliver a message. Each piece of information supplied can be compared to lists of items from known spamming systems. Spamming systems might be owned by actual spammers, but they might also be unintentionally open relays managed by hapless, (almost) innocent mail administrators. In either case, if a system is regularly sending you spam, you will probably decide to block messages from it. One problem with identifying spam by IP address, hostname, or email address is that these items are easily forged. While the IP address of the connecting system requires some sophistication to spoof, envelope email addresses are trivial to fake.

DNS-based blacklists

In a grass-roots effort to stem the tide of spam on the Internet, various anti-spam services, generally called DNS-based Blacklists (DNSBL) or Realtime Blacklists, have developed. These services maintain large databases of systems that are known to be open relays or that have been used for spam. A newer, increasingly more common problem is with systems that have been hijacked by spammers who install their own proxy software that allows them to relay messages. These hijacked systems can also be used in distributed denial-of-service attacks. There are DNSBL lists that are dedicated to listing these unwitting spam relays. The idea is that by pooling the information from hundreds or thousands of postmasters, legitimate sites can try to stay ahead of spammers.

Usually these systems work by adding a DNS entry to their domain space for each of the IP addresses in their database that have been identified as spam friendly open relays. For example, if the host at IP address 192.168.254.31 has been identified as an open relay, the (fictitious) DNSBL service No Spam Unlimited using a domain name of nospam.example.com creates a DNS entry like 31.254.168.192.nospam.example.com When a client connects to your Postfix system, Postfix can check the No Spam DNS server to see if there is an entry for the client's IP address. If the IP address has been identified as an open relay system, Postfix can reject the message.

Consider very carefully before you decide to make use of a DNSBL service. Many open relays used to forward spam also operate mail services for non-spamming users. You are very likely to block legitimate mail in addition to the spam. Also keep in mind that you are offloading to a third-party the responsibility of making important decisions about who can and cannot send mail to your users. On the other hand, if you're buried in spam, DNSBL services can definitely help. If you decide to use one, review their service options and policies very carefully. Again, you have to balance your aggressiveness and the likelihood of losing legitimate mail against the magnitude of your spam problem.

[Go to comments]

Content-based Spam Detection

In addition to identifying clients, you can often recognize spam by its contents. Certain strings within email messages mark them as likely to be spam ("Our Rates Have Never Been Lower!!"). But trying to distinguish spam by the contents of the message can be problematic. Imagine that you receive lots of spam offering new house mortgages. You figure you can eliminate most of it by blocking messages that contain words like "really low interest rate on a new mortgage." This may indeed block many spam messages, but you might also block a message from your friend (or one of your user's friends) who just got a great deal on a new house and wrote to tell you about it.

Detection Difficulties

The problem with both client- and content-based techniques to identify spam is that spammers are constantly finding ways to get around them. There is a sort of arms race going on between legitimate users of email and spammers. You can compile lists of open relays, but spammers expend a great deal of effort seeking out new open relays or proxy servers to abuse (and there always seem to be more of them).

You may discover that you receive a lot of spam with the same return address. You can block messages that use that return address, but spammers use hit-and-run tactics. They obtain an email address from one of the free email sites and use that address to send thousands or millions of spam messages, and then discard it for another. Within a couple of days, you'll never see the address you listed again.

Even content filters have to adjust for spammers escalating tactics. Some spammers embed HTML codes within the words of their messages to break up phrases you might filter against. Or they encode the entire message so that when Postfix scans it for recognized spam phrases, there are no intelligible phrases. Most email clients oblige users by automatically rendering such messages--decoding or ignoring extraneous HTML codes. Recipients often don't even notice that the message had originally been encoded.

[Go to comments]

Anti-Spam Actions

Broadly speaking you have a few choices once you have detected spam.

  1. Reject spam immediately during the SMTP conversation. Rejecting spam outright is an attractive idea because you never have to store a copy of the message and worry about what to do with it. The sender of the message is responsible for handling the error. If your site has a low tolerance for rejecting legitimate messages, you might prefer to accept suspect messages and develop a process to review them periodically to make sure that there are no good messages in with the bad.

  2. Save spam into a suspected spam repository. If you save the suspect messages and review them periodically, you can be sure that you don't miss any legitimate mail. The task is cumbersome and usually requires frequent reviews, so you may not gain much over allowing suspect messages into users' mail boxes.

  3. Label spam and deliver it with some kind of spam tag. This option provides users with flexibility in determining their own tolerance for spam versus their sensitivity to missing real messages. Postfix doesn't currently have a built-in mechanism for labeling spam. You can easily have Postfix work with an external content filter to handle the labeling (see Chapter 14 "Content Filtering"). If the content filter delivers tagged messages to individual users, they can configure their email software to deal with it according to their own preferences.

When using an MTA for spam detection, the rejection option is usually best. If you want more flexibility, consider using options that filter spam at the MDA or MUA level. A combination of spam filtering is also a good alternative. You can configure Postfix to reject the obvious spam allowing suspicious messages through to the next level where another agent can perform the most appropriate action.

Postfix really excels in its tools to help you identify spam clients and reject them. Rejecting messages with Postfix requires fewer system resources than invoking external filters after the message has been accepted. If you are concerned about losing legitimate mail, there are still a couple of safety measures available that we'll look at when configuring Postfix.

Postfix Configuration

The rest of this chapter discusses the various types of UBE checks Postfix provides. It considers four different categories of spam detection listed below. When configuring Postfix to detect spam, you also specify what to do with messages identified as spam. In general Postfix can reject them outright, separate them into a different queue, or pass them along to an external filter.

  1. Client-detection rules. Four parameter rules that work with pieces of the client identity. Each rule is assigned a list of one or more restrictions that can explicitly reject or accept a message or take no position one way or the other (commonly indicated as DUNNO). For example, you can configure a rule that includes a restriction to reject a particular client IP address.

  2. Syntax-checking parameters. Parameters that check for strict adherence to the standards. Since spammers often don't follow the published standards, you can reject messages that come from misconfigured or poorly implemented systems. Some of the client restrictions also fall under this category.

  3. Content checks. You can check the headers and the body of each message for tell-tale regular expressions that indicate probable spam.

  4. Restriction Classes You can define complex client-detection rules with restriction classes. These allow you to combine restrictions into groups to form new restrictions.

Client-Detection Rules

Postfix provides the following rules that are assigned restrictions based on client information.

  1. smtpd_client_restrictions

  2. smtpd_helo_restrictions

  3. smtpd_sender_restrictions

  4. smtpd_recipient_restrictions

  5. smtpd_data_restrictions

Each one corresponds to a step of the SMTP transaction. At each step, the client provides a piece of information. Using the client-supplied information, Postfix considers one or more restrictions that you assign to each rule. The figure below shows an SMTP conversation along with the client rule applied at each step. The header_checks and body_checks are discussed later in the chapter.

SMTP Conversation

Let's review the SMTP conversation to see where each of the parameters fits in.

[Go to comments]

The SMTP Conversation (briefly)

The SMTP conversation in the figure should be familiar to you from Chapter 1. The logging example below shows the log entries for the transaction. First an SMTP client connects to Postfix over a socket. Because of the way sockets function, Postfix learns the IP address of the client when it establishes the connection. You don't see the client IP address in the figure, but it is logged by Postfix. You can accept or reject a message based on the client hostname or IP address, thus blocking specific hostnames or IP and network addresses.

Once connected, the client sends a HELO command with an identifying hostname. The hostname provided can be used to accept or reject a message using smtpd_helo_restrictions.

In the next step, the client issues a MAIL FROM command to indicate the sender's email address, followed by a RCPT TO command to indicate the recipient's email address.

If everything is acceptable up to the point of the DATA command, the client is permitted to send the contents of the message, which consist of message headers followed by the message body. Postfix provides another opportunity to reject the message based on its contents (see the content checking section later in the chapter). If the final header and body checks are acceptable, the message is delivered.

Postfix indicates to the client that it has rejected a message by sending reply codes. Standard reply codes are described in Chapter 1. In this chapter, we consider codes in the 4xx and 5xx range. More information appears in a sidebar later in the chapter.

SMTP Logging

1. postfix/smtpd[866062]: connect from mail.ora.com[10.143.23.45]
2. postfix/smtpd[866062]: D694B20DD5B: client=[10.143.23.45]
3. postfix/cleanup[864868]: D694B20DD5B: \
  message-id=<20030106185403.D694B20DD5B@smtp.example.com>
4. postfix/qmgr[861396]: D694B20DD5B: from=<info@ora.com>, \
  size=486, nrcpt=1 (queue active)
5. postfix/local[864857]: D694B20DD5B: to=<kdent@smtp.example.com>, \
  relay=local, delay=98, status=sent (mailbox)
6. postfix/smtpd[866062]: disconnect from mail.ora.com[10.143.23.45]

Listing Restrictions

When you assign restrictions to Postfix UBE rules, it is not necessary to use all of the rules. You can define restrictions for the ones you need and leave out the others. The default setting if no rules are set in main.cf looks like the following:

smtpd_client_restrictions =
smtpd_helo_restrictions =
smtpd_sender_restrictions =
smtpd_recipient_restrictions =
     permit_mynetworks, reject_unauth_destination
This prevents your system from being an open relay by allowing any computer on your network to relay while rejecting all others unless they are sending messages destined for one of your users.

There are many restrictions available. The following table lists each one along with the client information it operates on. One important concept that confuses many people at first is that any of these restrictions can be used in any rule. While it may seem logical that check_helo_access should be assigned to smtpd_helo_restrictions, it could equally be assigned to smptd_sender_restrictions or any of the others. This gives you a lot of flexibility in ordering your restrictions when deciding what to accept and what to block.

SMTP rules and restrictions
Restrictions Client-Supplied Information
check_client_access maptype:mapname Client IP address or hostname
reject_rbl_client
reject_rhsbl_client
reject_unknown_client
check_helo_access maptype:mapname HELO hostname
permit_naked_ip_address
reject_invalid_hostname
reject_non_fqdn_hostname
reject_unknown_hostname
check_sender_access maptype:mapname MAIL FROM address
reject_non_fqdn_sender
reject_rhsbl_sender
reject_unknown_sender_domain
check_recipient_access maptype:mapname RCPT TO address
permit_auth_destination
permit_mx_backup
reject_non_fqdn_recipient
reject_unauth_destination
reject_unknown_recipient_domain
reject_unauth_pipelining DATA command

You'll notice from the table that some rules take an argument of the form maptype:mapname. The mapname refers to a normal Postfix lookup table whose right-hand side key is matched against the piece of client information and the left-hand side value is the action to perform. Access maps are discussed in a later section with the restriction definitions.

[Go to comments]

How restrictions work

Each of the non-access map restrictions evaluates to or returns one of three possible values that determine what action Postfix takes with the message: OK, REJECT, and DUNNO. (Access maps can also return the same values, but they provide additional actions as well.) The restrictions are evaluated in the order you list them. During processing, if a rule returns an explicit REJECT, the message is immediately rejected. If a rule returns an explicit OK, the processing stops for that parameter but continues on to the next until all of the assigned rules have been evaluated or Postfix encounters a rejection. It's important to note that a rule might explicitly accept a message, but it can still be rejected by another rule's restrictions. If the set of rules comes to no definite conclusion (all DUNNOs), the default action is to accept the message. Any single parameter can reject a message, but all of them must accept it in order for it not to be rejected.

There are generic restrictions such as permit and reject that return explicit OK or REJECT values without considering any of the client information. These restrictions are described in the access maps section later in the chapter with the restriction definitions.

When a rule evaluates to REJECT, by default Postfix does not actually reject the message until after the client has sent the RCPT TO command. Even though it may know at the HELO command that it's going to reject this client, it waits until after it receives the RCPT TO command before returning the reject code. The reason for this default is that some SMTP clients do not check that they have been rejected during the transaction and continue trying to deliver the message. In such a case, you end up with connections that last longer than they should and several warning messages in your log file. Another advantage to the default is that you get more complete information in your log. If you want to change the default to have a rejection take effect as soon as possible, set the parameter smtpd_delay_reject in main.cf:

smtpd_delay_reject = no
You might want to do this in a controlled environment where you know all of the connecting SMTP clients are well-behaved; otherwise, the default makes sense for most situations.

Testing new restrictions

A useful parameter for testing new restrictions is soft_bounce.

soft_bounce = yes
When it is set, hard reject responses (5xx) are converted to soft reject responses (4xx). When you add a new restriction that you're not sure about, you might want to turn soft_bounce on and then watch your logs for what's rejected so that you can fine tune your settings by the time another delivery attempt is made.

Another useful option for testing restrictions is the warn_if_reject qualifier. Simply precede any restriction with it to have that restriction log a warning instead of rejecting a message. If you're not sure what effect a new restriction will have in your environment, you can try it out with warn_if_reject, and then implement it completely only if it works as you expect.

smtpd_recipient_restrictions =
        permit_mynetworks
        reject_unauth_destination
        warn_if_reject reject_invalid_hostname
        reject_unknown_recipient_domain
        reject_non_fqdn_recipient

In this example, if a client uses an invalid HELO hostname when delivering a message, Postfix logs a warning but still delivers the message (assuming it's not blocked for other reasons).

A simple example

Before moving on the restriction definitions, let's consider a simple example.

smtpd_recipient_restrictions =
     permit_mynetworks
     reject_unauth_destination
     reject_invalid_hostname
     reject_unknown_sender_domain
This example expands on the default configuration with two additional restrictions. When a client connects, if it's from your own network permit_mynetworks returns OK, so it is allowed to send mail. The other restrictions are not checked. If the client is from outside your network, permit_mynetworks does not return OK and does not return REJECT, so it returns DUNNO. Postfix then checks reject_unauth_destination. If the message is not addressed to somebody at one of your destination domains, it returns REJECT; otherwise, it returns DUNNO. Assuming it returns DUNNO, Postfix then checks reject_invalid_hostname, which says to return REJECT if the hostname supplied with the HELO command is not valid. Otherwise it returns DUNNO. Finally Postfix checks reject_unknown_sender_domain, which returns REJECT if the domain name of the address supplied with the MAIL FROM command does not have a valid DNS entry. If none of the restrictions has rejected the message, Postfix accepts it for delivery.

...

Bookmark and Share
   
[Back to Top]

Enter a comment or email me directly if you prefer.

Comments:

Name (optional):
Comment: