Apache SpamAssassin is a
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
used for
e-mail spam filtering. It uses a variety of spam-detection techniques, including
DNS and
fuzzy checksum techniques,
Bayesian filtering, external programs, blacklists and online databases. It is released under the
Apache License 2.0 and is a part of the
Apache Foundation since 2004.
The program can be integrated with the
mail server to automatically filter all mail for a site. It can also be run by individual users on their own mailbox and integrates with several
mail programs. Apache SpamAssassin is highly configurable; if used as a system-wide filter it can still be configured to support per-user preferences.
History
Apache SpamAssassin was created by Justin Mason, who had maintained a number of patches against an earlier program named ''filter.plx'' by Mark Jeftovic, which in turn was begun in August 1997. Mason rewrote all of Jeftovic's code from scratch and uploaded the resulting codebase to
SourceForge on April 20, 2001.
In Summer 2004 the project became an
Apache Software Foundation
The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
project and later officially renamed to ''Apache SpamAssassin''.
Methods of usage
Apache SpamAssassin is a
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
-based application ( in
CPAN) which is usually used to filter all incoming mail for one or several users. It can be run as a
standalone application or as a subprogram of another application (such as a
Milter,
SA-Exim,
Exiscan,
MailScanner,
MIMEDefang,
Amavis) or as a
client () that communicates with a
daemon (). The client/server or embedded mode of operation has performance benefits, but under certain circumstances may introduce additional security risks.
Typically either variant of the application is set up in a generic
mail filter program, or it is called directly from a
mail user agent
The mail or post is a system for physically transporting postcards, letters, and parcels. A postal service can be private or public, though many governments place restrictions on private systems. Since the mid-19th century, national postal sy ...
that supports this, whenever new mail arrives. Mail filter programs such as
procmail can be made to
pipe all incoming mail through Apache SpamAssassin with an adjustment to a user's file.
Operation
Apache SpamAssassin comes with a large set of rules which are applied to determine whether an email is spam or not. Most rules are based on
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s that are matched against the body or header fields of the message, but Apache SpamAssassin also employs a number of other spam-fighting techniques. The rules are called "tests" in the SpamAssassin documentation.
Each test has a score value that will be assigned to a message if it matches the test's criteria. The scores can be positive or negative, with positive values indicating "spam" and negative "ham" (non-spam messages). A message is matched against all tests and Apache SpamAssassin combines the results into a global score which is assigned to the message. The higher the score, the higher the probability that the message is spam.
Apache SpamAssassin has an internal (configurable) score threshold to classify a message as spam. Usually a message will only be considered as spam if it matches multiple criteria; matching just a single test will not usually be enough to reach the threshold.
If Apache SpamAssassin considers a message to be spam, it can be further rewritten. In the default configuration, the content of the mail is appended as a
MIME
A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the tests passed and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious.
Apache SpamAssassin allows for a per-user configuration of its behavior, even if installed as system-wide service; the configuration can be read from a file or a database. In their configuration users can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and Apache SpamAssassin then assigns a higher score to all mails that appear to be written in another language.
Apache SpamAssassin is based on heuristics (pattern recognition), and such software exhibits false positives and false negatives.
Network-based filtering methods
Apache SpamAssassin also supports:
*
DNS-based blacklists and
DNS-based whitelists
* Fuzzy-checksum-based spam detection filters such as the
Distributed Checksum ClearinghouseVipul's Razor and the Cloudmark Authority plugins (commercial)
*
Hashcash email stamps based on
proof-of-work
*
Sender Policy Framework and
DomainKeys Identified Mail
*
URI blacklists such as
SURBL
SURBL (stands for Spam URI Realtime Block List) is a collection of URI DNSBL lists of Uniform Resource Identifier (URI) hosts, typically web site domains, that appear in unsolicited messages or other data. SURBL can be used to check data against k ...
o
URIBLwhich track spam websites
More methods can be added reasonably easily by writing a Perl plug-in for Apache SpamAssassin.
Bayesian filtering
Apache SpamAssassin reinforces its rules through
Bayesian filtering where a user or administrator "feeds" examples of good (ham) and bad (spam) into the filter in order to learn the difference between the two. For this purpose, Apache SpamAssassin provides the command-line tool , which can be instructed to learn a single mail or an entire mailbox as either ham or spam.
Typically, the user will move unrecognized spam to a separate folder, and then run on the folder of non-spam and on the folder of spam separately. Alternatively, if the mail user agent supports it, can be called for individual emails. Regardless of the method used to perform the learning, SpamAssassin's Bayesian test will help score future e-mails based on this learning to improve the accuracy.
Licensing
Apache SpamAssassin is
free/
open source software, licensed under the
Apache License 2.0. Versions prior to 3.0 are dual-licensed under the
Artistic License and the
GNU General Public License
The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
.
Many commercially available anti-spam packages integrate SpamAssassin as part of their products, such as SpamKiller by
McAfee
McAfee Corp. ( ), formerly known as McAfee Associates, Inc. from 1987 to 1997 and 2004 to 2014, Network Associates Inc. from 1997 to 2004, and Intel Security Group from 2014 to 2017, is an American proprietary software company focused on online ...
and
Kerio MailServer by Kerio.
sa-compile
sa-compile
is a utility distributed with Apache SpamAssassin that compiles a SpamAssassin ruleset into a
deterministic finite automaton that allows Apache SpamAssassin to use processor power more efficiently.
Testing
Apache SpamAssassin is designed to trigger on the
GTUBE, a 68-byte string similar to the antivirus
EICAR test file. If this string is inserted in an RFC 5322 formatted message and passed through the Apache SpamAssassin engine, Apache SpamAssassin will trigger with a weight of 1000.
See also
*
Anti-spam techniques
Notes
References
*
*
External links
*
Apache SpamAssassin WikiApache SpamAssassin Rule Updates WikiAutomatically updating Apache SpamAssassin
KAM.cfKAM Ruleset for Apache SpamAssassin
{{DEFAULTSORT:Spamassassin
SpamAssassin
Cross-platform software
Free email software
Free software programmed in Perl
Anti-spam
Spamming
Email-related software for Linux
2001 software