Bogofilter
   HOME

TheInfoList



OR:

Bogofilter is a
mail filter Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly ap ...
that classifies
e-mail Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
as
spam Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...
or ham (non-spam) by a
statistical Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally written by
Eric S. Raymond Eric Steven Raymond (born December 4, 1957), often referred to as ESR, is an American software developer, open-source software advocate, and author of the 1997 essay and 1999 book ''The Cathedral and the Bazaar''. He wrote a guidebook for the R ...
after he read Paul Graham's article
A Plan for Spam
and is now maintained together with a group of contributors by David Relson, Matthias Andree and Greg Louis. The statistical technique used is known as Bayesian filtering. Bogofilter's primary algorithm uses the ''f(w)'' parameter and the Fisher inverse chi-square technique that he describes. Bogofilter may be run by a
MDA MDA, mda or variants may refer to: Businesses and organizations Political parties * Meghalaya Democratic Alliance (2003–2008), in India * Meghalaya Democratic Alliance (2018–present), in India * Movement for Democracy in Africa, in Burkina F ...
or
mail client An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email. A web application which provides message management, composition, and reception functio ...
to classify messages as they are delivered to recipient mailboxes, or be used by a MTA to classify messages as they are received from the sending SMTP server. Bogofilter examines tokens in the message body and header, and refers to wordlists stored by
BerkeleyDB Berkeley DB (BDB) is an embedded database software library for key/value data, historically significant in open-source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitrary key/data p ...
,
SQLite SQLite ( "S-Q-L-ite", "sequel-ite") is a free and open-source relational database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it ...
or QDBM to calculate a probability score that a new message is spam. Bogofilter provides processing for plain text and
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
and supports reading multi-part
MIME A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
message including base64,
quoted-printable Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hi ...
, and uuencoded text or HTML. Bogofilter ignores non-text attachments, such as images. It is possible to tune Bogofilter's statistical algorithms by modifying various
coefficient In mathematics, a coefficient is a Factor (arithmetic), multiplicative factor involved in some Summand, term of a polynomial, a series (mathematics), series, or any other type of expression (mathematics), expression. It may be a Dimensionless qu ...
s and other settings in its configuration file, or by using the automated ''bogotune'' utility included with the software, which attempts to optimise various coefficients to maximise filtering efficiency for a particular corpus of spam and non-spam. Standard tests a
TREC 2005
show that Bogofilter compares well to its competitors
spambayes SpamBayes is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson and Tim Peters, among others. The most notable difference b ...
,
CRM114 CRM114 may refer to: * CRM 114 (fictional device) * CRM114 (program) The CRM114 Discriminator, or simply CRM114, is a program based upon a statistical approach for classifying data, and especially used for filtering email spam. Nomenclature The ...
and DSPAM. Other competitors include, but are not limited to Spamprobe and QSF. Bogofilter is written in C, and runs on
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
,
NetBSD NetBSD is a free and open-source Unix-like operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was fork (software development), forked. It continues to ...
,
OpenBSD OpenBSD is a security-focused operating system, security-focused, free software, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking NetBSD ...
,
Solaris Solaris is the Latin word for sun. It may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Sol ...
,
Mac OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
,
HP-UX HP-UX (from "Hewlett Packard Unix") is a proprietary software, proprietary implementation of the Unix operating system developed by Hewlett Packard Enterprise; current versions support HPE Integrity Servers, based on Intel's Itanium architect ...
,
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *Alternate index, for an IBM Virtual Storage Access Method key-sequenced data set * Athens Internet Exchange, a European Internet exchange point Places Belg ...
and other platforms. It is released under the
GNU GPL The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
.


Email clients that can use Bogofilter

The following email clients are known to support Bogofilter as a spam filtering backend: *
GNOME Evolution GNOME Evolution (formerly Novell Evolution and Ximian Evolution, prior to Novell's 2003 acquisition of Ximian) is the official personal information manager for GNOME. It has been an official part of GNOME since Evolution 2.0 was included with th ...
*
Claws Mail Claws Mail is a free and open-source, C_(programming_language), C/GTK-based e-mail client, which is both lightweight and highly configurable. Claws Mail runs on both Microsoft Windows, Windows and Unix-like systems such as Linux, BSD, and Solaris ...
*
KMail Kontact is a personal information manager and groupware software suite developed by KDE. It supports calendars, contacts, notes, to-do lists, news, and email. It offers a number of inter-changeable graphical UIs (KMail, KAddressBook, Akregator, ...
*
Mutt (email client) Mutt is a text-based email client for Unix-like systems. It was originally written by Michael Elkins in 1995 and released under the GNU General Public License version 2 or any later version. The Mutt slogan is "''All mail clients suck. This o ...
* Alpine (email client)


See also

*
Blacklist Blacklisting is the action of a group or authority compiling a blacklist of people, countries or other entities to be avoided or distrusted as being deemed unacceptable to those making the list; if people are on a blacklist, then they are considere ...
* Greylisting *
Whitelist A whitelist or allowlist is a list or register of entities that are being provided a particular privilege, service, mobility, access or recognition. Entities on the list will be accepted, approved and/or recognized. Whitelisting is the reverse of ...
*
Tarpit Tar pits, sometimes referred to as asphalt pits, are large asphalt deposits. They form in the presence of petroleum, which is created when decayed organic matter is subjected to pressure underground. If this crude oil seeps upward via fractures ...


References


External links


Official homepage
*{{Freshmeat, bogofilter, Bogofilter

– An essay by Paul Graham discussing the main ideas behind this program ''This article, or an earlier revision of it, was edited fro
bogofilter's homepage
'' Free email software Anti-spam