Xapian is a
free and open-source
Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
probabilistic
information retrieval library, released under the
GNU General Public License
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end user
In product development, an end user (sometimes end-user) is a person who ultimately uses or is intended to ulti ...
(GPL).
It is a full-text
search engine
A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
library for programmers.
It is written in
C++, with bindings to allow use from
Perl
Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
,
Python (2 and 3),
PHP
PHP is a General-purpose programming language, general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementati ...
(5 and 7),
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
Tcl,
C#,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
,
Lua
Lua or LUA may refer to:
Science and technology
* Lua (programming language)
* Latvia University of Agriculture
* Last universal ancestor, in evolution
Ethnicity and language
* Lua people, of Laos
* Lawa people, of Thailand sometimes referred t ...
,
Erlang,
Node.js and
R.
Xapian is highly portable and runs on
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
,
OS X
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and la ...
,
FreeBSD
FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
,
NetBSD
NetBSD is a free and open-source Unix operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was forked. It continues to be actively developed and is a ...
,
OpenBSD
OpenBSD is a security-focused operating system, security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking N ...
,
Solaris,
HP-UX
HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrit ...
,
AIX,
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
,
OS/2
OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 ...
[ and ]Hurd
GNU Hurd is a collection of microkernel servers written as part of GNU, for the GNU Mach microkernel. It has been under development since 1990 by the GNU Project of the Free Software Foundation, designed as a replacement for the Unix kernel, an ...
, as well as Tru64
Tru64 UNIX is a discontinued 64-bit UNIX operating system for the Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corporation ( ...
. Xapian grew out of the Muscat search engine, written by Dr. Martin F. Porter at the University of Cambridge. The first official release of Xapian was version 0.5.0 on September 20, 2002.
Xapian allows developers to add advanced indexing and search facilities to their own applications.
Organisations and projects using Xapian include the Library of the University of Cologne, Debian
Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of De ...
, Die Zeit
''Die Zeit'' (, "The Time") is a German national weekly newspaper published in Hamburg in Germany. The newspaper is generally considered to be among the German newspapers of record and is known for its long and extensive articles.
History
Th ...
, MoinMoin
MoinMoin is a wiki engine implemented in Python, initially based on the PikiPiki wiki engine. Its name is a play on the North German greeting ''Moin'', repeated as in WikiWiki. The MoinMoin code is licensed under the GNU General Public License ...
, and One Laptop per Child.
Features
* Supports Unicode 9.0 (including codepoints beyond the BMP) and stores indexed text in UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
.
* Transactions: if database update fails in the middle of a transaction, the database is guaranteed to remain in a consistent state.
* Simultaneous search and update, with new documents being immediately visible.
* Support for large databases: Xapian has been proven to scale to hundreds of millions of documents.
* Accurate probabilistic ranking: more relevant documents are listed first.
* Phrase and proximity searching.
* Relevance feedback Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not t ...
, which improves ranking and can expand a query, find related documents, categorise documents etc.
* Structured Boolean queries, e.g. "race AND condition NOT horse"
* Wildcard search, e.g. "wiki*"
* Spelling correction
* Synonyms
* Omega, a packaged solution for adding a search engine to a web site or intranet. Omega can easily be extended and adapted to fit changing requirements.
GUI front-ends
* Recoll written using Qt
See also
* List of information retrieval libraries
* Recoll
References
External links
* {{official website
Free search engine software