HOME

TheInfoList



OR:

Xapian is a
free and open-source Free and open-source software (FOSS) is software available under a Software license, license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term ...
probabilistic
information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
library, released under the
GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
(GPL). It is a full-text
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
library for programmers. It is written in C++, with bindings to allow use from
Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...
, Python (2 and 3),
PHP PHP is a general-purpose scripting language geared towards web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by the PHP Group. ...
(5 and 7),
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
Tcl TCL or Tcl or TCLs may refer to: Business * TCL Technology, a Chinese consumer electronics and appliance company ** TCL Electronics, a subsidiary of TCL Technology * Texas Collegiate League, a collegiate baseball league * Trade Centre Limited ...
, C#,
Ruby Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
, Lua, Erlang, Node.js and R. Xapian is highly portable and runs on
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
,
FreeBSD FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
,
NetBSD NetBSD is a free and open-source Unix-like operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was fork (software development), forked. It continues to ...
,
OpenBSD OpenBSD is a security-focused operating system, security-focused, free software, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking NetBSD ...
, Solaris,
HP-UX HP-UX (from "Hewlett Packard Unix") is a proprietary software, proprietary implementation of the Unix operating system developed by Hewlett Packard Enterprise; current versions support HPE Integrity Servers, based on Intel's Itanium architect ...
, AIX,
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
OS/2 OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
and
Hurd GNU Hurd is a collection of microkernel servers written as part of GNU, for the GNU Mach microkernel. It has been under development since 1990 by the GNU Project of the Free Software Foundation, designed as a replacement for the Unix kernel, and ...
, as well as
Tru64 Tru64 UNIX is a discontinued 64-bit UNIX operating system for the Alpha instruction set architecture (ISA), currently owned by Hewlett-Packard (HP). Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corporation (DE ...
. Xapian grew out of the Muscat search engine, written by Dr. Martin F. Porter at the University of Cambridge. The first official release of Xapian was version 0.5.0 on September 20, 2002. Xapian allows developers to add advanced indexing and search facilities to their own applications. Organisations and projects using Xapian include the Library of the University of Cologne,
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
,
Die Zeit (, ) is a German national weekly newspaper published in Hamburg in Germany. The newspaper is generally considered to be among the German newspapers of record and is known for its long and extensive articles. History The first edition of was ...
, MoinMoin, and One Laptop per Child.


Features

* Supports Unicode 9.0 (including codepoints beyond the BMP) and stores indexed text in
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
. * Transactions: if database update fails in the middle of a transaction, the database is guaranteed to remain in a consistent state. * Simultaneous search and update, with new documents being immediately visible. * Support for large databases: Xapian has been proven to scale to hundreds of millions of documents. * Accurate probabilistic ranking: more relevant documents are listed first. * Phrase and proximity searching. * Relevance feedback, which improves ranking and can expand a query, find related documents, categorise documents etc. * Structured Boolean queries, e.g. "race AND condition NOT horse" * Wildcard search, e.g. "wiki*" * Spelling correction * Synonyms * Omega, a packaged solution for adding a search engine to a web site or intranet. Omega can easily be extended and adapted to fit changing requirements.


GUI front-ends

* Recoll written using Qt


See also

* List of information retrieval libraries * Recoll


References


External links

* {{official website Free search engine software