Recoll is a
desktop search tool that provides
full-text search
In Document retrieval, text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of ...
in a
GUI with a few mandatory external dependencies. It runs on many
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
-like
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s and is mostly independent of the
desktop environment
In computing, a desktop environment (DE) is an implementation of the desktop metaphor made of a bundle of programs running on top of a computer operating system that share a common graphical user interface (GUI), sometimes described as a graphi ...
. Recoll has been ported to
OS/2
OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
, and is planned for integration into the OS/2-based
ArcaOS
ArcaOS is a Proprietary software, proprietary operating system based on OS/2, developed and marketed by Arca Noae, LLC under license from IBM. It was first released in 2017 and builds on OS/2 Warp 4.52 by adding support for new hardware, fixing ...
.
Recoll was designed not to require a permanent
daemon
A demon is a malevolent supernatural being, evil spirit or fiend in religion, occultism, literature, fiction, mythology and folklore.
Demon, daemon or dæmon may also refer to:
Entertainment Fictional entities
* Daemon (G.I. Joe), a character ...
; on
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
systems, it can make use of
inotify
inotify (inode notify) is a Linux kernel subsystem created by John McCutchan, which monitors changes to the filesystem, and reports those changes to applications. It can be used to automatically update directory views, reload configuration files, ...
. Recoll updates its index at designed intervals (for example, through
cronjobs), but if desired, the indexing task can run as a file-system monitoring daemon for real-time index updates.
Features
*
Qt GUI.
*
Xapian backend.
* Indexes the contents of many document types: text,
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
,
email
Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
stores of all kinds,
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
,
Microsoft Office
Microsoft Office, MS Office, or simply Office, is an office suite and family of client software, server software, and services developed by Microsoft. The first version of the Office suite, announced by Bill Gates on August 1, 1988, at CO ...
and
Office Open XML
Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version ...
,
AbiWord,
KWord,
Gaim,
Lyx
LyX (styled as LYX; pronounced ) is an open-source software, open source, graphical user interface document processor based on the LaTeX typesetting system. Unlike most word processors, which follow the WYSIWYG ("what you see is what you get") ...
,
Scribus
Scribus () is free and open-source desktop publishing (DTP) software available for most desktop operating systems. It is designed for layout, typesetting, and preparation of files for professional-quality image-setting equipment. Scribus can a ...
,
PDF
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
,
WordPerfect
WordPerfect (WP) is a word processing application, now owned by Alludo, with a long history on multiple personal computer platforms. At the height of its popularity in the 1980s and early 1990s, it was the market leader of word processors, disp ...
,
PostScript
PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
,
RTF,
TeX
Tex, TeX, TEX, may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Tex Earnhardt (1930–2020), U.S. businessman
* Joe Tex (1933–1982), stage name of American soul singer ...
,
DVI,
DjVu,
MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany under the lead of Karlheinz Brandenburg. It was designed to greatly reduce the amount ...
and other audio file formats,
JPEG
JPEG ( , short for Joint Photographic Experts Group and sometimes retroactively referred to as JPEG 1) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degr ...
and other image file formats.
* Recursively processes embedded documents (
email
Electronic mail (usually shortened to email; alternatively hyphenated e-mail) is a method of transmitting and receiving Digital media, digital messages using electronics, electronic devices over a computer network. It was conceived in the ...
attachments,
zip archives) to arbitrary depths.
* Query facilities with boolean searches, wildcards, phrases, proximity, and filters on file types and directory trees.
* GUI Boolean search build tool.
* Xesam query language support.
* Word
stemming is performed at query time (you can switch stemming language after indexing).
* Multiple indexes are selectable at query time (i.e., personal + system indexes).
* Natively based on Unicode. Supports many languages and character sets, including good support for East Asian texts (
CJK).
*
MD5
The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4, and was specified in 1992 as Request for Comments, RFC 1321.
MD5 ...
document hashes for the elimination of duplicates in results.
* Batch and real-time indexing modes.
*
Python API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
.
*
GNOME Shell search provider, WEB interface, and
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements curr ...
history extensions.
File type supported
File types indexed natively
* Text.
* Html.
*
Maildir
The Maildir e-mail format is a common way of storing email messages on a file system, rather than in a database. Each message is assigned a Computer file, file with a unique name, and each mail folder is a file system directory containing these fil ...
, MH, and mailbox (Mozilla, Thunderbird, and Evolution). Evolution requires .cache to be removed from the skippedNames list in the GUI Indexing preferences/Local Parameters/ Pane to index local copies of IMAP mail.
* Gaim and purple log files.
* Scribus files.
*
Man page
A man page (short for manual page) is a form of software documentation found on Unix and Unix-like operating systems. Topics covered include programs, system libraries, system calls, and sometimes local system details. The local host administr ...
s (needs Groff).
* Mimehtml web archive format (support based on the mail filter).
* All the following need Python 3:
** Dia diagrams.
** Excel and PowerPoint (pre-open XML).
** Tar archives. Tar file indexing is disabled by default given that tar archives don't typically contain the kind of documents that people search for, so it needs to be enabled explicitly with "
ndex or "application/x-tar=execm rcltar" in a $HOME/.recoll/mimeconf file.
** Zip archives.
** Konqueror web archive format (uses the tarfile Python standard library module).
File types indexed with external helpers
* PDF files.
* MS-Word files.
* Wordperfect files.
* RTF files.
* Image and audio file tags.
* Abiword files.
* Fb2, Epub, and CHM ebooks.
* Kword files.
* Microsoft Office traditional and Open XML files.
* OpenOffice files.
* SVG files.
* Okular annotations files.
* HWP files (without page numbering).
See also
*
Desktop search
*
List of desktop search engines
References
External links
*
{{Navigationbox Desktopsearch
Desktop search engines
Software that uses Qt