HOME

TheInfoList



OR:

Poliqarp is an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
search engine A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
designed to process
text corpora In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in cor ...
, among others the
National Corpus of Polish The National Corpus of Polish (Polish : Narodowy Korpus Języka Polskiego NKJP) is the biggest and the most important text corpus, corpus of the Polish language. A linguistic corpus is a collection of texts where one can find the typical use of a si ...
created at the Institute of Computer Science,
Polish Academy of Sciences The Polish Academy of Sciences (, PAN) is a Polish state-sponsored institution of higher learning. Headquartered in Warsaw, it is responsible for spearheading the development of science across the country by a society of distinguished scholars a ...
.


Features

* Custom
query language A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...
* Two-level
regular expressions A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of character (computing), characters that specifies a pattern matching, match pattern in string (computer science), text. Usually ...
: ** operating at the level of characters in words ** operating at the level of words in statements/paragraphs * Good performance * Compact corpus representation (compared to similar projects) * Portability across operating systems:
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
/
BSD The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
/
Win32 The Windows API, informally WinAPI, is the foundational application programming interface (API) that allows a computer program to access the features of the Microsoft Windows operating system in which the program is running. Programs can acces ...
* Lack of portability across
endianness file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
(current release works only on little endian devices)


References

{{reflist


External links


Polish corpus website (in English)

Project website on SourceForge

Search plugin for Firefox
Information retrieval systems