Xaira is an XML Aware Indexing and Retrieval Architecture developed at Oxford University, it was funded by the
Mellon Foundation
The Andrew W. Mellon Foundation, commonly known as the Mellon Foundation, is a New York City-based private foundation with wealth accumulated by Andrew Mellon of the Mellon family of Pittsburgh, Pennsylvania. It is the product of the 1969 merger ...
between 2005 and 2006. It is based on SARA,
How to search the BNC using SARA an Standard Generalized Markup Language, SGML-aware text-searching system originally developed for searching the British National Corpus
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention ...
. Xaira has been redeveloped as a generic XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
system for constructing query-systems for any kind of XML data, in particular for use with TEI. The current Windows implementation is intended for non-specialist users. A more sophisticated and open-source version is currently under development. This version supports cross-platform working using standards such as XML-RPC
XML-RPC is a remote procedure call (RPC) protocol which uses XML to encode its calls and HTTP as a transport mechanism.Simon St. Laurent, Joe Johnston, Edd Dumbill. (June 2001) ''Programming Web Services with XML-RPC.'' O'Reilly. First Edition. ...
and SOAP
Soap is a salt (chemistry), salt of a fatty acid (sometimes other carboxylic acids) used for cleaning and lubricating products as well as other applications. In a domestic setting, soaps, specifically "toilet soaps", are surfactants usually u ...
.
See also
* Corpus linguistics
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
* Lemmatisation
Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form.
In computational lingui ...
References
External links
Home page
Preliminary documentation
Sourceforge site
Xaira: an XML Aware Indexing and Retrieval Architecture - Lou Burnard
The British National Corpus
als
an
XML software
Text Encoding Initiative
Markup languages
{{markup-languages-stub