HOME

TheInfoList



OR:

UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
. It provides a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of
unstructured information Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, nu ...
and integration with
search technologies Search Technologies was a privately held IT services company whose main business involved search engines, big data, consulting and implementation services. The company specialized in a range of search engines including Microsoft SharePoint, the ...
.


Structure

The UIMA architecture can be thought of in four dimensions: # It specifies component interfaces in an analytics
pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
. # It describes a set of
design patterns ''Design Patterns: Elements of Reusable Object-Oriented Software'' (1994) is a software engineering book describing software design patterns. The book was written by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, with a fore ...
. # It suggests two data representations: an in-memory representation of annotations for high-performance analytics and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
representation of annotations for integration with remote web services. # It suggests development roles allowing tools to be used by users with diverse skills.


Implementations and uses

Apache UIMA, a
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
of UIMA, is maintained by the
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
. UIMA is used in a number of software projects: *
IBM Research IBM Research is the research and development division for IBM, an American Multinational corporation, multinational information technology company. IBM Research is headquartered at the Thomas J. Watson Research Center in Yorktown Heights, New York ...
's
Watson Scanning Habitable Environments with Raman and Luminescence for Organics and Chemicals (SHERLOC) is an ultraviolet Raman spectrometer that uses fine-scale imaging and an ultraviolet (UV) laser to determine fine-scale mineralogy, and detect orga ...
uses UIMA for analyzing
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically plain text, text-heavy, but may contain data such ...
. * The Clinical Text Analysis and Knowledge Extraction System ( Apache cTAKES) is a UIMA-based system for information extraction from medical records. * DKPro Core is a collection of reusable UIMA components for general-purpose natural language processing.


See also

* Data Discovery and Query Builder *
Entity extraction Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pr ...
*
General Architecture for Text Engineering General Architecture for Text Engineering (GATE) is a Java suite of natural language processing (NLP) tools for man tasks, including information extraction in many languages. It is now used worldwide by a wide community of scientists, companies, t ...
(GATE) *
IBM Omnifind IBM OmniFind was an enterprise search platform from IBM. It did come in several packages adapted to different business needs, including OmniFind Enterprise Edition, OmniFind Enterprise Starter Edition, and OmniFind Discovery Edition. IBM OmniFind a ...
*
LanguageWare LanguageWare is a natural language processing (NLP) technology developed by IBM, which allows applications to process natural language text. It comprises a set of Java libraries that provide a range of NLP functions: language identification, text ...


References


External links


Apache UIMA home page
{{Apache Software Foundation Apache Software Foundation projects Software architecture Data mining and machine learning software