HOME

TheInfoList



OR:

UIMA ( ), short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM. It provides a component software architecture for the development, discovery, composition, and deployment of multi-modal analytics for the analysis of unstructured information and integration with search technologies.


Structure

The UIMA architecture can be thought of in four dimensions: # It specifies component interfaces in an analytics pipeline. # It describes a set of
design patterns ''Design Patterns: Elements of Reusable Object-Oriented Software'' (1994) is a software engineering book describing software design patterns. The book was written by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, with a forewo ...
. # It suggests two data representations: an in-memory representation of annotations for high-performance analytics and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
representation of annotations for integration with remote web services. # It suggests development roles allowing tools to be used by users with diverse skills.


Implementations and uses

Apache UIMA, a
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation o ...
of UIMA, is maintained by the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the ...
. UIMA is used in a number of software projects: *
IBM Research IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research or ...
's
Watson Watson may refer to: Companies * Actavis, a pharmaceutical company formerly known as Watson Pharmaceuticals * A.S. Watson Group, retail division of Hutchison Whampoa * Thomas J. Watson Research Center, IBM research center * Watson Systems, make ...
uses UIMA for analyzing
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
. * The Clinical Text Analysis and Knowledge Extraction System ( Apache cTAKES) is a UIMA-based system for information extraction from medical records. * DKPro Core is a collection of reusable UIMA components for general-purpose natural language processing.


See also

* Data Discovery and Query Builder *
Entity extraction Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
*
General Architecture for Text Engineering General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many nat ...
(GATE) * IBM Omnifind * LanguageWare


References


External links


Apache UIMA home page
{{Apache Software Foundation Apache Software Foundation projects Software architecture Data mining and machine learning software