In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, a U-form is an
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a '' user'', of the data, specifically in terms of possible values, po ...
comprising a
collection of
attribute–value pairs associated with a
universally-unique identifier (UUID). A U-form essentially comprises an
associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
augmented with a UUID and with keys limited to strings.
The UUID that is associated with a u-form is immutable, however all data "contained" in the u-form are mutable (including the keys/names).
The mutability of contained data combined with an immutable identifier make implementations of fully mutable, replicable digital objects possible.
This has applications in distributed computing, non-relational database systems, information visualization, and knowledge representation systems.
Navigational database
A navigational database is a type of database in which records or objects are found primarily by following references from other objects. The term was popularized by the title of Charles Bachman's 1973 Turing Award paper, ''The Programmer as Navi ...
s,
Entity
An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually r ...
and
associative entity
An associative entity is a term used in relational and entity–relationship theory. A relational database requires the implementation of a base relation (or base table) to resolve many-to-many relationships. A base relation representing this ...
relationships can be implemented by using a UUID, or multiple UUIDs, as attribute values.
The u-form's design goals center around supporting an open, extensible distributed information space, emphasizing the unambiguous identity of data objects and the separation between data storage, data characterization, and schema development.
The use of non-semantic UUIDs combined with a simple attribute–value model draws a clear distinction between identity and data.
Although u-forms share certain design characteristics with serialization formats such as
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
, they should not be confused with such representational formats. Since u-forms are abstract, they do not specify any particular representational format. Indeed, they may be stored as or communicated via XML or other types of serialization.
Operations
The operations defined for a u-form are similar to associative arrays:
* Set_Attribute: Bind an attribute name to a value (replacing any existing binding to that name)
* Delete_Attribute: Unbind an attribute name from a value and remove the name from the u-form
* Get_Attribute: Find the value (if any) that is bound to a name.
* List_Attributes: Find all names that have a non-empty value.
Properties
U-forms have the following properties:
* A UUID is defined as an array of bytes that is intended to be unique in the Universe.
Note that these are not limited to the standards for
ISO, Microsoft, or DCE UUIDs though those are examples of acceptable sources of UUIDs.
* Attribute names are
case-folded and
normalized strings of
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
characters
* Values are arbitrary-length arrays of bytes (
BLOBs, though not necessary "large")
* Each attribute has only one value (though the bytes may be interpreted to represent a vector of data)
* The number of attribute–value pairs is arbitrary and extensible at any time
* The attribute–value pairs are treated as a set (i.e., they are unordered)
Copying vs replication
An important characteristic of u-forms, of significance to
distributed database systems, is that they support a clear distinction between copying and replication of data objects. Copying a u-form involves the creation of a new u-form (i.e., one with a different UUID), but with all attribute–value pairs identical to those of the original u-form. Replicating a u-form involves creating a new instance of the u-form with the same UUID as the original. Note that in a distributed system, two instances of the same u-form may be inconsistent (i.e., they may contain different attribute–value pairs). However, the fact that they have the same UUID means that they are intended to eventually be identical.
History
U-forms were developed at
MAYA Design
Maya may refer to:
Civilizations
* Maya peoples, of southern Mexico and northern Central America
** Maya civilization, the historical civilization of the Maya peoples
** Maya language, the languages of the Maya peoples
* Maya (Ethiopia), a populat ...
as part of the
Visage Information Visualization System, a joint project of MAYA and
Carnegie Mellon University
Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...
funded by
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military.
Originally known as the Ad ...
and
The Army Research Laboratory. The name "u-form" derives from the term "e-form", a hypothetical "electronic form" proposed by Michael Dertouzos in his 1997 book "What Will Be". In addition to their continuing use in Visage, they have been used as the basis of a number of significant research
[
] and large-scale production systems, most notably the US Army's
Command Post of the Future The United States Army's Command Post of the Future (CPOF) is a C2 software system that allows commanders to maintain topsight over the battlefield; collaborate with superiors, peers and subordinates over live data; and communicate their intent.
...
.
References
{{reflist
External links
* http://www.maya.com/portfolio/maya-universal-database
* http://www.bio-itworld.com/issues/2006/july-aug/infocommons/
* http://www.asis.org/Bulletin/Jun-07/Bulletin_JunJul07.pdf
* http://www.biotech-online.com/fileadmin/artimg/the-universal-genetics-database_-information-sharing-in-genetics-and-beyond.pdf
* https://books.google.com/books?id=oDYEAAAAMBAJ&lpg=PA20&vq=u-form&pg=PA20#v=onepage&q&f=false
Abstract data types
Unique identifiers