A machine-readable document is a
document whose content can be readily processed by
computer
A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
s. Such documents are distinguished from
machine-readable data
Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data.
Attempts to create machine-readable data occurred as early as the 1960s. At the same time tha ...
by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created.
Definition
Data without
context (language use)
In semiotics, linguistics, sociology and anthropology, context refers to those objects or entities which surround a ''focal event'', in these disciplines typically a communicative event, of some kind. Context is "a frame that surrounds the event a ...
is meaningless and lacks the four essential characteristics of trustworthy
business record
A business record is a document (hard copy or digital) that records an "act, condition, or event" related to business. Business records include meeting minutes, memoranda, employment contracts, and accounting source documents.
It must be retriev ...
s specified in
ISO 15489 Information and documentation -- Records management ISO 15489 Information and documentation—Records management is an international standard for the management of business records, consisting of two (2) parts: Part 1: Concepts and principles and Part 2: Guidelines. ISO 15489 is the first standard d ...
:
* Reliability
* Authenticity
* Integrity
*
Usability
Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a soft ...
The vast bulk of information is
unstructured data and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of the
Capability Maturity Model The Capability Maturity Model (CMM) is a development model created in 1986 after a study of data collected from organizations that contracted with the U.S. Department of Defense, who funded the research. The term "maturity" relates to the degree of ...
. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited for
records management
Records management, also known as records and information management, is an organizational function devoted to the information management, management of information in an organization throughout its records life-cycle, life cycle, from the time of ...
functions, provides inadequate
evidence
Evidence for a proposition is what supports this proposition. It is usually understood as an indication that the supported proposition is true. What role evidence plays and how it is conceived varies from field to field.
In epistemology, evidenc ...
for legal purposes, drives up the cost of
discovery
Discovery may refer to:
* Discovery (observation), observing or finding something unknown
* Discovery (fiction), a character's learning something unknown
* Discovery (law), a process in courts of law relating to evidence
Discovery, The Discovery ...
in
litigation
-
A lawsuit is a proceeding by a party or parties against another in the civil court of law. The archaic term "suit in law" is found in only a small number of laws still in effect today. The term "lawsuit" is used in reference to a civil actio ...
, and makes access and usage needlessly cumbersome in routine, ongoing
business processes.
There are at least four aspects to machine-readability:
* First, words or phrases should be discretely delineated (tagged) so that computer software and/or hardware logic can be applied to them as individual conceptual elements.
* Second, the semantics of each element should be specified so that computers can help human beings achieve a common understanding of their meanings and potential usages.
* Third, if the relationships among the individual elements are also specified, computers can automatically apply inferences to them, thereby further relieving human beings of the burden of trying to understand them, particularly for purposes of inquiry, discovery, and analysis.
* Fourth, if the structures of the documents in which the elements occur are also specified, human understanding is further enhanced and the data becomes more reliable for legal and business-quality purposes.
As early as 1983, the U.S.
Government Accountability Office
The U.S. Government Accountability Office (GAO) is a legislative branch government agency that provides auditing, evaluative, and investigative services for the United States Congress. It is the supreme audit institution of the federal govern ...
(GAO) began emphasizing the benefits of machine-readable information. Still sooner, in 1981, GAO began reporting on the problem of inadequate record-keeping practices in the
U.S. federal government
The federal government of the United States (U.S. federal government or U.S. government) is the Federation#Federal governments, national government of the United States, a federal republic located primarily in North America, composed of 50 ...
. Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means. However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements.
Moreover, more than two decades after a major and formerly highly respected auditing firm,
Arthur Andersen, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election.
On January 4, 2011, President Obama signed H.R. 2142, the
Government Performance and Results Act (GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format.
Additionally, in 2013, he issued
Executive Order 13642, Making Open and Machine Readable the New Default for Government Information in general.
On July 28, 2016, the
Office of Management and Budget (OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats, and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes", meaning that the information is both publicly accessible and machine-readable. On January 14, 2019, President Trump signed into law H.R. 4174, the
OPEN Government Data Act
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gotthard album), 1999
* ''Open'' (Cowboy Junkies album), 2001
* ''Open'' (Y ...
(OGDA), which codifies in law the requirement for agencies to make their public data assets available in machine-readable format. On June 28, 2019, in Circular A-11, OMB expressed intent to begin complying with section 10 of GPRAMA.
In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records.
Document-oriented databases have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (
XML) is a World Wide Web Consortium (
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
)
Recommendation setting forth rules for encoding documents in a format that is both
human-readable and machine-readable. Many
XML editor tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so.
The W3C's accompanying XML Schema (
XSD) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, the
(OASIS) is a leading
standards-developing organization
A standards organization, standards body, standards developing organization (SDO), or standards setting organization (SSO) is an organization whose primary function is developing, coordinating, promulgating, revising, amending, reissuing, interpr ...
. However, many technical developers prefer to work with
JSON
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
, and to define the structure of JSON data for validation, documentation, and interaction control,
JSON Schema
JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and Electronic data interchange, data interchange format that uses Human-readable medium, human-readable text to store and transmit data objects consisting of ...
was developed by the
Internet Engineering Task Force (IETF).
The
Portable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it.
PDF/A
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, ...
is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, including
XML, into
PDF/A
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, ...
conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C's
XSL-FO
XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and ...
(XSL Formatting Objects)
markup language
Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
is commonly used to generate PDF files
Metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can be
repurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk of
spoliation of evidence
Tampering with evidence, or evidence tampering, is an act in which a person alters, conceals, falsifies, or destroys evidence with the intent to interfere with an investigation (usually) by a law-enforcement, governmental, or regulatory authority. ...
and other fraudulent manipulations of records. Moreover, such records can be used to automate the process of
audit
An audit is an "independent examination of financial information of any entity, whether profit oriented or not, irrespective of its size or legal form when such an examination is conducted with a view to express an opinion thereon.” Auditing ...
ing data maintained in
databases, thereby reducing the risk of single points of failure associated with the
Machiavellian
Machiavellianism or Machiavellian may refer to:
Politics
*Machiavellianism (politics), the supposed political philosophy of Niccolò Machiavelli
*Political realism
Psychology
*Machiavellianism (psychology), a personality trait centered on cold an ...
concept of a
single source of truth
In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas su ...
.
Blockchain (database) is a new technology for maintaining continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is no
single point of failure subject to manipulation and
fraud
In law, fraud is intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover monetary compens ...
.
See also
*
Budapest Declaration on Machine Readable Travel Documents
The Budapest Declaration on Machine Readable Travel Documents is a declaration issued by the Future of Identity in the Information Society (FIDIS), a Network of Excellence, to raise the concern to the public to the risks associated by a security ar ...
*
Comparison of XML editors
*
Four corners (law)
The Four Corners Rule is a legal doctrine that courts use to determine the meaning of a written instrument such as a contract, will, or deed as represented solely by its textual content. The doctrine states that where there is an ambiguity of term ...
*
Integrity and particularly
Data integrity
*
Linked data
*
Machine-readable passport
A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format. Many countries began to issue machine-readable travel documents in the 1980s.
Mos ...
*
Markup language
Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
*
Open data
*
Reliability (statistics),
Data integrity,
Reliability (computer networking), and
Reliability (research methods)
*
Strategy Markup Language (StratML)
*
Structured document
*
Tag (metadata)
*
Universal Business Language
Universal Business Language (UBL) is an open library of standard electronic XML business documents for procurement and transportation such as purchase orders, invoices, transport logistics and waybills. UBL was developed by an OASIS Technical Com ...
(UBL)
*
XBRL (eXtensible Business Reporting Language)
References
{{reflist
External links
OMB M-13-13 Open Data Policy: Managing Information as an Asset, which requires agencies to use open, machine-readable, data format standards
January 2005, which outlines the characteristics of trustworthy records.
Driving a Stake in the Heart of the Capone Consultancy Method of Records Management: Best Practices for Correcting Non-Records Non-Policy Nonsense March 9, 2015
* The U.S. Code, which includes the term "machine-readable
over 50 timesas of September 10, 2016
__notoc__
Data management
Records management