HOME

TheInfoList



OR:

A machine-readable document is a
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
whose content can be readily processed by computers. Such documents are distinguished from
machine-readable data Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data. Attempts to create machine-readable data occurred as early as the 1960s. At the same time t ...
by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created.


Definition

Data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
without
context (language use) In semiotics, linguistics, sociology and anthropology, context refers to those objects or entities which surround a ''focal event'', in these disciplines typically a communicative event, of some kind. Context is "a frame that surrounds the event a ...
is meaningless and lacks the four essential characteristics of trustworthy business records specified in
ISO 15489 Information and documentation -- Records management ISO 15489 Information and documentation—Records management is an international standard for the management of business records, consisting of two (2) parts: Part 1: Concepts and principles and Part 2: Guidelines. ISO 15489 is the first standard d ...
: * Reliability * Authenticity * Integrity * Usability The vast bulk of information is
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, n ...
and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of the Capability Maturity Model. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited for records management functions, provides inadequate
evidence Evidence for a proposition is what supports this proposition. It is usually understood as an indication that the supported proposition is true. What role evidence plays and how it is conceived varies from field to field. In epistemology, eviden ...
for legal purposes, drives up the cost of discovery in
litigation - A lawsuit is a proceeding by a party or parties against another in the civil court of law. The archaic term "suit in law" is found in only a small number of laws still in effect today. The term "lawsuit" is used in reference to a civil actio ...
, and makes access and usage needlessly cumbersome in routine, ongoing
business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
es. There are at least four aspects to machine-readability: * First, words or phrases should be discretely delineated (tagged) so that computer software and/or hardware logic can be applied to them as individual conceptual elements. * Second, the semantics of each element should be specified so that computers can help human beings achieve a common understanding of their meanings and potential usages. * Third, if the relationships among the individual elements are also specified, computers can automatically apply inferences to them, thereby further relieving human beings of the burden of trying to understand them, particularly for purposes of inquiry, discovery, and analysis. * Fourth, if the structures of the documents in which the elements occur are also specified, human understanding is further enhanced and the data becomes more reliable for legal and business-quality purposes. As early as 1983, the U.S. Government Accountability Office (GAO) began emphasizing the benefits of machine-readable information. Still sooner, in 1981, GAO began reporting on the problem of inadequate record-keeping practices in the U.S. federal government. Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means. However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements. Moreover, more than two decades after a major and formerly highly respected auditing firm,
Arthur Andersen Arthur Andersen was an American accounting firm based in Chicago that provided auditing, tax advising, consulting and other professional services to large corporations. By 2001, it had become one of the world's largest multinational corporat ...
, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election. On January 4, 2011, President Obama signed H.R. 2142, the
Government Performance and Results Act The Government Performance and Results Act of 1993 (GPRA) () is a United States law enacted in 1993,Congress, U. S., and An Act. "Government Performance and Results Act of 1993." In ''103rd Congress. Congressional Record''. 1993. one of a series o ...
(GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format. Additionally, in 2013, he issued
Executive Order In the United States, an executive order is a directive by the president of the United States that manages operations of the federal government. The legal or constitutional basis for executive orders has multiple sources. Article Two of ...
13642, Making Open and Machine Readable the New Default for Government Information in general. On July 28, 2016, the
Office of Management and Budget The Office of Management and Budget (OMB) is the largest office within the Executive Office of the President of the United States (EOP). OMB's most prominent function is to produce the president's budget, but it also examines agency programs, pol ...
(OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats, and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes", meaning that the information is both publicly accessible and machine-readable. On January 14, 2019, President Trump signed into law H.R. 4174, the
OPEN Government Data Act Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * ''Open'' (Blues Image album), 1969 * ''Open'' (Gotthard album), 1999 * ''Open'' (Cowboy Junkies album), 2001 * ''Open'' (Y ...
(OGDA), which codifies in law the requirement for agencies to make their public data assets available in machine-readable format. On June 28, 2019, in Circular A-11, OMB expressed intent to begin complying with section 10 of GPRAMA. In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records. Document-oriented databases have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
) is a World Wide Web Consortium ( W3C) Recommendation setting forth rules for encoding documents in a format that is both human-readable and machine-readable. Many
XML editor An XML editor is a markup language editor with added functionality to facilitate the editing of XML. This can be done using a plain text editor, with all the code visible, but XML editors have added facilities like tag completion and menus and ...
tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so. The W3C's accompanying XML Schema ( XSD) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, the
Organization for the Advancement of Structured Information Standards The Organization for the Advancement of Structured Information Standards (OASIS; ) is a nonprofit consortium that works on the development, convergence, and adoption of open standards for cybersecurity, blockchain, Internet of things (IoT), emer ...
(OASIS) is a leading
standards-developing organization A standards organization, standards body, standards developing organization (SDO), or standards setting organization (SSO) is an organization whose primary function is developing, coordinating, promulgating, revising, amending, reissuing, interpr ...
. However, many technical developers prefer to work with
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
, and to define the structure of JSON data for validation, documentation, and interaction control, JSON Schema was developed by the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and ...
(IETF). The
Portable Document Format Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
(PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it.
PDF/A PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archivi ...
is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, including
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. ...
, into
PDF/A PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archivi ...
conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C's XSL-FO (XSL Formatting Objects)
markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
is commonly used to generate PDF files Metadata, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can be repurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk of spoliation of evidence and other fraudulent manipulations of records. Moreover, such records can be used to automate the process of
audit An audit is an "independent examination of financial information of any entity, whether profit oriented or not, irrespective of its size or legal form when such an examination is conducted with a view to express an opinion thereon.” Auditing ...
ing data maintained in
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
s, thereby reducing the risk of single points of failure associated with the Machiavellian concept of a single source of truth. Blockchain (database) is a new technology for maintaining continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is no
single point of failure A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software appl ...
subject to manipulation and
fraud In law, fraud is intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover monetary compen ...
.


See also

*
Budapest Declaration on Machine Readable Travel Documents The Budapest Declaration on Machine Readable Travel Documents is a declaration issued by the Future of Identity in the Information Society (FIDIS), a Network of Excellence, to raise the concern to the public to the risks associated by a security ar ...
* Comparison of XML editors *
Four corners (law) The Four Corners Rule is a legal doctrine that courts use to determine the meaning of a written instrument such as a contract, will, or deed as represented solely by its textual content. The doctrine states that where there is an ambiguity of term ...
*
Integrity Integrity is the practice of being honest and showing a consistent and uncompromising adherence to strong moral and ethical principles and values. In ethics, integrity is regarded as the honesty and truthfulness or accuracy of one's actions. In ...
and particularly
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
* Linked data *
Machine-readable passport A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format. Many countries began to issue machine-readable travel documents in the 1980s. Mos ...
*
Markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
*
Open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
*
Reliability (statistics) In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:"It is the characteristic of a set of test scores that ...
,
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
,
Reliability (computer networking) In computer networking, a reliable protocol is a communication protocol that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for assurance, which is the term used by the ...
, and Reliability (research methods) * Strategy Markup Language (StratML) * Structured document *
Tag (metadata) In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found ag ...
* Universal Business Language (UBL) *
XBRL XBRL (eXtensible Business Reporting Language) is a freely available and global framework for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based and ...
(eXtensible Business Reporting Language)


References

{{reflist


External links


OMB M-13-13
Open Data Policy: Managing Information as an Asset, which requires agencies to use open, machine-readable, data format standards

January 2005, which outlines the characteristics of trustworthy records.
Driving a Stake in the Heart of the Capone Consultancy Method of Records Management: Best Practices for Correcting Non-Records Non-Policy Nonsense
March 9, 2015 * The U.S. Code, which includes the term "machine-readable
over 50 times
as of September 10, 2016 __notoc__ Data management Records management