In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
and
data management
Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization's data so it can be analyzed for decision making.
Concept
The concept of data management emerged alongsi ...
, data mapping is the process of creating
data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:
# An identification such as a data element name
# A clear data element definition
# One or more representation term ...
mappings between two distinct
data models. Data mapping is used as a first step for a wide variety of
data integration tasks, including:
*
Data transformation or
data mediation between a data source and a destination
* Identification of data relationships as part of
data lineage analysis
* Discovery of hidden sensitive data such as the last four digits of a social security number hidden in another user id as part of a data masking or
de-identification project
*
Consolidation of multiple databases into a single database and identifying redundant columns of data for consolidation or elimination
For example, a company that would like to transmit and receive purchases and invoices with other companies might use data mapping to create data maps from a company's data to standardized
ANSI ASC X12 messages for items such as purchase orders and invoices.
Standards
X12 standards are generic
Electronic Data Interchange (EDI) standards designed to allow a
company
A company, abbreviated as co., is a Legal personality, legal entity representing an association of legal people, whether Natural person, natural, Juridical person, juridical or a mixture of both, with a specific objective. Company members ...
to exchange
data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
with any other company, regardless of industry. The standards are maintained by the Accredited Standards Committee X12 (ASC X12), with the
American National Standards Institute
The American National Standards Institute (ANSI ) is a private nonprofit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organiz ...
(ANSI) accredited to set standards for EDI. The X12 standards are often called
ANSI ASC X12 standards.
The
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
introduce
R2RMLas a standard for mapping data in a
relational database
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
to data expressed in terms of the
Resource Description Framework
The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
(RDF).
In the future, tools based on
semantic web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
languages such as RDF, the
Web Ontology Language
The Web Ontology Language (OWL) is a family of Knowledge representation and reasoning, knowledge representation languages for authoring Ontology (information science), ontologies. Ontologies are a formal way to describe Taxonomy, taxonomies and ...
(OWL) and standardized
metadata registry
A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.
A metadata repository is the database where metadata is stored. The registry also adds relationships with ...
will make data mapping a more automatic process. This process will be accelerated if each application performed
metadata publishing Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.
Metadata publishing is the foundation upon which a ...
. Full automated data mapping is a very difficult problem (see
semantic translation).
Hand-coded, graphical manual
Data mappings can be done in a variety of ways using procedural code, creating
XSLT transforms or by using graphical mapping tools that automatically generate executable transformation programs. These are graphical tools that allow a user to "draw" lines from fields in one set of data to fields in another. Some graphical data mapping tools allow users to "auto-connect" a source and a destination. This feature is dependent on the source and destination
data element name being the same. Transformation programs are automatically created in SQL, XSLT,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, or
C++. These kinds of graphical tools are found in most
ETL (extract, transform, and load) tools as the primary means of entering data maps to support data movement. Examples include SAP BODS and Informatica PowerCenter.
Data-driven mapping
This is the newest approach in data mapping and involves simultaneously evaluating actual data values in two data sources using heuristics and statistics to automatically discover complex mappings between two data sets. This approach is used to find transformations between two data sets, discovering substrings, concatenations,
arithmetic
Arithmetic is an elementary branch of mathematics that deals with numerical operations like addition, subtraction, multiplication, and division. In a wider sense, it also includes exponentiation, extraction of roots, and taking logarithms.
...
, case statements as well as other kinds of transformation logic. This approach also discovers data exceptions that do not follow the discovered transformation logic.
Semantic mapping
Semantic mapping is similar to the auto-connect feature of data mappers with the exception that a
metadata registry
A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.
A metadata repository is the database where metadata is stored. The registry also adds relationships with ...
can be consulted to look up data element synonyms. For example, if the source system lists ''FirstName'' but the destination lists ''PersonGivenName'', the mappings will still be made if these data elements are listed as
synonyms
A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
in the metadata registry. Semantic mapping is only able to discover exact matches between columns of data and will not discover any transformation logic or exceptions between columns.
Data lineage is a track of the life cycle of each piece of data as it is ingested, processed, and output by the analytics system. This provides visibility into the analytics pipeline and simplifies tracing errors back to their sources. It also enables replaying specific portions or inputs of the data flow for step-wise debugging or regenerating lost output. In fact, database systems have used such information, called data provenance, to address similar validation and debugging challenges already.
[De, Soumyarupa. (2012). Newt : an architecture for lineage based replay and debugging in DISC systems. UC San Diego: b7355202. Retrieved from: https://escholarship.org/uc/item/3170p7zn]
See also
*
Data integration
*
Data wrangling
*
Identity transform
*
ISO/IEC 11179
The ISO/IEC 11179 metadata registry (MDR) standard is an international International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard for representing metadata for an organization in a metadata registry ...
- The ISO/IEC Metadata registry standard
*
Metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
*
Metadata publishing Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes.
Metadata publishing is the foundation upon which a ...
*
Schema matching
*
Semantic heterogeneity
*
Semantic mapper
*
Semantic translation
*
Semantic web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
*
Semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
*
XSLT - XML Transformation Language
References
{{DEFAULTSORT:Data Mapping