Data Mesh
   HOME

TheInfoList



OR:

Data mesh is a
sociotechnical Sociotechnical systems (STS) in organizational development is an approach to complex organizational work design that recognizes the interaction between people and technology in wiktionary:Workplace, workplaces. The term also refers to coherent sys ...
approach to building a decentralized data architecture by leveraging a domain-oriented, self-serve design (in a software development perspective), and borrows Eric Evans’ theory of
domain-driven design Domain-driven design (DDD) is a major software design approach, focusing on modeling software to match a domain according to input from that domain's experts. DDD is against the idea of having a single unified model; instead it divides a large s ...
and Manuel Pais’ and Matthew Skelton’s theory of team topologies. Data mesh mainly concerns itself with the data itself, taking the
data lake A data lake is a system or data repository, repository of data stored in its natural/raw format, usually object binary large object, blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor ...
and the pipelines as a secondary concern. The main proposition is scaling analytical data by domain-oriented decentralization. With data mesh, the responsibility for analytical data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform. This enables a decrease in data disorder or the existence of isolated
data silos An information silo, or a group of such silos, is an insular management system in which one information system or subsystem is incapable of reciprocal operation with others that are, or should be, related. Thus information is not adequately shared ...
, due to the presence of a centralized system that ensures the consistent sharing of fundamental principles across various nodes within the data mesh and allows for the sharing of data across different areas.


History

The term ''data mesh'' was first defined by Zhamak Dehghani in 2019 while she was working as a principal consultant at the technology company Thoughtworks. Dehghani introduced the term in 2019 and then provided greater detail on its principles and logical architecture throughout 2020. The process was predicted to be a “big contender” for companies in 2022. Data meshes have been implemented by companies such as
Zalando Zalando SE is a publicly traded international online retailer based in Berlin which is active across Europe and specializes in shoes, fashion and beauty products. The company was founded in 2008 by David Schneider and Robert Gentz and has more th ...
,
Netflix Netflix is an American subscription video on-demand over-the-top streaming service. The service primarily distributes original and acquired films and television shows from various genres, and it is available internationally in multiple lang ...
,
Intuit Intuit Inc. is an American multinational business software company that specializes in financial software. The company is headquartered in Mountain View, California, and the CEO is Sasan Goodarzi. Intuit's products include the tax preparati ...
,
VistaPrint Vistaprint is a global e-commerce company that produces physical and digital marketing products for small businesses. Vistaprint was one of the first businesses to offer its customers the capabilities of desktop publishing through the internet ...
,
PayPal PayPal Holdings, Inc. is an American multinational financial technology company operating an online payments system in the majority of countries that support E-commerce payment system, online money transfers; it serves as an electronic alter ...
and others. In 2022, Dehghani left Thoughtworks to found Nextdata Technologies to focus on decentralized data.


Principles

Data mesh is based on four core principles: *
domain A domain is a geographic area controlled by a single person or organization. Domain may also refer to: Law and human geography * Demesne, in English common law and other Medieval European contexts, lands directly managed by their holder rather ...
ownership; * data as a product; * self-serve data platform; * federated computational governance. In addition to these principles, Dehghani writes that the data products created by each domain team should be discoverable, addressable, trustworthy, possess self-describing semantics and syntax, be interoperable, secure, and governed by global standards and access controls. In other words, the data should be treated as a product that is ready to use and reliable.


In practice

After its introduction in 2019 multiple companies started to implement a data mesh and share their experiences. Challenges (C) and best practices (BP) for practitioners, include: ; C1. Federated data governance: Companies report difficulties to adopt a federated governance structure for activities and processes that were previously centrally owned and enforced. This is especially true for security, privacy, and regulatory topics. ;C2. Responsibility shift: In data mesh individuals within domains are end-to-end responsible for data products. This new responsibility can be challenging, because it is rarely compensated and usually benefits other domains. ; C3. Comprehension: Research has shown a severe lack of comprehension for the data mesh paradigm among employees of companies implementing a data mesh. ; BP1. Cross-domain unit: Addressing C1, organizations should introduce a cross-domain steering unit responsible for strategic planning, use case prioritization, and the enforcement of specific governance rules—especially concerning security, regulatory, and privacy-related topics. Nevertheless, a cross-domain steering unit can only complement and support the federated governance structure and may grow obsolete with the increasing maturity of the data mesh. ; BP2. Track and observe: Addressing C2., organizations should observe and score data product quality as tracking and ranking key data products can encourage high-quality offerings, motivate domain owners, and support budget negotiations. ; BP3. Conscious adoption: Organizations should thoroughly assess and evaluate their existing data systems, consider organizational factors, and weigh the potential benefits before implementing a data mesh. When introducing data mesh, it is advised to carefully and consciously introduce data mesh terminology to ensure a clear understanding of the concept (C3).


Community

Scott Hirleman has started a data mesh community that contains over 7,500 people in their Slack channel.


See also

* Data product *
Data management Data management comprises all disciplines related to handling data as a valuable resource, it is the practice of managing an organization's data so it can be analyzed for decision making. Concept The concept of data management emerged alongsi ...
* Data platform *
Data vault modeling Datavault or data vault modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues ...
, method of data modeling with storage of data from various operational systems and tracing of data origin, facilitating auditing, loading speeds and resilience *
Data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
, a well established type of database system for organizing data in a thematic way * ETL and ELT


References

{{Data warehouses Databases