Data Mesh
A guide to Data Mesh, the decentralized sociotechnical approach to analytical data architecture that distributes data ownership to domain teams, treating data as a product with federated governance and shared platform infrastructure.
The Centralized Team Bottleneck
As organizations scale their data initiatives, a recurring failure pattern emerges: a centralized data engineering team becomes a bottleneck for all analytical data needs. Every new dataset request, schema change, or pipeline modification must go through the central team’s backlog. Domain teams with deep business context wait weeks for data engineering support. The central team, trying to serve all domains simultaneously, lacks the business context to model data correctly and maintains pipelines for systems they don’t understand.
Data Mesh, introduced by Zhamak Dehghani in 2019, proposes a fundamentally different organizational and architectural approach. Rather than centralizing data ownership in a single team, Data Mesh distributes data ownership to the domain teams who understand the data best, empowers them to build and serve their own data products, and provides shared platform infrastructure that makes self-service data product creation practical.
Data Mesh is a sociotechnical approach: it addresses both the organizational structure (who owns what) and the technical architecture (how data products are built and accessed) simultaneously. Technical architecture changes without organizational changes (or vice versa) deliver partial benefits at best.
The Four Principles of Data Mesh
1. Domain Ownership: Data is owned by the business domain that produces and understands it. The sales domain owns sales data; the marketing domain owns marketing attribution data; the product domain owns product usage data. Domain teams are responsible for the quality, freshness, and governance of their data products, with the business context to make informed decisions about data modeling and semantics.
2. Data as a Product: Each domain’s data assets are treated as first-class products with defined interfaces, SLAs, documentation, discoverability, and quality commitments. A data product has an owner who is accountable for its reliability, a versioned schema that consumers can depend on, freshness guarantees that enable downstream planning, and quality metrics that are monitored and reported.
3. Self-Serve Data Platform: A central platform team provides the infrastructure, tooling, and standards that make it practical for domain teams to build, deploy, and operate data products without platform engineering expertise. The self-serve platform provides: a catalog for data product discovery, an Iceberg lakehouse for shared storage, governed access control through Apache Polaris, observability tooling for monitoring data product health, and query infrastructure (Dremio) for data product consumption.
4. Federated Computational Governance: Governance policies (data quality standards, security requirements, access control rules, retention policies, interoperability standards) are defined centrally by a governance team and enforced programmatically across all data products through the platform’s automated governance capabilities. Domain teams operate autonomously within the governance guardrails; they do not need individual approval for every schema change or pipeline modification.

Data Mesh in the Iceberg Lakehouse
The Iceberg REST Catalog specification and Apache Polaris provide the technical foundation for a Data Mesh implementation. Each domain operates its own namespace in a shared Polaris catalog, with RBAC policies granting the domain team write access to their namespace and read access to other domains’ data products through catalog federation.
Domain teams build Iceberg tables as data products, managed through their own dbt projects or Spark pipelines. The central platform team defines governance rules in Polaris (retention policies, row-level security templates, quality monitoring standards) that apply automatically to all tables across all domain namespaces.
Dremio’s Semantic Layer sits above the Polaris catalog, providing governed consumption interfaces (Virtual Datasets with business semantics, column masking, row-level security) for BI tools and AI agents that consume data products across all domains. The Semantic Layer is the data mesh’s consumption plane: consumers query business-ready data products without knowledge of which domain produced them or how the underlying Iceberg tables are structured.
Learn More
To dive deeper into these architectures and master the modern data ecosystem, check out the comprehensive books by Alex Merced available in our Books section.