We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Data mesh is a hot topic in the data and analytics community. Introduced in 2020 by Zhamak Dehghani in her paper “Data Mesh Principles and Logical Architecture”, data mesh is a new distributed model for organizing analytics teams to deliver data products and is meant to address the challenges of both centralized and decentralized data. But is this approach truly the best approach for today’s enterprises?
Organization models for analytics
Over the years, we’ve seen both centralized and decentralized organizational models for delivering analytics to the business. While both models have their advantages, each has some severe drawbacks that make them inadequate for meeting the needs of today’s data-hungry consumers.
1. Centralized model
The data warehouse allows enterprises to store data in a single, curated location so, in theory, everyone can find and query their data with confidence. With central control over the data platform and standards, data can be defined consistently and delivered reliably.
In practice, however, there’s a few big problems with this approach. First, the data has to be so carefully curated and loaded, that only IT has the required skills to build the data warehouse. This sets up IT to be a bottleneck for integrating new data. Second, since the IT team typically doesn’t understand the business, they struggle to translate business requirements into technical requirements — and therefore exacerbate the bottleneck, frustrating their customers. Finally, business users struggle to parse through thousands of data warehouse tables, making the centralized data warehouse appealing to only the most sophisticated users.
2. Decentralized model
Driven by end-user frustration and the explosion in popularity of visualization tools like Tableau, business users have taken matters into their own hands with a decentralized approach. Instead of waiting for IT to deliver data, business users have created their own data extracts, data models and reports. By decentralizing data preparation, business users broke free from IT and avoided the “lost in translation” issue associated with the centralized, IT-led approach.
In practice, however, this approach, like the centralized approach, also introduced some major challenges. First, with a lack of control over business definitions, business users created their own versions of reality with every dashboard they authored. As a result, competing business definitions and results destroyed management’s confidence and trust in analytics outputs. Second, the decentralized approach drove a proliferation of competing and often incompatible platforms and tooling, making integrating analytics across business units difficult or impossible.
The data mesh
Data mesh is meant to address the challenges of both models. It accepts that today’s data is distributed and allows all users in an organization to access and analyze business insights from virtually any data source, without the intervention from expert data teams. It is based more on people and organization than technology, which is why it is so compelling. The distributed architecture of a mesh decentralizes the ownership of every business domain. This means every domain has control over the quality, privacy, freshness, accuracy and compliance of data for analytical and operational use cases.
The data mesh approach, however, advocates for a fully decentralized organizational model by abolishing the centralized team altogether. I’d like to suggest an alternative to this approach that introduces a center of excellence to make a decentralized model of data management viable for most enterprises.
Hub-and-spoke model: An alternative to data mesh
It’s clear that neither approach, centralized or decentralized, can deliver agility and consistency at the same time. These goals are in conflict. There is a model, however, that can deliver the best of both worlds if implemented with proper tooling and processes.
The “hub-and-spoke” model is an alternative to the data mesh architecture with some critical differences. Namely, the hub-and-spoke model introduces a central data team, or center of excellence (the “hub”). This team owns the data platform, tooling and process standards whereas the business domain teams (the “spokes”) own the data products for their domains. This approach solves the “anything goes” phenomenon of the decentralized model, while empowering subject matter experts (SMEs), or data stewards, to autonomously create data products that meet their needs.
The critical link: The data model
Supporting a decentralized, hub-and-spoke model for creating data products requires that teams speak a common data language, and it’s not SQL. What’s needed is a logical way of defining data relationships and business logic that’s separate and distinct from the physical representation of the data. A semantic data model is an ideal candidate to serve as the Rosetta Stone for disparate data domain teams because it can be used to create a digital twin of the business by mapping physical data into business-friendly terms. Domain experts can encode their business knowledge into digital form for others to query, connect and enhance.
For this approach to work at scale, it’s critical to implement a common semantic layer platform that supports data model sharing, conformed dimensions, collaboration and ownership. With a semantic layer, the central data team (hub) can define common models and conformed dimensions (i.e., time, product, customer) while the domain experts (spokes) own and define their business process models (i.e., “billing,” “shipping,” “lead gen”). With the ability to share model assets, business users can combine their models with models from other domains to create new mashups for answering deeper questions.
The hub-and-spoke model succeeds because it plays to the strengths of the centralized and business domain teams: the centralized team owns and operates the technical platform and publishes shared models, while the business teams create domain-specific data products using a consistent set of business definitions and without the need for understanding other domains’ business models.
How to get there
Moving to a hub-and-spoke model for delivering data products doesn’t need to be disruptive. There are two paths to success, depending on your existing model for analytics delivery.
If your current analytics organization is centralized, the central team and business teams should collectively identify key data domains, assign data stewardship and embed an analytics engineer into each. The analytics engineer may come from the central team or the business team. Using a semantic layer platform, the embedded analytics engineer can work inside the business domain team to create data models and data products for that domain. The embedded analytics engineer works with the central data team to set standards for tooling and process while identifying common models.
If your current organization is decentralized, you can create a central data team to establish standards for tooling and process. In addition to managing the semantic layer platform and its shared objects and models, the central data team may manage data pipelines and data platforms shared by the domain teams.
Building for scale
The optimal organizational model for analytics will depend on your organization’s size and maturity. However, it’s never too early to build for scale. No matter how small, investing in a hub-and-spoke, decentralized model for creating data products will pay dividends now and in the future. By promoting data stewardship and ownership by domain experts, using a common set of tools and semantic definitions, your entire organization will be empowered to create data products at scale.
David P. Mariani is CTO and cofounder of AtScale, Inc.