We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
As more and more processes move online during the pandemic, businesses are adopting analytics to gain greater insight into their operations. According to 2021 survey commissioned by Starburst and Red Hat, 53% of companies believe that data access became “more critical” throughout the pandemic. The results agree with findings from ManageEngine, the IT division of Zoho, which found in a 2021 poll that more than 20% of organizations boosted their usage of business analytics compared with the global average.
Thirty-five percent of respondents to the Starburst and RedHat survey said that they’re looking to analyze real-time business risks, while 36% said that they’re seeking growth and revenue generation through “more intelligent” customer engagements. But underlining the challenges in analytics, more than 37% of respondents said that they weren’t confident in their ability to access “timely, relevant data for decision-making,” whether because of disparate storage sources or problems with developing data pipelines.
Two emerging concepts have been pitched as the answer to hurdles in data analytics and management. One is a “data fabric,” a data integration approach that includes an architecture — and services running on that architecture — to help organizations orchestrate data. The other is a “data mesh,” which aims to mitigate the challenges of data availability by providing a decentralized connectivity layer that allows companies to access data from different sources across locations.
Both data fabrics and data meshes can serve a broad array of business, technical and organizational purposes. For example, they can save data scientists time by automating repetitive data transformation tasks while powering self-service data access tools. Data fabrics and data meshes can also integrate and augment data management software already in use for increased cost-effectiveness.
A combination of technologies including AI and machine learning, data fabric is akin to a weave that stretches to connect sources of data, types and locations with methods for accessing the data. Gartner describes it as analytics over “existing, discoverable and inferenced metadata assets” to support the “design, deployment and utilization” of data across local, edge and data center environments.
Data fabric continuously identifies, connects, cleanses and enriches real-time data from different applications to discover relationships between data points. For example, a data fabric might monitor various data pipelines — the set of actions that ingest raw data from a source and move it to a destination — to suggest better alternatives before automating the most repeatable tasks. A data fabric might also “heal” failed data integration jobs, handle more complicated data management aspects like creating — and profiling — datasets and offer ways to govern and secure data by limiting who can access what data and infrastructure.
To uncover the relationships between data, a data fabric builds a graph that stores interlinked descriptions of data such as objects, events, situations and concepts. Algorithms can use this graph for different businesses analytics purposes, like making predictions and surfacing previously-hard-to-find dataset stores.
As K2 View, a data fabric solutions vendor, explains: “The data fabric continually provisions … data based on a 360-view of business entities, such as a certain segment of customers, a line of company products or all retail outlets in a specific geography … Using this data, data scientists create and refine machine learning models, while data analysts use business intelligence to analyze trends, segment customers and perform root-cause analysis. The refined machine learning model is deployed into the data fabric, to be executed in real-time for an individual entity (customer, product, location, etc.) — thus ‘operationalizing’ the machine learning algorithm. The data fabric executes the machine learning model on demand, in real time, feeding it the individual entity’s complete and current data. The machine learning output is instantly returned to the requesting application and persisted in the data fabric, as part of the entity, for future analysis.”
Data fabrics often work with a range of data types including technical, business and operational data. In the ideal scenario, they’re also compatible with many different data delivery “styles” like replication, streaming and virtualization. Beyond this, the best data fabric solutions provide robust visualization tools that make their technical infrastructure easy to interpret, enabling companies to monitor storage costs, performance and efficiency — plus security — regardless of where their data and applications live.
In addition to analytics, a data fabric affords a number of advantages to organizations including minimizing disruptions from switching between cloud vendors and compute resources. Data fabric also allows enterprises — and the data analysis, sales, marketing, network architects and security teams working at them — to adapt their infrastructure based on changing technology needs, connecting infrastructure endpoints regardless of the location of data.
In a 2020 report, Forrester found that IBM’s data fabric solution could accelerate data delivery by 60 times while leading to a 459% increase in returns on investment. But data fabric has its downsides — chief among them implementation complexity. For example, data fabrics require exposing and integrating different data and systems, which can often format data differently. This lack of native interoperability can add friction like the need to harmonize and deduplicate data.
On the other hand, there’s a data mesh, which breaks large enterprise data architectures into subsystems managed by a dedicated team. Unlike a data fabric, which relies on metadata to drive recommendations for things like data delivery, data meshes leverage the expertise of subject-matter experts who oversee “domains” within the mesh.
“Domains” are independently deployable clusters of related microservices that communicate with users or other domains through different interfaces. Microservices are composed of many loosely coupled and independently deployable smaller services.
Domains usually include code, workflows, a team and a technical environment and teams working within domains treat data as a product. Clean, fresh and complete data is delivered to any data consumer based on permissions and roles, while “data products” are created to be used for a specific analytical and operational purpose.
To add value to a data mesh, engineers must develop a deep understanding of datasets. They become responsible for servicing data consumers and organizing around the domain — i.e., testing, deploying, monitor and maintaining the domain. Beyond this, they must ensure that different domains remain connected by a layer of interoperability and consistent data governance, standards and observability.
Data meshes promote decentralization, on the plus side, enabling teams to focus on specific sets of problems. They can also bolster analytics by leading with business context instead of jargony, technical knowledge.
But data meshes have their downsides. For example, domains can unwittingly duplicate data — wasting resources. The distributed structure of data meshes can — if the data mesh isn’t sufficiently infrastructure-agnostic — require more technical experts to scale than centralized approaches. And technical debt can increase as domains create their own data pipelines.
Using data meshes and fabrics
When weighing the pros and cons, it’s important to keep in mind that data mesh and data fabric are concepts — not technologies — and aren’t mutually exclusive. An organization can adopt both a data mesh and data fabric approach across certain, or all, departments as appropriate. To James Serra, previously a big data and data warehousing solution architect at Microsoft, the difference between the two concepts lies in which users are accessing data.
“A data fabric and a data mesh both provide an architecture to access data across multiple technologies and platforms, but a data fabric is technology-centric, while a data mesh focuses on organizational change,” he writes in a blog post (via Datanami). “[A] data mesh is more about people and process than architecture, while a data fabric is an architectural approach that tackles the complexity of data and metadata in a smart way that works well together.”
Eckerson Group analyst David Wells cautions against obsessing over the differences, which he argues are far less important than the components that must be in place to achieve the sought-after business objectives. “They are architectural frameworks, not architectures,” Wells writes in a recent blog post (also via Datanami). “You don’t have architecture until the frameworks are adapted and customized to your needs, your data, your processes and your terminology.”
That’s all to say that data fabrics and data meshes will remain equally relevant for the foreseeable future. While each involves different elements, they’re toward the same goal of bringing greater analytics to an organization with a sprawling — and growing — data infrastructure.