How observability designed for data teams can unlock the promise of DataOps

Check out all the on-demand sessions from the Intelligent Security Summit here.


These days, it’s no exaggeration to say that every company is a data company. And if they’re not, they need to be. That’s why more organizations are investing in the modern data stack (think: Databricks and Snowflake, Amazon EMR, BigQuery, Dataproc).

However, these new technologies and the increasing business-criticality of their data initiatives introduce significant challenges. Not only must today’s data teams deal with the sheer volume of data being ingested on a daily basis from a wide array of sources, but they must also be able to manage and monitor the tangle of thousands of interconnected and interdependent data applications. 

The biggest challenge comes down to managing the complexity of the intertwined systems that we call the modern data stack. And as anyone who has spent time in the data trenches knows, deciphering data app performance, getting cloud costs under control and mitigating data quality issues is no small task. 

When something breaks down in these Byzantine data pipelines, without a single source of truth to refer back to, the finger-pointing begins with data scientists blaming operations, operations blaming engineering, engineering blaming developers — and so forth and so on in perpetuity. 

Event

Intelligent Security Summit On-Demand

Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.

Watch Here

Is it the code? Insufficient infrastructure resources? A scheduling coordination problem? Without a single source of truth for everyone to rally around, everybody uses their own tool, working in silos. And different tools give different answers — and untangling the wires to get to the heart of the problem takes hours (even days).

Why modern data teams need a modern approach

Data teams today are facing many of the same challenges that software teams once did: A fractured team working in silos, under the gun to keep up with the accelerated pace of delivering more, faster, without enough people, in an increasingly complex environment. 

Software teams successfully tackled those obstacles via the discipline of DevOps. A big part of what enables DevOps teams to succeed is the observability provided by the new generation of application performance management (APM). Software teams are able to accurately and efficiently diagnose the root cause of problems, work collaboratively from a single source of truth, and enable developers to address problems early on — before software goes into production — without having to throw issues over the fence to the Ops team. 

So why are data teams struggling when software teams aren’t? They’re using basically the same tools to solve essentially the same problem.

Because, despite the generic similarities, observability for data teams is a completely different animal than observability for data teams. 

Cost control is critical

First off, consider that in addition to understanding a data pipeline’s performance and reliability, data teams must also grapple with the question of data quality — how can they be assured that they are feeding their analytics engines with high-quality inputs? And, as more workloads move to an assortment of public clouds, it’s also vital that teams are able to understand their data pipelines through the lens of cost.

Unfortunately, data teams find it difficult to get the information they need. Different teams have different questions they need answered, and everybody is myopically focused on solving their particular piece of the puzzle, using their own particular tool of choice, and different tools yield different answers.

Troubleshooting issues is challenging. The problem could be anywhere along a highly complex and interconnected application/pipeline for any one of a thousand reasons. And, while web app observability tools have their purpose, they were never intended to absorb and correlate the performance details buried within a modern data stack’s components or “untangle the wires” among a data application’s upstream or downstream dependencies. 

Moreover, as more data workloads migrate to the cloud, the cost of running data pipelines can quickly spiral out of control. An organization with 100,000-plus data jobs in the cloud has innumerable decisions to make about where, when, and how to run these jobs. And each decision carries a price tag. 

As organizations cede centralized control over infrastructure, it’s essential for both data engineers and FinOps to understand where the money is going and identify opportunities to reduce/control costs.

A lot of observability is hidden in plain sight

To get fine-grained insight into performance, cost, and data quality, data teams are forced to cobble together information from a variety of tools. And, as organizations scale their data stacks, the vast amount of information (and sources) makes it extraordinarily difficult to see the entirety of the data forest when you’re sitting in the trees. 

Most of the granular details needed are available — unfortunately, they’re often hidden in plain sight. Each tool provides some of the information required, but not all. What’s needed is observability that pulls together all these details and presents them in a context that makes sense and speaks the language of data teams.

Observability that is designed from the ground up specifically for data teams allows them to see how everything fits together holistically. And while there is a slew of cloud-vendor-specific, open-source, and proprietary data observability tools that provide details about one layer or system in isolation, ideally, a full-stack observability solution can stitch it all together into a workload-aware context. Solutions that leverage deep AI are further able to show not just where and why an issue exists but how it affects other data pipelines — and, finally, what to do about it.

Just like DevOps observability provides the foundational underpinnings to help improve the speed and reliability of the software development lifecycle, DataOps observability can do the same for the data application/pipeline lifecycle. But —  and this is a big but —  DataOps observability as a technology has to be designed from the ground up to meet the different needs of data teams.

DataOps observability cuts across multiple domains:

  • Data application/pipeline/model observability ensures that data analytics applications/pipelines are running on time, every time, without errors.
  • Operations observability enables data teams to understand how the entire platform is running end to end, offering a unified view of how everything is working together, both horizontally and vertically. 
  • Business observability has two parts: profit and cost. The first is about ROI and monitors and correlates the performance of data applications with business outcomes. The second part is FinOps observability, where organizations use real-time data to govern and control their cloud costs, understand where the money is going, set budget guardrails, and identify opportunities to optimize the environment to reduce costs.
  • Data observability looks at the datasets themselves, running quality checks to ensure correct results. It tracks lineage, usage, and the integrity and quality of data.

Data teams can’t be singularly focused because problems in the modern data stack are interrelated. Without a unified view of the entire data sphere, the promise of DataOps will go unfulfilled.

Observability for the modern data stack

Extracting, correlating, and analyzing everything at a foundational layer in a data team–centric, workload-aware context delivers five capabilities that are the hallmarks of a mature DataOps observability function:

  • End-to-end visibility correlates telemetry data and metadata from across the full data stack to give a unified, in-depth understanding of the behavior, performance, cost, and health of your data and data workflows. 
  • Situational awareness puts this aggregated information into a meaningful context.
  • Actionable intelligence tells you not just what’s happening but why. Next-gen observability platforms go a step further and provide prescriptive AI-powered recommendations on what to do next.
  • Everything either happens through or enables a high degree of automation.
  • This proactive capability is governance in action, where the system applies the recommendations automatically — no human intervention is needed. 

As more and more innovative technologies make their way into the modern data stack — and ever more workloads migrate to the cloud — it’s increasingly necessary to have a unified DataOps observability platform with the flexibility to comprehend the growing complexity and the intelligence to provide a solution. That’s true DataOps observability.

Chris Santiago is VP of solutions engineering for Unravel.

Originally appeared on: TheSpuzz

Scoophot
Logo