We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Data can be a company’s most valued asset — it can even be more valuable than the company itself. But if the data is inaccurate or constantly delayed because of delivery problems, a business cannot properly utilize it to make well-informed decisions.
Having a solid understanding of a company’s data assets isn’t easy. Environments are changing and becoming increasingly complex. Tracking the origin of a dataset, analyzing its dependencies and keeping documentation up to date are all resource-intensive responsibilities.
This is where data operations (dataops) come in. Dataops — not to be confused with its cousin, devops — began as a series of best practices for data analytics. Over time, it evolved into a fully formed practice all on its own. Here’s its promise: Dataops helps accelerate the data lifecycle, from the development of data-centric applications up to delivering accurate business-critical information to end-users and customers.
Dataops came about because there were inefficiencies within the data estate at most companies. Various IT silos weren’t communicating effectively (if they communicated at all). The tooling built for one team — that used the data for a specific task — often kept a different team from gaining visibility. Data source integration was haphazard, manual and often problematic. The sad result: The quality and value of the information delivered to end-users were below expectations or outright inaccurate.
While dataops offers a solution, those in the C-suite may worry it could be high on promises and low on value. It can seem like a risk to upset processes already in place. Do the benefits outweigh the inconvenience of defining, implementing and adopting new processes? In my own organizational debates I have on the topic, I often cite and reference the Rule of Ten. It costs ten times as much to complete a job when data is flawed than when the information is good. Using that argument, dataops is vital and well worth the effort.
You may already use dataops, but not know it
In broad terms, dataops improves communication among data stakeholders. It rids companies of its burgeoning data silos. dataops isn’t something new. Many agile companies already practice dataops constructs, but they may not use the term or be aware of it.
Dataops can be transformative, but like any great framework, achieving success requires a few ground rules. Here are the top three real-world must-haves for effective dataops.
1. Commit to observability in the dataops process
Observability is fundamental to the entire dataops process. It gives companies a bird’s-eye view across their continuous integration and continuous delivery (CI/CD) pipelines. Without observability, your company can’t safely automate or employ continuous delivery.
In a skilled devops environment, observability systems provide that holistic view — and that view must be accessible across departments and incorporated into those CI/CD workflows. When you commit to observability, you position it to the left of your data pipeline — monitoring and tuning your systems of communication before data enters production. You should begin this process when designing your database and observe your nonproduction systems, along with the different consumers of that data. In doing this, you can see how well apps interact with your data — before the database moves into production.
Monitoring tools can help you stay more informed and perform more diagnostics. In turn, your troubleshooting recommendations will improve and help fix errors before they grow into issues. Monitoring gives data pros context. But remember to abide by the “Hippocratic Oath” of Monitoring: First, do no harm.
If your monitoring creates so much overhead that your performance is reduced, you’ve crossed a line. Ensure your overhead is low, especially when adding observability. When data monitoring is viewed as the foundation of observability, data pros can ensure operations proceed as expected.
2. Map your data estate
You must know your schemas and your data. This is fundamental to the dataops process.
First, document your overall data estate to understand changes and their impact. As database schemas change, you need to gauge their effects on applications and other databases. This impact analysis is only possible if you know where your data comes from and where it’s going.
Beyond database schema and code changes, you must control data privacy and compliance with a full view of data lineage. Tag the location and type of data, especially personally identifiable information (PII) — know where all your data lives and everywhere it goes. Where is sensitive information stored? What other apps and reports does that data flow across? Who can access it across each of those systems?
3. Automate data testing
The widespread adoption of devops has brought about a common culture of unit testing for code and applications. Often overlooked is the testing of the data itself, its quality and how it works (or doesn’t) with code and applications. Effective data testing requires automation. It also requires constant testing with your newest data. New data isn’t tried and true, it’s volatile.
To assure you have the most stable system available, test using the most volatile data you have. Break things early. Otherwise, you’ll push inefficient routines and processes into production and you’ll get a nasty surprise when it comes to costs.
The product you use to test that data — whether it’s third-party or you’re writing your scripts on your own — needs to be solid and it must be part of your automated test and build process. As the data moves through the CI/CD pipeline, you should perform quality, access and performance tests. In short, you want to understand what you have before you use it.
Dataops is vital to becoming a data business. It’s the ground floor of data transformation. These three must-haves will allow you to know what you already have and what you need to reach the next level.
Douglas McDowell is the general manager of database at SolarWinds.