To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
Data quality, a subset of data intelligence, is a topic that many enterprise executives are concerned about — with 82% citing data quality as a barrier for their businesses. With many data quality solutions with different approaches available in the market, how do you choose?
Alation’s CEO and cofounder Satyen Sangani said that today’s announcement of its Alation Open Data Quality Initiative (ODQI) for the modern data stack is designed to provide customers with the freedom of choice and flexibility when selecting the best data quality and data observability vendors to fit the needs of their modern, data-driven organizations.
Alation’s Open Data Quality Framework (ODQF) opens up Alation Data Catalog to any data quality vendor in the data management ecosystem and modern data stack. Initially, data quality and data observability providers such as Acceldata, Anomalo, Bigeye, Experian, FirstEigen, Lightup and Soda have joined, as well as industry partners including Capgemini and Fivetran.
Some of those were Alation’s partners already, while others are new and drawn to the idea of having a standard to coalesce around. The company hopes ODQF will rise to become the de facto standard.
From data catalogs to data intelligence
Sangani, who has a background in economics and stints in financial analytics and product management at Oracle, cofounded Alation in 2012. However, the company stayed in stealth until 2015, working with a handful of customers to define what the product and what the company was really out to achieve and for whom.
Sangani’s experience informed Alation’s approach, too. He said that selling large-scale packages to big companies to help them analyze their data resulted in the companies not really understanding the data themselves:
“Two years, hundreds of millions of dollars would be spent … and often a lot of that time was spent locating which systems have the right data, how the data was used, what the data meant,” Sangani said. “Often there were multiple copies of the data and conflicting records. And the people who understand the systems and the data models were often outside of the company.”
The realization was that data modeling, schemas and the like presented more of a knowledge management problem than a technical problem. Sangani says he believes it incorporates aspects of human psychology as well as a didactic aspect, in terms of enabling and teaching people how to use quantitative reasoning and thinking.
Over time, Alation’s trajectory has been associated with a number of terms and categories. The most prominent among them included metadata management, data governance and data cataloging. However, today Sangani says these three are all coming together in a broader market space: what was originally identified by IDC as data intelligence.
For a couple of years after Alation’s launch in 2015, the company was trying to create the data catalog category, which was new to many,according to Sangani. Then, other players from metadata management and data governance also started to converge on building a data catalog.
In parallel, the timeline from 2012 to today also includes developments on the technology side, such as the democratization of big data via the Hadoop ecosystem, as well as the enactment of regulation such as HIPAA and GDPR. All of those played into the need to create inventories focused on facilitating data use by people, which Alation sees as a competitive differentiator.
Alation as a platform for data quality
For Alation, the data catalog is the platform for the broader data intelligence category. Sangani says data intelligence has many components: master data management, privacy data management, reference data management, data transformation, data quality, data observability and more. Alation’s strategy isn’t to “own one box of every single one of these things,” as Sangani put it.
“The real problem in this space isn’t whether or not you have the capability to tag data. The biggest problem is engagement and adoption. Most people don’t use data properly. Most people don’t have an understanding of what data exists. Most people don’t engage with the data. Most of the data is under-documented,” Sangani said.
“The idea of the data catalog is really all about engaging people into the data sets. But if that’s our strategy, to focus on engagement and adoption, that means that there are some things that strategically we’re not doing,” he said. “What we’re not doing is building a data quality solution. What we’re not doing is building a data observability solution or a master data management solution.”
Alation considered expanding its offering in the data quality market, but decided against it. It’s a fast-moving, densely populated market and approaches taken by solutions can vary greatly. Sangani said that Alation doesn’t have a massive competitive differentiation outside the information in its data catalog. Sangani added that sharing can turn Alation into a platform for data quality and that’s what the Open Data Quality Initiative aims to achieve.
However, whether standards live or die is really driven by customer adoption, Sangani said. This initiative is a follow-up to Alation’s Open Connector framework, which allows third parties to build connectors for metadata for any data system.
Plumbing as the foundation for value-add applications
Sangani said that Alation will continue building open integrations and frameworks over time, because in the world of data management there needs to be a consistent way to share metadata. In a way, Sangani added, what Alation has been building up to is now plumbing and the ODQF is an example of more plumbing.
However, while plumbing is essential, the company has already started moving up the stack to offer value-add features. For example, leveraging natural language processing (NLP) to perform name entity recognition for recommendations or allowing people to write English language sentences and convert that into SQL to be able to perform interactive interrogation of queryable datasets.
Sangani referred to technologies such as knowledge graphs, AI and machine learning as ingredients to being able to build a more intelligent data intelligence layer.
“I’m probably more excited about what we’ll be able to do in the next five years than what we’ve done in the past five, because all of it lays the foundation for some really cool applications that we’ll start seeing in the near term,” he said.