Check out all the on-demand sessions from the Intelligent Security Summit here.
Today, DataStax announced that it is acquiring privately-held AI vendor Kaskada, which develops a feature engineering platform that can help organizations use data for AI applications.
Of course, effective machine learning (ML) and artificial intelligence (AI) must begin with good data, typically stored in a database for querying. Event streaming data sources is another foundation of effective ML and AI, enabling real-time data to stream from any number of different locations.
Database and real-time streaming vendor DataStax has been building out its data platform since 2010, and is a leading contributor to the open-source Apache Cassandra database. In 2021, DataStax acquired Apache Pulsar vendor Kesque and launched a streaming data service. Demand for both database and event streaming have helped DataStax to grow, with the company announcing a $115 million round of funding in June 2022.
The next phase of the company’s growth will be fueled, in part, by the growing demand for AI and ML, powered by a real-time data platform.
Intelligent Security Summit On-Demand
Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.
“Machine learning is transformative to businesses, and it has to be something that you leverage daily in your business processes and in your applications,” Chet Kapoor, CEO of DataStax, told VentureBeat. “We think that we can make it possible for all types of customers to overlay AI pipelines to make it part of their business apps and business processes.”
AI is about more than just unstructured data
A good deal of the hype around modern AI is related to use cases that involve unstructured data. However, while it’s true that generative AI tools for text and images tend to work with unstructured data, that’s not the case for all AI workloads.
Ed Anuff, chief product officer at DataStax, explained to VentureBeat that package delivery, logistics, ride sharing, video streaming and other use cases rely on structured data and AI to work effectively. In those areas, organizations are tracking event-based data as interactions occur, or as locations change, all in a tabular, structured data format.
“The reality is that the majority of applications that we interact with where ML is actually being used to make our interactions more productive, on a daily basis, are the structured data use cases,” Anuff said.
Structured data is what the Apache Cassandra database works with. Vendors such as Uber and Netflix use Cassandra to help power operations. Taking structured data that’s already stored in Cassandra and using it to train AI models is where the process of feature engineering comes in.
What Kaskada brings to DataStax and the Apache Cassandra database
Kaskada has developed feature engineering technology that DataStax expects will be an ideal fit with its real-time data platform.
Anuff said that Kaskada has built a concise description language that enables a data engineer to simply describe what is needed from a dataset in order to feed an AI model. He added that the Kaskada technology is able to operate at the high throughput that’s necessary for real-time applications.
DataStax’s aim is to fit into an ML workflow, providing the data foundation and feature engineering that can be used to power inference engines for AI. Anuff emphasized that the flow of data is bi-directional, such that predictions and outcomes from AI inference can then be loaded back into Cassandra, where the result can be served to application users.
For Kapoor, the overall goal is to enable a real-time data stack that allows organizations to use operational data to help improve business outcomes.
“Our customers have a disproportionately high amount of real-time data and we are giving them an opportunity to leverage it so that they can create excellent experiences for their customers,” Kapoor said.