Streaming graph analytics: ThatDot’s open-source framework Quine is gaining interest

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

What do you get when you combine two of the most up-and-coming paradigms in data processing — streaming and graphs? Likely a potential game-changer, at least that’s what is being hinted at by the likes of DARPA and now CrowdStrike’s Falcon Fund, which are betting on ThatDot and its open-source framework Quine.

The CrowdStrike Falcon Fund is an investment vehicle managed by CrowdStrike, in partnership with Accel, that makes cross-stage private investments within cybersecurity and adjacent markets.

DARPA is also known to have an interest in cybersecurity, which is what the company claims motivated its decision to fund the development of the new framework recently released by ThatDot as an open-source project.

While many solutions exist on the market both for streaming data processing as well as for graph analytics, oftentimes working in tandem, ThatDot cofounder and CEO Ryan Wright claims that Quine’s technology is unique, enabling it to scale to orders of magnitude beyond the capabilities of other systems

Wright discussed with VentureBeat the key premises behind Quine and ThatDot, as well as the practical aspects of using Quine and the next steps in its evolution.

Graph analytics and stream processing

“Graph Relates Everything” is how Gartner framed the reasoning behind including graphs in its top 10 data and analytics technology trends for 2021. However, the streaming analytics market is projected to grow from $15.4 billion in 2021 to $50.1 billion in 2026, at a Compound Annual Growth Rate (CAGR) of 26.5% during the forecast period as reported by Markets and Markets.

Still, Wright said that what it takes to process massive volumes of data coming through the enterprise doesn’t fit well into either of these paradigms. Quine is designed to combine event streaming and graph data technologies to connect to existing data streams and build data into a stateful graph.

“It’s like a graph database, but it’s really meant for stream-processing applications. Graph databases have been known to be among the slowest in the data storage world. New technology means that Quine can enter this space with capabilities that had previously been impossible”, Wright said.

Illustration of the tool’s process provided by ThatDot

According to Wright, where previous graph technologies could potentially run in an event stream processing system at a couple of thousand events per second, Thatdot customers have used Quine to process over a million events per second.

And the fact that Quine is stateful makes it suitable to address some critical, difficult-to-solve challenges. Wright said that this is the reason cybersecurity is a prime application domain for Quine and the reason it received DARPA funding.

“The goal was to create new techniques and technologies for detecting advanced persistent threats. And the challenge with advanced persistent threats, where a sophisticated attacker gets into an enterprise environment and stays there quietly. What’s hard about that [is that there is] a huge volume of data all the time.

We’ve got tools that can process data, but to find the attacker, you have to take new data that just arrived. So, about what the attacker is doing right now and you have to combine it with data that might be weeks or months old. The needle in the haystack has to be joined in real time with the incoming needle in the event streaming haystack that just arrived”, Wright said.

Although there are no benchmarks or client names shared at this point, the metrics shared by Wright are impressive and the vote of confidence by investors is real. Prior to its Crowdstrike investment and other investments, ThatDot raised $2 million in seed funding. The company is not disclosing the amount of the Crowdstrike investment and plans to raise a series A later in 2022.

In addition to cybersecurity, other use cases for Quine include blockchain analysis, monitoring and analysis of CDN and MLops at scale with Kubernetes, as well as use by both traditional finance institutions and other fintech companies. So, what is the innovation that enables Quine to outperform existing systems and unlock those use cases?

Quine under the hood

ThatDot’s whitepaper identifies three design choices that define Quine: a graph-structured data model, an asynchronous actor-based graph computational model and standing queries, Quine’s solution to the challenges time presents in distributed systems. As the graph data model is well understood and also shared with many other solutions, let’s examine the actor model and standing queries.

Computation in Quine is built on the Actor Model using Akka. First described by Carl Hewitt in 1973, an actor is a lightweight, single-threaded process that encapsulates state and communicates with the outside world only through message passing. An actor receives messages in its mailbox and performs the corresponding small-scale computation.

Standing queries are the central innovation at the heart of Quine. That means that queries are formulated once and they subsequently live inside the graph, as Wright explained: “You drop it in and it automatically propagates through the graph. It means that answers come back to you. You don’t have to go ask over and over and over again — Do you have my answer now? Do you have my answer now?”.

As Wright put it, Quine is fully asynchronous, distributed and it runs in a graph structured fashion that matches the graph structured data model. Akka and the actor model are not the average developer’s cup of tea, but they are also not needed to be able to use the system. Queries and data ingestion patterns can be expressed in Cypher, one of the most widely used graph query languages.

The Quine community also shares so-called recipes, i.e., packaged configurations of data streaming in, building a graph, monitoring that graph and data streaming out. An example could be ingesting server logs, building a graph out of them, monitoring activity and displaying results in a dashboard. According to Wright, there is a growing repository of recipes that make using Quine effortless.

Obviously, to be able to combine incoming data in real time with historical data, an underlying storage is needed. Quine can be used with several options, ranging from RocksDB for local storage to Apache Cassandra and Amazon S3.

Although there is no fully managed version of Quine at this time, ThatDot offers an enterprise version. The enterprise version of Quine is focused on features around resilient clustering of the system and scaling it to arbitrarily large sizes of data volume so that you can get up to millions per second or beyond, as Wright noted.

The focus for ThatDot in the immediate future is on serving Quine’s open-source community. As Wright shared, Quine is seeing great adoption and lots of exciting use cases coming out of that community. ThatDot aims to create more educational resources and promote developer advocacy. The Portland, Oregon-based company doubled its headcount in 2021 and is aggressively hiring as part of plans to double employees nationwide by the end of 2022.

As for the roadmap, Wright positioned Quine as “a platform for the next generation of AI that is just emerging and starting to leave the research labs: the Graph AI generation.” Wright referred to new techniques around graph recommender systems, graph neural networks and graph anomaly detection, inviting enterprise users who have applications for this upcoming generation of technologies to Quine.

Originally appeared on: TheSpuzz