Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more
Let the OSS Enterprise newsletter guide your open source journey! Sign up here.
Starburst, the commercial entity behind the open source Presto-based SQL query engine Trino, has announced a new fully-managed, cross-cloud analytics product that allows companies to query data hosted on any of the “big three’s” infrastructure — without moving the data from its original location.
While many of the big cloud data analytics vendors support the burgeoning multicloud movement by making their products available for each platform, problems remain in terms of making data stored in multiple environments easy to access. Companies still have to find a way to “pool” data from these different silos, be it through moving data to a single cloud or data warehouse, which is not only time-consuming but can also incur so-called “egress” fees for transferring data. And this is what Starburst is now addressing, by extending its fully-managed software-as-a-service (SaaS) product to allow its customers to analyze data across the major clouds with a single SQL query.
From Presto to Trino
Starburst has followed a rather circuitous route to where it is today. The company’s foundations can be traced back to 2012 when a group of Facebook engineers developed a distributed SQL query engine called Presto to help its in-house data scientists and data analysts run faster queries on huge data sets. Facebook open-sourced Presto the following year, but following an ongoing disagreement with the powers-that-be at Facebook, the Presto creators eventually departed the social network and launched a fork called PrestoSQL — which was rebranded as Trino last December.
As with many similar open source projects, Trino now has a commercial counterpart known as Starburst, whose founders include the original Presto creators among other early Presto adopters. Initially, Starburst was offered in a single “enterprise” flavor that could be self-managed and hosted on-premises or any public cloud. Earlier this year, Starburst launched a new fully-managed SaaS offering called Starburst Galaxy, which features an integrated SQL editor out-of-the-box for querying data and connectors for integration with data sources.
Starburst Galaxy was originally only available for AWS, but to support Starburst’s push into cross-cloud analytics, the company is now extending support to Microsoft’s Azure and Google Cloud Platform (GCP). It’s worth noting that Starburst had previously introduced a cross-cloud analytics product called Stargate for the self-managed incarnation. Now Starburst is bringing this same functionality to its fully-managed service, where it handles all the infrastructure and the customer doesn’t have to worry about what’s going on under the hood.
“This allows us to extend cross-cloud analytics capabilities to anyone and any department without the help of central IT,” Starburst cofounder Matt Fuller told VentureBeat. “This allows domain experts to take ownership of the data they know best and deliver it as a product to the rest of the organization.”
So what is the big brouhaha over multicloud anyway? Isn’t it easier for companies to pick a public cloud and stick with it? In some cases, that might well be true, but companies will often pursue a multicloud approach for any number of reasons.
Some clouds are better at certain things than others, in which case it might make sense to use GCP for one thing, and AWS for another. Moreover, cost and compliance considerations might also lead a company down a multicloud or hybrid-cloud approach, mixing up on-premises infrastructure with one or more public clouds. And sometimes, companies can find themselves in a multicloud world by happenstance, through acquiring companies that use different clouds or where different internal departments select the cloud that best suits their needs.
Cross-cloud analytics goes some way toward helping these companies circumvent data silos that all these various scenarios create.
“By having data in these different clouds, it creates a further extension of the data silo problem where data not only exists in different data sources, but it is also now in very different locations,” Fuller said. “That is why cross-cloud analytics is needed — otherwise, data has to be moved to a single cloud. Much like the previous solution to the problem of attempting to move all data into a single data warehouse.”
It’s also worth noting that even in situations where a company does use a single cloud provider, the company may have to store data in different cloud “regions” to satisfy local data residency requirements. In such cases, using alternative analytics solutions that involve transferring data between systems or locations isn’t an option — which is where Starburst’s latest solution could really shine.
“Cross cloud analytics allow for processing to be pushed to the region where the data resides and only have aggregated insights leave,” Fuller explained. “If restricted data must leave, it can be masked to adhere to the requirements.”