Bluesky helps curb machine learning costs with cost governance algorithms

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Query optimization isn’t necessarily new. Cost governance in the cloud to identify and control expenses for queries isn’t new, either. What is new, however, is Bluesky, a cloud-based workload optimization vendor, focused on Snowflake, that launched earlier this month to help organizations achieve these objectives.

One of the critical elements in the company’s approach is “the algorithms that we created ourselves, based on each of our past 15 years’ experience tuning workloads at Google, Uber, and so on,” said Mingsheng Hong, Bluesky CEO.

Hong is the former head of engineering for Google’s machine learning runtime capabilities, a role in which he worked extensively with TensorFlow. Bluesky was cofounded by Hong and CTO Zheng Shao, a former distinguished engineer at Uber, where he specialized in big data architecture and cost reduction.

The algorithms Hong referenced analyze queries at scale, predominantly in cloud settings, and determine how to optimize their workloads, thereby decreasing their costs. “Individual queries rarely have business value,” Hong observed. “It’s a combination of them that together achieve certain business goals, like transforming data and providing business insights.”   


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

What’s particularly interesting is Bluesky combines both statistical and symbolic artificial intelligence (AI) approaches for this task, tangibly illustrating that their fusion may influence AI’s future in the enterprise.

Cost governance of machine learning queries

There are several ways in which Bluesky reinforces cost governance by optimizing the amount of time and resources dedicated to querying popular cloud sources. The solution can curb query redundancy via incremental materialization, a useful function for recurring queries in set increments, like hourly, daily or weekly.

According to Hong, when analyzing monthly revenue figures, for example, this capability enables systems to “materialize the prior computation and only compute the incremental part,” or the delta since the last computation. When applied at scale, this feature can conserve a considerable amount of fiscal and IT resources.

Tuning recommendations

Bluesky delivers a detailed amount of visibility into query patterns and their consumption. The solution offers an ongoing list of the most expensive query patterns, as well as other techniques to “show people how much they’re spending,” Hong said. “We break it down to individual users, teams, projects, call centers and so on, so everybody knows how much everybody else is spending.”

Bluesky incorporates algorithms that involve statistical and non-statistical AI approaches for profile-driven, query cost attribution. Query profiles are based on how much time, CPU and memory that specific queries require. The algorithms employ this information to reduce the use of such resources for queries via tuning recommendations for modifying the query code, data layout and more. “Optimization is not just the compute,” Hong noted. “Also, we organize the storage: the table indices, how you lay out the tables, and then there are warehouse settings and system settings that we tweak.”

Rules and supervised machine learning 

Significantly, the algorithms providing such recommendations and analyzing the factors Hong mentioned involve rules-based approaches and machine learning. As such, they combine AI’s classic knowledge-representation foundation with its statistical one. There are abundant use cases of such a tandem (termed neuro-symbolic AI) for natural language technologies. Gartner has referred to the inclusion of both of these forms of AI as part of a broader composite AI movement. According to Hong, rules are a natural fit for query optimization.

“This is like query optimization starting with rules and you enrich them with the cost model,” he reflected. “There are cases where trying to run a filter is always a good idea. So that’s a good rule. To eliminate a full table scan, that’s always good. That’s a rule.”

Supervised learning is added when implementing rules based on cost conditions or the cost model. For instance, eliminating queries with a poor ROI is a useful rule. Supervised learning techniques can ascertain which queries fit this classification by scrutinizing the past week’s worth of queries, for example, before eliminating them via rules. “If a query is failing more than 98% of the time over the last seven days, you can put such a query pattern into a penalty box,” Hong remarked.

Curbing costs

The need to lower enterprise costs, particularly as they apply to multicloud and hybrid cloud settings, will surely increase over the coming years. Cost governance and workload optimization methods that optimize queries are helpful for understanding where costs are increasing and how to reduce them. Relying on automation that uses both statistical and non-statistical AI to identify these areas, while offering suggestions for rectifying these issues, may be a harbinger of where enterprise AI is going

Originally appeared on: TheSpuzz