Exclusive: Microsoft and Google join forces on OneTable, an open-source solution for data lake challenges

VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Hear from top industry leaders on Nov 15. Reserve your free pass


In a new open-source partnership development effort announced today, Microsoft is joining with Google and Onehouse in supporting the OneTable project, which could reshape the cloud data lake landscape for years to come.

Over the last several years, organizations have had to make a decision on what data lake table format to use. It’s a decision that could potentially have led to vendor lock-in and compatibility challenges for data analytics and AI workloads. Among the primary data lake table formats are the Apache Iceberg and Apache Hudi technologies as well as the Databricks’ led Delta Lake.

The OneTable project, which was started by Onehouse, is an attempt to create a new layer that sits on top of the data lake table formats that enables omni-directional conversions and access across Iceberg, Hudi and Delta Lake.

Onehouse first announced OneTable in February, alongside a $25 million funding raise, and now the effort is being significantly expanded as an open-source project that has the support of Microsoft and Google, with other vendors including Amazon, in discussion for future participation.

VB Event

AI Unleashed

Don’t miss out on AI Unleashed on November 15! This virtual event will showcase exclusive insights and best practices from data leaders including Albertsons, Intuit, and more.

 

Register for free here

“Throughout this year, we’ve been working with our customers as well as with Google and Microsoft and a bunch of different folks to broaden the idea and bring more form and shape to it,” Onehouse founder and CEO Vinoth Chandar, told VentureBeat. “I think we are now at this point where we are ready to open source OneTable as our contribution to the community and make sure there’s a place for cross format, interoperability backed by some of the key influencers adopting these [data lake table] formats.”

Microsoft ignites data fabric and embraces OneTable

Microsoft has its own data lake approach called Fabric, that supports the Delta Lake table format, and is key part of Microsoft’s drive to create a single, open framework for its customers (see today’s other announcements about this). Joining the effort to support OneTable is all about helping to enable openness.

“We want a pathway where people can buy into our ecosystem without feeling blocked,” Raghu Ramakrishnan, CTO for data at Microsoft, told VentureBeat.

Ramakrishnan noted that there is diversity across the data lake landscape today. Databricks’ Delta Lake has a growing base of users, Iceberg is supported by multiple vendors including Snowflake and Cloudera, Hudi has its fair share of users and supporters too, including retailing giant Walmart. Being able to use and query data cross data lake table formats is a critical capability.

“Not having this [OneTable] be proprietary is going to be super helpful to our customers and frankly, to us,” Ramakrishnan  said.”Ultimately, my real hope here is that together, we can create an ecosystem where customers can go to whatever is the best solution without being shackled by the underlying data.” 

Google sees OneTable as a data lake ‘Babelfish’

Google has been developing its own data lake platform technology with BigLake tables among other efforts. Supporting OneTable as an open source effort is seen by Google as being key to enabling the goal of having an open data architecture.

“We built BigLake, because we really see the benefits of open data architecture,”Gerrit Kazmaier, VP data and analytics at Google Cloud, told VentureBeat.

Kazmaier noted that to date there has been a real challenge where organizations have had to make tough choices about what table format they choose. Depending on the technology, an organization could be locked into a way of managing, accessing and governing data that could have long term consequences.

“There are free and open formats like Iceberg, but then there may be other workloads running that depend on a different format  that is not your chosen primary file format,” he said. “That’s where OneTable helps, it’s kind of like a Babelfish.”

A Babelfish is a fictional creation from the science fiction classic, Hitchhiker’s Guide to the Galaxy, that enables people to automatically translate and understand different languages. Kazmaier said that OneTable will not replace the different data lake table formats, but it will remove a burden from organizations about having to choose a format they might get locked into.

The ability to enable interoperability across formats is critical for Google as it expands the availability of its BigQuery Omni data analytics technology. Kazmaier said that Omni basically extends BigQuery to AWS and Microsoft Azure and it’s a service that has been growing rapidly. As organizations look to do data processing and analytics across clouds there can be different formats and a frequent question that is asked is how can the data landscape be interconnected and how can potential fragmentation be stopped.

“OneTable we think is a great approach to that and it is really aligned with our principle of openness,” Kazmaier said.

Originally appeared on: TheSpuzz

Scoophot
Logo