To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
This is the second of a two-part series. Read part 1 dissecting how Databricks and Snowflake are approaching head-to-head competition.
As we noted yesterday, June was quite a month by post-lockdown standards, as back to back, MongoDB, Snowflake and Databricks each held their annual events in rapid succession. Historically, each of these vendors might have crossed paths in the same enterprises, but typically with different constituencies. So, they didn’t directly compete against each other.
Recent declines in financial markets notwithstanding, each of these companies are considered among the hottest growth players on the cloud data platform side, with valuations (private or market) ranging into the tens of billions of dollars. While Databricks is still private, MongoDB and Snowflake have their IPOs well behind them.
They are each positioning themselves as default destination platforms for the enterprise. Databricks and Snowflake at this point are on each other’s competitive radars and yesterday, we gave our take on the chess game that they are playing. In this installment, we look at what each player must do to appeal to the broader enterprise. While there are differences in target markets, especially with MongoDB, there is a common thread for all three: to grow further, they are going to have to spread beyond their comfort zones.
So, what are those comfort zones? Databricks and Snowflake come from different parts of the analytics worlds, while MongoDB has focused on operational use cases. Historically, they each appealed to different audiences. Databricks to data engineers and data scientists, Snowflake to business and data analysts, and MongoDB to app developers.
But recent moves from all three providers are starting to breach those silos. Let’s start with deployment. Of the three, MongoDB is the only one with on-premises presence (the other two are cloud pure plays), but barely five years into its Atlas cloud database service, the company’s revenues are now mostly cloud-driven. While MongoDB will likely never be a cloud pure play, the cloud is distinctly driving its future.
Next is operations. With Snowflake adding a lightweight transaction processing engine and MongoDB making early moves to start addressing analytics beyond visualization, we were prompted to ask a few weeks back whether they are on a collision course. Our take? In the short term, they are still in separate universes, but in the long run, never say never.
As for analytics, we noted yesterday, Databricks and Snowflake are more vocal about expanding into each other’s turfs.
Nonetheless, while MongoDB remains the most vocal about sticking to its knitting as an operational database, beneath the surface it’s making the first moves to come to terms with the relational database folks and dip its toes into analytics.
The starting points
Let’s look at the messages coming out of each of the summits last month. MongoDB’s was about doubling down on developers. In CTO Mark Porter’s keynote, he spoke of the mounting volume of new applications that would be coming forth over the next few years and, with it, the need for expedient approaches enabling developers to overcome the hurdles to getting apps into production. At Snowflake, it was all about reinforcing the “data cloud” as a destination by expanding its reach, both into transaction processing and machine learning. And for Databricks, it was all about benchmarks, governance and lineage capabilities showing that the data lakehouse is ready for prime time and capitalizing on their open-source strategy.
The starting points for each player places their ambitions into perspective. MongoDB’s official mission is enabling businesses to operate as “software companies.” That reflects the fact that MongoDB’s constituency has traditionally been software developers, and that they must be able to be productive if their organizations are to operate at software company velocity. A recurring message of that strategy is that traditional databases have proven to be hurdles, owing to the rigid nature of relational schema and the inability to scale them out.
For Snowflake, it is about targeting business and data analysts who rely on data warehouses with a cloud-native reinvention tackling the barriers of ease of use, scaling and data sharing.
And for Databricks, it is about harnessing the breadth and scale of the data lake with a soup-to-nuts development and execution environment powered by Apache Spark, Photon and Delta Lake.
The next steps
This is where getting outside the comfort zone becomes critical. Let’s examine each provider individually.
For MongoDB, it’s not just about app developers, but also the database folks, as we outlined in our piece last month. For MongoDB to become the default operational data platform for new applications, it must go beyond being a developer company to also becoming a data company.
MongoDB has made some early moves in this direction, such as upping its security game and writing a bona fide SQL query engine. The company needs to make deeper cultural shifts, such as pivoting away from the message denigrating SQL and obsolete database practices. MongoDB responds that relational database developers should also pivot, or at least accept the fact that the document model doesn’t mean walking away from the skillsets and disciplines that they’ve developed. The MongoDB platform does support schema validation. But schema tends to be variable in most MongoDB implementations, so we would like to see more focused efforts in the future for developing data lineage capabilities that could track schema evolution.
Either way, our message to MongoDB remains: Don’t alienate a key constituency (SQL database developers) that you will need to extend your enterprise footprint. We would like to see more positive outreach in the future.
For Snowflake, it’s convincing data scientists that Snowpark should be an effective execution environment for their models. The company has a new partnership with Anaconda, which curates Python libraries, to optimize them for execution in Snowpark. But doubters remain; for instance, H2O.ai contends that it is more efficient to bite the bullet and run machine learning models in their clusters that can multithread processes, then feed results back to Snowflake.
Since introducing Snowpark a couple of years ago, Snowflake has improved its ability to optimally scale resources for user-defined functions (UDFs) written in languages such as Java or Python.
Of course, the recent announcement of Unistore places operational analytics within Snowflake’s sights. However, we don’t view this as a vast land grab for a new constituency as the company is not going after the SQL Servers, Oracles or MongoDBs of the world.
For Databricks, it’s about making the data lakehouse more business- and database analyst-friendly. These folks work with data modeling and BI tools, not notebooks; there needs to be another entry path providing a view that makes Delta Lake look more like a data warehouse.
And business analysts expect consistent performance for both interactive queries and batch reporting. The TPC-DS benchmarks are designed around analytics/decision support workloads, but as with EPA gas mileage ratings, your results will vary. Significantly, the next stage for Photon is reducing latencies under more typical query conditions, along with broadening support of table and file formats beyond Delta Lake and Parquet, respectively.
Bringing it all together for a game-winning strategy
The common thread is that, coming from different starting points, each provider must connect to new constituencies. The key won’t be technology alone, but culture and structuring of the core business. Go-to-market, field and support teams must be recruited who can talk to the different constituencies. Debates over purity must go out the door.
Can MongoDB talk to relational database people as well as developers? Will Snowflake talk the language of data scientists, and can Databricks cultivate the BI crowd? These are not talking points that you’ll see on a press release.