We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Relational databases and SQL were invented in the 1970s, but still dominate the data world today. Why? Relational calculus, consistent data, logical data representation are all reasons that a relational database advocate might credit to its success. However, the success of relational databases could be boiled down to two practical considerations: momentum and the power of the SQL query language.
So-called “NoSQL” technology seems to run counter to those strengths. But in reality, NoSQL is building momentum of its own, and providing the familiarity and power of SQL is how it’s being done.
The power of SQL
Let’s review the power of SQL by supposing that it doesn’t exist: there is no declarative language for working with data. Instead, we have to work imperatively. Instead of specifying what data we want, we have to specify how to get it.
With this strategy, each step of a database query is given verbose instructions: matching, grouping, projecting, and sorting. Some processed by the client, and some by the server. Comparing that strategy to a declarative SQL query, how to project, how to sort, and all processing specified is left to the database. What we’re left with is an easier-to-read and write language that gets us the data we want. And it’s a standard language that someone working with data can pick up and use with any other relational database. It’s no wonder relational and SQL dominate.
The limits of relational
So, why does NoSQL exist? Gartner found that the non-relational DBMS market was the fastest-growing segment in 2020, expanding by 34.5% (more than double the growth of relational). Relational databases were not designed to deal with the scale of the internet. You want a relational server to handle more work? You need to vertically scale it. Which just means, you need a bigger, faster server.
What happens when that becomes impossible or wildly expensive? If you’re Amazon or Google, you have to go outside of the relational model. You have to horizontally scale, which means you have to join multiple servers together over a network. That introduces a whole new world of challenges to solve. Amazon and Google had the resources to tackle those problems, do the research, and release the technical papers, leading to a whole new generation of open-source databases and database-focused vendors, in a movement dubbed “NoSQL.”
Should I use NoSQL or not?
As NoSQL took off, so did microservices (a distributed approach for horizontal scaling of applications). Each microservice could use its own database, and in many cases, this meant that a full system could be using a patchwork of multiple databases.
Sounds like a good approach, but there are challenges. Each microservice has its own domain of data, which is a good, encapsulated design. But now the data is spread out, not only among different databases, but in different technologies. In this new landscape, your team needs to maintain, upgrade, buy, license, patch (log4j, anyone?), and learn different database technologies, but they also have to buy, license, build, maintain, patch (log4j again?), and learn data pipelines and integrations between those technologies. This is known as “database sprawl.”
Solutions: Single model, cloud, and multimodel
Three approaches can help reduce database sprawl:
- Standardize on a single database
- Lock into a cloud provider
- Use a multimodel approach
Standardize on a single database
This approach means dictating to your organization: “use this one database for everything.” The momentum of the relational database makes it a popular choice: it may not be the best choice for search or caching or graph, but “no one ever got fired for buying IBM.” as the saying used to go.
Pros: Huge talent pool, can usually “make it work” with enough time or money
Cons: Expensive, less agile
For organizations working in a standardized domain that doesn’t change often and doesn’t need to handle large scale, this costly approach is one to consider.
Lock into a cloud provider
Popular cloud providers (Azure, AWS, GCP) have gathered open-source databases, APIs, and their own proprietary database technologies “as a service.” They can offer a wide range of databases to go with microservices. Because they control the cloud, they can offer the integrations, patching and maintenance between all of them. It’s still database sprawl, but it’s less work.
Pros: One-stop shop, a buffet of database choices
Cons: Can get very costly, vendor lock-in, open-source compatibility lags behind, still sprawling
This approach is popular, but it has risks. If your applications are built solely on AWS, for instance, what happens when the price increases or a feature is removed? Your switching costs can be enormous (not just in dollars, but opportunity costs).
Use a multimodel approach
How can a NoSQL database compete with the titanic ecosystems of Azure, AWS, and GCP and still help you avoid database sprawl? The answer is “multimodel” databases. These are databases that are built on a single data storage technology, but offer multiple ways to read, write and access the same data.
Pros: One-stop shop, a buffet of data interaction options, can be used in multiple clouds
Cons: Relatively new
Wait a minute, did you say SQL?
Yes, SQL. It’s in NoSQL databases now. Nonrelational databases are turning to the most successful and well-known database language to put it to work on nonrelational data (like JSON). It’s known as SQL++, and it’s an emerging standard that is being championed by Couchbase, Amazon (PartiQL), and Microsoft (CosmosDB SQL).
We’re seeing a fusion of the best of relational and the best of NoSQL start to emerge. Fast and flexible like NoSQL, familiar like relational, a future-proof multimodel approach, joining together to make your database story more affordable.
Matthew Groves is a developer and database enthusiast at Couchbase.