An open data lakehouse will maintain and grow the value of your data

March 25, 2023

3841 Views 0

SaveSavedRemoved 0

An open data lakehouse will maintain and grow the value of your data

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Are recession fears eating at you? Worried about all your digital transformation investments evaporating like so much dew in the morning sun? That’s a natural way to feel. After all, the digital transformation journey is fraught with obstacles. And the challenging task of extracting value from growing repositories of data sometimes gets put on the back burner.

Fortunately, you don’t need to be stuck in fear and worry; these suggestions can help prevent your company’s precious data from going to waste.

Step 1: Get your data out of the cost center

Even though “everyone” says that data is the big shiny key that will unlock productivity and competitiveness and all the trappings of business success, in practice — that is, in action, not just words — data and data analytics are relegated to the “cost of doing business” side of the ledger.

This categorization triggers a race to the bottom, as organizations try to find the cheapest ways to wring value from their data. In most cases, it means outsourcing this business-critical function to lower and lower bidders.

Event

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

Resist this trend. Start treating data and the systems and people that work with it as the business assets they are. How? Try exposing sterilized or carefully curated versions of your data to customers and clients, as dashboards, for instance. Make your data useful to them, and they will pay you for access.

Utilizing the low-cost, high-availability object stores and robust built-in security frameworks that the cloud vendors provide makes this a much simpler and more cost-effective undertaking than it has ever been previously.

When you’re no longer merely spending money to generate and store and move and analyze data, you can put your data to work. You’ll probably find it’s really good at earning its keep.

Step 2: Keep your data options (and your infrastructure) open

I know this one might sound scary. Too often, people think open — as in open-source — means unprotected, unmanageable or just too much effort.

I’d argue that with the speed of technological advancements hammering us from all directions, the advantages of openness seem hard to argue against. They include:

No vendor lock-in, which can save you beaucoup money over time.
Flexibility to adopt — and, just as importantly, jettison — technologies or solution pieces according to what you need and when you need them.
Futureproofing, because unless you’ve found a perfect crystal ball somewhere (and if so, what are you doing reading this article?), there’s no way to predict what will happen next year or next decade or even next week.
Communities with open governance in which you and your company can participate and actually help shape the future.

And yes, these benefits of openness apply in full measure to data and databases. An open data format coupled with an open source query engine delivers the reliability and performance of a data warehouse; the flexibility and better price/performance of a data lake; the freedom of non-proprietary SQL query processing and data storage; and the governance, discovery, quality and security you need.

Unlike in the early database days of the 1970s when companies could choose among a handful of SQL-based relational database management systems, you are not tied to a single vendor. By uncoupling storage and compute, data lakes let you piece together a solution that takes best advantage of the amount and types of data you actually use. In addition to SQL processing, you can do machine learning (ML) and AI, if that’s your thing. A data lake is flexible, elastically scalable and cost effective. Meaning that now is pretty much a golden era of data analytics.

But — and you knew there was going to be a “but” — the flexibility of data lakes can make them disorganized and hard to manage. Plus, the lack of data consistency in data lakes makes it hard to enforce reliability and security. Here’s the analogy: A data warehouse is a group of sled dogs tied together and moving along snowy terrain in the same direction, while a data lake is more like a menagerie of various breeds of dogs running around in different directions.

And sure, these latest databases can scale like crazy, yet they still don’t solve all the cost issues because they link data storage with compute. So as your data grows, so do your processing and/or cloud infrastructure costs. And the complexity of managing these systems? Forget about it if you don’t have an army of IT admins and acres of data centers brimming with millions of twinkling lights.

Step 3: Employ a data lakehouse

So here’s how to take advantage of all the data flowing through your organization’s digital transformation pipelines and bring together open-source systems and the cloud to maximize the utility of the data.

Use an open data lakehouse designed to meld the best of data warehouses with the best of data lakes. That means storage for any data type, suitable for both data analytics and ML workloads, cost-effective, fast, flexible and with a governance or management layer that provides the reliability, consistency and security needed for enterprise operations.

Keeping it “open” (using open-source technologies and standards like PrestoDB, Parquet and Apache HUDI) not only saves money on license costs, but also gives your organization the reassurance that the technology that backs these critical systems is being continuously developed by companies that use it in production and at scale. And as technology advances, so will your infrastructure.

Remember, you’ve already invested mightily in data transformation initiatives to remain competitively nimble and power your long-term success.

By shifting your relationship to data from a cost center to a profit center and by employing an open data lakehouse in your operations, you will increase the chances of your data ecosystem paying dividends.

Rachel Pedreschi is head of technical services at Decodable.

Originally appeared on: TheSpuzz