To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and as a regular contributor. Watch for his articles in the Data Pipeline.
The year 2023 is here, and enterprises are set to make the most of it. From startups to major conglomerates, every company has moved into the new year with the same mission – driving growth with a focus on operational efficiency, productivity, and resilience.
Since data will play a key role in achieving this mission, leading industry experts and vendors have shared predictions on how the data space will take shape in the coming months.
1. CIOs will look to consolidate data and simplify architecture
“Speaking with other CIOs, I’ve noticed that companies are growing exponentially without a plan to organize their data. When a company considers scaling at all costs but doesn’t invest in the right technology to support that growth, there will be issues. Part of the problem is that CIOs today have to manage too many systems. Too many disjointed data pools lead to duplicated, siloed, and locked-up data, which is not only timely and costly to manage and analyze but also leads to security issues. For a company to truly move forward with digital transformation, they need to combine data science and data analytics and draw from a single source of truth. We’ll see more CIOs cutting back on vendor spending to simplify their data architecture. Companies that implement an architecture that combines hindsight and predictive analytics to deliver efficient and intelligent solutions will win in the end.”
— Naveen Zutshi, CIO of Databricks
Intelligent Security Summit On-Demand
Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.
2. Broader adoption of data contracts
“Designed to prevent data quality issues that occur upstream when data-generating services unexpectedly change, data contracts are very much en vogue. Why? Thanks to changes made by software engineers who unknowingly create ramifications via updates that affect the downstream data pipeline and the rise of data modeling gives data engineers the option to deliver the data into the warehouse, pre-modeled. 2023 will see broader data contract adoption as practitioners attempt to apply these frameworks.”
— Lior Gavish, co-founder and CTO of Monte Carlo
3. Availability will be the key to winning in 2023
“One thing we have learned in recent years is outages can be crippling for a business. In 2023, availability will be the secret sauce differentiating the winners from the losers. Companies need to avoid lock-in and have the flexibility to scale. By diversifying cloud environments, companies will minimize the impact of outages on their ability to continue operations.”
— Patrick Bossman, product manager for MariaDB
4. 2023 will be the year of the data app
“In the past ten years we’ve seen the rise of the web app and the phone app, but 2023 is the year of the data app. Reliable, high-performing data applications will be a critical tool for success as businesses seek new solutions to improve customer-facing applications and internal business operations. With on-demand data apps like Uber, Lyft and Doordash available at our fingertips, there’s nothing worse for a customer than to be stuck with the spinning wheel of doom and a request not going through. Powered by a foundation of real-time analytics, we will see increased pressure on data applications to not only be real-time but to be fail-safe.”
— Dhruba Borthakur, co-founder and CTO at Rockset
5. The rise of data processing agreement (DPA)
“How organizations process data within on-premises systems has historically been a very controlled process that requires heavy engineering and security resources. However, using today’s SaaS data infrastructure, it’s never been easier to share and access data across departments, regions, and companies. With this in mind, and as a result of the increase in data localization/sovereignty laws, the rules as to how one accesses, processes, and reports on data use will need to be defined through contractual agreements – also known as data processing agreements (DPA).
In 2023, we’ll see DPAs become a standard element of SaaS contracts and data-sharing negotiations. How organizations handle these contracts will fundamentally change how they architect data infrastructure and will define the business value of the data. As a result, it will be in data leaders’ best interest to fully embrace DPAs in 2023 and beyond. These lengthy documents will be complex, but the digitization of DPAs and the involvement of legal teams will make them far easier to understand and implement.”
— Matt Carroll, co-founder & CEO of Immuta
6. No-copy data exchanges will take hold
“In 2023, as data sharing continues to grow, and data and IT teams are strapped to keep up, no-copy data exchanges will become the new standard. As organizations productize their modern data stack, there will be an explosion in the size and number of data sets. Making copies before sharing just won’t be feasible anymore. In 2023, enterprises will flock to established platforms, like Snowflake’s Data Exchange and Databricks’ Delta Sharing protocol, to make it easier to share and monetize their data securely.”
— Matt Carroll, co-founder & CEO of Immuta
7. AI-based automation for unstructured data management will gain traction
“Data management for file and object data is getting more sophisticated with adaptive machine learning and AI-based automation to intelligently guide data placement, lifecycle management, search and movement. Solutions can adapt based on the customer’s cost profile, data profile, and target provisioning and learn over time to refine recommendations. For example, an AI algorithm could be used to proactively identify sensitive data sets, such as files with extensions or tags related to financial documents, which have been stored out of compliance–such as in the CMO’s directory rather than a read-only directory owned by the CFO.”
— Kumar Goswami, CEO and co-founder of Komprise
8. Synthetic data will accelerate AI innovation
“In 2023, synthetic data will be a game-changer in accelerating the development and deployment of AI while guarding against algorithmic bias. One of the significant challenges in developing AI is getting the right amount and diversity of data to train machine learning-based algorithms. These algorithms require massive amounts of data that are representative of the different people that will interact with it and the contexts in which it will be used. It is difficult, time-consuming and costly to acquire this breadth and depth of data. Data synthesis enables AI companies to rapidly augment their existing datasets and simulate scenarios that are difficult to generate in the real world.
For example, in automotive, synthetic data tools can use a source image of a driver to create synthetic variations that use varying lighting conditions or head movements. It could even simulate a driver falling asleep behind the wheel – data that is rare and very dangerous to capture in real life. Deploying synthetic data tools is key to not only solve these complex challenges of data collection but also to combat algorithmic bias, by ensuring datasets are truly diverse.”
— Dr. Rana el Kaliouby, deputy CEO at Smart Eye
9. In a multi-cloud world, object storage is primary storage
“Right now, databases are converging on object storage as their primary storage solution. This is driven by performance, scalability and open table formats. One key advantage in the rise of open table formats (Iceberg, Hudi, Delta) is that they allow multiple databases and analytics engines to coexist. This, in turn, creates the requirement to run anywhere – something that modern object storage is well suited for.
The early evidence is powerful, both Snowflake and Microsoft will GA external tables functionality in late 2023. Now companies will be able to leverage object storage for any database without ever needing to move those objects directly into the database, they can query in place.”
— Anand Babu Periasamy, co-founder and CEO of MinIO
10. Data hoarding will be thrust into the limelight
“Data hoarding is one of the biggest hidden secrets in the industry today. With 14.4 billion connection points in 2022, companies are sitting on treasure troves of data with no real use for all of it. The thought is that they will be able to use their data in the future in ways that they cannot access today, but it’s quite the opposite.
Each piece of data is also becoming bigger as technology continues to advance. Everything is becoming richer, from higher-res cameras to higher-quality microphones – this is all taking up massive amounts of space.
I expect companies and consumers alike to begin paying attention to the data that they are starting to hoard unconsciously.”
— Renen Hallak, founder and CEO of VAST Data
11. The rise of hybrid ‘bring-your-own-database’ (BYODB) cloud deployments
“The benefits of moving certain data-driven projects to the cloud are undisputed — quicker deployment, reduced infrastructure and maintenance costs, built-in support and SLAs, and instant scalability when you need it. However, there will always be use case obligations that require keeping data on-premises, including performance, security, regulatory compliance, local development, and air-gapped hardware (to name a few). A more flexible solution is for modern data vendors to support hybrid “bring-your-own-database” (BYODB) cloud deployments in addition to the more common on-premises and fully-managed cloud service options.
This new approach will catch on in the years ahead, allowing data to be kept in situ and unaltered but remotely connected to SaaS services that layer on top from nearby data centers. This provides all the benefits of the cloud, while still allowing for full authority and control over the company’s most precious resource… its data.”
— Ben Haynes, CEO and co-founder of Directus
12. Pipelines will get more sophisticated
“A data pipeline is how data gets from its original source into the data warehouse. With so many new data types—and data pouring in continuously—these pipelines are becoming not only more essential but potentially more complex. In 2023, users should expect data warehouse vendors to offer new and better ways to extract, transform, load, model, test, and deploy data. And vendors will do so with a focus on integration and ease of use.
— Chris Gladwin, CEO and co-founder of Ocient
13. Vector databases take hold to unleash the value of untapped unstructured data
“As businesses embrace the AI era and attempt to make full use of its benefits in production, there occurs a significant spike in the volume of unstructured data taking all sorts of forms that need to be made sense of. To cope with these challenges in extracting tangible value from unstructured data, vector databases – a new type of database management technology purpose-built for unstructured data processing – is on the rise and will take hold in years to come.”
— Frank Liu, director of operations at Zilliz
14. Data observability will become a critical industry
“In today’s economy, it’s critical to constantly calculate ROI and prioritize ways that we can do more with less. I believe engineering teams have an opportunity to lean in and work towards increasing the capacity of the company to win. I predict we’ll increasingly see engineers and data teams becoming facilitators of enabling companies to make data-driven decisions by building the infrastructure and providing tools needed to enable other teams (especially non-technical teams). One of the ways they’ll enable this shift is to help teams understand how to access their data in a self-serving manner, rather than being constantly at the center of answering questions. Instead of hiring more data scientists, I expect data teams to increase data engineering roles to build lasting infrastructures that enable folks on all sides of the business to answer questions independently.”
—Shadi Rostami, SVP of engineering at Amplitude