Why data has a sustainability problem

This article is part of a VB special issue. Read the full series here: Intelligent Sustainability.

Every year in the U.S., it’s estimated that about 5,130 million metric tons of energy-related carbon dioxide is added to the atmosphere. In tech enterprises alone, the explosion of data hasn’t helped matters, as innovation in the sector continues to grow rapidly. 

Some experts like Sanjay Podder, managing director and global lead of technology sustainability innovation at Accenture, say that if left unchecked, the exponential growth in data could result in increased energy demand and carbon emissions, counteracting progress on climate change.

The last two years have only added to the problem. As a result of COVID-19, cloud adoption, AI deployment and consequently data — all exponentially increased as the demand for accelerated digital transformation heated up. 

Accelerated adoption of these technologies may have helped companies adapt, kept business afloat, allowed employees to keep their jobs during a volatile time and paved the way for future innovation, but what did it do to the environment? 

Data collection and storage, cloud compute and AI all significantly contribute to carbon emissions, but how much and what can enterprises do to mitigate the impacts while propelling forward with innovation? And if data fuels these innovations, what is being done right, and what could companies do better when it comes to data sustainability?

“Hopefully, people move from focusing on data at rest to data in motion,” said Phil Tee, CEO of Moogsoft, an AI-driven observability company. “There’s a sort of a culture that has built up around the idea of throwing away nothing and maintaining every bit of data that you ever received. The trouble is when that gets turned into taking that approach to data that you don’t need to keep. Then what happens is that data is instead of it just being thrown away, or minimally retained, it gets maximally retained. So, in other words, there’s a sort of a knee-jerk reaction that because we don’t throw anything else away, we mustn’t throw that away, even if it’s data that’s purely got real-time significance — like literally six milliseconds — and after receiving that data you’ve got no further use of it. I think that is ultimately if you like the lowest hanging fruit on this tree.”

Defining the data sustainability problem

Technological innovations aren’t going to slow down, and, in fact, they’re booming. A report by Activate Consulting affirms that data and automation in the enterprise are driving the explosion. And while some of these innovations will likely aim to create a better, more efficient reality, their environmental impacts may not be so pretty.

A Stanford Magazine article cites that “saving and storing 100 gigabytes of data in the cloud per year would result in a carbon footprint of about 0.2 tons of CO2, based on the usual U.S. electric mix.” However, the cloud and its data centers can come with their own set of environmental issues. 

MIT reported that “the Cloud now has a greater carbon footprint than the airline industry. A single data center can consume the equivalent electricity of 50,000 homes. At 200 terawatt hours (TWh) annually, data centers collectively devour more energy than some nation-states.” 

The piece goes on to explain that although power from data centers accounts for 0.3% of overall carbon emissions, if the calculation is broadened to include devices that make these innovations happen like laptops, smartphones and tablets, the total adds up to 2% of carbon emissions worldwide.

And AI, which uses vast amounts of data and often leans on the cloud, also has its share of issues — part of which is that the datasets used to train AI are increasingly large and take much energy to run. Researchers from McKinsey confirmed this, stating in an article that “researchers discovered that the environmental costs of training increased in direct proportion to model size.” Similarly, MIT found that “training a single AI model can emit as much carbon as five cars in their lifetimes.”

Innovating while mitigating

But it’s not all doom and gloom. George Kamiya, analyst with the International Energy Agency (IEA), asserts that while it’s important to pay attention to the sustainability issues, keep in mind that “tech companies have different types of effects on emissions: 1) direct emissions from operations (i.e., their footprint); 2) positive indirect effects utilizing their technologies to reduce emissions; 3) negative indirect effects where their technologies actually result in net increase in emissions.” 

He argues that while a large amount of attention so far has fixated on companies’ direct carbon footprints, these emissions are relatively small compared with the effects on emissions from the use of digital technologies, services, and platforms.  

“We certainly need companies to cut their emissions footprints, but companies and policymakers should not lose sight of the fact that the use of these technologies could have much larger impacts in terms of both reducing emissions and increasing emissions in other sectors and services,” Kamiya said. “For example, videoconferencing could help cut emissions from aviation by ‘substituting’ for some business trips, but some uses of machine learning could promote more consumption or increase the competitiveness of fossil fuels, resulting in higher emissions overall. Focusing only on the ‘footprint’ risks missing opportunities — (and risks) — of larger emissions impacts in other sectors and services.”

Stanford Ph.D. and Juris Doctor candidate Peter Henderson, a researcher on natural language processing, reinforcement learning, machine learning, artificial intelligence, computer vision and AI ethics, agrees that there are reasonable actions execs can take to keep innovation flowing while reducing environmental harm. 

“Any generation task requires a lot of data, especially if you don’t have constraints on the topic or the subject matter that the model has to deal with. So, it is true that some areas just need a lot of data. But when you’re building a model, you have a target task in mind, right? And in those cases, where you have a target task in mind, you don’t need all the data in the world. What you need is sort of shown in ML benchmarks,” said Henderson. “A lot of benchmarks are already close to superhuman accuracy on sentiments, like classification or analysis … and so, in those cases, it’s very clear you don’t need all the data in the world because we’re able to solve those  with much less.  I think people really need to think about the target tasks they are using, and think about how you can constrain the amount of data you’re using to still get your benefit while reducing the amount of costs. That being said, it’s not clear how that interacts with scale.”

Stanford has taken steps itself with a tool specifically designed to measure AI and ML’s hidden carbon costs.

Additionally, in a paper titled Energy and Policy Considerations for Deep Learning in NLP, researchers Emma Strubell, Ananya Ganesh and Andrew McCallum found that four deep learning NLP models – Transformer, ELMo, BERT, and GPT-2 – have been responsible for the most significant improvements in performance  when it comes to energy efficiency. 

Another way to mitigate the impact of data explosion is to consider how impact is measured. Tools like Microsoft Cloud for Sustainability, SustainLife and Salesforce’s Net Zero Cloud offer ways to measure a company’s carbon footprints and sustainability impacts and even store data needed for companies to visually see and understand potential missteps and opportunities to improve.

“We continuously look for ways to advance our carbon emissions reporting and improve our carbon accounting process to deliver faster, better, and more accurate data,” said Ari Alexander, general manager of Salesforce’s Net Zero Cloud. “The vast majority of an organization’s emissions come from its value chain — also known as scope 3 emissions. That includes carbon emissions from partners like data and cloud service providers. With Net Zero Cloud, customers can track scope 1, 2 and 3 emissions, and streamline how they track their supply chain carbon footprint data to effectively engage with suppliers to align on sustainability efforts — all in one place.”

Of course, how and where data is ultimately stored, even when in the cloud, also makes a difference.

“A lot of the time, the biggest machine learning jobs are run in the cloud and many times that can be moved around to different parts of the world,” Henderson said. “A lot of the carbon emissions from the energy costs can be mitigated by just moving your jobs to a carbon friendly region like Montreal, for example, which has a lot of data centers that run on almost all hydroelectricity. So, running all your machine learning jobs there would lower emissions.” 

Though, Henderson notes that if everyone moved machine learning jobs to Montreal, it could overwhelm the energy grid in that region. But taking small steps to think about ways to move performance around can make a difference when it comes to climate impact. 

Providers of data centers like Equinix are innovating toward green data storage. The company in particular — which provides data center services to the likes of enterprise customers such as Zoom, Netflix, Salesforce, AT&T and Verizon — even focuses on the build and design of their data centers being sustainable and has been at this work for more than a decade.

“It seems likely that energy efficiency in data centers will continue to improve, but the key question is whether it can keep pace with the increase in demand for data services – in other words, whether overall data center energy use will continue to stay relatively flat (as we’ve seen over the past 10 years) or if it will start to increase more quickly because energy efficiency can’t keep pace with demand growth,” Kamiya notes. “Some of the easier (i.e., low-hanging fruit) efficiency opportunities have already been tapped (notably the shift from less efficient enterprise data centers to more efficient cloud and hyperscale data centers), so it’s possible that we could see a moderate increase in total data center energy use over the next few years. But how much and how quickly is uncertain, and how quickly these can be powered with low-carbon electricity (to keep emissions flat or decreasing).”

To keep perspective, Kamiya also noted that development of data centers is not uniform across the world. 

“Even though globally the energy use has been mostly flat, there have been huge increases in data center hubs such as Ireland, as noted by their central statistics office. Similarly, in the U.S., there will be states where data center energy use will go up a lot and others where it will remain flat and others where it could fall,” he said.

Industry predictions for data and energy efficiency

While the SEC is moving ahead with regulatory ESG reporting requirements for companies, this is a new area for many companies to navigate right now, particularly in the U.S. where this type of reporting has not been mandated before.

“I do think it’s important that the SEC and other regulatory agencies take action in terms of this, but I think companies are already acting without it,” Henderson noted. “At the same time, it’s important to have that baseline level of transparency across the board. I think there are other regulatory actions that can be taken to help move this process along in terms of making sure that carbon footprints and environmental friendliness are mitigated.”

As for what’s next while regulatory actions are pending, Kamiya suggests companies can also focus efforts on reducing environmental impacts across their supply chains, as well as “utilizing their platforms and tools to inform consumers on how they can reduce emissions (e.g., providing information on the environmental impacts of different products or shipping options; suggesting low-carbon travel options in map apps; and addressing climate misinformation/disinformation on social media). These core services that these companies provide are where important, additional emissions impacts could be realized.”

Originally appeared on: TheSpuzz