Minimizing the carbon footprint of data analysis, maximizing sustainability for data centers

Faster total time to insights is kinder to the environment.

Executives face more pressure than ever to reduce their environmental impact. This is especially true for data centers because of their contribution to global warming. If all the data centers in the world were a country, they would be ranked as the fifth-largest energy consumer in the world. In 2020, data centers consumed about 1% of the global electricity demand and contributed to 0.3% of all CO2 emissions.  

Today, companies are required to provide transparency about their carbon footprint, and the race is on for data centers to improve their efficiency ranking. There is a list of data centers around the world raked by PUE (price usage effectiveness) and Greenpeace has created a cleantech industry ranking of centers based on their carbon footprint.

The need for greener code

Many of the sustainability initiatives of data centers are based on using renewable energy for cooling or optimizing cooling systems to reduce power consumption. However, besides the energy required to maintain environmental controls for data analytics, the software itself also has a significant effect on the amount of electricity being consumed. How much? Quite a bit.

Based on current research, one large machine learning (ML) model, such as Meena, consumes the same amount of energy as a passenger vehicle that drove 242,231 miles. Researchers at the University of Massachusetts at Amherst estimated that training a large deep-learning model produces 626,000 pounds of CO2, equal to the lifetime emissions of five cars.

As a result, there is an increased interest and devotion to creating more efficient code. The Green Software Foundation (GSF), with members such as VMware, Microsoft, Accenture and GitHub, has a mission to design, architect and code software that consumes less energy.  

Tips for sustainable machine learning

There are several academic articles about how to write greener algorithms for AI/ML models, but here are a few basic tips.

One way to reduce computing resources is to minimize the number of training experiments. There are hundreds of ML models or blueprints that are pretrained, where developers only need to bring their own data to infuse AI capabilities into applications, significantly reducing the time needed to develop and train models.

It’s also important to have visibility into the algorithm’s carbon footprint in order to make decisions about the best way to optimize performance. Researchers from several universities have created tools for that purpose. For example, Green Algorithms calculates your cloud computing carbon footprint. Another example is CodeCarbon, which is a software package that integrates into the Python codebase and estimates the amount of CO2 produced by the computing resources used to execute the code.

Automation can also be used to reduce training run time. It’s possible to minimize the number of experiments, and/or the amount of data that is analyzed, while still maintaining accuracy. More efficient data sampling by itself can speed up model run time by a factor of 5.8.

The software that is used to actually do the computations can also help reduce the number of computing resources required. There are databases specifically designed for processing massive amounts of data that can optimize the utilization of memory and storage to reduce energy consumption. These databases also have the advantage that it’s not necessary to limit the amount of data that’s analyzed, which lowers the risk that the accuracy of the model is compromised by trying to speed up run time.

Reducing model run time, in addition to increasing energy efficiency, reduces total time to insights for business-critical applications such as fraud detection, cybersecurity solutions, quality control, etc. More efficient code is not only better for the environment, but it’s also good for business. 

More potential customers want transparency into a company’s commitment to its green strategies and having a code “green” standard could be an important first step. Employees want to work for an ecologically sensitive company that makes responsible decisions regarding the environment. In the future, cloud vendors might require visibility into a workload’s carbon footprint, with fines for processing that is considered excessive or unnecessary.    

With the huge number of calculations required to infer meaning to make better business decisions, being socially responsible isn’t just a nice-to-have, it’s become a necessity.

 Ohad Shalev is a strategic analyst at SQream.

Originally appeared on: TheSpuzz