If you’ve followed some of my recent blog posts, I’ve been talking about the increasing compute requirements and carbon footprint of deep models. I’ve also talked about some simple interventions that one can make to adopt more green practices. In this post, I’m going to shift a bit and ponder about green AI beyond environmental impact.
This is because sustainability includes more than just the environment. It is widely held that sustainability captures three domains of public interest: the environment, the economy, and society as a whole [1]. A discussion of sustainability in machine learning should thus include all three.
So how does increasing compute from machine learning affect sustainability beyond the environment? While the capabilities of these systems has increased, the cost of using them has also increased. As models get larger and larger, access to them becomes limited to those who have the resources to run them; namely, money and infrastructure. As such, they become restricted to the massive data centers of a few wealthy organizations, so fewer and fewer people are able to interact with them beyond the text prompts of public APIs. How do we alleviate this situation?
Infrastructure
First, there’s the infrastructure required to run large models. Whereas in previous decades one could run and train machine learning models with relative ease using just a single decent GPU, this has long since been the case. If you don’t have the luxury of owning your own massive rig with 384 A100 GPUs (the setup used to train the 176B parameter BLOOM model [2]) there’s one primary viable option: cloud compute.
Cloud compute has a number of attractive features. You can scale up or down as needed. You can choose exactly which hardware to run your models on. You can generally select when and where to train your models in order to optimize carbon emissions. Cloud data centers also tend to be built with environmental sustainability in mind.
But requiring all large scale compute related to machine learning to go through cloud data centers, which are largely operated by three companies, risks a concentration of power. While there are currently a handful of available publicly operated clouds, most ML cloud compute goes through Microsoft Azure, Amazon AWS, and Google Cloud Platform. If this increases going forward due to the need for more compute resources, how should these companies be regulated in order to ensure a fair distribution of these resources in addition to improving environmental sustainability?
It also might not be possible to fully utilize the flexibility of cloud compute, or to use it at all, depending on the constraints of the project being worked on. Data and model life cycle are a factor here. For example, GDPR introduces restrictions on where data can be stored. Inference restricts when and where models can be run in order to keep the user experience high (e.g. by reducing latency). One possible solution to geographic constraints is to build more publicly owned cloud compute, as is being done in Europe. But this is likely only achievable by very wealthy nations: as an example, the recently created Jean Zay compute cluster in France cost around 25M EUR to build.
Increasing infrastructure can allow one to use massive models, but compute is still forced into data centers. What about if one needs to run models in a resource constrained setting?
Scaling down compute requirements
In many cases, one is restricted from using ever ballooning state-of-the-art models due to various constraints. Maybe the monetary cost of the required infrastructure is too great. Perhaps your use case requires you to run models on resource-constrained devices. Reducing the compute requirements of models can help increase the democratization of machine learning by reducing required infrastructure, and thus cost, of training and running large models.
There are a number of ways to accomplish this: quantization, pruning, knowledge distillation, dataset condensation, and more. Certain models and operations can be compressed quite aggressively with almost no loss in performance: optimizer states can be quantized to make training more efficient [3], graph neural net activations can be compressed using single bit precision with negligible losses [4], language model weights can be compressed down 1/48th the number of parameters using tensor decomposition without much performance degradation [5], the list goes on.
But as with many things in life, there’s no free lunch. Scaling down compute needs to be balanced against the potential unintended consequences of doing so. For example, tolerable levels of performance degradation are going to depend on the task being performed. For example, certain operations to reduce compute, such as image downsampling, might not be appropriate in settings such as medical image analysis [6] due to the critical importance of accuracy. Additionally, it has been demonstrated that compression techniques can lead to an amplification of algorithmic bias [7], which has implications for fairness. In a nutshell: there are a number of ways you can reduce the compute requirements of your models, but you should think about potential negative impacts in order to select the most appropriate set of methods.
This post isn’t meant to enumerate all of the tradeoffs involved in increasing the accessibility of and reducing the environmental impact of our largest state of the art models. But we can start to think a bit more beyond environmental impact in order to mitigate social and economic impact as well. There is an opportunity here for new research in machine learning which looks into the mutual interplay of all three components of sustainability.
References
[1] Mensah, Justice. "Sustainable development: Meaning, history, principles, pillars, and implications for human action: Literature review." Cogent social sciences 5, no. 1 (2019): 1653531.
[2] Luccioni, Alexandra Sasha, Sylvain Viguier, and Anne-Laure Ligozat. "Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model." arXiv preprint arXiv:2211.02001 (2022).
[3] Dettmers, Tim, Mike Lewis, Sam Shleifer, and Luke Zettlemoyer. "8-bit Optimizers via Block-wise Quantization." In International Conference on Learning Representations.
[4] Liu, Zirui, Kaixiong Zhou, Fan Yang, Li Li, Rui Chen, and Xia Hu. "EXACT: Scalable graph neural networks training via extreme activation compression." In International Conference on Learning Representations. 2021.
[5] Wang, Benyou, Yuxin Ren, Lifeng Shang, Xin Jiang, and Qun Liu. "Exploring extreme parameter compression for pre-trained language models." In International Conference on Learning Representations.
[6] Selvan, Raghavendra, Julian Schön, and Erik B. Dam. "Operating critical machine learning models in resource constrained regimes." arXiv preprint arXiv:2303.10181 (2023).
[7] Hooker, Sara, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. "Characterising bias in compressed models." arXiv preprint arXiv:2010.03058 (2020).
These are indeed super important considerations. I wonder if the no-free-lunch extends also to sustainability in the sense of pressure on researchers (often PhD students) to publish and outpace their peers, which is exacerbated by social media bias etc. - how should we balance stricter sustainability requirements with researcher wellbeing? As I commented earlier, I am very hesitant to bring up this topic with my students, who are often already struggling with technical aspects of the project and could not handle yet another tool. Should I just get my stuff together?