4 Comments

I've been tracking some of the progress in hardware design that can address much of the energy loss problems from the current GPU and TPU based architectures mainly from NVIDIA A-1000. The most advanced seems to be the WSE (Wafer Scale Engine) architecture from Cerebras. Rather than printing 49-50 usable NVIDIA GPUs on a wafer and deploying them into separate racks, Cerebras prints a single wafer with 850,000 cores all connected through extremely short conductors. The short distance relative to rack based systems reduces the inductance losses of conventional circuits saving huge amounts of energy while increasing the potential clock speeds and watts/MFLOP performance.

Of course, this comes with a price tag of "several millions" for their lated CS-2 system, but with efficient sharing and scheduling of a central system this cost can be diffused over many users.

A very interesting publication on this comparison is available at this site:

https://khairy2011.medium.com/tpu-vs-gpu-vs-cerebras-vs-graphcore-a-fair-comparison-between-ml-hardware-3f5a19d89e38

Expand full comment

Thanks! It is easy but as you say, you have to care enough to spend the 10 minutes learning how. How do we get people to care enough? Even I find it unreasonably hard to get my students to follow these simple steps, since it would mean having less time for things they *are* rewarded for. What do you do, for example, when you work on or supervise a project that's not itself about efficency/carbon etc.? How do you motivate yourself and others to follow the best practices?

Expand full comment

This is a good question and one I'm not sure I have a good answer for. It's probably similar to how some people take steps on their own to be more environmentally conscious while others don't. It has to be incentivized somehow by either explicitly making it a requirement or enough people have to start doing it to add social pressure. And more friction can be removed by making some of these things available by default e.g. making mixed precision the default for training in pytorch, integrating carbon tracking tools into experiment trackers, etc.

Expand full comment

Makes sense. Though making it a requirement seems to be purely about politics rather than research, so it's important to keep bringing the topic up. In NLP we made a proposal for a minimal requirement list so lack of standard is not the issue (https://aclanthology.org/2022.emnlp-main.159/).

Expand full comment