Facebook disclose the carbon footprint of their new LLaMA models
Facebook used 2.6 million KWh hours of electricity and emitted 1,000 tons of CO2 when developing their new LLaMA models.
Facebook recently released four new language models called LLaMA, which outperform GPT-3 — ChatGPT’s underlying language model — on a number of tasks.
Following a recent trend on resource consumption transparency in LLM training, Facebook estimate the electricity consumption and carbon footprint of their models in the paper published on arxiv [1].
Facebook estimate that to train the 4 different sizes of LLaMA they used 2048 Nvidia A100–80GB GPUs for a period of approximately 5 months.
The energy consumption from this is estimated to be 2,638,000 KWh, roughly the same amount of electricity that 1,648 Danes use in a year on average.
Producing that amount of electricity is estimated to have led to the emission of 1,015 tCO2e – roughly the annual carbon footprint of 92 Danes.
What’s really interesting about Facebook’s numbers is that they seem to account for all the GPU compute they used in the entire model development process – not just the compute from training the final version(s), which seems to otherwise be the norm.
Although the energy consumption from developing the LLaMA models cover the development of all four models, it is still surprising that the number exceeds the resource consumption from training GPT-3, which is estimated to have consumed 1,287 MWh [2].
The explanation for this, I venture, is probably that Facebook report the electricity consumption of the entire development process including experimentation, failed runs etc. On the other hand, the estimate of the electricity consumption from training GPT-3 probably only considers the training of the final model — this is suggested by the fact that it was reported to “only” take 14.8 days to train [4, table 4] whereas Facebook report that they spent 5 months training the models.
The screenshot below shows a table from Facebook’s paper [1]. For comparison purposes, the numbers relating to BLOOM and OPT are based on the same assumptions used to estimate LLaMA’s resource. Luccioni et al have, however, already provided the actual numbers for BLOOM in another paper [3].
Resource consumption transparency in LLM development – a new trend?
The LLaMA paper is not the only paper Facebook has published in which they address the resource consumption and carbon footprint of their machine learning models. Google’s PaLM paper also disclose information about the amount of resources used to train the model [6].
Some papers even address the life cycle carbon footprint of some models.
In a paper called “Sustainable AI: Environmental Implications, Challenges and Opportunities” [5], Facebook estimate the life cycle carbon footprint of a number of their machine learning models.
And in a paper from November 2022, the authors estimate the carbon footprint from training and running inference with the large language model BLOOM and in doing so also consider the environmental impact of producing the hardware on which the model was trained.
These papers follow the publication of two papers from Google [2][4] which estimate the energy consumption and carbon footprint from a number of large language models, including GPT-3 and Evolved Transformer.
Let’s hope the trend continues and that OpenAI in particular will open up so we don’t have to guesstimate the electricity consumption and the carbon footprint of ChatGPT.
That’s it! I hope you enjoyed this post 🤞
Follow for more posts related to sustainable data science or subscribe to my email list to receive an email whenever I publish something.
I also write about time series forecasting like here or here.
And feel free to connect with me on LinkedIn.
References
[1] https://arxiv.org/pdf/2302.13971.pdf
[2] https://arxiv.org/ftp/arxiv/papers/2204/2204.05149.pdf
[3] https://arxiv.org/pdf/2211.02001.pdf
[4] https://arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf