Why and how of scaling large language models

Nicholas Joseph

PyTorch, January 4, 2022

Abstract

Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Over the past decade, the amount of compute used for the largest training runs has increased at an exponential pace. We’ve also seen in many domains that larger models are able to attain better performance following precise scaling laws. The compute needed to train these models can only be attained using many coordinated machines that are communicating data between them. In this talk, Nicholas Joseph (Technical Staff, Anthropic) goes through why and how they can scale up training runs to use these machines efficiently.