Drivers of large language model diffusion: incremental research, publicity, and cascades
December 21, 2022
Abstract
This article examines the diffusion of large language models, focusing on GPT-3-like models. The diffusion process began with incremental research, in which researchers made small modifications to existing methods to develop GPT-3-like models, none of which were publicly released. However, the open publication of OPT-175B in May 2022 shifted the prevailing diffusion mechanism to open publication, as it became more accessible than previous GPT-3-like models. The article identifies several factors that have hindered or accelerated the diffusion of GPT-3-like models. Access to sufficient compute resources has been the primary obstacle, followed by the difficulty in acquiring the necessary machine learning and engineering expertise. On the other hand, publicity surrounding GPT-3’s capabilities, sponsorship of compute resources, and the release of open-source tools for large-scale model training have significantly accelerated diffusion. The article also introduces the concept of a diffusion cascade, where the publication of artifacts relevant to a specific model (such as datasets, smaller models, specialized software tools, and method details) can accelerate the diffusion of the model itself. Finally, the article argues that while OpenAI’s decision to delay publication of the GPT-3 paper and Gopher paper likely slowed diffusion, future developments will likely see more closed publication practices and increased incentives for model theft, leading to a more challenging diffusion landscape. – AI-generated abstract.
