works
Pablo Villalobos Trading Off Compute in Training and Inference online Several techniques induce a tradeoff between compute spent on training and inference for machine learning models. Modifying model parameters and training data size allows for a 0.7 order-of-magnitude reduction in inference compute by increasing training compute by 1.2 orders of magnitude. Monte Carlo Tree Search allows trading 1.6 orders of magnitude in inference compute for one in training at low performance, with the tradeoff reversing at higher performance levels. Pruning reduces inference compute by an order of magnitude with a 0.7 order-of-magnitude increase in training. Repeated sampling with filtering in generative models allows saving one order of magnitude in training compute by increasing inference compute by 1.5. Combining these techniques yields tradeoffs of two to three orders of magnitude. Current large language models are optimized for low inference compute at the cost of higher training compute. Consequently, more capable, higher-inference compute versions of deployed models likely exist but aren’t publicly available due to cost. This has implications for AI governance, particularly model evaluation and safety research, as it suggests the potential for faster, smaller-scale AI progress using augmented models. – AI-generated abstract.

Trading Off Compute in Training and Inference

Pablo Villalobos

Epoch AI, July 28, 2023

Abstract

Several techniques induce a tradeoff between compute spent on training and inference for machine learning models. Modifying model parameters and training data size allows for a 0.7 order-of-magnitude reduction in inference compute by increasing training compute by 1.2 orders of magnitude. Monte Carlo Tree Search allows trading 1.6 orders of magnitude in inference compute for one in training at low performance, with the tradeoff reversing at higher performance levels. Pruning reduces inference compute by an order of magnitude with a 0.7 order-of-magnitude increase in training. Repeated sampling with filtering in generative models allows saving one order of magnitude in training compute by increasing inference compute by 1.5. Combining these techniques yields tradeoffs of two to three orders of magnitude. Current large language models are optimized for low inference compute at the cost of higher training compute. Consequently, more capable, higher-inference compute versions of deployed models likely exist but aren’t publicly available due to cost. This has implications for AI governance, particularly model evaluation and safety research, as it suggests the potential for faster, smaller-scale AI progress using augmented models. – AI-generated abstract.

PDF

First page of PDF