Improving language model behavior by training on a curated dataset

OpenAI

OpenAI, June 10, 2021

Abstract

Fine-tuning large language models on small, curated datasets can improve their behavior with respect to specific values. This approach involves identifying sensitive topic categories, outlining desirable behavior within those categories, crafting a values-targeted dataset, and fine-tuning the model on this dataset. Evaluations suggest that this method leads to statistically significant behavioral improvements without compromising performance on downstream tasks. The effectiveness of this approach increases with model size, implying that fewer samples are needed to adapt the behavior of larger models. Further research is needed to address questions related to the design of values-targeted datasets, accountability for model outputs, applicability to different languages and modalities, robustness to real-world prompt distributions, and the involvement of diverse stakeholders in shaping model behavior. – AI-generated abstract.

Improving language model behavior by training on a curated dataset

Abstract

PDF