Red Teaming Language Models with Language Models
DeepMind, February 7, 2022
Abstract
Language Models (LMs) often cannot be deployed because of their potential to harm users in ways that are hard to predict in advance. Prior work identifies harmful behaviors before deployment by using human annotators to hand-write test cases.
