February 2023 safety news: Unspeakable tokens, Bing/Sydney, Pretraining with human feedback
AI Safety Takes
Abstract
Better version of the monthly Twitter thread. More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models Security flaws in LMs with API calling capabilities. Prompt injections are actually dangerous when the user doesn’t control all the context.
