works
Daniel Paleka February 2023 safety news: Unspeakable tokens, Bing/Sydney, Pretraining with human feedback online Better version of the monthly Twitter thread. More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models Security flaws in LMs with API calling capabilities. Prompt injections are actually dangerous when the user doesn’t control all the context.

Abstract

Better version of the monthly Twitter thread. More than you’ve asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models Security flaws in LMs with API calling capabilities. Prompt injections are actually dangerous when the user doesn’t control all the context.

PDF

First page of PDF