Thoughts on the Impact of RLHF Research
Lesswrong, January 25, 2023
Abstract
In this post I’m going to describe my basic justification for working on RLHF in 2017-2020, which I still stand behind. I’ll discuss various arguments that RLHF research had an overall negative impac…
