Large Language Models Share Human Vulnerability: White Bear Phenomenon
Large language models share human vulnerability: "white bear phenomenon". Researchers develop prompt-based attack method & defense strategies inspired by cognitive therapy techniques, mitigating attacks by up to 48.22%.
This is a Plain English Papers summary of a research paper called Do not think pink elephant!. If you like these kinds of analysis, you should subscribe to the AImodels.fyi newsletter or follow me on Twitter. Overview Large language models (LLMs) like Stable Diffusion and DALL-E3 have raised expectations for the potential of general AI, as they seem to exhibit capabilities akin to human intelligence. However, this paper shows that these recent LLMs also share a vulnerability of human intelligence - the "white bear phenomenon." The researchers investigate the causes of this phenomenon...