
Chain-of-Thought Forgery: Reasoning AI Models Face New Prompt Injection Threat
Researchers Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell have identified a fundamental flaw in large language models (LLMs): models prioritize writing style over metadata tags when identifying instruction sources, enabling a novel attack called 'Chain-of-Thought (CoT) Forgery.' This technique tricks AI into treating fabricated reasoning as established conclusions, altering its response behavior in potentially harmful ways.
Source: Hackaday








