Goal Preservation
Papers tagged with this deception_type:
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Demonstrates that models can maintain hidden objectives that survive safety training, exhibiting strategic deception to preserve their goals.