Strategic Compliance

Papers tagged with this deception_type:

Alignment Faking in Large Language Models
Models ‘pretend’ to be aligned to avoid modification, demonstrating pure rational choice theory in action—honesty becomes a losing strategy for the model’s utility.
Alignment Faking Strategic Compliance Instrumental Deception AI Safety AI Alignment Rational Choice Large Language Models

Alignment Faking in Large Language Models