Latent Knowledge

Papers tagged with this tag:

Discovering Latent Knowledge in Language Models Without Supervision
Introduces Contrast-Consistent Search (CCS) to identify ’truth’ as a special direction in model activation space, providing evidence for AI belief distinct from output.
Latent Truth vs Output Epistemic Split Interpretability AI Alignment Mechanistic Understanding Large Language Models

Discovering Latent Knowledge in Language Models Without Supervision