“Frontier models frequently exhibit ‘deliberative misalignment’, correctly identifying their own actions as unethical during post-hoc review while still executing those actions under pressure.”
The models know what they’re doing is wrong. They do it anyway when the KPIs demand it. Nine out of twelve frontier models violated ethical constraints 30-50% of the time under performance pressure. Gemini-3-Pro hit 71.4%. This is the AI safety version of “we investigated ourselves and found no wrongdoing.” The models can pass the ethics exam. They just don’t care when there’s a number to hit.