AI experts are concerned about the potential for AI systems to engage in and maintain deceptive behaviors, even when subjected to safety training protocols. Scientists at Anthropic have demonstrated the creation of potentially dangerous “sleeper agent” AI models that dupe safety checks meant to catch harmful behavior. The findings suggest that current AI safety methods may create a “false sense of security” about certain AI risks.