AI Art Generators Can Be Fooled Into Making NSFW Images

Key Points:

  • Researchers have developed an algorithm called SneakyPrompt to trick text-to-image generative AIs into producing questionable images.
  • The algorithm uses nonsense words to bypass safety filters and prompt the AI systems to generate innocent or NSFW images.
  • Nonsense words were found to prompt generative AIs to produce images that were not related to the words’ meanings but were likely influenced by context.
  • The researchers aim to make generative AIs more resilient against such attacks and raise awareness about their vulnerabilities.


A group of researchers from Johns Hopkins University and Duke University have developed an algorithm called SneakyPrompt that can trick text-to-image generative AIs, including DALL-E 2 and Stable Diffusion, into producing questionable images by using nonsense words. The algorithm aims to probe and strengthen the safety filters of these AI systems.


The researchers will present their findings at the IEEE Symposium on Security and Privacy in May 2024. The algorithm’s experiments showed that nonsense words prompted generative AIs to produce innocent or not-safe-for-work (NSFW) images. These findings highlight the vulnerabilities of text-to-image AI models and the need to make them more robust against attacks.



Prompt Engineering Guides



©2024 The Horizon