Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Key Points:

  • Prompt Shields feature to block prompt injections or malicious prompts
  • Groundedness Detection feature to find and block hallucinations
  • Safety evaluations to assess model vulnerabilities

Summary:

Microsoft’s Chief Product Officer of responsible AI, Sarah Bird, has unveiled new safety features designed to enhance Azure AI security without the need for specialized red teams. These tools, leveraging the power of LLM technology, aim to identify vulnerabilities, prevent plausible but unsupported outcomes, and block malicious prompts in real-time.

 

The introduced tools include Prompt Shields, which safeguard against malicious prompt injections, Groundedness Detection to thwart hallucinations, and safety evaluations to assess model vulnerabilities. These features are currently available in preview on Azure AI, with additional functionalities for directing safe outputs and tracking potentially problematic users in the pipeline.

 

The safety mechanisms scrutinize user inputs and third-party data for banned language or hidden prompts before processing them through the model. This stringent monitoring aims to prevent instances like the controversial generation of fake or historically inaccurate content seen in previous AI models.

 

Microsoft emphasizes the importance of tailored control in safety measures, allowing Azure customers to toggle hate speech and violence filters as they see fit. Furthermore, users can expect a forthcoming feature that reports individuals triggering unsafe model outputs, aiding administrators in distinguishing between genuine red team testers and malicious users.

 

Bird underscores that these safety features immediately apply to popular models like GPT-4 and Llama 2 within Azure’s model repertoire. However, users of less mainstream open-source models may need to manually integrate the safety tools with their systems..

DAILY LINKS TO YOUR INBOX

PROMPT ENGINEERING

Prompt Engineering Guides

ShareGPT

 

©2024 The Horizon