Google DeepMind unveils ‘superhuman’ AI system that excels in fact-checking, saving costs and improving accuracy

Key Points:

  • AI system outperforms human fact-checkers in evaluating accuracy of information
  • Method called Search-Augmented Factuality Evaluator (SAFE) breaks down generated text and uses Google Search results for accuracy
  • SAFE assessments match human ratings 72% of the time and are cost-effective compared to human fact-checkers


Google’s DeepMind research unit has introduced a new method, the Search-Augmented Factuality Evaluator (SAFE), that uses artificial intelligence to assess the accuracy of information generated by large language models. SAFE breaks down text into individual facts and cross-references them with Google Search results to determine their correctness. A study revealed that SAFE’s assessments matched human ratings 72% of the time and outperformed human raters in 76% of disagreed cases.


While the paper asserts that large language model agents can surpass human fact-checkers, some experts like AI researcher Gary Marcus question the definition of “superhuman.” Marcus argues that true assessment should involve benchmarking against expert human fact-checkers rather than crowd workers to provide a more accurate context.


Despite concerns, SAFE offers cost-effective fact-checking compared to human experts. It was found to be 20 times cheaper, highlighting the importance of scalable and economical verification methods as language model outputs increase. The DeepMind team assessed top language models using SAFE and found that larger models generally had fewer factual errors, emphasizing the need for reliable fact-checking tools in the face of misinformation risks.


The release of SAFE’s code and the LongFact dataset on GitHub allows for further scrutiny and development by researchers. However, more transparency is necessary regarding the human baselines used in the study to evaluate SAFE’s capabilities accurately in real-world scenarios.


As advancements in language models continue to evolve, automated fact-checking tools like SAFE may play a crucial role in ensuring accountability and trust in these systems. To drive progress, open development with input from diverse stakeholders is essential, along with rigorous benchmarks against human experts.



Prompt Engineering Guides



©2024 The Horizon