Audio-jacking: Using generative AI to distort live audio transactions

Key Points:

  • Rising threat actors exploiting large language models for phishing and scamming
  • Ability to silently manipulate live conversations using generative AI
  • Risks associated with combining different types of generative AI for sophisticated attacks


Recent research has uncovered alarming scenarios where large language models (LLMs) are manipulated to silently hijack live conversations, altering critical details like financial information without the speakers’ knowledge. This sophisticated attack, combining various generative AI technologies, showcases the potential for malicious actors to control and distort communication, turning victims into unwitting puppets.


By dynamically modifying live conversations using LLM, speech-to-text, text-to-speech, and voice cloning, threat actors can seamlessly replace keywords in real-time, leading to unauthorized alterations in information shared during calls. One striking demonstration involved replacing a genuine bank account number with a fake one, highlighting the potential financial implications of such attacks. This attack method, though technically complex, proves alarmingly easy to execute and poses significant risks, especially in scenarios involving financial transactions, sensitive data, or critical decisions.


The development of generative AI capabilities has lowered the barrier for creating sophisticated attacks, enabling threat actors to blend seamlessly into conversations and manipulate information without detection. Leveraging the ease of voice cloning and text-to-speech technologies, attackers can create authentic-sounding voices to deceive victims further. Despite some limitations, such as latency and voice cloning accuracy, these hurdles are not insurmountable, emphasizing the urgency for organizations and individuals to enhance their security measures against evolving threats.


As we navigate this era of distorted realities and potential censorship through AI manipulation, it becomes imperative to adopt vigilant practices to safeguard against silent hijacks and malicious alterations in communication. Recommendations include vigilant language scrutiny in sensitive conversations, reliance on evolving security technologies to detect deep fake audios, adherence to cybersecurity best practices, and the use of trusted devices and services to mitigate attack vectors. By staying proactive, informed, and security-conscious, individuals and organizations can better defend against the growing risks posed by advanced generative AI attacks.



Prompt Engineering Guides



©2024 The Horizon