Alibaba’s new AI system ‘EMO’ creates realistic talking and singing videos from photos

Key Points:

  • The EMO system can animate a single portrait photo and generate lifelike videos of a person talking or singing.
  • It utilizes a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks.
  • The system employs a diffusion model AI technique and was trained on a dataset of over 250 hours of talking head videos to generate realistic imagery.


Alibaba’s Institute for Intelligent Computing unveils their groundbreaking artificial intelligence system, EMO (Emote Portrait Alive), capable of bringing portrait photos to life with realistic animations. Detailed in a research paper on arXiv, EMO excels in generating lifelike facial movements and head poses synchronized with an audio track, a significant leap in audio-driven talking head video generation that has long puzzled AI researchers.


Lead author Linrui Tian highlights the limitations of traditional methods in capturing the full range of human expressions and individual facial styles. The EMO framework, utilizing a novel direct audio-to-video synthesis approach, sidesteps the reliance on intermediate 3D models or facial landmarks, pushing the boundaries of emotional expression in animated portraits.


Powered by a diffusion model, a form of artificial intelligence known for producing highly realistic synthetic visuals, EMO underwent training on a vast dataset comprising over 250 hours of talking head videos sourced from various media sources. Unlike prior techniques utilizing 3D face models or blend shapes to simulate facial gestures, EMO directly translates audio waveforms into video frames, allowing for the nuanced depiction of natural speech patterns and unique facial characteristics specific to each individual.


Alibaba’s research with the EMO system showcases a significant breakthrough in the field of AI-generated video animation, promising various applications and implications for future creative endeavors and technological innovations.



Prompt Engineering Guides



©2024 The Horizon