Open source AI voice cloning arrives with MyShell’s new OpenVoice model

Key Points:

  • OpenVoice, a new open-source voice cloning technology developed by MIT, Tsinghua University, and Canadian AI startup MyShell, offers near-instantaneous voice cloning with granular controls for emotions, accents, rhythm, pauses, and intonation from small audio clips. It surpasses existing voice cloning technologies with its accessibility and precision.
  • The scientific paper outlining the development of OpenVoice describes the innovative approach, comprising text-to-speech and tone converter models, which use significantly fewer compute resources than other methods, including Meta’s Voicebox. MyShell’s disruptive technology has already attracted over 400,000 users and significant investment.
  • MyShell, in addition to offering OpenVoice, monetizes its platform through monthly subscriptions for users and third-party bot creators, as well as charging for AI training data, positioning itself as a comprehensive platform for discovering, creating, and staking AI-native apps.

Summary:

OpenVoice, developed by researchers at MIT, Tsinghua University, and Canadian AI startup MyShell, is disrupting the market with open-source voice cloning. The technology allows for near-instantaneous voice cloning with granular controls, enabling precise replication of emotions, accents, rhythm, pauses, and intonation from small audio clips.

 

MyShell has released a pre-reviewed research paper detailing the development of OpenVoice and provided accessible platforms for users to try it out, including the MyShell web app interface and HuggingFace. The new voice cloning model showcases impressive capabilities, allowing users to generate convincing voice clones rapidly, without the need for specific text prompts. Users can also adjust the style, changing the tone to convey different emotions such as cheerful, sad, friendly, or angry.

 

The creators of OpenVoice describe their approach, comprising text-to-speech and tone converter models, in their scientific paper. The TTS model, trained on 30,000 sentences from diverse speakers, controls style parameters, languages, intonation, rhythm, and pauses, while the tone converter model, trained on over 300,000 audio samples from various speakers, can reproduce and modify the user’s voice with emotional expression. This innovative approach is conceptually simple yet efficacious, using significantly fewer compute resources compared to other methods.

 

MyShell, a Canadian startup founded in 2023, has already garnered over 400,000 users and attracted substantial investment. Besides OpenVoice, the company’s web app features diverse text-based AI characters and bots with different “personalities,” an animated GIF maker, and user-generated text-based RPGs. Despite OpenVoice being open source, MyShell monetizes its platform through monthly subscriptions for users and third-party bot creators, as well as charging for AI training data. This strategic business model positions MyShell as a comprehensive platform for discovering, creating, and staking AI-native apps.”

DAILY LINKS TO YOUR INBOX

PROMPT ENGINEERING

Prompt Engineering Guides

ShareGPT

 

©2024 The Horizon