Xiaomi open-sources OmniVoice, a voice cloning TTS model covering more than 600 languages.

PANews reported on May 7th that Xiaomi AI Labs has launched OmniVoice, a multilingual speech cloning TTS model. Employing a minimalist single-bidirectional Transformer architecture, it supports speech synthesis in 646 languages, outperforming mainstream models in both Chinese and English scenarios in terms of synthesis quality and inference speed. Trained on approximately 580,000 hours of data from 50 open-source datasets, the model uses a dynamic upsampling strategy for low-resource languages. In tests with 24 and 102 languages, its speech similarity and intelligibility surpass many commercial systems, with some metrics approaching or even exceeding those of real speech. OmniVoice supports cross-language speech cloning, custom timbres, noisy reference audio adaptation, sub-language control, and pronunciation correction. The training and inference code, along with model weights, are open-sourced on platforms such as GitHub and Huggingface.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
All three major U.S. stock indexes closed higher, with HOOD rising over 5.29%.
PANews Newsflash