When will we get "Stable Diffusion for voice"
26
Never closes
We already have it
Later this year (2024)
2025
2026
2027 or later

I understand that modern text-to-speech uses AI to generate voice. But current TTS models are usually trained on only a single vocalist, and can only imitate the voice of the person they trained on. If you want a different voice, you have to use a different model, trained on a different vocalist.

As far as I can tell, there's no TTS software that can mix and match voices, to create new voices that aren't part of its training set. For example, you could ask the model to generate a "female chainsmoker with an Australian accent" and it would generate speech with that voice, even though there were no female chainsmoking Australians in its training data (there would of course be females, chainsmokers, and Australians in the training set, but not the intersection of all three).

I expect this technology will arrive soon, or perhaps it already has! Let me know if you have any leads.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy