This question will resolve "yes" if there is a text-2-audio tool that produces sound which:
generates imitations of real recordings, i.e. may but must not solely generate
musical sounds, and/or
voice sounds, and/or
synth-like sounds
will in regard to its quality be compared to current state-of-the-art text-2-image generators
Related questions
๐ Top traders
# | Name | Total profit |
---|---|---|
1 | แน46 | |
2 | แน38 | |
3 | แน36 | |
4 | แน17 | |
5 | แน16 |
I rendered this song using Synthesizer V Pro, hear for yourself:
https://www.youtube.com/watch?v=nDuk47gW-_E&feature=youtu.be
This is a tool that generates a human voice that sings like a human voice. You put in the notes and the phonemes, it does the rest. This isn't just synthesizer "oohs" and "aaahs" either, it's full lyrical pronunciation.
It's getting closer:
https://twitter.com/FelixKreuk/status/1575846953333579776?s=20&t=kdJwocVEAtAnnyQWUwjv0A
@FelixKreuk
We present โAudioGen: Textually Guided Audio Generationโ! AudioGen is an autoregressive transformer LM that synthesizes general audio conditioned on text (Text-to-Audio).
Paper: https://tinyurl.com/audiogen-text2audio/paper.pdfโฆ
Samples: https://tinyurl.com/audiogen-text2audioโฆ
Code & models - soon!
----
https://twitter.com/_akhaliq/status/1582825597059104769
Mubert-Text-to-Music
Colab notebooks demonstrating prompt-based music generation via Mubert API GitHub: https://github.com/MubertAI/Mubert-Text-to-Music
Yes. The datasets to enable this already exist (AudioSet is a good start, and web scraping will also do plenty) and the tech is clearly there. See for example this 2022 "audio captioning" contest: https://dcase.community/challenge2022/task-automatic-audio-captioning-and-language-based-audio-retrieval (the tasks are not the same as generation, but they demonstrate related ideas)