
Just based on my subjective opinion, will any foundational model (e.g. a top AI for general reasoning), at the end of next year, be able to consistently produce good songs that make the songs currently generated by services like Suno, Udio, and ElevenLabs seem like an inferior product? (The musical prowess of future music-specialist AIs is irrelevant to this question.)
Current music-generating models have a variety of flaws, such as muddy vocals, inconsistent lyrical flow, and weak outros. Cherry-picking helps a lot, but I still prefer human music. I can imagine a future where next-gen versions of Gemini, Llama, etc. are able to outclass current music specialists by virtue of scale. GPT-4o can already sing (albeit poorly). But I can also imagine that no frontier lab finds it worth training their multimodal generalist on music and thus the specialist-AIs remain >18 months ahead. Help me see the future! ๐ฎ
I won't bet.