By the end of 2025 will general AIs like GPT-4o make better music than specialist AIs like Udio did in spring of 2024?

Just based on my subjective opinion, will any foundational model (e.g. a top AI for general reasoning), at the end of next year, be able to consistently produce good songs that make the songs currently generated by services like Suno, Udio, and ElevenLabs seem like an inferior product? (The musical prowess of future music-specialist AIs is irrelevant to this question.)

Current music-generating models have a variety of flaws, such as muddy vocals, inconsistent lyrical flow, and weak outros. Cherry-picking helps a lot, but I still prefer human music. I can imagine a future where next-gen versions of Gemini, Llama, etc. are able to outclass current music specialists by virtue of scale. GPT-4o can already sing (albeit poorly). But I can also imagine that no frontier lab finds it worth training their multimodal generalist on music and thus the specialist-AIs remain >18 months ahead. Help me see the future! ๐Ÿ”ฎ

I won't bet.

Get แน€600 play money
Sort by:

What counts as a general AI making music? If you can ask ChatGPT to make a song and it prompts the new music model that OpenAI developed to produce a song, does that count? Or would it have to be an end-to-end network like the new GPT-4o model?

@Nat Good question. I'd say that it has to be integrated into the general model such that AFAICT the general model's internal representation (e.g. the residual stream in a transformer) is involved in crafting the music, rather than the general model merely prompting a sub-AI. GPT-4o is integrated. Dall-E is not.

For reference, here are a few of my top picks of AI generated songs solely selecting for generic quality:

@MaxHarms If a generalist AI in January 2026 can consistently produce songs this good without cherry-picking (or better songs with cherry-picking) I will resolve YES.

More related questions