03-26-2024, 11:32 AM
A ChatGPT for Music Is Here. Inside Suno, the Startup Changing Everything -- Suno wants everyone to be able to produce their own pro-level songs — but what does that mean for artists?
Afghanistan is sometimes called the graveyard of empires, and for good reason. My prediction: Music will turn out to be the Afghanistan of AI. When it comes to music, the tastes of the public are outrageously arbitrary and bespoke. If you've skimmed through the sheer number of genres in modern music, it's truly staggering. That's not to say that an AI song-generator couldn't imitate all of those genres. Of course it can if it is given enough training data, that has been thoroughly proved by current-generation AI. But music, even more than images and video, is a medium that is truly intangible. You can't see music. You can't even really visualize it, not in its essence. You can't touch it. You can't weigh it. You can't really apply mathematical reasoning to it, except in some theoretical sense that is not directly connected to the essence of what people care about in music (the aforementioned intangibles).
My challenge to anyone who thinks that "AI will solve music" -- that is, that AI is going to write music that people generally prefer to human-created music -- is this: explain to me the mathematical theory of melody. Not harmony. Not chord progressions. Melody. Explain to me why and when a melody should move up or down, or even stay the same. Explain to me why it should go fast or slow, why it should be in 2-time, 3-time, 4-time, 6-time, and so on. You don't have to even spell out the details, just point me to the body of theory that explains this. There is nothing in the body of music theory itself that explains why a melody is the way it is. There are principles, no doubt. There are known reasons for why certain things work especially well. But there is no general theory, not even a framework of a theory.
And that has important implications to AI. In the case of text, images and video, there are very general mathematical theories that explain them, that is, explain their encoded structure. Text might seem random at first, until you realize that words occur in patterns, and those patterns have structures. And while music also has patterns, and those patterns also have structure, we go back to the intangibility of these structures. Part of what makes a melody have a certain "feel", is how common or widespread the elements from which it is constructed are. If you use very simple intervals like fourths, fifths and steps, your melody can have a simplistic or youthful vibe, like Twinkle, Twinkle Little Star. Or, a melody may make heavy use of chromatic and chaotic elements or have almost no discernible structure at all, yet still seem compelling. Music is not objective in the way that images and video are, nor communicating definite ideas, as language does, so there is no definite "target" to shoot at for training AI neural nets, and any choice that is made by the training data-set is really just an arbitrary stricture that will dye or fingerprint the resulting neural net in a way that listeners are going to notice.
Trying to "please everybody" won't work, either, because music is inherently biased. That is, part of what makes any particular genre/style so compelling is what it doesn't do. The pentatonic scale is a great example of this. Pentatonic scales do not use two of the notes in the diatonic scale. The characteristic "sound" of the pentatonic scale comes from not using those two notes. As soon as you add those "missing" notes back into the scale, the sound goes flat. "Music is sound painted on a canvas of silence" -- in music, what you don't do is just as important as what you do, sometimes even more important.
Connect With Us