On one hand this is impressive, and I've been wondering when something like this would appear. On the other hand, I am -- like others here have expressed -- saddened by the impact this has on real musicians. Music is human, music theory is deeply mathematical and fascinating -- "solving" it with a big hammer like generative AI is rather unsatisfying.
The other very real aspect here is "training data" has to come from somewhere, and the copyright implications of this are beyond solved.
In the past I worked on real algorithmic music composition: algorithmic sequencer, paired with hardware- or soft- synthesizers. I could give it feedback and it'd evolve the composition, all without training data. It was computationally cheap, didn't infringe anyone's copyright, and a human still had very real creative influence (which instruments, scale, tempo, etc.). Message me if anyone's still interested in "dumb" AI like that. :-)
Computer-assisted music is nothing new, but taking away the creativity completely is turning music into noise -- noise that sounds like music.
> "solving" it with a big hammer like generative AI is rather unsatisfying.
The reason is greed. They jump on the bandwagon to get rich, not to bring art. They don't care about long term effects on creativity. If it means that it kills motivation to create new music, or even learn how to play an instrument, that's fine by these people. As long as they get their money.
Anyone with ears can find music satisfying. You don't need an artist's backstory or blessing for that. By all means use slow AI to get the same point fast AI can get to, but don't ask me to value it differently.
I don't see any contact info in your profile, but I have an email in mine. I am interested in hearing more about your process and if you have music for sale anywhere, I like to support electronic artists doing interesting stuff.
Interesting that Suno et al miss out on the obvious problem that actual musicians need extra musicians for their own projects.
For instance a guitarist will have a track they wish they had vocals for(and lyrics) for and if they could pay for that they would.
Literally if you could highlight a tune section in your DAW, prompt it, and vocals + lyrics were generated, possibly different version or harmonies for existing parts etc. Musicians already pay for plugins but the singing ones are awful to use so far.
We're super interested in working on this (and melody conditioning) and even have some of the code written to generate the training data, but we want our base model to get a bit better before this becomes our main focus. Check back in a few months!
I really wish this trend of prompting gen AI models with text would stop. It's really meaningless. Musicians need gen AI they can prompt with a melody on their keyboard. Or a bit of whistling into the microphone. Or a beat they can tap on the table. That is what allows humans to unleash their creativity. Not AI generating random bits that fit a distribution of training data. English language is not the right input for anything except for information retrieval tasks.
> Not AI generating random bits that fit a distribution of training data
How is that specific to text prompting? If you tap your fingers to a model and it generates a song from your tapping, it's still just fitting the training data as you say.
Why is it "music for developers"? I was expecting one of those Lofi music videos designed to enhance concentration or similar. These are typically instrumentals, ostensibly because they are less distracting, something like this:
it's because the thing we're launching today is an API for developers to use. If you want instrumental type stuff you should check out my bossa nova channel: https://sonauto.ai/radio
One thing I've been thinking about is how to do a better hobbyist plan system. It would be cool to do a flat rate unlimited plan, but we wouldn't want that to then be abused by larger customers/companies. Are there existing API providers you think solve this particularly well?
I don't think it meets your ask of "solve this particularly well" but the unlimited plans in video that I am familiar with have a fast/slow queue system. This effectively limits the plan. It seems, as well, that these kind of queue systems are tiered. So you can have N number of fast queued items, X number of tier one slow queue, Y number of tier two slow queue, etc. On the backend this is probably just some kind of weighted priority queue where the number of requests in some time duration determines some weight scaling factor.
I think this is a good start, X high speed queries per hour then unlimited low-priority ones after. Do you know of any specific companies that do this we could take a look at?
Remember that you’ve also got a nice natural limitation here: if it’s a hobbyist and not a (commercial) API consumer, there’s only so fast they can listen to the output. Even if they’re rapidly tweaking nobs in a DAW, you can use the play/pause signal to help prioritize the queue, depending on how expensive it is to serialize the GPU state and rehydrate it again. You also might not need to complete generation until the user reaches the play point so you can shuffle around the queue a lot. For example if the user skips after ten seconds you might not need to generate the rest until they try to play that track again, and when they do you usually have enough time before they reach the previous stopping point to generate some more sections.
It might also be helpful to come up with some ways to segregate customers so that “prosumer” users get faster “cold starts” (so that they can iterate faster) at the expense of sometimes having to wait for generation to start back up again.
E.g., in the case of a future "LibreMusic" open source UI or an integration into their DAW they work with on the weekends. I'd get pretty annoyed if I had to keep putting a coin in the machine to adjust Logic Pro effects.
I'm not sure to what extent AI music is copyrightable (I think it depends on a case-by-case amount of human influence) but our TOS assigns any rights we may have to the user.
8. OUTPUT
As between You and the Services, and to the extent permitted by applicable law,
You own any right, title, or interest that may exist in the musical and/or audio
content that You generate using the Services ("Outputs"). We hereby assign to
You all our right, title, and interest, if any, in and to Your Outputs.
This assignment does not extend to other users' Outputs, regardless of similarity
between Your Outputs and their Outputs.
You grant to us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive,
transferable, royalty-free, fully-paid, worldwide license to use Your Output to
provide, maintain, develop, and improve the Services, to comply with applicable
law, and/or to enforce our terms and policies.
You are solely responsible for Outputs and Your use of Outputs, including ensuring
that Outputs and Your use thereof do not violate any applicable law or these terms
of service. We make no warranties or representations regarding the Outputs,
including as to their copyrightability or legality. By using the Services,
You warrant that You will use Outputs only for legal purposes.
You own the rights, but Sonauto is granted the rights to use it as well.
By far the best use case here is generating "Weird Al" style parody covers of pop songs by just changing the lyrics. Songs that everyone knows but with custom lyrics are way more interesting than songs nobody has heard before generated at random.
i've been occasionally trying to get some usable ambience tracks from these various models, but none of them seem to be able to produce looping tracks
based on results so far it also looks like more flexible approach to ai generation would be to generate set of stems/samples based on user description and let them to actually compose instead of producing complete audio (maybe this is already happening somewhere)
- in either case, properly looped tracks will be most likely necessary to be produced by these models at some point
Okay. I know these guys IRL. BUT, I genuinely think they have the best music model out there. Hands down. The songs are just more unique, and have a wider range of musical variation. With Suno/Udio, the songs just sounds the same after a while (just with different lyrics).
That could just be me though. I am curious what users of Udio/Suno think?
I'm not going to comment on the technical side of things, which is way beyond my technical comprehensions skills, and I'm sure it required a considerable amount of brain, time and energy to reach similar results.
But music production and distribution is (actually, was) my home turf, so here's my two cents on the topic:
I've already heard music qualitatively on par with the tracks available on your demo page. I've heard it way more than I truly wanted or felt it was necessary, at least once a day while tracking on pro tools hundreds of albums you've never ever heard of, in studios in France and LA, for years.
It was made with people with the best intentions, coming from all sorts of walks of life, and yet it was obvious from the first note they played that they were condemned to the oblivion, their music destined to be basically never heard by anyone.
And this has been done every day, multiple times a day, in every studio around the world, since the '60s.
20% of Spotify music has never been played once. IIRC less than 40% has been played more than once.
There's a genuinely humbling scene in the 2002 documentary "Scratch" where DJ Shadow, a world-renowed DJ and producer, wades trough stacks of EPs out of a record store in NY that have never, ever been played once[1], which perfectly captures how little of the musical output being recorded we actually get to listen to.
Making music is very easy. Making music people want to listen to is hard, mind-bogglingly so. For every whitebread pop track you've heard on the radio, there's thousands of other similar tracks that have been discarded by an A&R, a radio DJ, some label, or simply by the audience.
I'm saying this with no ill feelings towards you or your work, but I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
The benefit of AI generated music is that you can make it for yourself. The goal shouldn't be to get other people to listen to it. It's very personal and that should be the end goal.
Is it very personal if it was generated from a third party? Making music and, more importantly, playing it is an incredibly personal, physical experience. Especially if you're doing it for yourself rather than being a gun-for-hire session musician.
Getting it from a service like this is the equivalent of buying an already assembled lego kit. Putting aside ethical concerns (we're talking about the music industry after all, there were none even before AI arrived) is there a viable business in it?
The benefit of real human-made music is that you can make it for yourself. The goal shouldn't be to get other people to listen to it. It's very personal and that should be the end goal.
> I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
Easy: Independent/single-dev operations needing some quick background music for a project (game, whatever)
Also: People (especially kids, students, etc) who want to make music but don't have the technical expertise to (yet?).
Obviously these tools don't do everything necessary to make great music, but the barrier of entry to making music is being lowered, and the quality floor is being raised -- and that'll result in a lot more would-be "musicians"[0] creating music that wouldn't otherwise exist[1].
[0] I leave the argument of whether these generative musicians count as "real" musicians to the Scotsmen in the audience.
[1] Bonus question: does art still hold value if no one sees it?
This is already easily solved by using royalty-free music or by licensing pre-made music from numerous publicly available sound libraries online -- with the added benefit of supporting actual musicians instead of plagiarist tech middlemen.
> I'm saying this with no ill feelings towards you or your work.
I can. It’s predatory behavior, performed by people looking to steal and cash in on something they have neither the skill, understanding, or love to make on their own.
I have mixed feelings on AI music.
When i make music, it's relaxing, it's fun, maybe 3 or 4 people listen to it.
Then it kinda just sits on Soundcloud. One of my partners did enjoy my music though...
AI music for the sake of just having it in the background removes that human element. It's just more stuff. To be fair, like you said, making generic music isn't anything new. But everything is turning into this.
Games are using AI generated music which isn't , by definition, able to try anything new, AI art which is just reguritated by other artists.
The enshitification of Spotify is here. Why pay artist 100$ for 500k plays when you can just push AI music and pay 1$ for every 500k plays. As is music( really any entertainment) is a horrible way to make money.
So I guess I'll just keep working on my beats, with Maschine( the only software that keeps me on Windows!), and sharing them with a few people every now and then.
This is super cool! Thanks for the hard work you've clearly put into this.
My dream product in this space (...that I didn't know existed until I discovered your site about 10 minutes ago LOL):
I listen to music when I work/code, and I used to loooooove Spotify Playlist Radio (a feature the reason for which they killed I will never understand) because it helped me discover new music in the style of music I already enjoyed working to. Liked a song? Add it to the seed list and click play to fine-tune the radio station.
So what I really want is just a fine-tuneable infinite stream of novel music to work to. And by fine-tuneable, I mean I'd love to be able to nudge the generation (Pandora style) with thumbs ups / thumbs downs, or other more specific guidance/feedback (more bass, faster tempo, etc.) until I have this perfectly crafted, customized-for-me stream of music.
I'd probably listen to it all day and happily pay $$ for this.
Thanks! We used SkyPilot (an open source cloud GPU worker management tool) to help out with both our small (single node) and large (many node) training runs.
Audio models are actually quite similar to image models, but there are a few key differences. First, is the autoencoder needs to be designed much more carefully as human hearing is insanely good and music requires orders of magnitude more spatial compression (image AEs do 8X8 downsampling, audio AEs need to do thousands of times downsampling). Second the model itself needs to be really good at placing lyrics/beats (similar to placing text in image diffusion): a sixth finger in an image model is fine, but a missed beat can ruin a song. That's why language model approaches (which have a stronger sequential inductive bias than diffusion models which is good for rhythm and lyric placement) have been really popular in audio.
If you're interested in papers (IMO not good for new people as they make everything seem more complicated than it is):
The first 80s song I heard was a literal copy of Phil Collins. But there are no emotions attached to it (for me), and the lyrics are random. It’s more like supermarket background music IMHO, not something I would pay for, especially when we have centuries of music to discover already, why make fake stuff like that?
Edit: I have just heard the funniest most ridiculous metal song ever without a touch of metal inside. Breathe of Death, it’s like a bad joke.
If thats the future of anything, I’m going back to plain C (code) when I retire and I’ll never approach the internet ever again.
In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway). I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing. If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it, losing massive value to humanity.
It’s intellectual property laundering. A company selling a button that launders the blood sweat and tears of generations of artists is not the same as a person being inspired and dedicating themselves to mastery.
Humans create value. AI consumes and commoditizes that value, stealing it from the people and selling it back to their customers.
It’s unethical and will be detrimental in the long run. All profit should be distributed to all artists in the training set.
It won't be detriment to consumers who ultimately decide the value. If I could AI gen a better tasting cocacola for cheaper that would be beneficial to consumers and coke wouldn't deserve a cut. Get gud, as they say.
I’m skeptical about how much value AI art is going to really contribute to humanity but as a lifelong opponent of copyright I have to roll my eyes when I see people arguing against it on behalf of real artists, all of whom are thieves in the best case and imitators in the worst.
Yeah every musician has a story of writing a new song, bringing it to the band, and they say "oh, this sounds just like [song]." It's almost impossible to make something truly novel.
But beyond the originality !== novelty discussion, I'm not sure how we've come to equate 'creativity' (and the rights to retaining it) to a sort of fingerprint encoding one's work. As if a band, artist or creator should stick to a certain brand once invented, and we can sufficiently capture that brand in dense legalese or increasingly, stylistic prompts.
How many of today's artists just 'riffing' off existing motifs will remain, if the end result of their creative endeavours will be absorbed into generative tools in some manner? What's the incentive for indies to distribute digitally, beyond the guarantee their works will provide the (auditory) fingerprints for the next content generation system?
I have written and performed many songs over many bands. At no point did anybody compare my work to any other artist's work, because it is genuinely unique.
The difference being that a musician being influenced by other musicians still has to work to develop the skills necessary to distill those influences into a final product, and colors that output with their own subjective experiences and taste. This feels like a conveniently naive interpretation to justify stealing artists' work and using it to create derivative generative slop. The final line in your comment is pretty telling of how seriously you take this issue (which is near-universally decried by artists) -- some other massive company is doing a bad thing, so why shouldn't I?
edit: I have to add how disingenuous I find calling out corporations owning "all of humanity's musical knowledge and history" as if generative AI music trained on unlicensed work from artists is somehow a moral good. At least the contracts artists make with these corporations are consensual and have the potential to yield the artist some benefit which is more than you can say for these gen-AI music apps.
Law should be considered to be artificial rules optimized for the collective good of society.
What's the worst that can happen if we allow unregulated AI training on existing music? Musician as a job won't exist anymore lest for the greatest artists. But it makes creating music much more accessible to billions of people. Are they good music? Let the market decide. And people still make music because the creative process is enjoyable.
The animus towards AI generated music deeply stems from job security. I work in software and I see it is more likely that AI can be eventually able to replace software devs. I may lose my job if that happens. But I don't care. Find another career. Humanity needs to progress instead of stagnating for the sake of a few interest groups.
I don't see how the amount of work that went into it changes the core fact that all art is influenced by that which came before, and we don't call that stealing (unless you truly believe that "all art is theft").
My point re: LLMs wasn't meant to exclusively be a "they're doing it" one, the hope was to give an example of something many people would agree is super useful and valuable (I work much faster and learned so much more in college thanks to LLMs) that would be impossible in the proposed strict interpretation of copyright.
edit responding to your edit:
Re: moral good: I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good. Music production software costs >$200 and studios cost thousands and majoring in music costs hundreds of thousands, but we can make getting started so much easier.
Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market? To be clear, artists absolutely have a right to benefit from reproduction of their recordings. I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with (if their right to this knowledge were affirmed, every new song someone creates could hypothetically have a konga line of lawyer teams clamoring for "their cut" of that chord progression/instrument sample/effect/lyrical theme/style).
1. Anthropomorphizing the kind of “influence” and “learning” these tools are doing, which is quite unrelated to the human process
2. Underrepresenting the massive differences in scale when comparing the human process of learning vs. the massive data centers training the AI models
3. Ignoring that this isn’t just about influence, it’s about the fact that the models would not exist at all, if not for the work of the artists it was trained on
I think we intuitively allow for artists to derive and interpolate from their influences because of a baseline understanding that A) it is impossible to create art without influence and B) that there is an inherent value in a human creating art and expressing themselves. How that relates to someone using unlicensed music from actual humans to train an AI model in order to profit off of the collective work of thousands of actual human artists, I have no idea.
edit:
> I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good
Generative AI music isn't in any way accomplishing this goal. A free Spotify account with ads accomplishes this goal -- being able to generate a passable tune using a mish-mash of existing human works isn't bringing musical knowledge to the masses, it's just enabling end users to entertain themselves and you to profit from that.
> Is it really consent for those artists signing to labels
Yes? Ignoring the fact that there are independent labels outside the ownership of the Big Three you mention, artists enter into contracts with labels consensually because of the benefits the label can offer them. You train your model on these artists' output without their consent, credit or notification, profit off of it and offer nothing in return to the artists.
A) Agreed! B) So I guess the argument here is that this doesn't apply to AI music. I think that if someone really pours their soul into the lyrics of a song and regenerates/experiments with prompts until it's just right, and maybe even contributes a melody or starting point that's still a human creating art and expressing themselves. It's definitely not as difficult as creating a song from scratch, but I've been told similar arguments were made regarding whether photography was art when that became a thing.
btw, if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
> if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
Am I understanding right that the point here is that while you are able to get away with using copyrighted material to turn a profit, your end users cannot, so no worries?
> Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market?
This premise is false. I have made plenty of money busking on the street, for example. Or selling audio recordings at shows.
> {o be clear, artists absolutely have a right to benefit from reproduction of their recordings.
This is correct. Artists benefit when you pay them for the right to reproduce. When you don't (like what you are doing), you get sued. Here's a YouTube video covering 9 examples:
> I have made plenty of money busking on the street
That's why I specified mass market. However, given a choice between literally being on the street and working with a record label I'd probably choose the label, though I don't know about others.
> pay them for the right to reproduce
My point is learning patterns/styles does not equate to reproducing their recordings. If someone wants to listen to "Hey Jude" they cannot do so with our model, they must go to Spotify. There are cases where models from our competitors were trained for too long on too small a dataset and were able to recite songs, but that's a bug they admit is wrong and are fighting against, not a feature.
> in most cases it wasn't theirs to begin with
In most cases they did not invent the chord progression they're using or instruments they're playing or style they're using or even the lyrical themes they're singing. All are based on what came before and the musicians that come after them are able to use any new knowledge they contribute freely. It's all a fork of a fork of a fork of a fork, and if everyone along the line decided they were entitled to a cut we'd have disaster.
> In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway).
I beg of you, speak to some real life musicians. A human composing or improvising is not choosing notes based on a set of probabilities derived from all the music they’ve heard in their life.
> I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing.
Your impoverished worldview of music as an artistic endeavor is depressing. Humanity’s musical knowledge extends far beyond the big 3.
> If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it
Now we’re talking.
> losing massive value to humanity.
Nothing of value would be lost. In fact it would refund massive value to humanity that was stolen by generative AI.
Megacorporations owning copyrights to the majority of IPs(music, games, etc.) is a capitalism/monopoly problem. How does getting rid of copyright and allowing your company to profit off other peoples work in any way solve that issue?
no one can actually explain the value OpenAI adds to humanity. What massive loss? What have we gained from this entity other than another billionaire riding a hype cycle?
These high-quality music models require pirating many, many terabytes of music. Torrents are the main way to do it, but they likely scraped sites like Bandcamp, Soundcloud and YouTube.
AI music is a weird business model. They hope that there's enough money peddling music slop after paying off the labels (and maybe eventually the independent music platforms) whose music you stole. Meanwhile, not even Spotify can figure out how to be reliably profitable, serving music people want to hear.
Not related to this post, but I was wondering about AI music generators and I don't have experience with their capabilities. The ones I know seem catered to making entire songs.
I was having a discussion with a friend who writes a lot of guitar music but can also play bass and sing. However, getting good drums is a problem. What he'd like is a service to upload his songs in some form (just guitar, or a mixed version with bass and vocals) and get an output that layers a drum track without altering the input. Ideally with appropriate fills, etc. I mean, just getting an in-time drum stem would probably be even better.
Is there any GenAI service to do this kind of incremental additive drums?
Not sure about GenAI, but Logic Pro has the ability to add a Session Drummer which can be set to track a given bass stem and produce passable drums for a song.
Suno's RVQ-token-based language model is tuned give you an acceptable song that most of their userbase would prefer every single time, but isn't very diverse. Our diffusion model is much more diverse and has higher vocal audio quality, but the results aren't always consistent (just like Flux et al). However, since we have unlimited generations this can be worked around. We're also never going to preference tune our model because I think the stuff that is lost in that process is valuable.
For the consumer stuff: It's fun, and IMO that's enough. Not every song has to be peak artistic quality pushing the world forward, sometimes it's enough to bring a smile to a friend's face by making a song about them. If you think their art is slop you shouldn't have to listen to it (IMO Spotify et al should have an optional "no AI music" filter for now).
For the API: I think this could be integrated into artists workflows in lots of ways we can't even imagine right now as it gets better. One example I gave above was generating transitions between songs.
You know that post card they sent you is not actually a photoshop-free picture taken by them. They just addressed it to, you should totally dump them from your life.
The reason anything makes anyone happy is completely subjective, as evidenced by the many people who have told us our app made them and/or their friends and family happy.
I made little gift songs for friends for awhile. It was nice and fun. Making a roadtrip theme song for friends on a vacation is way fun, and kinda locks in the moment
I also used it when I was living in New Orleans to help a friend come up with a riff for a live set he had, which had some unusual constraints (only had a singer, drummer and trombone, but no others, in an echoey space). He used the generated song hook as inspiration for that nights' arrangement
There's lots of stuff, and song of it supports artists who have tight timelines and want creative support
There's so much real independent music out there that actually has meaning. I hope you didn't tell your friend you wrote the song, because if someone tricked me into listening to generated not-art and I found out afterwards, I would consider them a liar.
What your friend did, using generation for inspiration for real music he creates is fine. But if someone gifted me an AI generated song I would ask why they didn't pay a few dollars -- honestly not much more -- to a real artist to do the same.
Ten years ago a friend of mine did that, hired a real person, and it cost less than $20 to write a ditty. That's comparable to the cost in tokens for an AI except you could support a real human artist instead of megalomaniac Yarvinists Sam Altman and friends.
And the song would have real meaning. You gave your friend a non-gift. The Let Me Google That For You of gifts. Honestly if one of my friends did that I'd wonder if they even like me.
The problem with AI music, and in fact AI in general, is that weve spent the last few decades aggressively attacking the idea that art should get paid for at all and yet people still do it, because they love it. So musicians work for pennies, and yet people still need to replace them with a machine.
So even if you just pay someone else to make you a song, its not really any more expensive than this. Same with painting. What does this AI bring to the table, at all? It grosses me out.
People on this site should go pick up a guitar and write a 3 chord song about someone, itll take you a day if that. Its not hard! Its fun!
The problem with real music, is that it requires a hefty amount of musicians to establish a genre. This amount could be somewhere in the range of 100 to 1000 musicians.
When this critical number is not amassed then the genre effectively dies.
With A.I. we can resurrect dead genres, but not only that, we can combine genres together, popular genres with one another, also popular and unpopular genres or popular and dead genres.
Using A.I. for music is easier and much faster than traditional means, and this could greatly reduce the critical mass of musicians to support a genre. It could be reduced as much as 10 times, or 100 times, like one person creating 10000 songs or something similar.
By trying to compare A.I. music to traditional music, you are comparing 10 songs a real band makes, with 10000 songs an A.I. (human) musician makes. It's apples and oranges comparison.
I don't see why human music cannot be a genre, the best of all genres but just one, and an innumerable amount of A.I. genres which may not be so good, but they are infinite.
The real human music genre might be the best forever or just for the next 3 years, but so what? Let there be more genres some good some bad. No one is gonna listen to a cheap copy of an already existing song of an already existing genre, but songs already in existence should be used to train A.I. weights.
Regarding A.I. weights, smaller models forget much of the information they are trained on, and they are cheaper, faster and easier to be fine-tuned, also probably easier to apply RL reasoning on. In that way, A.I. musicians (or real musicians) could run the model in their computers and use it as an instrument instead of relying in companies with big models, slow and expensive.
And some times big and inefficient models copy text/code/music verbatim from the training data. But this is a bug, when small models become competitive enough, most people are gonna use those. They might even carry them around, like a personal band always ready to make melodies for them.
I’m a pretty big music fan and I have no idea what you’re on about. Where did you get this theory?
> The problem with real music, is that it requires a hefty amount of musicians to establish a genre.
Why is establishing genre a goal in the first place?
> This amount could be somewhere in the range of 100 to 1000 musicians.
This is demonstrably false. Genre is defined by critical consensus, and it can arise around one or a handful of bands.
> With A.I. we can resurrect dead genres
What dead genre are you after? I’d imagine there are folk styles that haven’t been kept alive, but I question whether AI recreations would satisfy anyone. I’d rather listen to authentic recordings instead. And if the genre doesn’t have a significant recorded catalog, you can’t train a generative AI to produce it anyway.
On one hand this is impressive, and I've been wondering when something like this would appear. On the other hand, I am -- like others here have expressed -- saddened by the impact this has on real musicians. Music is human, music theory is deeply mathematical and fascinating -- "solving" it with a big hammer like generative AI is rather unsatisfying.
The other very real aspect here is "training data" has to come from somewhere, and the copyright implications of this are beyond solved.
In the past I worked on real algorithmic music composition: algorithmic sequencer, paired with hardware- or soft- synthesizers. I could give it feedback and it'd evolve the composition, all without training data. It was computationally cheap, didn't infringe anyone's copyright, and a human still had very real creative influence (which instruments, scale, tempo, etc.). Message me if anyone's still interested in "dumb" AI like that. :-)
Computer-assisted music is nothing new, but taking away the creativity completely is turning music into noise -- noise that sounds like music.
> "solving" it with a big hammer like generative AI is rather unsatisfying.
The reason is greed. They jump on the bandwagon to get rich, not to bring art. They don't care about long term effects on creativity. If it means that it kills motivation to create new music, or even learn how to play an instrument, that's fine by these people. As long as they get their money.
Anyone with ears can find music satisfying. You don't need an artist's backstory or blessing for that. By all means use slow AI to get the same point fast AI can get to, but don't ask me to value it differently.
> Message me
I don't see any contact info in your profile, but I have an email in mine. I am interested in hearing more about your process and if you have music for sale anywhere, I like to support electronic artists doing interesting stuff.
> Message me if anyone's still interested in "dumb" AI like that. :-)
Not sure how to reach out, but I'm definitely interested in reading about procedural methods in music synthesis. Any links describing your approach?
Interesting that Suno et al miss out on the obvious problem that actual musicians need extra musicians for their own projects.
For instance a guitarist will have a track they wish they had vocals for(and lyrics) for and if they could pay for that they would.
Literally if you could highlight a tune section in your DAW, prompt it, and vocals + lyrics were generated, possibly different version or harmonies for existing parts etc. Musicians already pay for plugins but the singing ones are awful to use so far.
We're super interested in working on this (and melody conditioning) and even have some of the code written to generate the training data, but we want our base model to get a bit better before this becomes our main focus. Check back in a few months!
Without disclosing your training data, this should be considered piracy and removed from HN.
I really wish this trend of prompting gen AI models with text would stop. It's really meaningless. Musicians need gen AI they can prompt with a melody on their keyboard. Or a bit of whistling into the microphone. Or a beat they can tap on the table. That is what allows humans to unleash their creativity. Not AI generating random bits that fit a distribution of training data. English language is not the right input for anything except for information retrieval tasks.
> Not AI generating random bits that fit a distribution of training data
How is that specific to text prompting? If you tap your fingers to a model and it generates a song from your tapping, it's still just fitting the training data as you say.
Why is it "music for developers"? I was expecting one of those Lofi music videos designed to enhance concentration or similar. These are typically instrumentals, ostensibly because they are less distracting, something like this:
https://www.youtube.com/watch?v=M5QY2_8704o
it's because the thing we're launching today is an API for developers to use. If you want instrumental type stuff you should check out my bossa nova channel: https://sonauto.ai/radio
One thing I've been thinking about is how to do a better hobbyist plan system. It would be cool to do a flat rate unlimited plan, but we wouldn't want that to then be abused by larger customers/companies. Are there existing API providers you think solve this particularly well?
I don't think it meets your ask of "solve this particularly well" but the unlimited plans in video that I am familiar with have a fast/slow queue system. This effectively limits the plan. It seems, as well, that these kind of queue systems are tiered. So you can have N number of fast queued items, X number of tier one slow queue, Y number of tier two slow queue, etc. On the backend this is probably just some kind of weighted priority queue where the number of requests in some time duration determines some weight scaling factor.
I think this is a good start, X high speed queries per hour then unlimited low-priority ones after. Do you know of any specific companies that do this we could take a look at?
Remember that you’ve also got a nice natural limitation here: if it’s a hobbyist and not a (commercial) API consumer, there’s only so fast they can listen to the output. Even if they’re rapidly tweaking nobs in a DAW, you can use the play/pause signal to help prioritize the queue, depending on how expensive it is to serialize the GPU state and rehydrate it again. You also might not need to complete generation until the user reaches the play point so you can shuffle around the queue a lot. For example if the user skips after ten seconds you might not need to generate the rest until they try to play that track again, and when they do you usually have enough time before they reach the previous stopping point to generate some more sections.
It might also be helpful to come up with some ways to segregate customers so that “prosumer” users get faster “cold starts” (so that they can iterate faster) at the expense of sometimes having to wait for generation to start back up again.
runway.ai (video gen) is what I was thinking when I suggested this.
Why would a hobbyist need an unlimited plan?
E.g., in the case of a future "LibreMusic" open source UI or an integration into their DAW they work with on the weekends. I'd get pretty annoyed if I had to keep putting a coin in the machine to adjust Logic Pro effects.
So if I make a song using this API, who owns the copyright? Is it me or Sonauto?
I'm not sure to what extent AI music is copyrightable (I think it depends on a case-by-case amount of human influence) but our TOS assigns any rights we may have to the user.
From their terms (https://sonauto.ai/tos):
8. OUTPUT As between You and the Services, and to the extent permitted by applicable law, You own any right, title, or interest that may exist in the musical and/or audio content that You generate using the Services ("Outputs"). We hereby assign to You all our right, title, and interest, if any, in and to Your Outputs. This assignment does not extend to other users' Outputs, regardless of similarity between Your Outputs and their Outputs. You grant to us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide license to use Your Output to provide, maintain, develop, and improve the Services, to comply with applicable law, and/or to enforce our terms and policies. You are solely responsible for Outputs and Your use of Outputs, including ensuring that Outputs and Your use thereof do not violate any applicable law or these terms of service. We make no warranties or representations regarding the Outputs, including as to their copyrightability or legality. By using the Services, You warrant that You will use Outputs only for legal purposes.
You own the rights, but Sonauto is granted the rights to use it as well.
>You own any right, title, or interest that may exist
>We hereby assign to You all our right, title, and interest, if any
>You are solely responsible for Outputs and Your use of Outputs
I love how it clearly laid out the scenario that the right don't exist, yet you are responsible.
By far the best use case here is generating "Weird Al" style parody covers of pop songs by just changing the lyrics. Songs that everyone knows but with custom lyrics are way more interesting than songs nobody has heard before generated at random.
i've been occasionally trying to get some usable ambience tracks from these various models, but none of them seem to be able to produce looping tracks
based on results so far it also looks like more flexible approach to ai generation would be to generate set of stems/samples based on user description and let them to actually compose instead of producing complete audio (maybe this is already happening somewhere)
- in either case, properly looped tracks will be most likely necessary to be produced by these models at some point
You can create a looped track by combining the song generation and transition generation examples from our API example repo!
This is pretty cool! It's noticeably better than any of the other similar music generation tools I've tried, kudos!
The transition btw two songs demo is super cool! I often need to do this when editing videos but used to have no way to do it.
Not to mention that now you can have playlists that transition seamlessly btw two songs. Low-cost party DJ?
Okay. I know these guys IRL. BUT, I genuinely think they have the best music model out there. Hands down. The songs are just more unique, and have a wider range of musical variation. With Suno/Udio, the songs just sounds the same after a while (just with different lyrics).
That could just be me though. I am curious what users of Udio/Suno think?
Quality has improved so much too, I tried it a few months ago at Demo Day and I’m blown away by how good it is now.
I'm not going to comment on the technical side of things, which is way beyond my technical comprehensions skills, and I'm sure it required a considerable amount of brain, time and energy to reach similar results.
But music production and distribution is (actually, was) my home turf, so here's my two cents on the topic:
I've already heard music qualitatively on par with the tracks available on your demo page. I've heard it way more than I truly wanted or felt it was necessary, at least once a day while tracking on pro tools hundreds of albums you've never ever heard of, in studios in France and LA, for years.
It was made with people with the best intentions, coming from all sorts of walks of life, and yet it was obvious from the first note they played that they were condemned to the oblivion, their music destined to be basically never heard by anyone.
And this has been done every day, multiple times a day, in every studio around the world, since the '60s.
20% of Spotify music has never been played once. IIRC less than 40% has been played more than once.
There's a genuinely humbling scene in the 2002 documentary "Scratch" where DJ Shadow, a world-renowed DJ and producer, wades trough stacks of EPs out of a record store in NY that have never, ever been played once[1], which perfectly captures how little of the musical output being recorded we actually get to listen to.
Making music is very easy. Making music people want to listen to is hard, mind-bogglingly so. For every whitebread pop track you've heard on the radio, there's thousands of other similar tracks that have been discarded by an A&R, a radio DJ, some label, or simply by the audience.
I'm saying this with no ill feelings towards you or your work, but I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
[1]https://www.youtube.com/watch?v=1gpKYnRdf0A&t=6s
The benefit of AI generated music is that you can make it for yourself. The goal shouldn't be to get other people to listen to it. It's very personal and that should be the end goal.
Is it very personal if it was generated from a third party? Making music and, more importantly, playing it is an incredibly personal, physical experience. Especially if you're doing it for yourself rather than being a gun-for-hire session musician.
Getting it from a service like this is the equivalent of buying an already assembled lego kit. Putting aside ethical concerns (we're talking about the music industry after all, there were none even before AI arrived) is there a viable business in it?
The benefit of real human-made music is that you can make it for yourself. The goal shouldn't be to get other people to listen to it. It's very personal and that should be the end goal.
> I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
Easy: Independent/single-dev operations needing some quick background music for a project (game, whatever)
Also: People (especially kids, students, etc) who want to make music but don't have the technical expertise to (yet?).
Obviously these tools don't do everything necessary to make great music, but the barrier of entry to making music is being lowered, and the quality floor is being raised -- and that'll result in a lot more would-be "musicians"[0] creating music that wouldn't otherwise exist[1].
[0] I leave the argument of whether these generative musicians count as "real" musicians to the Scotsmen in the audience.
[1] Bonus question: does art still hold value if no one sees it?
The creator sees/hears it! (and if they don't it really shouldn't have been generated lol, waste of compute)
This is already easily solved by using royalty-free music or by licensing pre-made music from numerous publicly available sound libraries online -- with the added benefit of supporting actual musicians instead of plagiarist tech middlemen.
> I'm saying this with no ill feelings towards you or your work.
I can. It’s predatory behavior, performed by people looking to steal and cash in on something they have neither the skill, understanding, or love to make on their own.
Can humans generate a song based on custom lyrics and style in a matter of minutes?
Whose Line Is It Anyway?
https://www.youtube.com/watch?v=XwIsvKpEgOA
I have mixed feelings on AI music. When i make music, it's relaxing, it's fun, maybe 3 or 4 people listen to it. Then it kinda just sits on Soundcloud. One of my partners did enjoy my music though...
AI music for the sake of just having it in the background removes that human element. It's just more stuff. To be fair, like you said, making generic music isn't anything new. But everything is turning into this. Games are using AI generated music which isn't , by definition, able to try anything new, AI art which is just reguritated by other artists.
The enshitification of Spotify is here. Why pay artist 100$ for 500k plays when you can just push AI music and pay 1$ for every 500k plays. As is music( really any entertainment) is a horrible way to make money.
So I guess I'll just keep working on my beats, with Maschine( the only software that keeps me on Windows!), and sharing them with a few people every now and then.
This is super cool! Thanks for the hard work you've clearly put into this.
My dream product in this space (...that I didn't know existed until I discovered your site about 10 minutes ago LOL):
I listen to music when I work/code, and I used to loooooove Spotify Playlist Radio (a feature the reason for which they killed I will never understand) because it helped me discover new music in the style of music I already enjoyed working to. Liked a song? Add it to the seed list and click play to fine-tune the radio station.
So what I really want is just a fine-tuneable infinite stream of novel music to work to. And by fine-tuneable, I mean I'd love to be able to nudge the generation (Pandora style) with thumbs ups / thumbs downs, or other more specific guidance/feedback (more bass, faster tempo, etc.) until I have this perfectly crafted, customized-for-me stream of music.
I'd probably listen to it all day and happily pay $$ for this.
Is this a pipe dream?
I am with you. I want this too. Maybe somebody can make it wit their API?
Congrats on the API launch (from SkyPilot)!
Thanks! We used SkyPilot (an open source cloud GPU worker management tool) to help out with both our small (single node) and large (many node) training runs.
I'm familiar with video and image diffusion model architectures, but know almost nothing about music models.
Are there any good papers or writeups on them?
Are there any open source implementations to play with?
There are!
Audio models are actually quite similar to image models, but there are a few key differences. First, is the autoencoder needs to be designed much more carefully as human hearing is insanely good and music requires orders of magnitude more spatial compression (image AEs do 8X8 downsampling, audio AEs need to do thousands of times downsampling). Second the model itself needs to be really good at placing lyrics/beats (similar to placing text in image diffusion): a sixth finger in an image model is fine, but a missed beat can ruin a song. That's why language model approaches (which have a stronger sequential inductive bias than diffusion models which is good for rhythm and lyric placement) have been really popular in audio.
If you're interested in papers (IMO not good for new people as they make everything seem more complicated than it is):
Stable Audio (similar to our architecture): https://arxiv.org/abs/2402.04825 (code: https://github.com/Stability-AI/stable-audio-tools)
MusicGen (Suno-style architecture): https://arxiv.org/abs/2306.05284 (code: https://github.com/facebookresearch/audiocraft/tree/main)
how did you create this without committing grand theft musica
The first 80s song I heard was a literal copy of Phil Collins. But there are no emotions attached to it (for me), and the lyrics are random. It’s more like supermarket background music IMHO, not something I would pay for, especially when we have centuries of music to discover already, why make fake stuff like that?
Edit: I have just heard the funniest most ridiculous metal song ever without a touch of metal inside. Breathe of Death, it’s like a bad joke.
If thats the future of anything, I’m going back to plain C (code) when I retire and I’ll never approach the internet ever again.
In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway). I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing. If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it, losing massive value to humanity.
It’s intellectual property laundering. A company selling a button that launders the blood sweat and tears of generations of artists is not the same as a person being inspired and dedicating themselves to mastery.
Humans create value. AI consumes and commoditizes that value, stealing it from the people and selling it back to their customers.
It’s unethical and will be detrimental in the long run. All profit should be distributed to all artists in the training set.
It won't be detriment to consumers who ultimately decide the value. If I could AI gen a better tasting cocacola for cheaper that would be beneficial to consumers and coke wouldn't deserve a cut. Get gud, as they say.
I’m skeptical about how much value AI art is going to really contribute to humanity but as a lifelong opponent of copyright I have to roll my eyes when I see people arguing against it on behalf of real artists, all of whom are thieves in the best case and imitators in the worst.
Yeah every musician has a story of writing a new song, bringing it to the band, and they say "oh, this sounds just like [song]." It's almost impossible to make something truly novel.
> almost impossible to make something truly novel
But beyond the originality !== novelty discussion, I'm not sure how we've come to equate 'creativity' (and the rights to retaining it) to a sort of fingerprint encoding one's work. As if a band, artist or creator should stick to a certain brand once invented, and we can sufficiently capture that brand in dense legalese or increasingly, stylistic prompts.
How many of today's artists just 'riffing' off existing motifs will remain, if the end result of their creative endeavours will be absorbed into generative tools in some manner? What's the incentive for indies to distribute digitally, beyond the guarantee their works will provide the (auditory) fingerprints for the next content generation system?
I have written and performed many songs over many bands. At no point did anybody compare my work to any other artist's work, because it is genuinely unique.
Citation needed. Where can I hear some of your work?
The difference being that a musician being influenced by other musicians still has to work to develop the skills necessary to distill those influences into a final product, and colors that output with their own subjective experiences and taste. This feels like a conveniently naive interpretation to justify stealing artists' work and using it to create derivative generative slop. The final line in your comment is pretty telling of how seriously you take this issue (which is near-universally decried by artists) -- some other massive company is doing a bad thing, so why shouldn't I?
edit: I have to add how disingenuous I find calling out corporations owning "all of humanity's musical knowledge and history" as if generative AI music trained on unlicensed work from artists is somehow a moral good. At least the contracts artists make with these corporations are consensual and have the potential to yield the artist some benefit which is more than you can say for these gen-AI music apps.
Law should be considered to be artificial rules optimized for the collective good of society.
What's the worst that can happen if we allow unregulated AI training on existing music? Musician as a job won't exist anymore lest for the greatest artists. But it makes creating music much more accessible to billions of people. Are they good music? Let the market decide. And people still make music because the creative process is enjoyable.
The animus towards AI generated music deeply stems from job security. I work in software and I see it is more likely that AI can be eventually able to replace software devs. I may lose my job if that happens. But I don't care. Find another career. Humanity needs to progress instead of stagnating for the sake of a few interest groups.
I don't see how the amount of work that went into it changes the core fact that all art is influenced by that which came before, and we don't call that stealing (unless you truly believe that "all art is theft").
My point re: LLMs wasn't meant to exclusively be a "they're doing it" one, the hope was to give an example of something many people would agree is super useful and valuable (I work much faster and learned so much more in college thanks to LLMs) that would be impossible in the proposed strict interpretation of copyright.
edit responding to your edit:
Re: moral good: I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good. Music production software costs >$200 and studios cost thousands and majoring in music costs hundreds of thousands, but we can make getting started so much easier.
Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market? To be clear, artists absolutely have a right to benefit from reproduction of their recordings. I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with (if their right to this knowledge were affirmed, every new song someone creates could hypothetically have a konga line of lawyer teams clamoring for "their cut" of that chord progression/instrument sample/effect/lyrical theme/style).
I think there are a few fallacies at play here:
1. Anthropomorphizing the kind of “influence” and “learning” these tools are doing, which is quite unrelated to the human process
2. Underrepresenting the massive differences in scale when comparing the human process of learning vs. the massive data centers training the AI models
3. Ignoring that this isn’t just about influence, it’s about the fact that the models would not exist at all, if not for the work of the artists it was trained on
I think we intuitively allow for artists to derive and interpolate from their influences because of a baseline understanding that A) it is impossible to create art without influence and B) that there is an inherent value in a human creating art and expressing themselves. How that relates to someone using unlicensed music from actual humans to train an AI model in order to profit off of the collective work of thousands of actual human artists, I have no idea.
edit:
> I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good
Generative AI music isn't in any way accomplishing this goal. A free Spotify account with ads accomplishes this goal -- being able to generate a passable tune using a mish-mash of existing human works isn't bringing musical knowledge to the masses, it's just enabling end users to entertain themselves and you to profit from that.
> Is it really consent for those artists signing to labels
Yes? Ignoring the fact that there are independent labels outside the ownership of the Big Three you mention, artists enter into contracts with labels consensually because of the benefits the label can offer them. You train your model on these artists' output without their consent, credit or notification, profit off of it and offer nothing in return to the artists.
A) Agreed! B) So I guess the argument here is that this doesn't apply to AI music. I think that if someone really pours their soul into the lyrics of a song and regenerates/experiments with prompts until it's just right, and maybe even contributes a melody or starting point that's still a human creating art and expressing themselves. It's definitely not as difficult as creating a song from scratch, but I've been told similar arguments were made regarding whether photography was art when that became a thing.
btw, if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
> if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
Am I understanding right that the point here is that while you are able to get away with using copyrighted material to turn a profit, your end users cannot, so no worries?
> Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market?
This premise is false. I have made plenty of money busking on the street, for example. Or selling audio recordings at shows.
> {o be clear, artists absolutely have a right to benefit from reproduction of their recordings.
This is correct. Artists benefit when you pay them for the right to reproduce. When you don't (like what you are doing), you get sued. Here's a YouTube video covering 9 examples:
https://www.youtube.com/watch?v=IIVSt8Y1zeQ
> I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with
What?
> I have made plenty of money busking on the street
That's why I specified mass market. However, given a choice between literally being on the street and working with a record label I'd probably choose the label, though I don't know about others.
> pay them for the right to reproduce
My point is learning patterns/styles does not equate to reproducing their recordings. If someone wants to listen to "Hey Jude" they cannot do so with our model, they must go to Spotify. There are cases where models from our competitors were trained for too long on too small a dataset and were able to recite songs, but that's a bug they admit is wrong and are fighting against, not a feature.
> in most cases it wasn't theirs to begin with
In most cases they did not invent the chord progression they're using or instruments they're playing or style they're using or even the lyrical themes they're singing. All are based on what came before and the musicians that come after them are able to use any new knowledge they contribute freely. It's all a fork of a fork of a fork of a fork, and if everyone along the line decided they were entitled to a cut we'd have disaster.
> In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway).
I beg of you, speak to some real life musicians. A human composing or improvising is not choosing notes based on a set of probabilities derived from all the music they’ve heard in their life.
> I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing.
Your impoverished worldview of music as an artistic endeavor is depressing. Humanity’s musical knowledge extends far beyond the big 3.
> If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it
Now we’re talking.
> losing massive value to humanity.
Nothing of value would be lost. In fact it would refund massive value to humanity that was stolen by generative AI.
Megacorporations owning copyrights to the majority of IPs(music, games, etc.) is a capitalism/monopoly problem. How does getting rid of copyright and allowing your company to profit off other peoples work in any way solve that issue?
no one can actually explain the value OpenAI adds to humanity. What massive loss? What have we gained from this entity other than another billionaire riding a hype cycle?
These high-quality music models require pirating many, many terabytes of music. Torrents are the main way to do it, but they likely scraped sites like Bandcamp, Soundcloud and YouTube.
AI music is a weird business model. They hope that there's enough money peddling music slop after paying off the labels (and maybe eventually the independent music platforms) whose music you stole. Meanwhile, not even Spotify can figure out how to be reliably profitable, serving music people want to hear.
Not related to this post, but I was wondering about AI music generators and I don't have experience with their capabilities. The ones I know seem catered to making entire songs.
I was having a discussion with a friend who writes a lot of guitar music but can also play bass and sing. However, getting good drums is a problem. What he'd like is a service to upload his songs in some form (just guitar, or a mixed version with bass and vocals) and get an output that layers a drum track without altering the input. Ideally with appropriate fills, etc. I mean, just getting an in-time drum stem would probably be even better.
Is there any GenAI service to do this kind of incremental additive drums?
There's work in that area, it's sometimes called "accompaniment generation."
https://arxiv.org/abs/2301.12662
https://fastsag.github.io/
Not sure about GenAI, but Logic Pro has the ability to add a Session Drummer which can be set to track a given bass stem and produce passable drums for a song.
how is this better or different from suno besides api? I'm assuming since you are smaller the quality is not as good and the depth not as wide.
Suno's RVQ-token-based language model is tuned give you an acceptable song that most of their userbase would prefer every single time, but isn't very diverse. Our diffusion model is much more diverse and has higher vocal audio quality, but the results aren't always consistent (just like Flux et al). However, since we have unlimited generations this can be worked around. We're also never going to preference tune our model because I think the stuff that is lost in that process is valuable.
I use both. Sonauto sounds more "real" and varied than what I can get with suno
What is the point of generating this low quality AI slop music, what real use case do you have in mind?
For the consumer stuff: It's fun, and IMO that's enough. Not every song has to be peak artistic quality pushing the world forward, sometimes it's enough to bring a smile to a friend's face by making a song about them. If you think their art is slop you shouldn't have to listen to it (IMO Spotify et al should have an optional "no AI music" filter for now).
For the API: I think this could be integrated into artists workflows in lots of ways we can't even imagine right now as it gets better. One example I gave above was generating transitions between songs.
the reason a song from a friend makes you happy is directly related to the effort behind it, this is totally meaningless.
You know that post card they sent you is not actually a photoshop-free picture taken by them. They just addressed it to, you should totally dump them from your life.
The reason anything makes anyone happy is completely subjective, as evidenced by the many people who have told us our app made them and/or their friends and family happy.
that's just your opinion.
I made little gift songs for friends for awhile. It was nice and fun. Making a roadtrip theme song for friends on a vacation is way fun, and kinda locks in the moment
I also used it when I was living in New Orleans to help a friend come up with a riff for a live set he had, which had some unusual constraints (only had a singer, drummer and trombone, but no others, in an echoey space). He used the generated song hook as inspiration for that nights' arrangement
There's lots of stuff, and song of it supports artists who have tight timelines and want creative support
There's so much real independent music out there that actually has meaning. I hope you didn't tell your friend you wrote the song, because if someone tricked me into listening to generated not-art and I found out afterwards, I would consider them a liar.
What your friend did, using generation for inspiration for real music he creates is fine. But if someone gifted me an AI generated song I would ask why they didn't pay a few dollars -- honestly not much more -- to a real artist to do the same.
Ten years ago a friend of mine did that, hired a real person, and it cost less than $20 to write a ditty. That's comparable to the cost in tokens for an AI except you could support a real human artist instead of megalomaniac Yarvinists Sam Altman and friends.
And the song would have real meaning. You gave your friend a non-gift. The Let Me Google That For You of gifts. Honestly if one of my friends did that I'd wonder if they even like me.
The problem with AI music, and in fact AI in general, is that weve spent the last few decades aggressively attacking the idea that art should get paid for at all and yet people still do it, because they love it. So musicians work for pennies, and yet people still need to replace them with a machine.
So even if you just pay someone else to make you a song, its not really any more expensive than this. Same with painting. What does this AI bring to the table, at all? It grosses me out.
People on this site should go pick up a guitar and write a 3 chord song about someone, itll take you a day if that. Its not hard! Its fun!
The problem with real music, is that it requires a hefty amount of musicians to establish a genre. This amount could be somewhere in the range of 100 to 1000 musicians.
When this critical number is not amassed then the genre effectively dies.
With A.I. we can resurrect dead genres, but not only that, we can combine genres together, popular genres with one another, also popular and unpopular genres or popular and dead genres.
Using A.I. for music is easier and much faster than traditional means, and this could greatly reduce the critical mass of musicians to support a genre. It could be reduced as much as 10 times, or 100 times, like one person creating 10000 songs or something similar.
By trying to compare A.I. music to traditional music, you are comparing 10 songs a real band makes, with 10000 songs an A.I. (human) musician makes. It's apples and oranges comparison.
I don't see why human music cannot be a genre, the best of all genres but just one, and an innumerable amount of A.I. genres which may not be so good, but they are infinite.
The real human music genre might be the best forever or just for the next 3 years, but so what? Let there be more genres some good some bad. No one is gonna listen to a cheap copy of an already existing song of an already existing genre, but songs already in existence should be used to train A.I. weights.
Regarding A.I. weights, smaller models forget much of the information they are trained on, and they are cheaper, faster and easier to be fine-tuned, also probably easier to apply RL reasoning on. In that way, A.I. musicians (or real musicians) could run the model in their computers and use it as an instrument instead of relying in companies with big models, slow and expensive.
And some times big and inefficient models copy text/code/music verbatim from the training data. But this is a bug, when small models become competitive enough, most people are gonna use those. They might even carry them around, like a personal band always ready to make melodies for them.
I’m a pretty big music fan and I have no idea what you’re on about. Where did you get this theory?
> The problem with real music, is that it requires a hefty amount of musicians to establish a genre.
Why is establishing genre a goal in the first place?
> This amount could be somewhere in the range of 100 to 1000 musicians.
This is demonstrably false. Genre is defined by critical consensus, and it can arise around one or a handful of bands.
> With A.I. we can resurrect dead genres
What dead genre are you after? I’d imagine there are folk styles that haven’t been kept alive, but I question whether AI recreations would satisfy anyone. I’d rather listen to authentic recordings instead. And if the genre doesn’t have a significant recorded catalog, you can’t train a generative AI to produce it anyway.
Some kind of Dadaist movement I guess. Listen to Breathe of Death, it’s hilarious and then you cry.
Signed up with gmail, and get 'Generation Failed' with every attempt. Please dont email me or add me to your marketing list.
There was a single unhealthy worker that didn't get caught, we just killed it.