Hi listeners and welcome to No Priers. Today we're talking to Mikey Schulman, the co-founder and CEO of Suno, an AI music generation tool trying to democratize music making. Users can make a song complete with lyrics just by entering a text prompt. For example, I was playing with it this morning and you guys all get to hear Koto Boombop with LoFi intricate beats. Okay, so feeling really excited about quality here for a company that is just under two years old but is making waves in the AI music industries. Since you came out of stealth mode late last year, Mikey. That's right. All right, well, we're excited to talk to you about AI music models and how it's been going since launch. Thanks so much for doing this. Welcome. Thank you. I'm super excited to be here.
Maybe just start us off with a little bit of background. You're a kid who loved music, playing in bands. How do you go from that to Harvard physics, PhD, building, you know, couple AI companies? Yeah, I guess a bit of a circuitous route. I've been playing music for a really long time since I started playing piano when I was four. I played in a lot of bands in high school and college growing up. And the dirty secret is I'm not that good. And so the smart move I suppose for me was to pursue the thing that I was relatively better at, which was physics. I went to college and then to grad school and did a PhD in physics. I studied quantum computing. Maybe for your next podcast, I can tell you about why you shouldn't go into quantum computing.
What did you think you were going to do? Like, did you think you were going to be like a theoretical physicist or like an academic? Oh, goodness. Well, two things like I've never had a master plan. So I don't think I thought what I was going to do or not going to do. But I am certainly not great at physics. You know, I think I had a reasonably successful PhD. Not because I'm good at physics. The quantum mechanics that I studied was worked out in like the 50s. There was a lot of very tricky low temperature microwave engineering. That turns out to be really important for actually doing this stuff. I got lucky that I was relatively good at that compared to all the other physicists.
So, you know, kind of something kind of something on the boundary between two disciplines. I enjoyed every second of that. I would do it all again, even knowing what, you know, what I would be when I grew up or when I grew out of that. Still very close with my PhD advisor. I still live walking distance for my old lab. You know, it's kind of a fun place to just walk around Cambridge, Massachusetts. But yeah, quantum computing is cool. Not what I wanted to do with my life. I found a company called Kentro by accident, not founded, found. They were local and I met them and probably 10 people at the time and I met all 10 and I really, really liked them. And I said, let's go do this. And I was hired as a software engineer.
And I think I got really, really lucky in terms of timing about a month after I joined the machine learning opportunities came along. And in 2014, Guy with PhD in physics is what passes for machine learning engineer. And so I took full advantage of that opportunity. Worked a ton, got to build a team, got to build some fun products. We were acquired by S&P Global in 2018. And got to pursue a lot of fun stuff after that acquisition as well. So I guess I found my way into AI somewhat by accident, but I really like it. It's a lot of fun.
You guys actually started with this open source model bark. Can you talk about like what the idea was at the very beginning and how you ended up in music generation? We did our, we were doing all text that can show and we did our first audio project after we were acquired by S&P Global, which was learning to transcribe earnings calls. So I'm sure both of you have read an earnings call transcript exceedingly likely it was done by S&P Global. It used to be done completely manually, very painful and we could lend a lot of speed and scale by bringing automation to that. And we fell in love with doing audio AI, like we happened to be musicians, but it kind of took this very honestly non sexy project of earnings call transcription to show us how much we loved it.
And we also realized that certainly compared to images and text audio was really, really far behind. And this was in 2020. And I think that's maybe even more true now if you just look at everything that's happened in images and text in the last couple of years. Like I said, I never had a master plan. We made bark and as an open source project. And even before we released bark, we knew we wouldn't be focusing on speech. I think if I'm honest, a lot of people told us go build a speech company. It is more straightforward. You'll build a great B2B product and people will love it. And we couldn't help ourselves. We just love music too much.
And so we decided to build a music company. Why did you know you weren't going to focus on speech? Speech is super interesting, but the inherent creativity that we were so drawn to was like not really present in speech. Speech just needs to be right. Just like read me this New York Times article. And if it's a tiny bit non expressive or tiny bit robotic, that'll still get the job done. And the real creativity was happening in a totally different part of audio, which is music, which all I care about is how it makes me feel. That's really cool.
And then is that person you've taken? Because I guess the two main architectures that people have used for different forms of audio models. I mean, a lot of them are traditionally under fusion models. I know there's been more work on the transformer side. And then there's obviously a few other types of architectures. Is there anything you could tell us about sort of the technical approaches you've taken or how you think about it?
And one of the reasons I ask is obviously for a lot of the transformer models, people just look at scaling laws and how things will sort of adapt with scale. And I'm a little bit curious how that applies to music and how you think about that future relative to models and approaches. We don't make it a secret that these are just transformers. This is somewhat our backgrounds doing text before, but also transformers scale nicely.
A lot of work ends up being done for you by the open source text community, which is always really nice. We can really be choosy with where we innovate. And where we end up innovating a lot is how do you tokenize audio? Audio does not give us the good favor of being nicely discretized. It's sampled very, very quickly, approximately 50,000 samples per second. It's a continuous signal. And so you have to use a set of heuristics or models in order to turn that into a manageable set of tokens.
And that's where we expand, I think, a lot of our kind of innovation cycles is really understanding that. As you said, the thing that matters is how it makes you feel. And so, like, how did you measure quality in your own models? Like, what do you know about how to train something that creates great generations? Is it just all like Mikey as human eval? It's definitely not all Mikey as human eval, but, you know, one thing we say here is that aesthetics matter.
And I think that is a recognition that I think in all branches of AI, we become slaves to our metrics. And you say, I did this accuracy on this benchmark and this accuracy on this benchmark. And in the real world, sometimes it doesn't necessarily matter. And these benchmarks are extra terrible in audio just because the field is so new. And so, aesthetics matter is like a way of saying that you have to use your ears in order to evaluate things. You can look at the things like what your final losses or something like that. But ultimately, it's definitely more tedious to evaluate than you want it to be.
I think the good news is everybody here really loves music. And so, evaluating your models, which means listening to a lot of things and getting people to listen to a lot of things and doing a lot of A.B. tests turns out to be fun. But I think we have a long way to go in this journey on how we're actually going to evaluate these things. And I think we learn a lot about human beings and human emotions while we learn to evaluate these things.
Yeah, it's interesting because as an analog, I know that in the early days of mid-journey, one of the ways it really stood out is people just felt that there is better taste exhibited. It's better aesthetics versus, hey, there's a much better EGAL function that they're optimizing against. Although obviously, there were things that we're doing there as well. And so it feels similar here where that sort of taste component really matters, particularly early on.
Are there other ways that your music background is impacted, the development of C&O, or really helped sort of facilitate some of the things that you're doing? There's this cliche about it being really important to look at your results and look at your data in machine learning and in AI. And if that is pleasurable, it is not nearly as tedious. And that's not just for me. That's kind of everybody here. And that ends up mattering a lot. I've learned a lot about music actually since starting this company and just the exposure to different genres that I never knew existed and exposure to hybrids of genres that have yet to be created by people has been like really, really eye opening.
But it's funny because you ask like, okay, maybe the stuff that I know about music, we actually try very hard not to put too much play implicit bias in the model. The model shouldn't know about music theory. You don't tell GPT, this is a noun and this is a verb, GPT figures it out. If I tell my model, there are only 12 tones. My model will only know how to output 12 tones. If I tell my model, there's 50 different instruments. I will never get that unique sound. And so we've really tried very hard not to do anything like that. And honestly, I don't think this is so smart of us. This is something that we've stolen from the text world of. There's something beautiful about next token prediction that ends up being very, very powerful.
Mikey, what's hard in AI music? I know less about what this frontier looks like. Where do you want to push in terms of things that are really hard for the model to get right? In visual models or video, like human hands, object permanence, there's lots of things that are more intuitive to me there. Yeah, that's a really good question. I confess I've not really thought about that too much. There are the easy things or the easy to describe things like, did you get the stereo right? Did you get the bit rate high enough, et cetera? Again, I think the reason music is so special is because it makes you feel a certain way. And like to the extent that any of this is difficult, it is because you are really targeting human emotions in some way.
And that's not terribly well understood by anyone. And it is also super, super diverse and super culturally dependent and super age dependent or demographic dependent. So, you know, I think what we're doing is so far from objective truth. And it's very easy for people who spend all their days in text LLMs to be thinking about things like, this is how well I did on the LSAT. You know, I can pass the bar with this size model, like the law bar. And none of that exists for us. It's really just like I made a song and it made me feel a certain way. And it may have been grainy audio that made me feel a certain way. It may have been a long song, a short song. I think there's a lot more unanswerable questions in this domain.
One of the things that you all did quite early is I believe you have like a free tier so people can make up to 10 songs a day. And then you have a subscription based approach. How do you think about your users over time in terms of consumer versus prosumer versus business users? And is it too early to tell? Is it a specific area that you're most focused on? Like how do you think about all that stuff? Yeah, that's a great question. I would say, you know, we are trying to change how the entire globe interacts with music and to open new experiences for people. And so what that means is that this is a consumer product. This is not sprinkling AI into able to neurologic or pro tools. This isn't for the person already staying up all night as a hobbyist trying to produce music. This is for everybody. This is for like my mom.
And, you know, I think the business side of things, it may not be conventional wisdom to say start charging immediately for your product. But it's actually really important as we are trying to create something that is a set of behaviors that does not exist. To be able to understand what actually makes people want to part with their hard earned dollars. If I'm being honest, people ask about the business model of dinner to AI a ton. And I think everybody's doing kind of something that looks like SaaS pricing. And it's kind of done very crudely. And we are certainly no exception to that. But I don't know if this is right in the long term. And it strikes me as probably just a vestige of it is the same types of people who were building SaaS companies five years ago.
你知道,我认为在经营方面,立即开始为你的产品收费可能不是传统的智慧。但实际上这非常重要,因为我们正在试图创造一种不存在的行为模式。要理解什么才能真正让人们愿意支付他们辛苦赚来的钱。如果我坦诚一点,很多人问关于Dinner to AI的商业模式的问题。我觉得大家基本上都在模仿SaaS定价,而且做得相当粗糙。我们当然也不例外。但我不确定这在长期来看是否正确。我觉得这可能只是五年前那些开发SaaS公司的人留下的遗迹。
And the same investors who were investing in SaaS companies five years ago who are building it and investing in it this time around. And so it kind of feels like a bit of a vestige. No offense to you guys are both great investors. But like this feels like something that's not totally worked out yet. Yeah. It's interesting because like I remember talking to some people who are very active in the 90s as the web browser was really coming to the forefront. And they were trying to figure out the right business model for web pages. And a lot of the emphasis was actually should we do micro payments. So every time you read a New York Times article, you pay a fraction of a cent instead of ad space models. And of course the world ended up collapsing on that side to ad space models.
But nobody that I've talked to from that era actually thinks that was necessarily the right answer. They just think it was the easiest thing to do in the short run. And so I think there's a really interesting question here to your point in terms of subscriptions. There's ads. There's other sorts of. A placement. There's a variety of things you could do over time. There's micro transactions. And so there's reselling things in a marketplace and letting people take a cut of subscribers, you know, almost like an XGen Spotify or something. So it's super interesting to wonder how all this evolves and where you take it. So it's really cool that you're thinking deeply about it right now. Yeah. It's actually funny to hear you say that because I remember back in the 90s, my older brother was a beta tester for AOL. And I actually remember some of these things happening in a river actually watching him beta test these things. Yeah. It's cool. Are there any ways that people have started to use a product that we're very unexpected for you or surprising use cases or applications or other things people have done with it?
I think so much has been really fulfilling and cool to see and definitely surprising. And you know, one thing I'm constantly reminding everyone is that we are eliciting a set of behaviors that are not. Common and that are not regular for people to do. And so it's not going to be surprising when we see stuff that comes out. It's maybe not surprising that people love to feel creative and they love to feel ownership over what they produce and they love to share it with others if you want to be a little bit more reductive about it. They love to feel famous. But I think it's not the same way that that famous people are famous. It's it's a little bit different. And so we've seen that people will spend a lot of time in front of their computers, enjoying making songs. This is really cool. And it is different from, I think, the way music is done now music is done now sometimes painfully, but only in service of the final product. And I think when you open this up to people, sure, you definitely care about the final product about what the song sounds like on the other end. But you also really cared about the journey and that people will really enjoy making music regardless of the final product.
And I can tell you, personally, the most fun I have ever had doing music is playing music with friends, jam sessions, even when you're not recording. And I think there's something that's like very, very akin to that, that we are able to open up with some of these technologies. It's such like a magical experience. And I feel like everybody should should feel some of that joy of creation with other people. Maybe you already see it in the product, but are you imagining that you get that collaboration joy from like, or, you know, the creation, or working with yourself, feeling like you are more skilled, you're collaborating with AI with Suno or are people jamming? Do you see like, mix tape, like sharing behaviors today you can talk about?
We see all of that, which is super cool. Like a video game, music is fun by yourself and maybe more fun in multiplayer mode. And so we see people enjoying this by themselves, but we see people basically hacking multiplayer mode into this in lots of fun ways where you can have people co-writing lyrics together, trading off words, trading off verses. I'll write the verse, you write the chorus or I'll write the lyrics and you pick all the styles and I'll make a song and then I'll send it to you and you'll, you know, make a song back. And so it's not surprising. I think humans really evolved to resonate strongly with music and want to do music together.
Every culture basically has music. And so it really shouldn't be surprising that we see all of this, but it is really fulfilling from our perspective because it really brings people together. It makes people smile. I don't pretend like we're here in cancer at Suno, but it is really cool to make a lot of people smile. One of the things that you and I talked about previously was in creation platforms, you have like a very skewed ratio in general and then varies by, you know, what the platform is of like creators and people who are listening, absorbing, viewing, whatever, right. There, of course, are a lot of people who make music today, but you listen to the creations of a relatively few number of people, right? How much do you think that changes with something like Suda?
I think a lot. I will say, you know, I'm speculating here. It's still super, super early, but I think of us opening a few important avenues. The first is, I guess, all of the sort of smaller niche micro sharing that is possible where we can make songs that the three of us are going to listen to because it is capturing a moment that three of us had the same way we might take a selfie. And that is sharing dynamics that just are completely absent in music right now. But I think let's do it. Sorry to interrupt you. I love it. Okay. I need some genres. What should we make a song about? My favorite genre, but I don't know that it's supported yet as funk, P-H-O-N-K.
Yeah, I think so. Maybe too obscure. Okay. That'd be very exciting. No, I think we can do some, but let's do some hybrid also, like, I don't know, a song, Rege song. How about some, like, yeah, or like Hawaiian R&B? Who? Why? You want to choose like an instrument to add in there? Yeah. You said Kodo before. Kodo. Or sitar. Something random. Sitar, sitar sounds cool. Yeah. I have heard a lot of really good sitar trap on Suno. Yeah. It goes really well together. That's my second favorite. Okay. Priors in statistics. Yes, we have no priors here. Let's see how we do. Just a learning from the world. Ground up. I've learned a lot. I've learned a lot about a lot of new genres since starting this. What's your favorite new genre, by the way? They came out of that. Gosh, there's some reasons he buys here, but sitar trap is freaking fantastic. Yes. That sounds good. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. Try the other one. Rish has to change this so that intro for no priors going forward. I like it. We're going to have to get an image where I don't even own a Hawaiian shirt, but we're going to have to get an image where we're all wearing the Hawaiian shirts. The Palmer Locky? Yeah, fine. I'll just get Palmer to do it.
I'm maybe the only person who does this with Suno, but every time I create a song, I can imagine what the artist would look like that creates this visualized it, where I'm like, it's the big dude with the Hawaiian shirt, and he's got the sitar with him. I love that. I will tell you one very cool and unexpected thing we saw is we shipped a very simple feature of you can edit your song title, maybe you fat-fingered it or something. As soon as we did that, people started to put their names in their song titles and hit our trending page. People like to feel good about their creations, and you should know, in hindsight, it's obvious, and people will hack your product and tell you what they want out of it. Just one thing back to your point before, though, Sarah, I think we talk a lot about how asymmetric the creation versus consumption is on different platforms, and TikTok is famously very creation-heavy, although still most of TikTok is consumption. I think these set of technologies have the ability to skew that much, much farther, because the creation process is so enjoyable.
I actually think if we do this right in the future, these are not going to be the terms that we use to describe what we're doing. We're not going to say, say, I'm creating or I'm consuming, these things will bleed into one another. We'll have a lot of lean-in consumption, we'll have a lot of lean-out creation, and I think we will eventually decline to draw the line of how many people are creating, how many people are consuming, and we'll just say, people are enjoying all of this music stuff. That's a really interesting vision of the future. I guess that has pretty deep implications as well in terms of how you think music about music, the music industry, how it permeates society. Do you have a view in terms of what all this looks like five years from now? If we are correct that there are just modes of experience around music that people don't have access to, that we can get a billion people much more engaged with music than they are now, that just in terms of the number of dollars or the amount of time people are spending doing music, both of those are going to go up dramatically, that I feel quite confident about.
The exact nature of how this looks, I think, is up for some more debate. This is just an opinion. Because music is so human and so much emotional connection involved in it, I don't really see people losing connection with their favorite artists at all. In fact, if you labor around music and you understand the process, you feel a much deeper connection with the artists that you love.
Another thing I think is likely to happen, if we look at the last wave of technologies to enter music, let's say the DAW, this really accelerates how quickly music can change and how quickly culture can change. Music is really just a reflection of culture. The way that happened is the DAW really let a lot of people start making music who could never make music. You could do this from your dorm room if you had a good pair of headphones and you had a good ear and you were willing to put in the work to learn the tool.
I think if we can give this to so many more people, yes, a lot more people will create, a lot more people will become tastemakers, but the rate at which culture changes the rate at which the styles of music change the rate at which new styles of music are uncovered is likely to go up a lot. I think even if you were just going to only ever listen to music, which some people will, that will get so much more interesting. Things are going to change so much more quickly. You will not have people really, I think, cribbing off of one another in the same way. I'm really excited about that.
Just because not every listener will mix a DAW, like a digital audio workstation is like Ableton or something. You can generate music, put it on time-fine and create sound, as Mikey was saying, in your dorm room, in your apartment cheaply. That was pretty revolutionary when it turned out you didn't need a $500,000 SSL mixer. A staff of 10 people to cut an album. That was really revolutionary. People made tremendous contributions to our collective culture when that happened. There were 15-year-olds who got discovered, and that was extremely rare before that. I actually think it's really an untold story. I'm not the right person. But somebody with really rich musical history understanding should explain what happened with digitization of music.
We were like, ah, I have infinite set of every snare drum sound in the world. I can think of just the ability to completely unconstrain, as you said, something that's much cheaper than traditional tooling, where you don't need to know how to play any instrument. I think of what some of what Suno is doing is making the assembly of that another magnitude easier. I think that's right. There's one other thing that I'm really excited about getting unlocked, which is that if you look at the last 10 years of music, a lot of the changes are, let's say, sonically.
It's like interesting sounds and maybe slightly less so evolving how interesting songs are. It's a function of the technology that got unlocked, like a lot of digitization of things. I'm actually really excited for the opposite. Like AI is certainly able to produce interesting sounds that we've never heard before, but putting these tools in people's hands, we can unlock song structures and chord changes and borrow different styles and mix them with other styles and make stuff that is not only sonically new, but kind of melodically new.
And I think that has the ability to really keep people listening to stuff. And, you know, on my most optimistic days, I'll say untick talk of music, like get us listening to stuff for more than 30 seconds at a time. Maybe I'm a little bit naive and optimistic, but I think it's very possible. Yeah. Okay. Before we wrap, like, I played a song at the beginning. We made a song. You got to play one that's your favorite. That's a creation. Oh, that's a, let me find it. I'm tempted to play a song that's at the top of our showcase. And it's by an artist called Oliver McCann. It's got a lot of plays. It's a really interesting song. It is certainly the public's favorite. So I can play it now. Oh, my love. My friend, you know, it's been a while without thinking of you, but the thought makes me smile. I'm so tempted, wanting more than this. I know it, but what am I to do? I need some stress to breathe. So give me a song. Oh, my love.
It's unbelievable. The amazing thing about this, by the way, which, you know, just for a listener's sake is the vocals are completely machine created. The music is completely machine created. The lyrics are machine created. And so this truly is a synthetic song, which I think is pretty amazing. Yeah, it certainly is easy to lose sight of that fact when you do this day in and day out, but it is incredible. I'll say one step further. And the machine doesn't know that there is even a concept of voice. Like it's just all sound. And somehow it's able to produce the sounds that we have been evolved and a culture related to resonate with. So all of that makes me think I have the coolest job in the world. Not bad for a quantum physicist, a failed one, I guess. Exactly.
Mikey, how big is Suno? It's obviously very popular now. You're growing the team. What are you looking for? Yeah, we always, we are always on the hunt for the best people, people who love technology, people who deeply love music, people who are excited about bringing more music to the world. We're hiring in primarily the East Coast, Cambridge, Massachusetts, or New York. Come drop us a line. Careers at Suno.com. Great. Well, thank you so much for joining us today. I think we covered a lot of great things. I had a great time. Thanks so much for having me. Find us on Twitter at NoPryersPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at New York. And you can get a new episode every episode at no-priars.com.