首页  >>  来自播客: No Priors Podcast 更新   反馈

No Priors Ep. 64 | With Suno CEO and Co-Founder Mikey Shulman

发布时间 2024-05-16 10:00:23    来源

摘要

Mikey Shulman, the CEO and co-founder of Suno, can see a future where the Venn diagram of music creators and consumers ...

GPT-4正在为你翻译摘要中......

中英文字稿  

Hi listeners and welcome to No Priers. Today we're talking to Mikey Schulman, the co-founder and CEO of Suno, an AI music generation tool trying to democratize music making. Users can make a song complete with lyrics just by entering a text prompt. For example, I was playing with it this morning and you guys all get to hear Koto Boombop with LoFi intricate beats. Okay, so feeling really excited about quality here for a company that is just under two years old but is making waves in the AI music industries. Since you came out of stealth mode late last year, Mikey. That's right. All right, well, we're excited to talk to you about AI music models and how it's been going since launch. Thanks so much for doing this. Welcome. Thank you. I'm super excited to be here.
大家好,欢迎收听"No Priers"节目。今天我们邀请到了Suno的联合创始人兼CEO Mikey Schulman。Suno 是一个AI音乐生成工具,致力于让音乐创作变得更加大众化。用户只需输入一个文本提示,就可以创作出一首包含歌词的歌曲。例如,今天早上我玩了一下这个工具,大家可以听到一首融合了古筝和LoFi复杂节拍的“Koto Boombop”。对于一个成立还不到两年的公司来说,能有这样的高质量作品让人非常兴奋,Suno 在AI音乐产业已经掀起了一股热潮。自从去年底你们脱颖而出,Mikey,这是正确的。好的,我们非常期待和你聊聊AI音乐模型以及自发布以来的进展。非常感谢你参与我们的节目。欢迎你。谢谢,我也非常激动能来到这里。

Maybe just start us off with a little bit of background. You're a kid who loved music, playing in bands. How do you go from that to Harvard physics, PhD, building, you know, couple AI companies? Yeah, I guess a bit of a circuitous route. I've been playing music for a really long time since I started playing piano when I was four. I played in a lot of bands in high school and college growing up. And the dirty secret is I'm not that good. And so the smart move I suppose for me was to pursue the thing that I was relatively better at, which was physics. I went to college and then to grad school and did a PhD in physics. I studied quantum computing. Maybe for your next podcast, I can tell you about why you shouldn't go into quantum computing.
也许我们可以从一点背景介绍开始。你是个热爱音乐的小孩,在乐队里演奏。那么,你是如何从那一步走到哈佛物理学博士,甚至创办几家AI公司的呢? 嗯,我想我的路程有点曲折。我从四岁开始弹钢琴,到现在已经演奏音乐很长时间了。高中和大学时,我在很多乐队里演出。说实话,我并不是那么擅长音乐,所以对我来说,聪明的选择是追求我相对更擅长的东西——物理。我上了一所大学,然后读了研究生,最后取得了物理学博士学位,我的研究方向是量子计算。也许在你下一个播客节目中,我可以告诉你为什么不应该进入量子计算领域。

What did you think you were going to do? Like, did you think you were going to be like a theoretical physicist or like an academic? Oh, goodness. Well, two things like I've never had a master plan. So I don't think I thought what I was going to do or not going to do. But I am certainly not great at physics. You know, I think I had a reasonably successful PhD. Not because I'm good at physics. The quantum mechanics that I studied was worked out in like the 50s. There was a lot of very tricky low temperature microwave engineering. That turns out to be really important for actually doing this stuff. I got lucky that I was relatively good at that compared to all the other physicists.
你觉得你当初打算做什么?像是打算成为一个理论物理学家或者学术界人士吗?哦,天哪。嗯,有两点,我从来没有过一个详细的计划。所以我并没想着自己会做什么或者不会做什么。而且我肯定不是物理学方面的高手。你知道吗,我觉得自己博士做得还算成功,但不是因为我物理学好。我研究的量子力学其实在五十年代就已经有了很多成果。这里面有很多非常复杂的低温微波工程,这对实际操作来说非常重要。我很幸运,因为相比其他物理学家,我在这方面相对较为擅长。

So, you know, kind of something kind of something on the boundary between two disciplines. I enjoyed every second of that. I would do it all again, even knowing what, you know, what I would be when I grew up or when I grew out of that. Still very close with my PhD advisor. I still live walking distance for my old lab. You know, it's kind of a fun place to just walk around Cambridge, Massachusetts. But yeah, quantum computing is cool. Not what I wanted to do with my life. I found a company called Kentro by accident, not founded, found. They were local and I met them and probably 10 people at the time and I met all 10 and I really, really liked them. And I said, let's go do this. And I was hired as a software engineer.
所以,你知道,有点像是跨越两个学科边界的东西。我每一秒都很享受。如果再来一次,即使知道自己将来会成为什么样的人或者最终会做什么,我还是会这样选择。至今我仍与我的博士导师关系密切,我依然住在离旧实验室步行距离的地方。你知道,剑桥,马萨诸塞州是一个让人愉快的地方。不过是的,量子计算确实很酷,但不是我想追求的事业。我无意间发现了一家公司叫Kentro,不是创办,是发现。他们是本地公司,我当时遇到了一大概10个人,我非常喜欢他们。我就想,咱们一起干吧。然后我就被雇佣为软件工程师。

And I think I got really, really lucky in terms of timing about a month after I joined the machine learning opportunities came along. And in 2014, Guy with PhD in physics is what passes for machine learning engineer. And so I took full advantage of that opportunity. Worked a ton, got to build a team, got to build some fun products. We were acquired by S&P Global in 2018. And got to pursue a lot of fun stuff after that acquisition as well. So I guess I found my way into AI somewhat by accident, but I really like it. It's a lot of fun.
我认为我非常非常幸运,因为就在我加入公司大约一个月后,机器学习的机会就来了。而在2014年,有物理学博士学位的人也能被认为是机器学习工程师。所以我充分利用了这个机会,投入了大量的工作,组建了一个团队,开发了一些有趣的产品。我们的公司在2018年被标普全球收购,之后我也有幸继续追求许多有趣的事情。所以,我算是偶然进入了人工智能领域,但我真的很喜欢这个领域,它非常有趣。

You guys actually started with this open source model bark. Can you talk about like what the idea was at the very beginning and how you ended up in music generation? We did our, we were doing all text that can show and we did our first audio project after we were acquired by S&P Global, which was learning to transcribe earnings calls. So I'm sure both of you have read an earnings call transcript exceedingly likely it was done by S&P Global. It used to be done completely manually, very painful and we could lend a lot of speed and scale by bringing automation to that. And we fell in love with doing audio AI, like we happened to be musicians, but it kind of took this very honestly non sexy project of earnings call transcription to show us how much we loved it.
你们最开始是从这个开源模型Bark入手的。能谈谈最初的想法是什么,以及你们最后是怎么走到音乐生成这一步的吗?我们最初都在做文本相关的内容,直到我们被S&P Global收购后才开始第一个音频项目, 就是学习转录财报电话会议。我相信你们俩肯定都读过财报电话会议的转录内容,很可能是S&P Global做的。以前这些都是完全靠人工完成的,非常费力,但我们引入自动化后,可以大大提升速度和规模。我们由此爱上了音频AI。虽然我们本来就是音乐人,但是真正让我们发现对音频AI的热爱,反而是这个看起来非常不性感的财报电话会议转录项目。

And we also realized that certainly compared to images and text audio was really, really far behind. And this was in 2020. And I think that's maybe even more true now if you just look at everything that's happened in images and text in the last couple of years. Like I said, I never had a master plan. We made bark and as an open source project. And even before we released bark, we knew we wouldn't be focusing on speech. I think if I'm honest, a lot of people told us go build a speech company. It is more straightforward. You'll build a great B2B product and people will love it. And we couldn't help ourselves. We just love music too much.
我们还意识到,与图像和文本相比,音频领域显然落后很多。这是在2020年。而且如果你看看过去几年图像和文本领域的发展,这一点可能更加明显。正如我说的,我从没有一个详细的计划。我们创建了Bark,并作为一个开源项目发布。甚至在发布Bark之前,我们就知道自己不会专注于语音领域。我必须坦诚地说,很多人建议我们去做一家语音公司,这样更为直接,可以开发出优秀的B2B产品并受到欢迎。但我们就是控制不住自己,我们太热爱音乐了。

And so we decided to build a music company. Why did you know you weren't going to focus on speech? Speech is super interesting, but the inherent creativity that we were so drawn to was like not really present in speech. Speech just needs to be right. Just like read me this New York Times article. And if it's a tiny bit non expressive or tiny bit robotic, that'll still get the job done. And the real creativity was happening in a totally different part of audio, which is music, which all I care about is how it makes me feel. That's really cool.
于是我们决定创建一家音乐公司。为什么你知道自己不会专注于语音技术呢?语音确实很有趣,但我们所追求的那种内在创意在语音中并不存在。语音只需要正确地传达信息,比如读一篇《纽约时报》的文章,即使声音有点缺乏表现力或有点机械化,也能完成任务。而真正的创造力则存在于音频的另一个完全不同的领域——音乐,我关心的只是它如何让我感受到情感,这真的很酷。

And then is that person you've taken? Because I guess the two main architectures that people have used for different forms of audio models. I mean, a lot of them are traditionally under fusion models. I know there's been more work on the transformer side. And then there's obviously a few other types of architectures. Is there anything you could tell us about sort of the technical approaches you've taken or how you think about it?
那个人是你所选择的吗?因为我猜测人们用于不同类型的音频模型的主要架构有两个。我的意思是,很多传统上是融合模型。我知道最近在Transformer架构上有更多的研究。当然,还有其他一些架构类型。你能否告诉我们一些你们所采用的技术方法或你们的思考方式呢?

And one of the reasons I ask is obviously for a lot of the transformer models, people just look at scaling laws and how things will sort of adapt with scale. And I'm a little bit curious how that applies to music and how you think about that future relative to models and approaches. We don't make it a secret that these are just transformers. This is somewhat our backgrounds doing text before, but also transformers scale nicely.
我问这个问题的原因之一显而易见,因为对于很多变换器模型,人们会关注缩放规律以及这些模型在规模上的变化。我有点好奇这种情况如何适用于音乐领域,以及你们对模型和方法的未来有怎样的看法。我们并不隐瞒这些模型其实只是变换器。这和我们之前从事文本处理的背景有关,但另一方面也是因为变换器在缩放方面表现得很好。

A lot of work ends up being done for you by the open source text community, which is always really nice. We can really be choosy with where we innovate. And where we end up innovating a lot is how do you tokenize audio? Audio does not give us the good favor of being nicely discretized. It's sampled very, very quickly, approximately 50,000 samples per second. It's a continuous signal. And so you have to use a set of heuristics or models in order to turn that into a manageable set of tokens.
很多工作最终是由开源文本社区完成的,这真的很好。这样我们就可以在创新方向上更加挑剔。而我们最终在创新上投入最多的地方是音频的分词。音频并不像文本那样可以轻松离散化。其采样速度非常快,约为每秒50,000次,这是一个连续的信号。因此,必须使用一套启发式方法或模型将其转换为可管理的标记集合。

And that's where we expand, I think, a lot of our kind of innovation cycles is really understanding that. As you said, the thing that matters is how it makes you feel. And so, like, how did you measure quality in your own models? Like, what do you know about how to train something that creates great generations? Is it just all like Mikey as human eval? It's definitely not all Mikey as human eval, but, you know, one thing we say here is that aesthetics matter.
我认为,我们在这个方面拓展了很多创新周期,关键在于真正理解这一点。正如你所说,重要的是它让你感觉如何。那么,你是如何评估自己模型的质量的?你知道怎样训练以产生优秀结果的模型吗?是否仅仅依靠像Mikey这样的人工评估呢?虽然不仅仅只是依靠人工评估,但我们这里有一句话,那就是美学很重要。

And I think that is a recognition that I think in all branches of AI, we become slaves to our metrics. And you say, I did this accuracy on this benchmark and this accuracy on this benchmark. And in the real world, sometimes it doesn't necessarily matter. And these benchmarks are extra terrible in audio just because the field is so new. And so, aesthetics matter is like a way of saying that you have to use your ears in order to evaluate things. You can look at the things like what your final losses or something like that. But ultimately, it's definitely more tedious to evaluate than you want it to be.
我认为这是一个共识,即在所有的人工智能领域中,我们往往成为我们指标的奴隶。你会说,我在这个基准测试上达到了这种准确率,在那个基准测试上达到了那种准确率。然而在现实世界中,这些指标有时并不一定那么重要。尤其是在音频领域,这些基准测试显得尤为糟糕,因为这个领域还很新。因此,"美学重要"就像是在说,你必须用耳朵来评估成果。你可以看一些最终的损失值或类似的指标,但最终,评估过程确实比你期望的要繁琐。

I think the good news is everybody here really loves music. And so, evaluating your models, which means listening to a lot of things and getting people to listen to a lot of things and doing a lot of A.B. tests turns out to be fun. But I think we have a long way to go in this journey on how we're actually going to evaluate these things. And I think we learn a lot about human beings and human emotions while we learn to evaluate these things.
我认为,好消息是这里的每个人都非常热爱音乐。因此,评估我们的模型,这意味着要听很多东西,让人们听很多东西,进行许多A.B.测试,这实际上是很有趣的。但我认为在如何真正评估这些事情上,我们还有很长的路要走。在这个过程中,我们还会学到很多关于人类和人类情感的知识。

Yeah, it's interesting because as an analog, I know that in the early days of mid-journey, one of the ways it really stood out is people just felt that there is better taste exhibited. It's better aesthetics versus, hey, there's a much better EGAL function that they're optimizing against. Although obviously, there were things that we're doing there as well. And so it feels similar here where that sort of taste component really matters, particularly early on.
是的,这很有趣,因为作为一个类比,我知道在Mid-Journey的早期阶段,他们的突出表现之一是人们感到它展示了更好的品味和美感,而不仅仅是在优化某种EGAL功能。虽然显然我们在那里也在做一些优化。但在这里也有类似的感觉,特别是在初期阶段,品味这个因素真的很重要。

Are there other ways that your music background is impacted, the development of C&O, or really helped sort of facilitate some of the things that you're doing? There's this cliche about it being really important to look at your results and look at your data in machine learning and in AI. And if that is pleasurable, it is not nearly as tedious. And that's not just for me. That's kind of everybody here. And that ends up mattering a lot. I've learned a lot about music actually since starting this company and just the exposure to different genres that I never knew existed and exposure to hybrids of genres that have yet to be created by people has been like really, really eye opening.
你的音乐背景是否影响了你对C&O的发展,或者说是否真的帮助你促进了一些你正在做的事情?有一句老话说,在机器学习和人工智能中,查看你的结果和数据是非常重要的。如果这是令人愉快的,那么它就不会显得那么繁琐。这不仅仅是对我,对这里的每个人都是如此。这点非常重要。自从创建这家公司以来,我实际上学到了很多关于音乐的知识,接触到了一些我以前不知道存在的不同音乐类型,还有一些尚未被人们创造出来的混合音乐类型,这真是大开眼界。

But it's funny because you ask like, okay, maybe the stuff that I know about music, we actually try very hard not to put too much play implicit bias in the model. The model shouldn't know about music theory. You don't tell GPT, this is a noun and this is a verb, GPT figures it out. If I tell my model, there are only 12 tones. My model will only know how to output 12 tones. If I tell my model, there's 50 different instruments. I will never get that unique sound. And so we've really tried very hard not to do anything like that. And honestly, I don't think this is so smart of us. This is something that we've stolen from the text world of. There's something beautiful about next token prediction that ends up being very, very powerful.
但有趣的是,你问的这个问题,好像我对音乐了解很多,但实际上我们非常努力地避免在模型里加入太多隐含偏见。模型不应该懂音乐理论。你不会告诉GPT这是名词,那是动词,GPT会自己弄明白。如果我告诉我的模型,只有12个音符,那模型就只能输出12个音符。如果我告诉模型,有50种不同的乐器,我就永远无法得到独特的声音。所以我们真的非常努力地避免这样做。坦白说,我不觉得这是我们有多聪明,这是我们从文本领域借鉴来的。在预测下一个词时,有某种美妙的能力,这最终变得非常强大。

Mikey, what's hard in AI music? I know less about what this frontier looks like. Where do you want to push in terms of things that are really hard for the model to get right? In visual models or video, like human hands, object permanence, there's lots of things that are more intuitive to me there. Yeah, that's a really good question. I confess I've not really thought about that too much. There are the easy things or the easy to describe things like, did you get the stereo right? Did you get the bit rate high enough, et cetera? Again, I think the reason music is so special is because it makes you feel a certain way. And like to the extent that any of this is difficult, it is because you are really targeting human emotions in some way.
米奇,关于人工智能音乐,什么地方比较难?我对这个领域了解得不多。你在这方面想要突破的难点有哪些?在视觉模型或视频中,比如人的手的表现、物体的持久性,这些对我来说更直观一些。是的,这是个非常好的问题。说实话,我并没有特别深入地考虑过这个问题。有一些比较简单或者容易描述的问题,比如立体声效果是否正确?码率是否足够高,等等。但我认为音乐特别的原因在于它能让你产生某种情感,从这个角度来看,任何困难都是因为你在某种程度上瞄准了人类的情感。

And that's not terribly well understood by anyone. And it is also super, super diverse and super culturally dependent and super age dependent or demographic dependent. So, you know, I think what we're doing is so far from objective truth. And it's very easy for people who spend all their days in text LLMs to be thinking about things like, this is how well I did on the LSAT. You know, I can pass the bar with this size model, like the law bar. And none of that exists for us. It's really just like I made a song and it made me feel a certain way. And it may have been grainy audio that made me feel a certain way. It may have been a long song, a short song. I think there's a lot more unanswerable questions in this domain.
那是非常难以被任何人完全理解的。而且它也是超级多样化的,超级依赖文化背景、年龄或人口统计。因此,我认为我们目前所做的事情离客观真理还很远。那些整天沉浸在文本大型语言模型(LLM)中的人很容易认为,这是我在LSAT(法学院入学考试)中表现如何,或者我可以通过这种规模的模型通过律师资格考试。这些都与我们无关。对我们来说,这真的就像是我创作了一首歌,这首歌让我有了一定的感觉。可能是模糊的音频让我产生了一定的感觉,可能是长歌,也可能是短歌。在这个领域,我认为有更多无法回答的问题。

One of the things that you all did quite early is I believe you have like a free tier so people can make up to 10 songs a day. And then you have a subscription based approach. How do you think about your users over time in terms of consumer versus prosumer versus business users? And is it too early to tell? Is it a specific area that you're most focused on? Like how do you think about all that stuff? Yeah, that's a great question. I would say, you know, we are trying to change how the entire globe interacts with music and to open new experiences for people. And so what that means is that this is a consumer product. This is not sprinkling AI into able to neurologic or pro tools. This isn't for the person already staying up all night as a hobbyist trying to produce music. This is for everybody. This is for like my mom.
你们很早就做了一件事,我相信你们有一个免费层,让用户每天可以制作最多10首歌曲。然后你们有一个基于订阅的模式。你们如何看待用户随着时间的推移在消费者、专业消费者和企业用户之间的变化?现在判断这些情况是否为时过早?你们最关注的具体领域是什么呢?你们如何看待这些问题呢? 对,这是个很好的问题。我想说的是,我们正在试图改变全球人与音乐的互动方式,并为人们开启新的体验。因此,这是一款面向普通消费者的产品。这不是在能够神经网络或者专业工具中撒些AI。这不是针对那些已经整夜不眠不休地作为业余爱好者制作音乐的人。这是面向所有人的,像我的妈妈也可以使用。

And, you know, I think the business side of things, it may not be conventional wisdom to say start charging immediately for your product. But it's actually really important as we are trying to create something that is a set of behaviors that does not exist. To be able to understand what actually makes people want to part with their hard earned dollars. If I'm being honest, people ask about the business model of dinner to AI a ton. And I think everybody's doing kind of something that looks like SaaS pricing. And it's kind of done very crudely. And we are certainly no exception to that. But I don't know if this is right in the long term. And it strikes me as probably just a vestige of it is the same types of people who were building SaaS companies five years ago.
你知道,我认为在经营方面,立即开始为你的产品收费可能不是传统的智慧。但实际上这非常重要,因为我们正在试图创造一种不存在的行为模式。要理解什么才能真正让人们愿意支付他们辛苦赚来的钱。如果我坦诚一点,很多人问关于Dinner to AI的商业模式的问题。我觉得大家基本上都在模仿SaaS定价,而且做得相当粗糙。我们当然也不例外。但我不确定这在长期来看是否正确。我觉得这可能只是五年前那些开发SaaS公司的人留下的遗迹。

And the same investors who were investing in SaaS companies five years ago who are building it and investing in it this time around. And so it kind of feels like a bit of a vestige. No offense to you guys are both great investors. But like this feels like something that's not totally worked out yet. Yeah. It's interesting because like I remember talking to some people who are very active in the 90s as the web browser was really coming to the forefront. And they were trying to figure out the right business model for web pages. And a lot of the emphasis was actually should we do micro payments. So every time you read a New York Times article, you pay a fraction of a cent instead of ad space models. And of course the world ended up collapsing on that side to ad space models.
五年前投资SaaS公司的那些投资者现在依然在构建并投资这些公司。因此这感觉有点像是过去的遗留现象。当然,不是说你们不是好投资者,你们都是很优秀的投资者。只是说这种感觉好像还没有完全成型。 这很有趣,因为我记得在90年代,当网络浏览器刚刚开始普及时,我和一些非常活跃的人聊天,他们试图找到合适的网页商业模式。当时很多人都在讨论是否应该采用微支付模式,比如每读一篇《纽约时报》的文章,支付几分之一分钱,而不是依赖广告模式。当然,最终世界还是选择了广告模式。

But nobody that I've talked to from that era actually thinks that was necessarily the right answer. They just think it was the easiest thing to do in the short run. And so I think there's a really interesting question here to your point in terms of subscriptions. There's ads. There's other sorts of. A placement. There's a variety of things you could do over time. There's micro transactions. And so there's reselling things in a marketplace and letting people take a cut of subscribers, you know, almost like an XGen Spotify or something. So it's super interesting to wonder how all this evolves and where you take it. So it's really cool that you're thinking deeply about it right now. Yeah. It's actually funny to hear you say that because I remember back in the 90s, my older brother was a beta tester for AOL. And I actually remember some of these things happening in a river actually watching him beta test these things. Yeah. It's cool. Are there any ways that people have started to use a product that we're very unexpected for you or surprising use cases or applications or other things people have done with it?
但我和那个时代的人谈过,没有一个人真正认为这是正确的答案。他们只是觉得,在短期内这是最简单的办法。所以我觉得,这里有一个非常有趣的问题与你谈到的订阅服务相关。还有广告和其他类型的投放方式。你可以做很多事情,比如微交易,再比如在市场上转售物品并让人们从订阅者中分一杯羹,就像下一代的Spotify之类的。因此,思考所有这些如何演变以及你会如何引导它,是非常有趣的。很高兴看到你现在在深入思考这些问题。其实挺有趣的是,你这么说让我想起90年代,我的哥哥曾是AOL的测试员。我记得那些事情确实发生过,还记得看他测试这些东西。很酷。有没有一些用户对产品的使用方式是你意想不到的,或者有没有一些令人惊讶的用例或应用,用户还做了什么其他奇特的事情?

I think so much has been really fulfilling and cool to see and definitely surprising. And you know, one thing I'm constantly reminding everyone is that we are eliciting a set of behaviors that are not. Common and that are not regular for people to do. And so it's not going to be surprising when we see stuff that comes out. It's maybe not surprising that people love to feel creative and they love to feel ownership over what they produce and they love to share it with others if you want to be a little bit more reductive about it. They love to feel famous. But I think it's not the same way that that famous people are famous. It's it's a little bit different. And so we've seen that people will spend a lot of time in front of their computers, enjoying making songs. This is really cool. And it is different from, I think, the way music is done now music is done now sometimes painfully, but only in service of the final product. And I think when you open this up to people, sure, you definitely care about the final product about what the song sounds like on the other end. But you also really cared about the journey and that people will really enjoy making music regardless of the final product.
我认为有很多事情真的让人感到充实和酷,而且肯定也很令人惊喜。你知道,我一直在提醒大家,我们正在引导出一系列不常见的、不是人们日常会做的行为。所以,当我们看到一些出现的现象时,并不该感到惊讶。人们热爱创作,渴望对自己的作品有归属感和认同感,并喜欢与他人分享,如果你想更简单地说,他们热爱成为焦点。但我觉得这与名人的那种成名方式不同,有点差别。因此,我们看到人们会花很多时间坐在电脑前,享受制作歌曲的过程,这真的很酷。这与现在的音乐制作方式不同,现在的音乐制作有时痛苦,但只为最终的成品服务。而当你向大众开放这个过程时,是的,你肯定关心最终的成品,即歌曲最后的效果。但你同样也会关心制作的过程,这样人们会真正享受音乐创作,不管最终的成品如何。

And I can tell you, personally, the most fun I have ever had doing music is playing music with friends, jam sessions, even when you're not recording. And I think there's something that's like very, very akin to that, that we are able to open up with some of these technologies. It's such like a magical experience. And I feel like everybody should should feel some of that joy of creation with other people. Maybe you already see it in the product, but are you imagining that you get that collaboration joy from like, or, you know, the creation, or working with yourself, feeling like you are more skilled, you're collaborating with AI with Suno or are people jamming? Do you see like, mix tape, like sharing behaviors today you can talk about?
我可以告诉你,个人而言,我做音乐最开心的时候是和朋友一起玩音乐,进行即兴演奏,甚至不录音的时候。我觉得有些技术能让我们体验到类似的快乐,这是一种非常神奇的体验。我希望每个人都能感受到这种与他人共同创作的喜悦。或许你已经在产品中看到了,但你有没有想过你能从中获得那种合作的快乐,或者说是与自己合作的感觉,感觉自己更有技巧,与Suno的AI合作,或者人们在即兴演奏呢?你今天有没有看到可以分享的混音带之类的行为,可以谈谈吗?

We see all of that, which is super cool. Like a video game, music is fun by yourself and maybe more fun in multiplayer mode. And so we see people enjoying this by themselves, but we see people basically hacking multiplayer mode into this in lots of fun ways where you can have people co-writing lyrics together, trading off words, trading off verses. I'll write the verse, you write the chorus or I'll write the lyrics and you pick all the styles and I'll make a song and then I'll send it to you and you'll, you know, make a song back. And so it's not surprising. I think humans really evolved to resonate strongly with music and want to do music together.
我们看到这一切,这真的非常酷。就像视频游戏一样,音乐一个人玩也很有乐趣,但在多人模式下可能会更有趣。因此,我们看到人们自己享受音乐的同时,也通过各种有趣的方式把音乐变成多人模式。例如,人们可以一起写歌词,交换单词或段落。比如,我写主歌,你写副歌,或者我写歌词,你来选风格。我做一首歌然后发给你,你再做一首歌回给我。所以,这并不奇怪。我认为人类真的进化到对音乐有强烈的共鸣,并且愿意一起做音乐。

Every culture basically has music. And so it really shouldn't be surprising that we see all of this, but it is really fulfilling from our perspective because it really brings people together. It makes people smile. I don't pretend like we're here in cancer at Suno, but it is really cool to make a lot of people smile. One of the things that you and I talked about previously was in creation platforms, you have like a very skewed ratio in general and then varies by, you know, what the platform is of like creators and people who are listening, absorbing, viewing, whatever, right. There, of course, are a lot of people who make music today, but you listen to the creations of a relatively few number of people, right? How much do you think that changes with something like Suda?
每个文化基本上都有音乐。因此,我们看到这些现象真的不应该感到惊讶,但从我们的角度看,这确实令人满足,因为它真正地把人们联系在一起,让人们微笑。我不会假装我们在Suno能治愈癌症,但能让很多人微笑确实很酷。我们之前谈到的一件事是,在创作平台上,创作者和听众、观众等的比例通常非常不平衡,而且还会根据不同平台有所不同。当然,现在有很多人创作音乐,但我们基本上只听相对少数人的作品,对吧?你觉得像Suda这样的东西会在多大程度上改变这种情况?

I think a lot. I will say, you know, I'm speculating here. It's still super, super early, but I think of us opening a few important avenues. The first is, I guess, all of the sort of smaller niche micro sharing that is possible where we can make songs that the three of us are going to listen to because it is capturing a moment that three of us had the same way we might take a selfie. And that is sharing dynamics that just are completely absent in music right now. But I think let's do it. Sorry to interrupt you. I love it. Okay. I need some genres. What should we make a song about? My favorite genre, but I don't know that it's supported yet as funk, P-H-O-N-K.
我想了很多。我会说,我只是猜测。这还很早很早,但是我觉得我们在开辟一些重要的途径。第一个途径,我想,是所有那些可能的小众微分享,我们可以制作一些只有我们三个人会听的歌曲,因为它捕捉到的是我们三个人一起度过的某个时刻,就像我们拍自拍一样。而这种分享动态在现在的音乐中是完全没有的。但我觉得我们应该试一试。不好意思打断了,我真的很喜欢这个想法。好吧,我需要一些音乐类型的建议。我们该做什么风格的歌曲呢?我最喜欢的类型是Funk音乐,拼作P-H-O-N-K,但我不知道这种风格是否被支持。

Yeah, I think so. Maybe too obscure. Okay. That'd be very exciting. No, I think we can do some, but let's do some hybrid also, like, I don't know, a song, Rege song. How about some, like, yeah, or like Hawaiian R&B? Who? Why? You want to choose like an instrument to add in there? Yeah. You said Kodo before. Kodo. Or sitar. Something random. Sitar, sitar sounds cool. Yeah. I have heard a lot of really good sitar trap on Suno. Yeah. It goes really well together. That's my second favorite. Okay. Priors in statistics. Yes, we have no priors here. Let's see how we do. Just a learning from the world. Ground up. I've learned a lot. I've learned a lot about a lot of new genres since starting this. What's your favorite new genre, by the way? They came out of that. Gosh, there's some reasons he buys here, but sitar trap is freaking fantastic. Yes. That sounds good. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. I'm not a lot of people. Try the other one. Rish has to change this so that intro for no priors going forward. I like it. We're going to have to get an image where I don't even own a Hawaiian shirt, but we're going to have to get an image where we're all wearing the Hawaiian shirts. The Palmer Locky? Yeah, fine. I'll just get Palmer to do it.
对,我觉得是这样。也许太模糊了。好吧,那会很令人兴奋。不,我觉得我们可以做一些,但我们也可以做一些混合的,比如,我不知道,一首歌,雷鬼歌曲。怎么样,比如,嗯,或者像夏威夷R&B?谁?为什么?你想选择一个乐器加入进去吗?对。你之前说过Kodo(日本太鼓)。Kodo,或者锡塔琴。随便什么。锡塔琴,锡塔琴听起来很酷。对。我在Suno上听过很多非常好的锡塔琴陷阱音乐。对,它们结合得很好。这是我第二喜欢的。好吧。统计学的先验知识。是的,我们这里没有先验知识。看看我们能做些什么。只是从世界中学习。从零开始。我学到了很多。自从开始这个以来,我学到了很多关于新音乐类型的东西。顺便问一下,你最喜欢的新类型是什么?就是那些。天啊,有些原因让我有些倾向,但锡塔琴陷阱音乐实在是太棒了。是的,听起来不错。很多人都不多。我不是很多人。再试试另外一个。Rish 需要改变这个,没有先验知识的介绍。太好了。我们需要一张图片,我甚至没有夏威夷衬衫,但我们需要一张所有人都穿着夏威夷衬衫的图片。Palmer Locky?好的,我会让Palmer来做。

I'm maybe the only person who does this with Suno, but every time I create a song, I can imagine what the artist would look like that creates this visualized it, where I'm like, it's the big dude with the Hawaiian shirt, and he's got the sitar with him. I love that. I will tell you one very cool and unexpected thing we saw is we shipped a very simple feature of you can edit your song title, maybe you fat-fingered it or something. As soon as we did that, people started to put their names in their song titles and hit our trending page. People like to feel good about their creations, and you should know, in hindsight, it's obvious, and people will hack your product and tell you what they want out of it. Just one thing back to your point before, though, Sarah, I think we talk a lot about how asymmetric the creation versus consumption is on different platforms, and TikTok is famously very creation-heavy, although still most of TikTok is consumption. I think these set of technologies have the ability to skew that much, much farther, because the creation process is so enjoyable.
翻译如下: 或许我是在用 Suno 的时候唯一这样做的人,但每次我创作一首歌曲时,我都能想象出这位艺术家的样子,好像在我脑海中浮现出一个画面:他是个穿着夏威夷衬衫的壮汉,拿着一把锡塔琴。我非常喜欢这种感觉。我告诉你一个很酷且意想不到的事情。我们推出了一个很简单的功能,就是你可以编辑歌曲标题,也许你一开始输入的时候按错了。结果一发布这个功能,人们就开始把他们的名字放在歌曲标题里,并出现在我们的流行页上。人们喜欢为自己的创作感到自豪,事后再想想这件事就很显而易见了,人们会利用你的产品并告诉你他们想从中得到什么。回到你之前提到的点,Sarah,我觉得我们经常讨论不同平台上创作和消费之间的非对称性,TikTok 就因为创作量大而著称,尽管 TikTok 上大部分内容还是以消费为主。我认为这类技术有能力把这个情况拉得更远,因为创作过程实在是太享受了。

I actually think if we do this right in the future, these are not going to be the terms that we use to describe what we're doing. We're not going to say, say, I'm creating or I'm consuming, these things will bleed into one another. We'll have a lot of lean-in consumption, we'll have a lot of lean-out creation, and I think we will eventually decline to draw the line of how many people are creating, how many people are consuming, and we'll just say, people are enjoying all of this music stuff. That's a really interesting vision of the future. I guess that has pretty deep implications as well in terms of how you think music about music, the music industry, how it permeates society. Do you have a view in terms of what all this looks like five years from now? If we are correct that there are just modes of experience around music that people don't have access to, that we can get a billion people much more engaged with music than they are now, that just in terms of the number of dollars or the amount of time people are spending doing music, both of those are going to go up dramatically, that I feel quite confident about.
实际上,我认为如果我们将来能够正确处理这件事情,我们将不会使用这些术语来描述我们在做什么。我们不会说,“我在创作”或“我在消费”,这些活动将会相互融合。会有很多参与式的消费,也会有很多超脱的创作。我认为最终我们不会再去界定有多少人在创作,有多少人在消费,而只会说,人们在享受所有这些音乐的东西。这是一个非常有趣的未来愿景。我猜这也将对你如何看待音乐、音乐产业以及它如何渗透到社会中产生深远的影响。你觉得五年后这会是什么样子呢?如果我们认为有一些音乐体验模式是人们目前无法接触到的,我们可以让十亿人比现在更加投入到音乐中去,那么无论是从人们在音乐上花费的钱数还是时间来看,这两者都会大幅增加,我对此很有信心。

The exact nature of how this looks, I think, is up for some more debate. This is just an opinion. Because music is so human and so much emotional connection involved in it, I don't really see people losing connection with their favorite artists at all. In fact, if you labor around music and you understand the process, you feel a much deeper connection with the artists that you love.
我认为具体情况可能还需要进一步讨论。这只是一个观点。因为音乐本身非常人性化,而且与人的情感有很大的联系,所以我不觉得人们会对自己喜欢的艺术家失去感情连接。事实上,如果你深入了解音乐创作的过程,你会与喜欢的艺术家产生更深的共鸣。

Another thing I think is likely to happen, if we look at the last wave of technologies to enter music, let's say the DAW, this really accelerates how quickly music can change and how quickly culture can change. Music is really just a reflection of culture. The way that happened is the DAW really let a lot of people start making music who could never make music. You could do this from your dorm room if you had a good pair of headphones and you had a good ear and you were willing to put in the work to learn the tool.
我认为还有一件很可能发生的事情是,如果我们看看上一波进入音乐领域的技术,比如数字音频工作站(DAW),这实际上加快了音乐和文化变化的速度。音乐只是文化的反映。数字音频工作站让许多人开始制作音乐,这些人原本可能无法涉足音乐创作。只要你有一副好的耳机、好的听力,并且愿意投入时间学习这个工具,你甚至可以在宿舍里进行音乐创作。

I think if we can give this to so many more people, yes, a lot more people will create, a lot more people will become tastemakers, but the rate at which culture changes the rate at which the styles of music change the rate at which new styles of music are uncovered is likely to go up a lot. I think even if you were just going to only ever listen to music, which some people will, that will get so much more interesting. Things are going to change so much more quickly. You will not have people really, I think, cribbing off of one another in the same way. I'm really excited about that.
我认为,如果我们能够把这个机会带给更多的人,是的,会有更多人开始创作,更多人成为潮流引领者,但与此同时,文化变化、音乐风格变化和新音乐风格被发掘的速度可能会大大加快。我觉得,即使你只是单纯地听音乐(有些人确实会这样),这也会变得更加有趣。事物将会变化得更快。我认为人们不会再像以前那样互相模仿。我对此感到非常兴奋。

Just because not every listener will mix a DAW, like a digital audio workstation is like Ableton or something. You can generate music, put it on time-fine and create sound, as Mikey was saying, in your dorm room, in your apartment cheaply. That was pretty revolutionary when it turned out you didn't need a $500,000 SSL mixer. A staff of 10 people to cut an album. That was really revolutionary. People made tremendous contributions to our collective culture when that happened. There were 15-year-olds who got discovered, and that was extremely rare before that. I actually think it's really an untold story. I'm not the right person. But somebody with really rich musical history understanding should explain what happened with digitization of music.
只是因为并不是每个听众都会使用DAW(数字音频工作站),比如Ableton之类的软件,你可以在宿舍或公寓里便宜地生成音乐、定时并创建声音,就像Mikey说的那样。当你不再需要一个价值五十万美元的SSL混音器或一个由10人组成的团队来制作一张专辑时,那真的是一种革命性变化。当时,很多人对我们共同的文化做出了巨大贡献。有很多15岁的年轻人因此被发现,这在以前是非常罕见的。我实际上认为这是一个鲜为人知的故事。我不是那个合适的人选,但应该有一个真正了解丰富音乐历史的人来解释数字化音乐带来的变化。

We were like, ah, I have infinite set of every snare drum sound in the world. I can think of just the ability to completely unconstrain, as you said, something that's much cheaper than traditional tooling, where you don't need to know how to play any instrument. I think of what some of what Suno is doing is making the assembly of that another magnitude easier. I think that's right. There's one other thing that I'm really excited about getting unlocked, which is that if you look at the last 10 years of music, a lot of the changes are, let's say, sonically.
我们当时感觉就像是,啊,我拥有了世界上所有军鼓声的无限集合。我认为正如你所说的,这种完全不受限制的能力比传统乐器要便宜得多,而且你不需要会演奏任何乐器。我觉得Suno公司正在做的事情让这个过程变得更加容易,这点我同意。另外,还有一件让我非常兴奋的事,就是如果你回顾过去十年的音乐,很多变化主要是在音效方面。

It's like interesting sounds and maybe slightly less so evolving how interesting songs are. It's a function of the technology that got unlocked, like a lot of digitization of things. I'm actually really excited for the opposite. Like AI is certainly able to produce interesting sounds that we've never heard before, but putting these tools in people's hands, we can unlock song structures and chord changes and borrow different styles and mix them with other styles and make stuff that is not only sonically new, but kind of melodically new.
这就像创造有趣的声音,虽然在提升歌曲的有趣层面上稍微少了一些。不过,这是技术解放的结果,比如很多东西的数字化。实际上,我对相反的方向更感兴趣。虽然人工智能肯定能够生成我们从未听过的有趣声音,但如果把这些工具交到人们手中,我们可以解锁新的歌曲结构和和弦变化,借鉴不同的风格并混合其他风格,创作出不仅在音效上新颖,而且在旋律上也别具一格的作品。

And I think that has the ability to really keep people listening to stuff. And, you know, on my most optimistic days, I'll say untick talk of music, like get us listening to stuff for more than 30 seconds at a time. Maybe I'm a little bit naive and optimistic, but I think it's very possible. Yeah. Okay. Before we wrap, like, I played a song at the beginning. We made a song. You got to play one that's your favorite. That's a creation. Oh, that's a, let me find it. I'm tempted to play a song that's at the top of our showcase. And it's by an artist called Oliver McCann. It's got a lot of plays. It's a really interesting song. It is certainly the public's favorite. So I can play it now. Oh, my love. My friend, you know, it's been a while without thinking of you, but the thought makes me smile. I'm so tempted, wanting more than this. I know it, but what am I to do? I need some stress to breathe. So give me a song. Oh, my love.
我认为这能够真正让人们持续地听音乐。在我最乐观的时候,我会说,这可以打破音乐的快餐文化,让我们听一首歌超过30秒。也许我有点天真和乐观,但我觉得这是非常有可能的。好的,在结束之前,我在开头播放了一首歌。我们创作了一首歌,你可以播放你最喜欢的作品。这是一首……让我找找看。我很想播放一首我们展会上最受欢迎的歌。这是由一位名叫奥利弗·麦坎的艺术家创作的,播放次数非常多,非常有趣的一首歌。这无疑是大众的最爱。那我现在就播放它吧。 哦,我的爱人。我亲爱的朋友,你知道的,已经有一段时间没有想起你了,但这个念头让我微笑。我非常想要更多,但我知道该怎么做呢?我需要一些压力来呼吸。所以,给我一首歌吧。哦,我的爱人。

It's unbelievable. The amazing thing about this, by the way, which, you know, just for a listener's sake is the vocals are completely machine created. The music is completely machine created. The lyrics are machine created. And so this truly is a synthetic song, which I think is pretty amazing. Yeah, it certainly is easy to lose sight of that fact when you do this day in and day out, but it is incredible. I'll say one step further. And the machine doesn't know that there is even a concept of voice. Like it's just all sound. And somehow it's able to produce the sounds that we have been evolved and a culture related to resonate with. So all of that makes me think I have the coolest job in the world. Not bad for a quantum physicist, a failed one, I guess. Exactly.
这简直难以置信。顺便提一下,这件事最令人惊奇的地方在于,出于听众的考虑,这首歌的演唱完全由机器生成,音乐完全由机器生成,歌词也是机器生成。所以这确实是一首完全由人工合成的歌曲,我觉得这真的很了不起。是的,当你每天都在做这件事时,确实很容易忽略这个事实,但它的确是不可思议的。我再进一步说,机器甚至不知道“声音”这个概念存在,对它来说只是一种声音。但它却能够生成我们在进化和文化中产生共鸣的声音。所以,这一切让我觉得我拥有世界上最酷的工作。对于一个“失败”的量子物理学家来说,这也不错,不是吗?确实如此。

Mikey, how big is Suno? It's obviously very popular now. You're growing the team. What are you looking for? Yeah, we always, we are always on the hunt for the best people, people who love technology, people who deeply love music, people who are excited about bringing more music to the world. We're hiring in primarily the East Coast, Cambridge, Massachusetts, or New York. Come drop us a line. Careers at Suno.com. Great. Well, thank you so much for joining us today. I think we covered a lot of great things. I had a great time. Thanks so much for having me. Find us on Twitter at NoPryersPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at New York. And you can get a new episode every episode at no-priars.com.
Mikey,Suno 现在有多大?显然非常受欢迎。你们在扩充团队。你们在寻找什么样的人? 是的,我们一直在寻找最优秀的人才,热爱科技、深深爱着音乐、对将更多音乐带给世界充满激情的人。我们主要在美国东海岸招聘,地点包括剑桥(马萨诸塞州)或纽约。欢迎给我们留言:careers@suno.com。 太好了,非常感谢你今天和我们一起。我觉得我们讨论了很多很棒的内容,我真的很开心。非常感谢你邀请我。 大家可以在 Twitter 上关注我们:@NoPryersPod。如果想看我们的面容,可以订阅我们在 YouTube 上的频道。可以在 Apple Podcasts、Spotify 或其他任何你收听的平台关注我们的节目,这样你每周都会收到新的一集。同时可以在 no-priars.com 注册接收邮件或查找每一集的文字记录。