OpenAI’s CPO on how AI changes must-have skills, moats, coding, startup playbooks, more | Kevin Weil
发布时间 2025-04-10 11:01:19 来源
在这个播客节目中,Lenny Rachitsky 采访了 OpenAI 的首席产品官 Kevin Weil,讨论了公司的运营、人工智能的影响以及如何在人工智能生态系统中构建产品。Weil 强调了人工智能发展的快速步伐,指出人工智能模型不断改进,以至于现在的模型是“你未来一生将使用的最糟糕的人工智能模型”。这种持续变化要求产品构建者快速适应并以不同的方式思考他们的工作。
Weil 讨论了 OpenAI 的内部文化,强调其专注于快速行动和拥抱实验。他强调了授权的、自下而上的团队的重要性,以及接受错误是学习过程的一部分。他提到了 OpenAI 的迭代部署理念,即他们尽早发布产品,与公众共同发展,并根据用户反馈快速迭代。
一个中心主题是“evals”(评估)在人工智能产品开发中的日益重要性。Evals 是用于衡量模型在特定任务或主题材料上的表现的测试。Weil 认为,编写 evals 正在成为产品经理和人工智能开发人员的一项核心技能,因为它们有助于确定模型的优势和劣势。了解模型的性能(例如,60% 的准确率与 95% 或 99.5%)对于设计合适的产品至关重要。定制的 evals 对于衡量公司特定或用例特定场景中的性能至关重要。
Weil 消除了人们对 OpenAI“扼杀”初创公司的担忧,他强调公司外部聪明的人比内部多。他还指出,OpenAI 的重点是构建一个强大的 API,以赋能各个行业的开发人员。这为初创公司创造了巨大的机会,可以构建改进现有技术水平的基于人工智能的产品。
Weil 强调了人工智能产品开发的一些反直觉的方面,例如将人工智能视为人类进行推理。这种方法对于设计用户界面和交互模式来说可能出奇地有效。他还认为,“聊天”将成为人工智能的多功能界面,因为它具有多功能性,并且可以适应各种交流方式。
展望未来,Weil 预测人工智能将深入整合到产品的各个方面。这将导致微调过程,因此研究人员将被纳入产品团队以进行持续改进。他还建议,人们将越来越关注针对特定应用进行定制调整的模型。这将是对定制训练和其他工具日益普及的回应。
关于未来,Weil 对人工智能的影响持乐观态度,同时他也承认对工作岗位流失和其他挑战的担忧。他对个性化辅导非常乐观。人工智能将使辅导更易于获得,并为所有人降低成本。然而,他认为总体而言,技术推动着经济和地缘政治的进步。他还认为优质教育可以改变世界。
Weil 提到了 Facebook 的 Libra 加密货币的失败,表示由于监管障碍和 Facebook 的声誉,它从未发布而感到失望。他希望 Meta 现在可以重新考虑构建它。
对话以一个闪电问答结束,涵盖了 Weil 最喜欢的书籍、电视节目和产品。他建议持续不断地完成伟大的工作,因为他相信要专注于每天的工作,而不是最终的结果。关于提示的最后提示是,给模型提供你想要的东西的例子,它们会听从你的指示并学习,你不需要成为提示方面的专家就能得到你想要的东西。
In this podcast episode, Lenny Rachitsky interviews Kevin Weil, Chief Product Officer at OpenAI, about the company's operations, the implications of AI, and how to build products in the AI ecosystem. Weil emphasizes the rapid pace of AI development, stating that AI models are constantly improving, rendering the current models "the worst AI model you will ever use for the rest of your life." This constant change requires product builders to adapt quickly and think differently about their work.
Weil discusses OpenAI's internal culture, highlighting its focus on moving fast and embracing experimentation. He stresses the importance of empowered, bottoms-up teams and the acceptance of mistakes as part of the learning process. He mentioned OpenAI's iterative deployment philosophy, where they ship products early, co-evolve with the public, and rapidly iterate based on user feedback.
A central theme is the growing importance of "evals" (evaluations) in AI product development. Evals are tests used to gauge how well a model performs on specific tasks or subject material. Weil argues that writing evals is becoming a core skill for product managers and AI developers, as they help determine a model's strengths and weaknesses. Understanding a model's performance (e.g., 60% accuracy vs. 95% or 99.5%) is critical for designing appropriate products. Custom evals are vital for measuring performance in company-specific or use-case-specific scenarios.
Weil dispels concerns about OpenAI "squashing" startups, emphasizing that there are more smart people outside the company than inside. He also notes that OpenAI's focus is on building a robust API to empower developers across various industries. This creates immense opportunities for startups to build AI-based products that improve upon the state-of-the-art.
Weil highlights some counterintuitive aspects of AI product development, such as reasoning about AI as if it were a human. This approach can be surprisingly effective for designing user interfaces and interaction patterns. He also suggests that "chat" will be a versatile interface for AI due to its versatility and ability to accommodate diverse communication styles.
Looking ahead, Weil predicts that AI will become deeply integrated into every facet of the products. This will result in the fine tuning process, therefore researchers will be included into the product teams for continuous improvement. He also suggested that there'll be an increasing focus on custom-tuned models for specific applications. This will be a response to the growing accessibility to custom training and other tools.
Regarding the future, Weil is optimistic about the impact of AI, acknowledging concerns about job displacement and other challenges. He is very optimistic towards personalized tutoring. AI will make tutoring more accessible and cheaper for all. However, he believes that technology, in general, drives economic and geopolitical advancements. He also believes that quality education can change the world.
Weil touches on the failure of Facebook's Libra cryptocurrency, expressing disappointment that it never launched due to regulatory hurdles and Facebook's reputation. He hopes that Meta might reconsider building it now.
The conversation concludes with a lightning round covering Weil's favorite books, TV shows, and products. He advises to do great work over a sustained period of time, as he believes in doing the daily work and not focusing on the final result. His final tip about prompting is give models examples of the kinds of things you want and they will listen to you and learn and you do not need to be an expert in prompting to get what you want.
摘要
Kevin Weil is the chief product officer at OpenAI, where he oversees the development of ChatGPT, enterprise products, and the ...
GPT-4正在为你翻译摘要中......
中英文字稿 
The AI models that you're using today is the worst AI model you will ever use for the rest of your life. And when you actually get that in your head, it's kind of wild. Everywhere I've ever worked before this, you kind of know what technology you're building on. But that's not true at all with AI. Every two months, computers can do something they've never been able to do before, and you need to completely think differently about what you're doing. Your chief product officer of maybe the most important company in the world right now, I want to chat about what it's just like to be inside the center of the storm.
你今天使用的AI模型将是你这一生中用过的最差的AI模型。当你真正意识到这一点时,感觉会很震撼。在我以前工作的地方,总是很清楚我们正在构建什么样的技术。但在AI领域,情况完全不同。每两个月计算机就能做以前从未做到的事情,你需要彻底改变思维方式去看待你正在做的事情。作为可能是当今世界上最重要的公司的首席产品官,我想和你聊聊在这场风暴的中心工作是什么样的体验。
Our general mindset is in two months, there's going to be a better model and it's going to blow away whatever the current set of limitations are. And we say this to developers too. If you're building, and the product that you're building is kind of right on the edge of the capabilities of the models, keep going because you're doing something right. Give it another couple months, and the models are going to be great. And suddenly the product that you have that just barely worked is really going to sing.
我们的总体思路是,两个月后会有一个更好的模型出现,它将突破当前的各种限制。我们也会这样告诉开发者。如果你正在开发的产品正处于模型能力的边缘,继续前进,因为你做的事情是对的。再给它几个月,模型将会变得非常出色。这样一来,你的产品就不再是勉强可行,而是能够真正发挥出色。
Famously you led this project at Facebook called Libra. Libra is probably the biggest disappointment of my career. It fundamentally disappoints me that this doesn't exist in the world today because the world would be a better place if we'd been able to ship that product. We tried to launch a new blockchain, it was a basket of currencies originally, it was integration into WhatsApp and Messenger. I would be able to send you 50 cents in WhatsApp for free. It should exist.
您曾在Facebook领导一个名为Libra的项目,这一点广为人知。Libra可能是我职业生涯中最大的遗憾。我非常失望,因为这个项目未能在世界上实现,而如果我们能够发布这个产品,世界将会更美好。我们尝试推出一个新的区块链系统,最初是一个由多种货币组成的货币篮子,并计划整合到WhatsApp和Messenger中。我本来可以在WhatsApp里免费给您发送50美分。这个项目本该存在的。
To be honest, the current administration is super friendly to crypto. Facebook's reputation is in a very different place. Maybe they should go build it now. Today, my guest is Kevin Wheel. Kevin is chief product officer at OpenAI, which is maybe the most important and most impactful company in the world right now, being at the forefront of AI, and AGI, and maybe someday superintelligence. He was previously head of product at Instagram and Twitter. He was co-creator of the Libra cryptocurrency at Facebook, which we chat about.
坦白说,现任政府对加密货币非常友好。而Facebook的声誉则处于一个截然不同的位置。或许他们应该现在就去建立(加密货币)。今天,我的嘉宾是Kevin Wheel。Kevin是OpenAI的首席产品官,该公司可能是目前全球最重要、影响力最大的公司之一,因为它处于人工智能、通用人工智能,甚至可能未来的超级智能的前沿。他之前是Instagram和Twitter的产品负责人,也是在Facebook参与创建Libra加密货币的联合创始人,我们会聊聊这个话题。
He's also on the boards of Planet and Strava and the Black Product Managers Network, and the Nature Conservancy. He's also just a really good guy, and he has so much wisdom to share, which had about how OpenAI operates, implications of AI and how we will all work and build product, which markets within the AI ecosystem, companies like OpenAI won't likely go after, and thus are good places for startups to own.
他还担任Planet、Strava、黑人产品经理网络和大自然保护协会的董事会成员。他也是一个很好的人,有很多智慧可以分享,比如关于OpenAI的运作方式、AI的影响,以及我们如何工作和构建产品;在AI生态系统中,哪些市场像OpenAI这样的公司可能不会涉足,因此是适合初创公司发展的好地方。
Also, why learning the craft of writing eVALs is quickly becoming a core skill for product builders, what skills will matter most in an AI era and what he's teaching his kids to focus on, and so much more, this is a very special episode, and I am so excited to bring you to you. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. If you become an annual subscriber of my newsletter, you get a year free of Proplexity Pro, Linear, Notion, Superhuman, and Ragnola. Check it out at Lenny's newsletter.com and click Bundle.
此外,我们将探讨为什么学习撰写评估报告的技巧正在迅速成为产品开发者的核心技能,以及在人工智能时代哪些技能最重要,还有他教孩子们专注于哪些内容,等等。这是一集非常特别的节目,我很高兴为你呈现。如果你喜欢这期播客,别忘了在你喜欢的播客应用或YouTube上订阅和关注。如果你成为我新闻通讯的年费订阅者,你将免费获得一年的Proplexity Pro、Linear、Notion、Superhuman和Ragnola。请访问Lenny的newsletter.com并点击Bundle查看详情。
With that, I bring you Kevin Wheel. This episode is brought to you by Epo. Epo is a next generation AB testing and feature management platform built by alums of Airbnb and Snowflake for modern growth teams. Companies like Twitch, Miro, ClickUp, and DraftKings rely on Epo to power their experiments. Experimentation is increasingly essential for driving growth and for understanding the performance of new features. Epo helps you increase experimentation velocity while unlocking rigorous deep analysis in a way that no other commercial tool does.
接下来,我将为您介绍凯文·威尔。本期节目由Epo赞助。Epo是一个由Airbnb和Snowflake前员工开发的新一代AB测试和功能管理平台,专为现代化增长团队设计。像Twitch、Miro、ClickUp和DraftKings这样的公司依靠Epo来支持他们的实验。实验对于推动增长和理解新功能的表现变得越来越重要。Epo帮助您提高实验速度,同时以其他商业工具无法实现的方式进行深入的严谨分析。
When I was at Airbnb, one of the things that I left most was our experimentation platform. Break it set up experiments easily, troubleshoot issues, and analyze performance all on my own. Epo does all that and more with advanced statistical methods that can help you shave weak-sop experiment time and accessible UI for diving deeper into performance and out of the box reporting that helps you avoid annoying, prolonged, analytic cycles. Epo also makes it easy for you to share experiments inside through their team, sparking new ideas for the AB testing flywheel.
当我在Airbnb工作时,我最喜欢的一件事就是我们的实验平台。这个平台让我可以轻松设置实验、排查问题,并自行分析性能。而Epo不仅能够做到这些,还提供更先进的统计方法,帮助你缩短实验时间。它拥有易于使用的界面,便于深入分析性能,也提供开箱即用的报告功能,帮助你避免繁琐冗长的分析周期。此外,Epo还让你能够轻松地与团队内部共享实验,激发AB测试的创新思路。
Epo powers experimentation across every use case, including product, growth, machine learning, monetization, and email marketing. Check out Epo at getepo.com slash Lenny and 10X your experiment velocity that's getepo.com slash Lenny. This episode is brought to you by Persona, the adaptable identity platform that helps businesses fight fraud, meet compliance requirements, and build trust. While you're listening to this right now, how do you know that you're really listening to me, Lenny?
Epo可以用于各种实验场景,包括产品开发、增长、机器学习、盈利化和电子邮件营销。在getepo.com/lenny了解更多信息,加速您的实验速度,实现10倍增长。本节目由Persona赞助,Persona是一个灵活的身份平台,帮助企业打击欺诈、满足合规要求并建立信任。当你现在在听这个节目的时候,你如何确认你真的在听我的声音呢,Lenny?
These days, it's easier than ever for fraudsters to steal PII, faces, and identities. That's where Persona comes in. Persona helps leading companies like LinkedIn, Etsy, and Twilio securely verify individuals and businesses across the world. What sets Persona apart is its configurability. Every company has different needs, depending on its industry, use cases, risk tolerance, and user demographics. That's why Persona offers flexible building blocks that allow you to build tailored collection and verification flows that maximize conversion while minimizing risk. Plus Persona's orchestration tools automate your identity process so that you can fight rapidly shifting fraud and meet new waves of regulation. Whether you're a startup or an enterprise business, Persona has a plan for you. Learn more at widthpersona.com slash Lenny. Again, that's width, P-E-R-S-O-N-A dot com slash Lenny.
如今,骗子比以往更容易窃取个人身份信息、面部特征和身份。在这种情况下,Persona公司应运而生。Persona帮助像LinkedIn、Etsy和Twilio这样的知名企业,在全球范围内安全地验证个人和企业。Persona的独特之处在于其高度可定制性。每个公司根据其行业、使用场景、风险承受能力和用户群体的不同而需求各异。因此,Persona提供灵活的构建模块,让您打造量身定制的信息采集和验证流程,在最大化转化率的同时,将风险降到最低。此外,Persona的协调工具可以自动化您的身份认证流程,帮助您应对迅速变化的欺诈行为和新法规的浪潮。无论您是初创企业还是大型企业,Persona都有适合您的计划。了解更多信息,请访问 widthpersona.com/Lenny。请记住,是 width,P-E-R-S-O-N-A 点 com 斜杠 Lenny。
Kevin, thank you so much for being here and welcome to the podcast. Thank you so much for having me. We've been talking about doing this forever and we made it happen. We did it. I can't imagine how insane your life is, so I really appreciate you that you made time for this. And we're actually recording this the week that you guys launched your new image model, which is a happy coincidence. My entire social feed is filled with skibbley, vacations of everyone's life and family photos and everything. So good job.
凯文,非常感谢你来到这里,欢迎参加我们的播客。谢谢你邀请我。我们一直在谈论要录这期节目,现在终于实现了。你的生活一定特别忙,所以我非常感谢你能抽出时间来参与。而且,碰巧的是,我们正好在你们发布新图像模型的这一周录制节目,真是个巧合。我的整个社交媒体上全是有关大家生活和家庭照的精彩内容。干得不错!
Yep, mine too. My wife Elizabeth sent me one of her. So I'm right there with you. Let me just ask, did you guys expect this kind of reaction? It feels like this is the most viral thing that's happened in AI, which is a high bar since I don't know, JTPT launch. Just like did you guys expect it to go this well? What does it feel like internally? You know, there have been a handful of times in my career when you're working on a product internally. And the internal usage just explodes. This was true, by the way, when we were building stories at Instagram, more than anything else in my career, we could feel it was going to work because we were all using it internally and we'd go away for a weekend before it launched. We were all using it and we'd come back after a weekend and we would know what was going on. And be like, oh, hey, I saw you at that camping trip. How was that? You were like, man, this thing really works.
是的,我的情况也是这样。我的妻子伊丽莎白给我发了她的一条信息。所以我和你一样感同身受。我想问一下,你们是否预料到会有这样的反响?感觉这是自JTPT发布以来,AI领域最火爆的事情。这确实是很高的标准。你们是否预料到会有这么好的效果?内部的感觉是怎样的呢?在我的职业生涯中,有几次在内部研发某个产品时,内部使用量激增。例如在Instagram开发“故事”功能的时候,甚至超过了我职业生涯中的其他任何时刻。我们能感受到这个功能会成功,因为在推出前的周末,我们所有人都在内部使用它。我们度过周末后回来,就会知道正在发生什么,还会聊起:“嘿,我看到你去野营了,感觉怎么样?”,类似这样的话题。那时我们就觉得,这个功能真的很有效。
Image Gen was definitely one of those. So we've been playing with it for, I don't know, a couple months. And when it first went live internally to the company, there was kind of a little gallery where you could generate your own. You could also see whatever one else was generating. And it was just like non-stop buzz. So yeah, we had a sense that this was going to be a lot of fun for people to play with. That's a really cool. That should be a measure of just like confidence and something going well that you're launching is internally everyone's going crazy for it.
Image Gen 绝对是其中之一。所以我们已经玩了一段时间,大概几个月吧。当它第一次在公司内部上线时,有一个小型画廊,你可以在那儿生成自己的图像,你也可以看到其他人生成的作品。那时真的是无休止的讨论和热议。因此,我们觉得这会是一个很有趣的工具,大家都愿意来玩。这真是太酷了。这也应该是对一个新产品充满信心的标志,说明它的发展很顺利,因为内部的每个人都为之疯狂。
Yeah, especially social things because you have a very tight network as a company socially. So you know each other and your experts in your product hopefully. And so there's some sense in which if you're doing something social and it's not taking off internally, you might question what you're doing. Yeah. And by the way, the Ghibli thing is that something you're seated or how did that even start? Was that like an intentional example? I think it's just the style people love and the model is really capable at emulating style or understanding what, you know, it's very good in instruction following.
是的,特别是在社交方面,因为你们公司内部的社交网络非常紧密。每个人都彼此了解,并且希望都是产品方面的专家。所以如果你们在做某些社交活动,但没有在内部获得反响,那么你可能需要质疑一下你在做什么。对了,关于吉卜力的事情,那是你们主动发起的吗?还是它是怎么开始的?这是一种有意为之的示例吗?我认为这只是人们喜欢的风格,而模型确实能很好地模仿这种风格或理解这一点,它非常善于遵循指令。
That's actually something that I think people, I'm starting to see people discover with it. But you do very complex things. You can give it to images, you know, what is your living room? And the other is a whole bunch of photos or memorabilia or things you want. And you say like, tell me how you would arrange these things. Or you can say, I'd like you to show me what this will look like. If you put this over here and this thing to the right of that and this one to the left of this, under that one, and the model actually will understand all of that and do it. It's incredibly powerful. So I'm, I'm, I'm as excited about all the different things people are going to figure out.
这实际上是我开始看到人们在不断发现的一个特性。你可以使用它来完成非常复杂的事情。比如,你可以给它一些图像,比如你的客厅,还有一堆照片、纪念品或你想放置的东西,然后你说:“告诉我你会怎么布置这些东西。”或者,你可以说:“我想让你展示一下,如果把这个放在这里,把那个放在它的右边,再把另一个放在它的左边或者在它的下面,这一切会是什么样子。”这个模型实际上能够理解并实现这些操作,这真的非常强大。所以,我对人们将要探索出的各种可能性感到非常兴奋。
Yeah. All right. Well, good job. Good job team open AI. Let's get serious here. And let's kind of zoom out a little bit. The way I see it is your chief product officer of maybe the most important company in the world right now. Just not to set the bar too high, but you guys are ushering in AI, AGI at some point, super intelligence at some point. No big deal. I've had, I have more questions for you than I've had for any other guest actually put out a call out on Twitter and like did in my community, just like, what would you want to ask Kevin?
好的。你们干得不错,OpenAI团队。让我们认真一点,把视野放大一点。在我看来,你是可能是现今全球最重要公司的首席产品官。并不是要给你们设立太高的标准,但你们正在引领人工智能的发展,未来某天可能会达到通用人工智能,甚至超级智能。这真的不简单。我对你的好奇超过对其他任何嘉宾的。我在推特上发了消息,也在我的社群中询问大家:“你们想问Kevin什么问题?”
And I had 300 over 300 well formed questions. And we're going to go through every single one. So let's just get started. I'm just, I picked out the best and there's a lot of stuff I'm really curious about. It's, it's 1 p.m. here. It doesn't get dark for a while. So let's do it. Okay. Here we go. Okay. So first of all, I'm just going to take notes here.
我有300多个精心设计的问题。我们将逐一讨论每一个。所以让我们开始吧。我挑选了一些最好的问题,其中有很多是我非常感兴趣的。现在是下午1点,还有很长时间才天黑,所以我们开始吧。好的,我们走吧。首先,我会在这里做一些笔记。
When is AGI launching? When is this not running? I mean, we just launched a good image, GenMald. Is that count? It's, it's getting there. It's getting there. There's this, there's this quote I love, which is, A.I. is whatever hasn't been done yet. Because once it's been done when it kind of works, then you call it machine learning. And once it's kind of ubiquitous and it's everywhere, then it's just an algorithm. So I've always loved that, that we call things A.I. when they still don't quite work. And then, you know, by the time it's like, an A.I. algorithm that's recommending you follow, you know, oh, that's just an algorithm. But this new thing, like self-driving cars, that's it.
AGI(人工通用智能)什么时候会推出呢?什么时候它还没运行呢?我是说,我们刚推出了一个很不错的图像生成器,GenMald。这算数吗?它正在接近目标。这里有一个我很喜欢的说法,就是“人工智能就是那些还没有实现的东西”。因为一旦实现并稍微运作正常了,你就称它为机器学习。而当它变得无处不在时,就只是一个算法。因此,我一直喜欢这种观点,就是当某个东西尚未完全运行时,我们称其为人工智能。然后,当变成像一个给你推荐关注的算法时,你会说,哦,这只是个算法。但像自动驾驶汽车这样的新东西,那才是人工智能。
Some degree we're always going to be there. And the next thing is always going to be A.I. And the current thing that we, you know, use every day and is just a part of our lives, that's an algorithm. And so interesting, because yeah, like in the Bay Area, you see self-driving cars driving around. And so normal now, when like four years ago, and I know three years ago, you would have thought, you would have seen this thing, you'd be like, holy shit. What is, how we're in the future, and now we're just so taken for granted. It's, I mean, there's something like that with everything.
在某种程度上,我们总会参与其中。而接下来会成主流的永远是人工智能。我们每天使用的事物已经融入我们的生活习惯,那就是算法。这非常有趣,比如在湾区,你会看到自动驾驶汽车在路上行驶。而现在这已经变得如此平常,而若是在四年前,可能三年前,你见到这些车时会感叹,天哪,我们已经进入未来。而如今我们对此习以为常。类似的情况在其他事情上也一样。
If I showed you, when GPT-3 launched, right, I wasn't at OpenAi then, I was just a user, but it was mind-blowing. And if I gave you GPT-3 now, I just plugged that into chat GPT for you when you started using it, you'd be like, what is this thing? Like, there's like mess. It's slop. Slop. I had the same experience when I first got into a Waymo, right? You're very first ride, at least my very first ride. My first like 10 seconds in a Waymo, it starts driving. And you're like, oh my god, watch out for that bike. You're holding on to whatever you can.
如果我在 GPT-3 刚推出时给你展示,那个时候我还不在 OpenAI,只是个用户,但它真是令人震惊。如果我现在给你用上 GPT-3,把它插到 ChatGPT 里给你用,估计你会觉得,这是什么东西?就像是一团乱糟糟的东西。我第一次乘坐 Waymo 的经历也是这样,你第一次坐上的时候,好比是我的第一次乘坐。Waymo 开始行驶的头十秒,你会想,天哪,小心那个自行车,然后紧紧抓住任何可以抓的东西。
And then like five minutes in, you've calmed down, and you realize that you're getting driven around the city without a driver and it's working. You're just like, oh my god, I am living in the future right now. And then like another 10 minutes, you're bored, you're doing email on your phone, answering Slack messages, and suddenly this miracle of human invention is just an expected part of your life from then on. And there is really something in the way that we all are adapting to AI that's kind of like that.
五分钟过去后,你平静下来,意识到自己正乘坐一辆没有司机的车在城市里行驶,而且车子运作良好。你会感叹:“天啊,我这是在亲身体验未来!”又过了十分钟,你开始觉得无聊,拿出手机处理邮件,回复Slack消息,这时这个人类发明的奇迹似乎已经成为你生活中的习以为常。我们适应人工智能的方式就有点像这样。
These miraculous things happen and computers can do something they've never been able to do before. And it blows our mind collectively for like a week. And then we're like, oh yeah. Like, oh yeah. Now it's just machine learning on its way to being an algorithm. The craziest thing about what you just shared actually is like, I don't know, chat GPT, which is like now feels terrible 3.5 was like a couple years ago and imagined what life will be like in a couple years from now.
这些神奇的事情发生了,计算机能够做到以前从未做到的事情。这让我们集体感到震惊,持续了大约一周。然后我们就觉得,哦,对,这就是机器学习正逐步变成一个算法。你刚刚分享的事情中最疯狂的一点是,比如说,我不知道,ChatGPT。感觉3.5版本简直不怎么样,而那就是几年前的事情。想象一下,再过几年生活会是什么样子。
We're gonna get to that where things are going, what you think is gonna be in the next big leap. But I wanna start with the beginning of your journey at OpenAI. So you worked at Twitter, you worked at Facebook, you worked at Planet Instagram. At some point you got recruited to go and come work at OpenAI. I'm curious just what that story was like of the recruiting process of joining OpenAI as CPO. Is there any fun stories there?
我们会谈到未来的发展方向,以及你认为下一个重大进展会是什么。但我想从你在 OpenAI 的旅程开始聊起。你之前在推特、脸书和 Instagram 工作过。后来你被招募到 OpenAI 工作。我很好奇你是如何被招募到 OpenAI 担任首席产品官的,这个过程中有什么有趣的故事吗?
If I'm remembering the timeline right, we communicated a planet I was leaving. And I was planning to just go take some time. Like I wasn't gonna stop working, but I was also happy to take the summer. This is like maybe April or something. It was like cool. I'm gonna have the summer with my kids. We're gonna go to Tahoe or something and I'll actually get to hang out rather than what I usually do going up and down and all that. And then Sam and I had known each other lightly for a bunch of years.
如果我没记错时间线的话,我们谈到了一个我将要离开的星球。我计划花些时间休息。虽然我不是要停止工作,但我也很高兴能在夏天放松一下。大概是在四月左右,我觉得很棒。我可以和孩子们一起度过夏天。我们打算去塔霍湖或者其他地方,我可以好好陪伴他们,而不是像平时那样忙来忙去。在此之前,Sam和我已经有好几年保持着轻松的朋友关系。
And he's always involved in so many interesting things. Companies building fusion and all these things. So he'd always been somebody that I would call occasionally if I was starting to think about my next thing. Because I like working on big tech forward sort of next wave kind of things. And so I called him. I think Vinod also helped put us in touch again. And this time it wasn't like, oh you should go talk to these guys working on fusion. It he said actually, we're thinking about something. You should come talk to us.
他总是参与许多有趣的事情,比如建设融合技术的公司等等。所以每当我开始考虑自己的下一个计划时,我时不时会给他打电话。因为我喜欢从事那些引领科技前沿的下一波潮流的项目。所以这次我联系了他。我想Vinod也帮助我们重新建立了联系。这次他并不是说,你应该去跟那些研究融合技术的人聊聊。而是他告诉我,我们正在考虑一些事情,你应该来跟我们谈谈。
I was like, okay, that sounds amazing. Let's do it. And it goes really fast, really, really fast. Like I met most of the management team in a brief period of time in a few days. And they were telling me, look, we're basically gonna move as fast as we want to move. And if you talk to everyone, everyone likes you already to go. Sam came over for dinner and we had a great evening together just talking about OpenAI in the future and getting to know each other better. And at the end, I was like, I was gonna go in the next day for like a bigger round of interviews. And Sam was saying, hey, it's going really well. We're really excited. And I said, cool. So how do I think about tomorrow? And he said, oh, you'll be fine. Don't worry about it. And if it goes well, like we're basically there.
我当时想,好的,这听起来太棒了。我们就这么做吧。一切进展得非常快,真的非常非常快。在短短几天内,我见到了大多数管理团队成员。他们告诉我,我们基本上会以我们想要的速度推进。如果你和所有人交流过,你会发现大家都已经很喜欢你并希望你加入。之后,Sam过来和我共进晚餐,我们度过了一个愉快的夜晚,一起聊了很多关于OpenAI的未来,并更加了解彼此。在晚餐结束时,我准备第二天参加更大规模的面试环节。Sam对我说,这一切进展得很顺利,他们非常兴奋。我说,很好,那我明天该怎么准备呢?他告诉我,你会没问题的,不用担心。如果一切顺利,我们基本上就可以确定下来了。
And so I go in the next day, meet a bunch of people. Have a great time. Like I really enjoyed everybody I met with. In any interview, you can always second guess yourself. Like, oh, I shouldn't have said that thing. Or that thing I gave a bad answer on. I wish I could read you. But I came away feeling like I think that went pretty well. And I was expecting to hear like that weekend basically because they'd sort of said expectations. So this goes well, we're ready to go. And I didn't hear anything. And then it was like Monday, Tuesday, Wednesday, I still didn't hear anything. And I reached out to folks on the opening, I sat a couple of times, still nothing. And I was like, oh my god, I screwed it up. Like I don't know where I screwed it up, but I totally screwed it up. I can't believe it. And I was going back to Elizabeth, my wife, and being like, what did I do? Like where do you think I, you know, getting all crazy about it? And then it's still nothing.
第二天我就去见了一群人,玩得很开心,我真的很喜欢和他们见面。在任何面试中,你总会对自己后悔,比如"啊,我不该说那个"或者"我那个问题回答得不好"。真希望能重来一次。但是总体上我感觉还不错。当时我期待能在周末听到消息,因为他们之前说如果一切顺利,就可以马上进行下一步。然而,我什么消息也没收到。到了星期一、星期二、星期三,还是没有任何消息。我试着联系几次相关的人,还是没有回应。我心想,天哪,我搞砸了。我不知道哪里出了问题,但我肯定是搞砸了。我无法相信这是真的。我不断跟我的妻子伊丽莎白倾诉,问她我哪里做错了,对这一切陷入了疯狂。然而,依然没有任何消息。
And finally, it was like nine days later. They finally got back to me. And it turned out, you know, there was like a bunch of stuff happening internally and this, that and the other thing. And, you know, there's just a million things happening. And they finally were like, oh yeah, that went well, let's do this. And I was like, oh, okay, cool, let's do it. But it was like nine days of agony. And they were just super busy on some internal stuff. And there I was like fretting every single day and re-going over every line of our interview process. It makes me think about when you're like dating someone and you texted them and you're just, you're not hearing anything back. And all like you assume something is wrong. Yeah, totally. They might just be busy.
最后,大概过了九天,他们才终于联系我。结果发现,他们内部有很多事情要处理,各种各样的事情接连不断。他们终于说,哦,对,那次进行得不错,我们来做这个吧。我心里想着,好吧,太好了,就这么办。但这九天真是折磨人,他们一直忙于内部事务,而我每天都在担心,反复回忆我们的面试过程。这让我想起与某人约会时,你给他们发了短信,却迟迟没有收到回复的感觉,总觉得哪里出了问题。没错,他们可能只是很忙而已。
I give her a hard time about it still. So. That's wild. I love that it worked out. And I guess the lesson there is don't jump to conclusions. Yeah, have a little bit of chill. Speaking of that, I want to chat about what it's just like to be inside the center of the storm. Again, you worked at a lot of, let's say, traditional companies, even though they're not that traditional Twitter and Instagram and Facebook and Planet. And now you work at OpenAI. I'm curious what is most different about how things work in your day-to-day life at OpenAI.
我仍然对她的那件事情开玩笑。真是不可思议。我很高兴事情最后有了好的结果。我想从中得到的教训是——不要草率下结论,要放松一点。说到这点,我想聊聊在风暴中心工作的感觉。你曾在许多所谓的传统公司工作过,尽管它们不那么传统,比如推特、Instagram、Facebook和Planet。现在你在OpenAI工作。我很好奇,在OpenAI,你的日常工作中有什么不同之处。
I think it's probably the pace. Maybe it's two things. One is it's the pace. The second is everywhere I've ever worked before this, you kind of know what technology you're building on. So you spend your time thinking about what problems are you solving, who are you building for? How are you gonna make their lives better? How are you gonna, is this a big enough problem that you're gonna be able to change habits? Do people care about this problem being solved? All those good product things. But the stuff that you're building on is kind of fixed. You're talking about databases and things. And I bet the database you use this year is probably 5% better than the database you used two years ago. But that's not true at all with AI.
我想这可能是节奏的问题。也许有两个原因。第一个是节奏。第二个是,在我之前工作的所有地方,你大致都知道你所使用的是什么技术平台。所以,你会花时间去思考你在解决哪些问题,你为谁而建这个产品?你将如何改善他们的生活?这个问题是否足够重要,以至于你能改变他们的习惯?人们是否在意这个问题的解决?所有这些都属于优秀的产品思维。然而,你所使用的技术平台是相对固定的,比如数据库。今年用的数据库可能比两年前用的好5%。但在人工智能领域,这种情况完全不同。
It's like every two months, computers can do something they've never been able to do before and you need to completely think differently about what you're doing. There's like something fundamentally interesting about that makes life fun here. There's also something, we'll maybe talk about eVals later, but it also really, in this world of, everything we're used to with computers is about giving a computer very defined inputs. If you look at Instagram, for example, there are buttons that do specific things and you know what they do. And then when you give a computer defined inputs, you get very defined outputs. You're confident that if you do the same thing three times, you're gonna get the same output three times.
这就像每两个月,计算机都能做一些以前从未做到的事情,你需要彻底改变对自己工作方式的看法。这里有一些基本有趣的东西,让生活变得有趣。我们以后可能会谈到eVals,但在这个世界里,我们习惯于给计算机提供非常明确的输入。如果你看看Instagram,例如,那上面有一些按钮,可以执行特定的操作,你知道它们会做什么。当你给计算机明确的输入时,你会得到非常明确的输出。你确信如果你做同样的事情三次,你会三次得到相同的输出。
LLMs are completely different than that, right? They're good at fuzzy, subtle inputs than all the nuances of human language and communication. They're pretty good at. And also, they don't really give you the same answer. You probably get spiritually the same answer for the same question, but it's certainly not the same set of words every time. And so you're much more, it's fuzzier inputs and fuzzier outputs. And when you're building products, it really matters whether there's some use case that you're trying to build around. If the model gets it right 60% of the time, you build a very different product than if the model gets it right 95% of the time versus if the model gets it right 99.5% of the time. And so there's also something that you have to get really into the weeds on your use case and the evals and things like that in order to understand the right kind of product to build.
大型语言模型(LLMs)完全不同,对吧?它们擅长处理模糊、微妙的输入,能够理解人类语言和交流中的各种细微差别。通常,它们的表现相当不错。此外,它们并不会每次都给出同样的答案。对于相同的问题,你可能会得到在精神上类似的回答,但每次使用的词语肯定有所不同。因此,你会意识到它们的输入和输出更具有模糊性。在构建产品时,是否有明确的使用场景非常重要。如果模型能够正确地回答60%的问题,那么构建的产品会与模型能正确回答95%或99.5%的问题时完全不同。因此,深入研究你的使用场景、评估以及相关细节,才能了解需要构建的产品类型。
So that is just fundamentally different. If your database works once, it works every time and that's not true in this world. Let's actually follow this thread on evals. I definitely wanted to talk about this. So we had this legendary panel at the Leningrad Summit, as Yoon Mike Krieger and Sir Guo Ma Rating. So fun. And the thing that I heard that kind of stuck with people from that panel was a comment you made where you said that writing evals is gonna become a core skill for product managers. Yeah. And I feel like that probably applies further than just product managers. A lot of people know what evals are. A lot of people have no idea what I'm talking about.
这在根本上是不同的。如果你的数据库能正常工作一次,那它就能一直正常工作,但在这个世界中并非如此。让我们实际来深入探讨一下关于评估的话题。我确实想要谈论这个。所以我们在列宁格勒峰会有一个传奇的小组,由尹迈克·克里格和郭马·雷廷主持,非常有趣。那次讨论中让我印象深刻的是你的一句话,你说撰写评估将成为产品经理的一项核心技能。是的,我觉得这可能不仅仅适用于产品经理。很多人知道评估是什么,但也有很多人不知道我在说什么。
So could you just briefly explain what is an eval? And then just why do you think this is gonna be so important for people building products in the future? Yeah, sure. I think the easiest way to think about it is almost like a quiz for a model, a test to gauge how well it knows a certain set of subject material or how good it is at responding to a certain set of questions. So in the same way, you take a calculus class and then you have calculus tests that see if you've learned what you're supposed to learn, you have evals that test how good is the model at creative writing, how good is the model at graduate level science, how good is the model at competitive coding?
当然可以。我认为最简单的理解方法是把它看作模型的测验,一种测试模型在特定主题材料上的掌握程度或回答特定问题的能力的方式。就像你上微积分课后会有微积分考试来检验你是否学到了应该学的内容,评估(eval)也是一样的。它们用于测试模型在创意写作、研究生水平的科学以及竞争性编程等方面的表现。评估对于未来产品的开发者来说非常重要,因为它可以帮助他们更好地了解和改进模型的能力。
And so you have these set of evals that basically perform as benchmarks for how smart or capable the model is. Is it like a simple way to think about it like unit tests for model of? Unit tests in general for models totally. Great, great. Okay, and then why is this so important for people that don't totally understand what the hell's going on here with evals? Why is this so key to building AI products? Well, it gets back to what I was saying. You need to know whether your model is going to, there are certain things that models will get right 99.95% of the time and you can just be confident. There are things that they're gonna be 95% right on and things that are gonna be 60% right on.
这段英文大意是:
这些评估相当于衡量模型智能或能力的基准。是否可以把它简单地理解为模型的单元测试?对于模型来说,确实可以这样理解。好的,那么对于那些不太理解评估重要性的人来说,为什么这对构建人工智能产品如此关键呢?这回到了我之前说的一个点。你需要知道你的模型在正确率方面能达到什么水平。某些事情,模型99.95%的情况下都做对了,你可以很有信心。但是也有些事情模型只有95%或60%的正确率。
If the model's 60% right on something, you're gonna need to build your product totally differently. And by the way, these things aren't static either. So a big part of evals is if you know, you're building for some use case. So let's say, let's take our deep research product, which is one of my favorite things that we've released maybe ever. All right, the idea is with deep research for people who haven't used it, you can give ChatGPT now an arbitrarily complex query. It's not about returning you an answer from a search query, which we can also do. It's, here's a thing that if you were gonna answer it yourself, you'd go off and do two hours of reading on the web and then you might need to read some papers and then you would come back and start writing up your thoughts and realize you had some gaps in your thinking so you go out and do more research.
如果一个模型在某个问题上有60%的正确率,你就需要完全不同地构建你的产品。而且,这些事情也不是一成不变的。因此,评估的一个重要部分是,如果你知道你正在为某个特定的用例进行构建。举个例子,比如说我们的深度研究产品,这可能是我最喜欢的产品之一。对那些还没有使用过的人来说,深度研究的理念是,你可以向ChatGPT提出一个复杂的查询请求。这不是通过搜索查询返回一个答案的问题,我们也可以做到这一点。而是你提出一个问题,如果你自己要回答,你可能需要在网上阅读两个小时,然后需要查看一些论文,之后再开始写下你的想法,可能会发现思考中有一些空白,于是你再去进行更多的研究。
You might take you a week to write some 20-page answer to this question. You can let ChatGPT just like chug for you for 25-30 minutes. It's not that immediate answers you're used to, but it might go work for 25-30 minutes and do work that would have taken you a week. So as we were building that product, we were designing evals, at the same time as we were thinking about how this product was gonna work and we were trying to go through hero use cases. Here's a question you wanna be able to ask. Here's an amazing answer for that question and then turning those into evals and then hill climbing on those evals.
翻译如下:
你可能需要花费一周时间才能写出对这个问题的大约20页的回答。但是,你可以让ChatGPT花25-30分钟为你处理。虽然这不是你习惯的即时回答,但使用ChatGPT的话,它可能工作25-30分钟,就能完成需要你花一周时间才能完成的工作。在构建这个产品时,我们同时在设计评估机制,并且思考产品的运作方式。我们尝试通过一些极具代表性的使用场景:这里是一个你想问的问题,这里是一个令人惊叹的答案,然后将这些变成评估标准,并在这些评估上不断优化。
So it's not just that the model is static and we hope it does okay on a certain set of things, you can teach the model. You can make this a continuous learning process. And so as we were fine-tuning our model for deep research to be able to answer these things, we were able to test, is it getting better on these evals that we said were important measures of how the product was working? And it's when you start seeing that and you start seeing performance on evals going up, you start saying, okay, I think we have a product here.
所以,这不仅仅是一个静态的模型,我们希望它在某些事情上表现良好。你可以对模型进行训练,让它不断地学习。当我们为深入研究而微调我们的模型,使其能够回答这些问题时,我们能够测试模型是否在我们认为重要的评估上表现得更好。当你开始注意到这一点,并看到在评估中的表现提升时,你会开始认为:好的,我觉得我们有一个成功的产品了。
You made a comment along these same lines around evals that AI is almost like capped in how amazing it can be by how good we are at evals. Does that resonate any more thoughts along those lines? These models are their intelligence and intelligence is so fundamentally multi-dimensional. So you can talk about a model being amazing at competitive coding, which may not be the same as that model being great at front end coding or back end coding or taking a whole bunch of code that's written in cobalt and turning it into Python. Like, and that's just within the software engineering world.
你之前也提到过,AI 的出色程度几乎受到我们评估方式的限制。这让人想到了更多类似的想法吗?这些模型的智能实际上是多维度的。比如,一个模型在竞技编程方面表现得非常出色,但这并不意味着这个模型在前端编程、后端编程,或者将一大堆用 COBOL 编写的代码转换成 Python 方面同样优秀。而这仅仅是在软件工程领域中的例子。
And so I think there's a sense in which you can think of these models as incredibly smart, very like factually aware intelligences. But still most of the world's data, knowledge, process is not public. It's behind the walls of companies or governments or other things. And same way, if you were going to join a company, you would spend your first two weeks onboarding. You'd be learning the company specific processes. You'd get access to company specific data. It's, you can teach these models are smart enough. You can teach them anything. But they need to have the sort of the raw data to learn from.
我认为可以把这些模型看作非常聪明且掌握大量事实知识的智能体。然而,世界上大多数数据、知识和流程并不是公开的,它们隐藏在公司、政府或其他机构的内部。就像你加入一家公司时,需要花前两周进行入职培训,学习公司特有的流程,并获取公司专属的数据。这些模型非常聪明,可以被教会任何东西,但前提是要有原始数据供它们学习。
And so there's a sense in which, now I think the future is really going to be incredibly smart, broad base models that are fine tuned and tailored with company specific or use case specific data so that they perform really well on company specific or use case specific things. And you're going to measure that with custom e-vals. And so what I was referring to is just like these models are really smart. You need to still teach them things if the data's not in their training set. And there's a huge amount of use cases that are not going to be in their training set because they're relevant to one industry or one company.
我认为未来的发展将会是非常智能的基础模型,这些模型将通过公司特定或应用场景特定的数据进行微调和定制,从而在这些特定领域表现得特别出色。你将通过定制的评估来衡量它们的表现。虽然这些模型本身已经很聪明,但如果某些数据不在它们的训练集中,你仍然需要对它们进行针对性的教学,因为有大量与特定行业或公司相关的用例不在它们原有的训练数据中。
I'm just going to keep following the thread that you're leading us down. And but I'm going to come back because I want more questions around some of these things. So you came to a space that I think a lot of AI founders are thinking about is just where's opening eye not going to come squash me in the future or one of the other foundational models. And so it's unclear to a lot of people just like should I build a startup in the space or not? Is there any advice you have or any guidance for where you think opening eye or just foundational models in general likely won't go and where you have an opportunity to build a company?
我会继续跟随你带领我们的思路。不过,我会回来,因为我想对其中的一些问题提出更多的疑问。你提到的一个领域是许多人工智能创始人正在思考的,即将来开放AI或其他基础模型不会进入并对我构成威胁的领域。因此,很多人不清楚是否应该在这个领域创建初创公司。你有什么建议或指导意见,可以指出开放AI或基础模型可能不会涉足的地方,从而让我们可以在那个领域创建公司的机会吗?
Well, one of my, so this is something that Ev Williams used to say back at Twitter that's always stuck with me, which is no matter how big your company gets, no matter how incredible the people are, they're way more smart people outside your walls than there are inside your walls. And that's why we are so focused on building a great API. We have three million developers using our API. No matter how ambitious we are, how big we grow, by the way, we don't want to grow super big, there are going to be, there are so many use cases, places in the world where AI can fundamentally make our lives better.
好吧,我有一个观点,这是Ev Williams在Twitter时常说的,并一直让我印象深刻,那就是不管你的公司有多大,不管公司里的人多么出色,墙外总有更多聪明的人。因此,我们非常注重打造一个优秀的API。目前有三百万开发者在使用我们的API。不管我们的目标多么宏伟,不管我们公司成长到多大,顺便说一句,我们并不希望公司变得过于庞大,世界上还有许多地方可以利用AI从根本上改善我们的生活。
We're not going to have the people, we're not going to have the know how to build most of these things. And I think like I was saying, the data is industry-specific, use case-specific, behind certain company walls, things like that. And there are immense opportunities in every industry and every vertical in the world to go build AI-based products that improve upon the state of the art. And there's just no way we could ever do that ourselves. We don't want to, we couldn't if we did want to. And we're really excited to power that for three million plus developers and way more in the future.
我们不会有人力和技术去构建大部分这些东西。正如我所说,许多数据是特定于行业、用例,并隐藏在某些公司的内部。但在世界上的每一个行业和领域,都有巨大的机会可以去开发基于人工智能的产品,以改进现有的技术水平。我们不可能单靠自己实现这一切,即使我们想做也做不到。我们非常高兴能够为超过三百万开发者及未来更多的人提供这种动力。
Coming back to your earlier point about the tech changing constantly and getting faster, not exactly knowing what you'll have by the time you launch something in terms of the model. I'm curious what allows you to ship quickly and consistently and such great stuff. And it sounds like one answer is bottoms up, empowered teams, versus a very top-down roadmap that's planned out for a quarter. What are some of those things that allow you to ship such great stuff so often so quickly?
回到您之前提到的关于技术不断变化并加速发展的观点,我们在推出产品时并不完全清楚最终会有什么样的模型。我很好奇,是什么让你们能够快速、一贯地推出如此出色的产品。听起来其中一个答案是依靠自下而上的、拥有自主权的团队,而不是按季度规划的高度自上而下的路线图。还有哪些因素让你们能够如此频繁而迅速地推出如此出色的产品呢?
Yeah, we try and have a sense of where we're trying to go. Point ourselves in a direction so that we have some rough sense of alignment. Like, thematically. I don't, for a second. And we do quarterly road mapping. We laid out sort of a year-long strategy. I don't for a second believe that what we write down in these documents is what we're going to actually ship three months from now, let alone six or nine. But that's OK. There's a, I think it's like an Eisenhower quote, plans are useless, planning is helpful, which I totally subscribed to, especially in this world.
是的,我们试图了解我们的目标方向。给自己指引一个方向,以便在大体上保持一致,比如在主题上。我并不认为这些文档中写下的内容就是我们三个月后,甚至六个月或九个月后实际上会推出的东西。但这没关系。我想这是艾森豪威尔的一句话,大意是计划是无用的,但规划是有帮助的,我对此深信不疑,尤其是在这个世界里。
It's really valuable, if you think about quarterly road mapping, for example, it's really valuable to have a moment where you stop and go, OK, what did we do? What worked? What went well? What didn't go well? What did we learn? And now what do we think we're going to do next? And by the way, everybody has some dependencies. You need the infrastructure team to do the following things. Partnership with research here. And so you want to have a second to kind of check your dependencies, make sure you're good to go, and then start executing.
如果你考虑季度规划的话,这样的时刻是非常有价值的。比如,停下来问自己:我们做了什么?哪些做得好?哪些做得不理想?我们学到了什么?接下来我们打算做什么?值得一提的是,每个人都有一些相互依赖的部分。你可能需要基础设施团队完成一些任务,与研究部门的合作等等。因此,你需要花点时间检查这些相互依赖的环节,确保一切准备就绪,然后再开始执行。
We try and keep that really lightweight, because it's not going to be right. We're going to throw it out halfway, because we will have learned new things. So the moment of planning is helpful, even if you're only going to, it's only partially right. So that's, I think, just expecting that you're going to be super agile, and that there's no sense writing a three-month road map, let alone a year-long road map, because the technology is changing underneath you so quickly.
我们尽量保持计划简单轻便,因为计划不可能是完全正确的。我们会在计划进行到一半时抛弃它,因为我们会学到新的东西。因此,规划这个过程本身是有帮助的,即使它只是部分正确。所以,我认为我们需要准备好保持高度的灵活性,没有必要写一个三个月的计划,更不用说一年了,因为技术在快速变化。
We really do try and go very strongly bottoms up, kind of subject to our overall directional alignment. We have great people. We have engineers and PMs and designers and researchers who are passionate about the products they're building, and have strong opinions about them. And are also the ones building them. And so they have a real sense of what the capabilities are, too, which is super important. And so I think you want to be more bottoms up in this way. And so we operate that way.
我们确实非常注重从下而上的方法,同时遵循整体方向的统一。我们拥有出色的人才,包括工程师、产品经理、设计师和研究员,他们对自己正在开发的产品充满热情,并且对产品有坚定的见解。而且他们正是那些亲手打造产品的人,因此对产品的实际能力有着真实的理解,这一点非常重要。因此,我认为我们应该更注重这样的从下而上的方法,这也是我们的运作方式。
We are happy making mistakes. We make mistakes all the time. It's one of the things I really appreciate about Sam. He pushes us really hard to move fast. But he also understands that with moving fast comes, we didn't quite get this right. Or the wheel off this thing, it didn't work. We'll roll it back. Look at our naming. Our naming is horrible. There's a lot of questions people have for you. Yeah, model names. Yeah. It's absolutely atrocious. And we know it. And we'll get around to fixing it at some point. But it's not the most important thing. And so we don't spend a lot of time on it. But it also shows you how it doesn't matter. Again, Chats GPT, the most popular fastest-growing product in history, models are, it's the number one AI API and model.
我们乐于犯错误,我们一直在犯错误。这是我特别欣赏Sam的一点。他督促我们快速行动,但也明白快速行动的同时,难免会有做得不够好的地方。有时事情的发展有些偏差,那就暂时撤回。比如我们的命名,真的很糟糕,很多人对此有疑问。对,模型命名,确实糟糕透顶。我们知道这一点,并会在某个时候去解决,但这并不是最重要的事情,所以我们不会花太多时间在这上面。但这也表明,这其实没那么重要。比如,ChatGPT,历史上增长最快、最受欢迎的产品之一,模型是头号AI API和模型。
So clearly it doesn't matter that much. And we name things like O3 Mini High. And it's just my stuff. Oh, man, I love it. OK, so you talked about road mapping and bottom's up. And I'm really curious, is there like a cadence or ritual of aligning with you or Sam or you? Review everything that's going out? Like is there a meeting every week or every month where you guys see what's happening? On key projects, so we do product reviews and things like that, like you would expect. There isn't a ritual because there isn't.
所以显然这并不那么重要。我们给东西起名字,比如O3 Mini High。这些只是我的东西。哦,伙计,我真的喜欢它。好吧,你刚才谈到路线图和自下而上。我真的很好奇,你们或者Sam之间有没有一个协调的节奏或仪式?比如,你们有没有每周或每月开会,查看正在进行的一切?在关键项目上,我们确实会做产品评审之类的事情,就像你所想的那样。之所以没有一个仪式,是因为确实没有。
I would never want us to be blocked on launching something, waiting for a review with me or Sam if we can't get there. If I'm traveling or Sam's busy or whatever, that's a bad reason for us not to ship. So obviously for the biggest, most high priority stuff, we have a pretty close beat on it. But we really try not to, frankly. Like we want to empower teams to move quickly. And I think it's more important to ship and iterate.
我绝对不希望因为我或Sam无法进行审核而导致我们无法发布某个东西。如果我在旅行或者Sam很忙等等,这都不应该成为我们不能发布的理由。当然,对于那些最重要和最优先的事情,我们会密切关注。但说实话,我们真的尽量避免这种情况。我们希望给团队足够的自主权,让他们能够快速行动。我认为能够发布并在其后进行迭代更为重要。
So we have this philosophy that we call iterative deployment. And the idea is like we're all learning about these models together. So there's a real sense in which it's way better to like ship something even when you don't know the full set of capabilities and iterate together, like in public. And we kind of co-evolve together with the rest of society as we learn about these things and where they're different and where they're good and bad and weird. I really like that philosophy.
这段话的意思是“我们有一个被称为‘迭代部署’的理念。这个想法是我们一同在学习这些模型。所以就算我们对全部功能还不完全了解,尽早推出产品并在公众中进行迭代,要好得多。这样,我们可以与社会其他部分共同进化,一起了解这些模型的独特之处、优缺点以及奇怪的地方。我个人非常喜欢这种理念。”
There's also a bit of, I think the other thing that like ends up being a part of our product philosophy is the sense of like model maximalism. The models are not perfect. They're going to make mistakes. You could spend a lot of time building all kinds of different scaffolding around them. And by the way, sometimes we do because sometimes there are things, you know, kinds of errors that you just don't want to make. But we don't spend that much time building scaffolding around the parts that don't match that.
我们的产品理念中还有一个元素,我认为可以称之为“模型极大化”。模型并不完美,它们会犯错。你可能会花很多时间在模型周围构建各种支架。顺便说一下,有时候我们确实这样做,因为有些错误是绝对不能犯的。但是,对于那些不属于这个情况的部分,我们不会花那么多时间去为其构建支架。
Because our general mindset is in two months, there's going to be a better model and it's going to blow away whatever, you know, the current set of limitations are. And so if you're building, and we say this to developers too, if you're building and the product that you're building is kind of right on the edge of the capabilities of the models, keep going. Because you're doing something right. Because you give it another couple months. And the models are going to be great. And suddenly the product that you have that just barely worked is really going to sing. And you know, that's kind of how you make sure that you're really pushing the envelope and building new things. I had a founder of Bolt on the podcast, Stack Blitz is the company name. And he’s shared the story that they've been working on this product for seven years. You have scenes and it was failing. Nothing was happening.
我们的普遍心态是,再过两个月,就会有一个更好的模型出现,它将突破当前的各种限制。所以如果你正在开发产品,而这个产品正好在模型能力的边缘发展,那就继续吧。因为你做得对。再过几个月,模型会变得更强大,那个原本勉强能用的产品就会变得非常出色。这就是确保你不断突破极限、创造新事物的方法。我在播客中采访了Bolt的创始人,他们的公司名字是Stack Blitz。他分享了一个故事:他们在这个产品上已经努力了七年。起初,一切都不顺利,仿佛停滞不前。
And then Nelson, I was sorry to mention a competitor but Claude came out or a sonnet 3.5 came out. And Nelson, everything worked. And they've been building all this time and finally worked. And I hear that a lot with YC just like things that never were possible now are just becoming possible every few months with the updates and the models. Yeah, absolutely. Let me actually ask this. I wasn't planning to ask this, but I'm curious if you have any quick thoughts. Why is a sonnet so good at coding and kind of thoughts on your stuff getting as good and better at actual coding? Yeah. I mean, kudos to Anthropic. They've built very good coding models, no doubt. We think that we can do the same. Maybe by the time this podcast is shipped, we'll have more to say. But either way, all credit to them.
然后,Nelson,我很抱歉提到一个竞争对手,但Claude或者sonnet 3.5推出了。而Nelson,一切都运作良好。他们一直在不断改进,而现在终于奏效了。我在YC这边听到很多类似的情况,以前不可能的事情现在随着更新和模型的升级每隔几个月就成为可能。是的,绝对如此。我其实并没有计划问这个问题,但我很好奇你的快速看法。为什么sonnet在编程方面如此出色,以及关于你们的产品在编程能力上变得同样出色甚至更好的想法?是的,我要向Anthropic致敬,他们确实建立了非常优秀的编程模型,毋庸置疑。我们认为我们也能做到这点。也许在这个播客播出的时候,我们会有更多要说的。不管怎样,所有的功劳都归功于他们。
I think this intelligence is really multi-dimensional. And so I think there's the model providers, it used to be that OpenAI had this massive model 12 months or something ahead of everybody else. That's not true anymore. I like to think we still have a lead. I'd argue that we do. But it's certainly not a massive one. And that means that there are going to be different places where the Google models are really good or where Anthropics models are really good. Or where we're really good and our competitors are like, ah, we got to get better at that. And it actually is easier to get better at a certain thing once someone's proved it possible than it is to forge a path through the jungle and doing something brand new.
我认为这种智能确实是多维的。过去,OpenAI 的模型领先其他人差不多有12个月。这种情况已经不再是真的了。我喜欢认为我们仍然有一些领先优势,我也认为我们确实有。但绝对不是巨大的差距。这意味着在某些方面,可能是谷歌的模型表现特别好,或者是Anthropics的模型表现特别好,或者是我们表现特别好,然后我们的竞争对手就会觉得,他们必须在这方面提高。而实际上,一旦某个方面已经被证明是可行的,那么在这个方面提升就要比在未知领域开创全新事物来得容易得多。
So I just think, yeah, as an example, it was like nobody could break four minutes in the mile. And then finally, somebody did. And the next year, 12 more people did it. I think there's that all over the place. And it just means that competition is really intense. And consumers are going to win. And developers are going to win. And businesses are going to win in a big way from that. It's part of why the industry moves so fast. But all respect to the other big model providers, models are getting really good. We're going to move as fast as we can. And I think we've got some good stuff coming. Exciting.
所以我认为,这就像一个例子,以前没人能在一英里跑进四分钟,然后最终有人做到了。接下来的一年,又有12个人也做到了。我觉得这样的情况普遍存在,这也意味着竞争非常激烈。消费者将从中受益,开发者将从中受益,企业也将大幅受益。这也是行业发展如此迅速的原因之一。尽管我们尊重其他的大型模型提供者,但我们的模型也在变得非常优秀。我们会尽可能快地发展下去,我相信我们会推出一些很不错的东西,令人期待。
This makes me also think about in many ways other models are better at certain things. But somehow, ChatGPT is like the, like if you look at all the awareness numbers and usage numbers, it's like no matter where you guys are in the rankings, people seem to just like think of AI and ChatGPT almost as the same. What do you think you did right to kind of win and the consumer mindset, at least at this point, and awareness in the world? I think being first helps, which is one of the reasons why we're so focused on moving quickly. We like being the first to launch new capabilities, things like deep research. We've also, our models are very, they can do a lot of things.
这让我想到,在很多方面,其他模型在某些领域可能表现得更好。但不知为何,ChatGPT就像是,如果你看所有的知名度和使用率数据,不管你们在排行榜上的位置如何,人们似乎总是把AI和ChatGPT视为同一回事。你认为你们做对了什么,才能在消费者心目中赢得这个位置,并在全球范围内提高知名度?我认为先行一步有帮助,这也是我们如此专注于快速发展的原因之一。我们喜欢率先推出新功能,比如深入研究。我们的模型也非常全面,可以处理很多事情。
So they can take real-time video input. They can, you have speech to speech. You can do speech to text and text to speech. They can do deep research. They can operate on a canvas. They can write code. And so ChatGPT can kind of be this one-stop shop where all the things that you want to do are possible. And as we go forward in it, we have more agentic tools like operator where it's browsing for you and doing things for you on the web, more and more you're going to be able to come to this one place to ChatGPT, give it instructions, and have it accomplish real things for you in the world. There's just like something fundamentally valuable in that.
他们可以实时接收视频输入,支持语音转换语音、语音转换文本和文本转换语音,能够进行深入研究,也可以在画布上操作,还能编写代码。所以,ChatGPT可以成为一个几乎万能的平台,帮助你实现各种需求。随着技术的发展,我们引入了更多自主工具,比如“操作员”,它可以为你浏览网页并执行各种任务。你会发现越来越多的事情可以在这个平台上通过ChatGPT来完成,只需给出指令,它就能为你在现实世界中实现。这里面蕴含着一些非常有价值的东西。
And so we think a lot about that. We think, and we move, we try to move really fast so that we are always the most useful place for people to come to. What would you say is the most counterintuitive thing that you've learned after building AI products or working at OpenAI, something that's just like, I did not expect that. I don't know, maybe I should have expected this. But one of the things that's been funny for me is the extent to which you can kind of reason. When you're trying to figure out how some product should work with AI, you can often, or even why some AI thing happens to be true, you can often reason about it, the way you would reason about another human. And it kind of works.
我们对此思考了很多。我们思考并迅速采取行动,努力始终成为对人们最有用的地方。你认为在构建人工智能产品或在OpenAI工作后学到的最出乎意料的事情是什么?有没有让你觉得意想不到的东西?也许我应该能预料到这些。但令我感到有趣的一点是,你很大程度上可以通过推理去理解。当你试图搞清楚某个产品应该如何与人工智能搭配工作时,或者为什么某个人工智能现象会是真实的,你常常可以像推理人类一样去推理它,并且这种方法还挺有效的。
So maybe a couple examples. When we were first launching our reasoning model, we were the first to build a model that could reason, that could, instead of giving you just a quick, system one answer right away to every question you asked. It was the third emperor of the Holy Roman Empire. Here's an answer. You could ask it hard questions and it would reason. The same way that if I asked you to do a crossword puzzle, you couldn't just snap, fill in everything. You would be, well, this one across, I think it could be one of these two, but that means there's an A here. So that one has to be this. A way, backtrack, kind of step-by-step build up from where you are.
也许我可以举几个例子。当我们第一次推出我们的推理模型时,我们是第一个构建能够进行推理的模型,而不仅仅是对你提出的每个问题立刻给出一个快速的、简单的答案。例如,若你问它神圣罗马帝国的第三任皇帝是谁,它能给出答案。但我们这次构建的模型能接受更复杂的问题并进行推理。就像如果我让你做填字游戏,你不可能一下子就全部填好。你可能会说,这个横向的词可能是这两个中的一个,因此这里应该有一个"A",所以那个词必须是某个特定的。这个过程是一种回溯的、逐步推进的方法。
Same way you answer any difficult logistical problem, any scientific problem. So this reasoning breakthrough was big, but it was also the first time that a model needed to sit and think, and that's a weird paradigm for a consumer product. You don't normally have something where you might need to hang out for 25 seconds after you ask a question. And so we were trying to figure out, what's the UI for this? Because it's also not like with deep research, where the model is going to go and think for 25 minutes sometimes. It's actually not that hard, because you're not going to sit and watch it for 25 minutes. You're going to go do something else.
在处理复杂的物流问题或科学问题时,我们通常使用同样的方式来解决。这种推理突破很重要,但这也是首次需要一个模型坐下来“思考”,这对消费产品来说是一个新奇的模式。通常情况下,我们并不习惯在问完问题后等待25秒。因此,我们努力思考,这种情况下的用户界面应该怎么设计?因为这也不像深入研究那样需要模型花25分钟去思考。其实,这也不是特别难,因为在那25分钟里,你不可能一直坐在那里盯着它看,你会去做别的事情。
You're going to go to another tab or go get lunch or whatever. And then you'll come back and it's done. When it's like 20, 25 seconds or 10 seconds, it's a long time to wait, but it's not long enough to go do something else. And so you actually need, and so you can think, like if you asked me something that I needed to think for 20 seconds to answer, what would I do? I wouldn't just go mute and not saying anything and shut down for 20 seconds and then come back. So we shouldn't do that. We shouldn't just have a slider sitting there. That's annoying. But I also wouldn't just start babbling every single thought that I had. So we probably shouldn't just expose the whole chain of thought as the model's thinking.
你会去打开另一个标签页、去吃午饭或者做其他事情。然后,当你回来时,它就完成了。当等待时间是20、25秒或10秒时,这似乎很长,但又不足以去做其他事情。所以实际上你需要想一想,比如如果你问我一个需要思考20秒才能回答的问题,我会怎么做?我不会只是默不作声地停下来20秒然后再说话。所以我们不应该那样做,我们不应该只是放着一个进度条在那,这是很烦人的。但我也不会开始随机地说出每一个想法,所以我们可能也不应该在模型思考时,暴露整个思考过程。
But I might go, that's a good question. All right, I might approach it like that and then think, you know, you're sort of like maybe giving little updates. And that's actually what we ended up shipping. You have similar things where you can find situations where you get better thinking sometimes out of a group of models that all try and attack the same problem. And then you have a model that's looking at all their outputs and integrating it and then giving you a single answer at the end. I mean, sounds a little bit like brainstorming, right?
我可能会去,这是个好问题。好吧,我可能会这样去处理它,然后想想看,你知道,你有点像是在提供一些小更新。实际上,这就是我们最后发布的产品。你会发现类似的情况,有时候一组模型一起解决同一个问题时,你可以得到更好的思考。然后有一个模型负责查看所有这些输出,进行整合,最后给出一个单一的答案。听起来有点像头脑风暴,对吧?
Like I certainly have better ideas when I get in a room and brainstorm with other people because they think differently than me. And so anyways, there's just like all these situations where you can actually kind of reason about it like a group of humans or an individual human. And it sort of works, which I don't know, maybe I shouldn't have been surprised, but I was. That is so interesting. Because when I see these models operate, I like, I never even thought about you guys designing that experience.
当然,当我与其他人一起进入一个房间进行头脑风暴时,我的想法会更好,因为他们的思维方式和我不同。因此,在很多情况下,你可以把情况看作是一群人或者一个人来进行思考。这种方式居然有效果,这让我感到惊讶,也许我本不该感到惊讶,但我确实感到惊讶。这真是很有趣。因为当我看到这些模式运作时,我从未想到过你们是如何设计这种体验的。
Like to me just feels like this is what the LLM does. It just sits there and tells me what it's thinking. And I love this point you're making of like, we like, let's make it feel like a human operating. And how does human operate? Well, they just talk a lot, they think, here's the thing I should explore. And I love that deep sequence, like to the extreme of that, right? Where they're just like, here's everything I'm doing and thinking in. And people actually like that too, I guess was that, was that surprising to you?
在我看来,这就是大型语言模型(LLM)所做的事情。它就像坐在那里,告诉我它在想什么。你提到的一点我很喜欢,就是让它感觉像人类在操作。人类是怎么做的呢?他们会多说话,并思考“这是我该探索的事情。” 我很喜欢这种深入的过程,甚至达到极致,就是把他们正在做和思考的所有事情都展示出来。我猜人们也喜欢这样,对你来说,这会让你感到惊讶吗?
Like, oh, maybe that could work too. People seem to like everything. Yeah, we learned from that actually. Because when we first launched it, we kind of gave you like the subheadings of what the model was looking about, but not much more. And then deep sea launched and it was a lot. And we kind of went, you know, I don't know if everyone wants like that. There's some novelty effect to seeing what the model's really thinking about. We felt that too when we were looking at it internally.
就像,“哦,也许那也可以。”人们似乎对所有东西都很喜欢。是的,我们实际上从中学到了很多。因为当我们刚刚推出它的时候,我们给你展示了模型正在关注的副标题,但没有更多内容。然后当深海模式发布时,信息量很大。我们在想,也许并不是每个人都想要那样的信息。有些人就喜欢看到模型真正思考什么的新奇感。当我们内部查看这一点时,我们也有同样的感觉。
It's interesting to see the model's chain of thought. But it's not, you know, I think at the scale of like, 400 million people, you don't want to see the model kind of like babble a bunch of things. And so what we ended up doing was summarizing it in interesting ways. So instead of just getting the subheadings, you're kind of getting like one or two sentences about how it's thinking about it. And you can learn from that. So we kind of tried to find a middle ground that we thought was an experience that would be meaningful for most people. But, you know, showing everybody like three paragraphs is probably not the right answer.
看到模型的思维过程是很有趣的。但你知道,对于4亿人这样的大规模来讲,你并不想看到模型只是胡乱说一堆东西。所以,我们最后选择以一种有趣的方式对其进行总结。这样一来,你不只是得到小标题,而是能看到一两句话,了解模型是怎么思考的,你可以从中学习。所以我们尽量找到一个折中的方法,希望能为大多数人提供有意义的体验。不过,给每个人看三大段的文字可能并不是最好的选择。
This reminds me of something else. You said at the summit that has really stuck with me, this idea that chat, people always make fun of like, chat is not like the future interface for how we interact with AI. But you made this really interesting point that may argue the other side, which is like as humans, we interface by talking and IQ of a human can span from really low to really high. And it all works because we're talking to them and chat is the same thing. And it can work on all kinds of intelligence levels. Maybe just share, maybe I just shared it, but I guess anything there about just why chat actually ends up being such an interesting interface for all of us.
这让我想起了其他事情。您在峰会上说的一句话让我印象深刻,就是关于聊天这个观点。人们总是取笑说聊天不会是我们与人工智能互动的未来界面,但您提出了一个很有趣的看法,可能会支持相反的观点:我们人类是通过交流来互动的,人类的智商范围可以从很低到很高,而这一切之所以有效,是因为我们通过交谈交流,聊天也是如此。它可以在各种智力水平上起作用。也许我刚才已经分享过这个观点,但其中为什么聊天最终成为我们所有人都感兴趣的界面,还有没有其他要补充的?
Yeah, I don't know if maybe I'm, maybe this is one of those things I believe that most people don't believe. But I think chat is an amazing interface because it's so versatile. People tend to go, oh, chat, yeah, well, that's just like, you know, we'll figure out something better. And I kind of think, I kind of think this is, it's incredibly universal because it is the way we talk. Like I can talk to you verbally, like we're talking now, I can, you know, we can see each other and interact. We can talk on WhatsApp and, you know, be texting each other. But all of these things is this sort of like unstructured, you know, method of communication. And that's how we operate. If I had to, and if I had some more rigid interface that I was allowed to use when we spoke, I would be able to speak to you about, you know, far fewer things and it would actually get in the way of us having like maximum communication bandwidth.
是的,我不知道这是不是那种我相信而大多数人不相信的事情之一。但我认为聊天是一种很棒的界面,因为它非常多功能。人们往往会说,哦,聊天嘛,总会有更好的东西出现。然而,我觉得聊天是极其普遍的交流方式,因为这就是我们交流的方式。就像我们现在这样可以进行口头交流,我们也可以在WhatsApp上互发信息。这些都是非结构化的沟通方式,而这正是我们运作的方式。如果我必须使用某种更严格的界面来交谈,我能够讨论的事情会少得多,这反而会妨碍我们最大程度地进行沟通。
So there's something magical. And by the way, in the past, it never worked because models, there wasn't a model that was good at understanding all of the complexity and nuances of human speech and that's the magic of LLM's. So to me, it's like an interface that's exactly fit to the power of these things. And that doesn't mean that it always has to be just like, I don't necessarily always want to type. But if you do want that very open-ended, flexible communication medium, it may be that we're speaking and the model's speaking back to me, but you still want that, like that very sort of lowest common denominator, no restrictions way of interacting.
所以这是某种神奇的东西。顺便说一下,以前这个方法行不通,因为没有一个模型能够很好地理解人类语言的复杂性和细微差别,而这正是大型语言模型(LLM)的神奇之处。对我来说,这就像是一个完美契合这些事物功能的界面。而且这并不意味着我们总是必须通过打字来互动。如果你想要那种非常开放、灵活的交流方式,可能是我们通过说话进行交流,模型也在和我们对话,但你还是希望这种互动方式能保持在一个没有限制、最简单的层面上。
That is so interesting. That's really changed the way I think about the stuff is that point that chat is just so good for this very specific problem of talking to a superintelligence, basically. By the way, I think there are like, it's not that it's only chat either. Like there are, if you have high volume use cases where they're more prescribed and the, you don't actually need the full generality, there are many use cases where it's better to have something that's less flexible, more prescribed, faster to specific tasks. And those are great too. And you can build all sorts of those. And, but you still want chat as like this baseline for anything that falls out of whatever vertical you happen to be building for. It's like a catch-all for like every possible thing you'd ever want to express to a model.
这真是太有趣了。这真的改变了我对于某些事情的看法,特别是聊天在与超级智能对话这个非常具体的问题上真的是非常有效的。另外,我认为不仅仅是聊天适用,如果你有一些高使用量且较为明确的应用场景,并且不需要完全的通用性,有很多情况下使用那些不太灵活但更专注于特定任务、更快速的工具会更好。这些工具也非常出色,并且可以构建出各种用途的工具。但是,聊天仍然可以作为一个基础工具,适用于你构建的任何垂直领域之外的情况。它就像是一个万用工具,可以处理任何你想对模型表达的事情。
I'm excited to chat with Christina Gilbert, the founder of OneSkima, whenever a long-time podcast sponsor. Hi, Christina. Yes, thank you for having me on, Lenny. What is the latest with OneSkima? I know you now work with some of my favorite companies like Ram, Vanta, Scale, and Watershed. I heard that you just launched a new product to help product teams import CSVs from especially tricky systems like ERPs.
我很高兴能和OneSkima的创始人Christina Gilbert聊天,她是一位长期支持我们播客的赞助商。你好,Christina。非常感谢你邀请我,Lenny。请问OneSkima最近有什么新动态?我知道你们现在已经和一些我最喜欢的公司合作了,比如Ram、Vanta、Scale和Watershed。我听说你们新推出了一款产品,帮助产品团队从复杂的系统(比如ERP)中导入CSV文件。
Yes, so we just launched OneSkima file feeds, which allows you to build an integration with any system in 15 minutes, as long as you can export a CSV to an SOT peopleder. We see our customers all the time getting stuck with hacks and workarounds. And the product teams that we work with don't have to turn down prospects because their systems are too far to integrate with. We allow our customers to offer thousands of integrations without involving their engineering team at all.
好的,我们刚刚推出了 OneSkima 文件传输功能,只要你能将 CSV 文件导出到 SOT 文件夹,就可以在 15 分钟内与任何系统建立集成。我们经常看到客户因各种临时解决方案和应对措施而陷入困境。与我们合作的产品团队不再需要因为系统难以对接而拒绝潜在客户。我们的服务让客户提供数千种集成,而无需联系他们的工程团队。
I can tell you that if my team had to build integrations like this, how nice would it be to be able to take the software roadmap and instead use something like OneSkima and not just to build it, but also to maintain it forever? Absolutely, Lenny. We've heard so many four stories of multi-day outages from even just a handful of ad records. We have laser-focused on integration reliability to help teams end all of those distractions that come up with integrations. We have a built-in validation layer that stops any bad data from entering your system. And OneSkima will notify your team immediately of any data that looks incorrect. I know that importing incorrect data can cause all kinds of pain for your customers and quickly lose their trust.
我可以告诉你,如果我的团队需要构建这样的集成系统,那么能够利用像OneSkima这样的工具,而不仅仅是建立它,还能永远维护它,该有多好啊!确实是这样,Lenny。我们听说过很多关于仅仅因为几条错误的数据记录,导致的长达几天的系统中断的故事。我们非常关注集成的可靠性,以帮助团队减少集成过程中出现的各种麻烦。我们有一个内置的验证层,可以防止任何错误数据进入你的系统。而且,OneSkima会立即通知你的团队任何看起来不正确的数据。我知道导入错误数据会给客户带来各种问题,并迅速失去他们的信任。
Christina, thank you for joining us and if you wanna learn more, head on over to oneskima.co. That's oneskima.co. I wanna come back to the you talked about researchers and their relationship with product teams. I imagine a lot of innovation comes from researchers just like having an inkling and then building something amazing and then releasing it and some ideas come from PMs and engineers. How did those teams collaborate? Does every team have a PM? Is it a lot of research-let stuff? Just like what give us a sense of just where ideas and products come from mostly? It's an area where we're evolving a lot. I'm really excited about it, frankly.
克里斯蒂娜,谢谢你加入我们。如果你想了解更多信息,可以访问oneskima.co。那是oneskima.co。我想回到我们之前谈论的关于研究人员和产品团队之间的关系。我想象很多创新来自于研究人员的一点灵感,他们创造出惊人的东西并发布出来,而有些想法则来自产品经理和工程师。那些团队是如何合作的?每个团队都有产品经理吗?是很多由研究主导的吗?给我们一个大致的感觉,想法和产品主要来自哪里?在这个领域,我们有很多进展,这让我十分兴奋。
I think if you go back a couple years when ChatGPT was just getting started, obviously I wasn't an open AI, so. We were more of a pure research company at the time. ChatGPT, if you remember, was a low-key research preview. For many years. Yeah, it wasn't a thing that the team launched thinking it was gonna be this massive product. Oh, ChatGPT, yeah. And it was just a way that we were gonna let people play with and iterate on the models. And so we were primarily a research company at a world-class research company.
我觉得如果回到几年前,当时ChatGPT刚刚起步,显然我不是OpenAI中的一员。那时我们更多是一个纯粹的研究公司。大家可能还记得,ChatGPT当时只是一个低调的研究预览版本,持续了很多年。对,它并不是团队推出时就预想的那种大型产品。哦,ChatGPT,对。它只是让人们可以体验和改进模型的一种方式。所以当时我们主要是一个世界级的研究公司。
And as ChatGPT has grown and as we built our B2B products and our APIs and other things, now we're more of a product company than we were. I still think we're, OpenAI should never be a pure product company. We need to be both a world-class research company and a world-class product company. And the two need to really work together. And that's the thing that I think we've been getting much better at over the last six months. If you treat those things separately, and the researchers go do amazing things and build models, and then they get to some state, and then the product and engineering teams go take them and do something with them.
随着ChatGPT的发展,我们建立了B2B产品、API等,现在比以前更像一家产品公司。但我认为OpenAI绝不应该仅仅是一家产品公司。我们需要同时成为一家世界级的研究公司和世界级的产品公司,两者必须紧密合作。我认为在过去的六个月里,我们在这方面有了很大的进步。如果将这两方面分开处理,研究人员可能会做出很棒的事情并构建出模型,然后等到模型达到某种状态后,产品和工程团队再去处理和应用这些模型。
We're effectively just an API consumer of our own models. The best products, though, are gonna be, is like I was talking about with deep research. It's a lot of iterative feedback. It's understanding the products you're trying to solve, or the problems you're trying to solve, building eVals for them, using those eVals to go gather data and fine-tune models to get them to be better at these use cases that you're looking to solve. It's a huge amount of back and forth to do it well. And I think the best products are gonna be end product design and research working together as a single team to build novel things.
我们其实就像是自己模型的API使用者。不过,最好的产品就像我之前提到的深入研究一样,需要大量的反复反馈。这包括理解您要解决的产品或问题,为其建立评估(eVals),利用这些评估去收集数据并微调模型,以便模型在这些用例上表现更好。做好这件事情需要大量的反复交流和反馈。而我认为,最好的产品将是终端产品设计和研究作为一个团队一起合作,创造出新颖的东西。
So that's actually how we're trying to operate with basically anything that we build. It's a new muscle for us, because we're kind of new as a product company, but it's one that people are really excited about, because we've seen every time we do it, we build something awesome. And so now every product starts like that. How many product managers do you have at OpenAI? I don't know if you share that number, but if you do. Not that many, actually, I don't know, 25. Maybe it's a little more than that, but my personal belief is that you want to be pretty PM light as an organization, just in general.
所以,实际上我们试图用这种方式来进行我们所做的一切。对于我们来说,这是一种新的尝试,因为我们作为一家产品公司还比较新手,但这种方式让人们感到非常兴奋,因为每次我们这样做的时候,都能创造出很棒的东西。所以现在每个产品都是这样开始的。你们在OpenAI有多少产品经理?不知道这个数字能不能分享,如果可以的话。实际上并不多,大概25个左右。也许稍微多一些,但我个人认为,作为一个组织,整体上应该尽量少一些产品经理。
I say this with love because I am a PM, but too many PMs causes problems. You know, we'll like fill the world with decks and ideas versus execution. So I think that the, I think it's a good thing when you have a PM that is working with maybe slightly too many engineers because it means that they're not going to get in and micromanage. You're going to leave a lot of influence and responsibility with the engineers to make decisions. It means you want to have really product focus engineers, which we're fortunate to have. We have an amazingly product focused like high agency engineering team.
我以充满善意的态度说这番话,因为我自己就是一名产品经理,但产品经理太多会引发问题。你知道的,我们可能更多地填满世界的是各种幻灯片和想法,而不是实施。所以我认为如果有一名产品经理同时与相对较多的工程师一起工作,这其实是一件好事。因为这样产品经理就不会过度干预,我们可以把更多的影响力和责任留给工程师,让他们自行做决定。这意味着你需要的是非常专注于产品的工程师,而我们恰好拥有这样的团队。他们在产品上有着惊人的专注和较强的自主决策能力。
But when you have something like that, you have a team that feels super empowered. You have a PM that's trying to really understand the problems and kind of gently guide the team a little bit, but has too much going on to get too far into the details. And you end up being able to move really fast. So that's kind of the philosophy we take. We want producty engine leads and producty engineers all the way through. We want not too many PMs, but really awesome high quality ones. And so far that seems to be working pretty well.
但当你拥有这样的情况时,你就有了一支非常有动力的团队。你的产品经理(PM)会努力去理解问题,并在一定程度上轻轻地引导团队,但由于事情太多,他们无法深入细节。因此,我们能够快速推进。这就是我们采取的理念。我们希望从头到尾都有具有产品思维的工程负责人和工程师。我们不需要太多的产品经理,但要确保他们非常优秀且高质量。目前,这种方法似乎运作良好。
I imagine being a PM at OpenAI is like a dream come true for a lot of people. At the same time, I imagine it's not a fit for a lot of people. There's researchers involved, very product minded engineers. What do you look for in the PMs that you hire there for folks that are like maybe a pro, I shouldn't go work there, I shouldn't even think about that. I think I've said this a few times, but high agency is something that we really look for. People that are not going to come in and kind of wait for everyone else to allow them to do something. They're just going to see a problem and go do it. That's, it's just a core part of how we work.
在OpenAI担任产品经理(PM)对于很多人来说可能就像梦想成真。但与此同时,我想这份工作对很多人来说可能并不合适。因为这里涉及到研究人员,以及非常有产品思维的工程师。那么,在招聘PM时,你们看重哪些特质呢?对于那些在考虑是否要在这里工作、甚至有些犹豫的人,你有什么建议吗?我想我已经多次提到过,我们非常看重的是高自主性。我们寻找的,是那些不会等别人许可才去做事情的人。他们会主动发现问题,然后立即着手解决。这是我们工作方式的核心之一。
I think people that are happy with ambiguity because there is a massive amount of ambiguity here. It is not the kind of place. And we have trouble sometimes with more junior PMs because of this because it's just not the place where someone is going to come in and say, okay, here's the landscape, here is your area. I want you to go do this thing. And that's what you want as an early career PM. We just, I mean, no one here has time. And nobody, the problems are too ill-formed and we're figuring them all out as we go. And so high agency, very comfortable with ambiguity. Ready to come in and help execute and move really quickly. That's kind of our recipe.
我认为那些能够接受不确定性的人会在这里感到满意,因为这里存在大量的不确定因素。这不是一个明确规定任务的地方。有时候,对于较资浅的产品经理来说,这会是个难题,因为这里不是一个有人会给你一幅全景图,并明确指示你负责哪个区域、要去做些什么的环境。这在职业生涯早期的产品经理来说是他们想要的,但是我们这里真的没有人有时间做这些事情。再说,问题本身都不够明确,我们都是在边走边摸索。因此,我们需要高度自主,能够轻松应对不确定性的人,随时准备进来帮助执行并迅速推进。这就是我们的方法。
And I think also happy leading through influence because I mean, it's usual, it is a PM. People don't report to you. Your team doesn't report to you, et cetera. But you also have the complexity of a research function which is even more self-directed and it's really important to build a good rapport with the research team. And so, I think the EQ side of things is also super important for us. I know that most companies at PM comes in and they're just like, why do we need you? And as a PM, you have to earn trust and help people see the value.
我认为通过影响力来领导也是非常令人愉快的,因为一般情况下,作为产品经理(PM),团队成员不会直接向你汇报工作,你的团队也不直接受你管辖。而研究部门的工作性质更加自我导向,与研究团队建立良好的关系就显得尤为重要。因此,我认为情商方面的能力对我们来说也非常重要。我了解到,在大多数公司,一开始产品经理加入时,别人可能会质疑他们的作用,他们会问:"我们为什么需要你?" 作为产品经理,你需要赢得大家的信任,并帮助他们看到你的价值所在。
And I feel like it opened it. It's probably a very extreme version of that where they're like, why do we need this person? Maybe researchers, engineers, what are you gonna do here? Yeah, I think people appreciate it, done right. But you bring people along. I think one of the most important things a PM can do well is be decisive. So it's, there's a real fine line. You don't want to be making, I mean, it's kind of like, I don't love the PM as the CEO of the product, illusion all the time. But just like Sam and his role would be making mistakes, if he made every single decision in every meeting that he was in.
我感觉这打开了一个极端版本的问题,他们可能会想:我们为什么需要这个人呢?也许是研究人员和工程师,他们会问你能在这里做什么?是的,我认为如果做得好,人们会欣赏它。但你需要带领大家一起前进。我认为产品经理(PM)能做得很好的其中一件重要事情就是果断决策。不过,这是一条微妙的界限。你不想总是让PM像产品的CEO一样做出所有决策,就像Sam在他的职位上可能会犯错,如果他在每个会议中做出每一个决定,结果也不会理想。
And he would also be making mistakes if he made no decisions in any meetings that he was in, right? It's the, it's understanding when to defer to your team and to let people innovate. And when there is a decision to be made that people either don't feel comfortable with or don't feel empowered to make or a decision that has too many different disparate pros and cons that are spread out across a big group and someone needs to be decisive and make a call. It's a really important trade of a CEO. It's something Sam does well.
如果他在参加的所有会议中都不做任何决策,那也是一种错误,对吧?关键是要明白什么时候需要依赖团队,让他们进行创新,而什么时候需要做出决策:比如团队不太愿意或没有权力做决策,或者存在太多不同的利弊权衡,导致没有人能够做出决定,这时就需要一个果断的人来拍板。这是一个CEO非常重要的特质,而Sam在这方面做得很好。
And it's, it's also a really important trade of a PM kind of at a more microscopic level. And so because there's so much ambiguity, it's not obvious what the answer is in a lot of cases. And so having a PM, they can come in and like, and by the way, this doesn't need to be a PM. I'm perfectly happy if it's anybody else. But I kind of looked to the PM to say like, if there's ambiguity and no one's making a call, you better make sure that we get a call made and we move forward.
这在某种程度上也是项目经理(PM)非常重要的一个特质。在很多情况下,因为存在太多模糊不清的地方,答案并不明确。因此,拥有一个PM来参与其中是很重要的。当然,这并不一定非得是PM,我完全乐意由其他人来做这件事。但是,我习惯性地把目光投向PM,如果有模糊不清的状况,而且没人做决定,那么你就必须确保做出决定并推动事情向前发展。
This touches on a few posts. I've done of just where is AI gonna take over work that we do versus help us with various work. So let me come at this question from a different direction of just how AI impacts product teams and hiring things like that. So first of all, there's always talk of LMS doing or coding for us and 90% of code is gonna be written by AI in a year, Dario and Anthropics said that. At the same time, you guys are all hiring engineers like crazy, PM's like crazy. Every function is dead, but you're still hiring every single one.
这涉及到几个帖子。我曾讨论过AI在哪些方面会接管我们的工作,在哪些方面会辅助我们的工作。所以让我从另一个角度来探讨这个问题,即AI如何影响产品团队和招聘等方面。首先,总是有关于学习管理系统(LMS)为我们编写代码的讨论,还有人说Dario和Anthropic提到一年后90%的代码会由AI编写。与此同时,你们仍在疯狂招聘工程师和项目经理等各种职能的人才。每个职能似乎都不需要了,但你们依然招人。
I guess just, first of all, let me just ask this, how do you and the team, like say, engineers, PM's use AI in your work? Is there anything that's like really interesting or things that you think people are sleeping on and how you use AI in your day to day work? We use it a lot. I mean, every one of us is in chat GPD all the time. It's summarizing docs using it to help write docs with GPDs that write product specs and things like that. All the stuff that you would imagine, I mean, talk about writing e-vals, like you can actually use models to help you write e-vals and they're pretty good at it. That all said, I still don't, I'm still sort of disappointed by us and despite, I really mean me. In, if I were to just like teleport my five year old self leading product at some other company into my day job, I would recognize it still.
首先,让我问一下,您和团队中的工程师、产品经理等在工作中是如何使用人工智能的?有没有什么特别有趣的用法或者是您认为大家还没意识到的应用?我们在工作中大量使用人工智能。几乎每个人都在使用Chat GPT。我们用它来总结文档,帮助撰写文件,生成产品规格等你能想到的任务。甚至在撰写评估报告的时候,我们都可以用AI模型帮助完成,它们的表现相当不错。尽管如此,我对我们目前的使用程度还是略感失望,这也包括我自己。如果我将五年前在另一家公司的自己放到今天的工作中,我可能还是会觉得与过去的工作方式差别不大。
And I think we should be in a world certainly a year from now, probably even more now that where I almost wouldn't recognize it because the workflows are so different and I'm using AI so heavily. And I'd still recognize it today. So I think in some sense, I'm not doing a good enough job of that. Yeah, just to give an example, like why shouldn't we be like vibe coding demos right left in center? Like instead of showing stuff in like Figma, we should be showing prototypes that people are vibe coding over the course of 30 minutes to illustrate proofs of concept into explore ideas. That's totally possible today and we're not doing it enough. Are actually our chief people officer, Julia was telling me the other day, she vibe coded an internal tool that she had at a previous job that she really wanted to have here at OpenAI and she opened, I don't know, Windsor for something and vibe coded it.
我认为一年后,甚至可能更快,我们应该生活在一个几乎认不出来的世界中,因为工作流程发生了巨大变化,我正在大量使用人工智能。而今天,这个变化还没有明显体现,所以在某种程度上,我觉得自己在这方面做得还不够好。例如,我们为什么不在各处展示"振动编码"(vibe coding)演示呢?与其在Figma中展示设计,我们应该通过振动编码在半小时内展示原型,以便更好地实现概念验证和探索创意。这在今天是完全可能的,但我们还没有充分利用。我们的首席人事官Julia最近告诉我,她曾在以前的工作中通过振动编码开发了一个她非常想在OpenAI使用的内部工具,她打开了某种工具(可能是Windsor),并用振动编码实现了它。
Like how cool is that? And if our chief people officer is doing it, we have no excuse to not be doing it more. That's an awesome story. Okay, and some people may not have heard this term vibe coding, can you describe what that means? Yeah, I think this was, I think this was Andre's term. Carpathy, yeah. Andre Carpathy, yeah. Whereas just, so you have these tools like cursor and Windsor and get up copilot that are very good at suggesting what code you might want to write so you can give them a prompt and it will write code and then as you go to edit it, it's suggesting what you might want to do. And the way that that everyone started using that stuff was give it a prompt, have it do stuff, you go edit it, give it a prompt, and you're kind of like really going back and forth with the model the whole time.
翻译如下:
这有多酷啊!如果我们的首席人力资源官都在做这件事,那我们就没有理由不多去尝试。这真是个了不起的故事。好,有些人可能没听过“vibe coding”这个词,能否解释一下是什么意思?嗯,我想这是Andre的用词,Carpathy,对,就是Andre Carpathy。他是这样说的:你有像cursor、Windsor和GitHub Copilot这样的工具,它们非常善于建议你可能想要编写的代码。你可以给它们一个提示,它们就会编写代码,然后当你去编辑时,它会建议你可能想要做的事情。大家开始使用这种工具的方法就是给它一个提示,让它做事,你再去编辑,然后再给它另一个提示,整个过程中你一直在和模型来回互动。
As the models are getting better and as people are getting more used to it, you can kind of just like let go of the wheel a little bit and when the model's suggesting stuff, it's just like tap, tap, tap, tap, tap, like keep going, yes, yes, yes, yes, yes. And of course the model makes mistakes or it does something that doesn't compile, but when it doesn't compile, you paste the error in and you say go, go, go, go, go, go, and then you test it out and it does one thing that you don't want it to do. So you enter in an instruction and say go, go, go, go, go, and you just kind of like let the model do its thing. And it's not that you would do that for production code that needed to be super tight today yet.
随着模型的改进以及人们对其使用越来越习惯,你可以稍微放松一点控制。当模型给出建议时,就像是不断点下“是的,是的,是的”一样,继续往前走。当然,模型有时会出错或生成无法编译的代码,但当出现这样的问题时,你只需将错误粘贴进去,然后让它继续尝试。之后进行测试时,如果模型做了一些你不想要的事情,你只需输入一个指令,然后让它继续工作。目前这种方式还不适用于需要特别严谨的生产环境代码。
But for so many things, you're trying to get to a proof of concept, you're getting to a demo and you can really take your hands off the wheel and the model will do an amazing job and that's what that's that's five coding. That's an awesome explanation. I think like the pro version of that, which is I think the way Andre even described it as you talk, you do like a there's a step like whisper, super whisper or something like that where you're like talking to the model, not even typing. Yeah, totally. Oh man. So let me let me just ask, I guess when you look at product teams in the future, you talked about how you guys should doing this more instead of designs, having prototypes.
但是对于很多事情,你只是想要得到一个概念验证,你正在接近一个演示,然后你可以真的把手从方向盘上拿开,模型会做得非常出色。这就是编程的第五类代码。这是一个很棒的解释。我想它的专业版本是这样的,Andre就是这样描述的——你和模型对话,而不是打字,就像是“低语”或“超级低语”这样的步骤。是的,绝对是这样的。哇,那让我问一下,我想知道未来的产品团队,你谈到了你们应该更多地这样做,而不是设计原型。
What do you think might be the biggest changes in how product teams are structured or built? Where do you think they're going in the next few years? I think you're definitely going to live in a world where you have more, where you have researchers built into every product team. And I don't even mean just at like foundation model companies because I think the future, I actually frankly one thing that I'm sort of surprised about about our industry in general is that there's not a greater use of fine tuned models. Like a lot of people, these models are very good. So our API does a lot of things really well. But when you have particular use cases, you can always make the model perform better on a particular use case by fine tuning it.
你认为在产品团队的结构或建设方面,未来最大的变化可能是什么?你认为在接下来的几年里会有什么样的发展?我认为你们肯定会迎来一个新世界,即每个产品团队都会包含研究人员。我不仅指那些基础模型公司,因为我实际上非常惊讶的是,在整个行业中,对微调模型的使用并没有更广泛。尽管这些模型本身表现得很优秀,比如我们的API在很多方面都做得很好,但当你有特定需求时,通过微调模型,总能让其在特定应用上表现得更好。
It's probably just a matter of time. Folks aren't quite comfortable yet with doing that in every case. But to me, there's no question that that's the future. Models are going to be everywhere, just like transistors or everywhere, AI is going to be just a part of the fabric of everything we do. But I think there are going to be a lot of fine tuned models because why would you not want to more specifically customize a model against a particular use case? And so I think you're going to want sort of quasi researcher machine learning engineer types as part of pretty much every team because fine tuning a model is just going to be part of the core workflow for building most products.
这可能只是时间问题。大家在每种情况下还不是很习惯这样做。但对我来说,这毫无疑问是未来的趋势。模型将无处不在,就像晶体管无处不在一样,人工智能将成为我们生活中一部分不可分割的内容。不过,我认为会有很多经过精细调校的模型,因为为什么不根据特定的使用场景来更具体地定制一个模型呢?所以我认为,几乎每个团队中都需要有类似研究员的机器学习工程师类型的人,因为对模型进行微调将成为大多数产品开发的核心工作流程的一部分。
So that's one change that maybe you're starting to see a foundation model companies that will propagate out to more teams over time. I'm curious if there's a concrete example that makes that real. And I'll share one that comes to mind as you talk, which is when you look at cursor and winser for something you learned from those founders is that they use like a saw in it. But then they also have a bunch of custom models that help along the edges that make the specific experience. That's not just generating code, even better. Like auto-complete and looking ahead to where things are going.
所以,这可能是您开始看到的一种变化,即基础模型公司会在一段时间后影响到更多团队。我很好奇是否有一个具体的例子能够对此进行说明。在您讲话时,我想到了一个例子,那就是在查看Cursor和Winser时,从这些创始人那里学到的是,他们像使用锯子一样使用基础模型。但同时,他们还有一系列自定义的模型,帮助完善其他细节,使得整个体验更加出色。这不仅仅是在生成代码方面变得更好,比如自动补全功能和预测代码的走向。
So is that one any other examples of what what is a fine tune model there? Do you think teams will be building with these researchers on their teams? Yeah, I mean, so when you're fine tuning a model, you're basically giving the model a bunch of examples of the kinds of things you want it to be better at. So it's here's a problem, here's a good answer. Here's a problem, here's a good answer, or here's a question, here's a good answer, times 1,000 or 10,000. And suddenly you're teaching the model to be much better than it was out of the gate at that particular thing.
所以,这就是一个关于微调模型的例子。你觉得团队会和他们的研究人员一起开发这样的模型吗?是的,我的意思是,当你在微调一个模型时,你基本上是在给这个模型提供大量你希望它更擅长的例子的训练数据。比如,这是一个问题,这是一个好的答案;这是一个问题,这是一个好的答案,或者这是一个问题,这是一个好的答案,这样重复一千次或一万次。这样一来,你就是在教这个模型在特定的任务上变得比原来更好。
We use it everywhere internally. We also, we use ensembles of models much more internally than people might think. So it's not, here is, I have 10 different problems. I'll just ask baseline GPT-40 about a bunch of these things. If we have 10 different problems, we might solve them using 20 different model calls, some of which are using specialized fine tune models. They're using models of different sizes, because maybe you have different latency requirements or cost requirements for different questions.
我们在内部各处都使用它。而且,我们比人们想象的更频繁地在内部使用模型集成。所以情况并不是,我有10个不同的问题,我就直接用基础的GPT-40来问这些东西。如果我们有10个不同的问题,我们可能会用20次不同的模型调用来解决,其中一些使用的是专门微调过的模型。我们使用不同规模的模型,因为可能不同的问题有不同的延迟要求或成本要求。
They are probably using custom prompts for each one. Basically, you want to teach the model to be really good. You want to break the problem down into more specific tasks versus some broader set of high level tasks. And then you can use models very specifically to get very good at each individual thing. And then you have an ensemble that sort of tackles the whole thing. I think a lot of good companies are doing that today. I still see a lot of companies giving the model single generic broad problems versus breaking the problem down.
他们可能为每一项任务都使用了定制的提示。基本上,你需要让模型变得非常出色。你需要将问题分解为更具体的任务,而不是一些笼统的大任务。然后,你可以非常有针对性地使用模型,让它在每个具体的事情上都表现得非常好。这样你就有一个综合系统来处理整个问题。我认为很多优秀的公司今天都在这样做。不过,我仍然看到许多公司给模型提出单一且笼统的大问题,而不是将问题细分。
And I think there will be more breaking the problem down using specific models for specific things, including fine tuning. And so in your case, because this is really interesting, is that you're using different levels of chat GPT, like a 103 and that's really, really cheaper. They'll be parts of our internal stack. So if you give you an example, customer support, with 400 plus million weekly active users, we get a lot of inbound tickets, right? I don't know how many customer support folks we have, but it's not very many. 30, 40, I'm not sure. Way smaller than you would have at any comparable company.
我认为未来会有更多的情况是针对特定问题使用特定模型进行解决,包括进行微调。在你的情况下,这很有趣,因为你使用了不同级别的ChatGPT,比如一个103,这是非常便宜的。它们会成为我们内部体系的一部分。举个例子来说,在客户支持方面,每周有超过4亿的活跃用户,所以我们会收到大量的用户请求。我不清楚我们有多少客户支持人员,但人数并不多,可能只有30或40个,远少于其他类似规模的公司。
And it's because we've automated a lot of our flows. We've got most questions using our internal resources, knowledge base, guidelines for how we answer questions, what kind of personality, et cetera. You can teach the model those things, and then have it do a lot of its answers automatically. Or where it doesn't have the full confidence to answer a particular question, it can still suggest an answer, request a human to look at it. And then that human's answer actually is its own sort of fine tuning data for the model. You're telling it the right answer in a particular case.
这是因为我们自动化了许多流程。我们通过内部资源、知识库和问题回答指南来处理大多数问题,比如我们用什么样的语气等。你可以教模型这些内容,然后让它自动回答很多问题。如果模型对某个问题没有足够的信心去回答,它仍然可以提出一个建议答案,并请求人工审核。人工的回答实际上为模型提供了微调的数据,这样你就相当于告诉它在特定情况下什么是正确答案。
And we're using it various places. Some of these places you want a little bit more reasoning, is not super latency sensitive. So you want a little more reasoning, and we'll use one of our O series models. In other places, you want to quick check on something. And so you're fine to use like 40 mini, which is super fast and super cheap. And in general, it's like specific models for specific purposes. And then you ensemble them together to solve problems.
我们在不同的地方使用它。在某些地方,你会需要更多的推理能力,对延迟不太敏感,所以你会使用我们的O系列模型。在其他地方,你可能只是想快速核对一些东西,那么使用40 mini就很合适,因为它速度非常快,成本也很低。总体来说,就是为特定的用途选择特定的模型,然后将它们组合在一起解决问题。
By the way, again, not unlike how we as humans solve problems. A company is arguably an ensemble of models that have all been fine tuned in based on what we study in college and what we have learned over the course of our careers. We've all been fine tuned to have different sets of skills. And you like group them together in different configurations and the output of the ensemble is much better than the output of anyone individual. Kevin, you're blowing my mind. That sounds exactly correct. And also different people, you pay them less. They cost less to talk to. Some people take a long time to answer. Some people hallucinating. This is a mental novel, but it really does work in thinking nothing. This is great. Some people are visually want to dry out their thinking. Some people want to talk. Word cell. Wow. This is a really good metaphor.
顺便说一下,这其实和我们人类解决问题的方式没有太大区别。可以说,一家公司就是一个经过精细调整的模型集合,这些模型基于我们在大学学习的知识和事业发展过程中获得的经验。我们每个人都被精心调整,拥有不同的技能组合。把这些技能按不同的方式组合在一起,集体的产出往往比任何个体的产出都要好。Kevin,你让我脑洞大开。这听起来完全正确。而且不同的人薪资也不同,与他们沟通的成本也各有差异。有些人回答问题需要很长时间,而有些人则可能会偏离主题。这简直就像一部思维小说,但实际上这样思考真的很有效。非常棒。有些人喜欢用视觉化的方法来整理思绪,而有些人则喜欢通过谈话来沟通。真是个很好的比喻。
So again, coming back to your advice here, because I love that we circled back to it. It's you're finding a really good way to think about how to design great. Yeah, experiences and LMs, I guess specifically, think about how a person would do this. Well, it's maybe not always the answer is to think about how a person would do it, but sometimes to gain intuition for how you might solve a problem, you think about what an equivalent human would do in those situations and use that to at least gain a different perspective on the problem. Wow. This is great. There's just like, because this really is talking to a model, there's a lot of prior art, because we talk to other humans all the time and encounter them in all sorts of different situations. And so there's a lot to learn from that.
所以,再次回到你的建议,因为我喜欢我们又回到了这个话题。你提到对于如何设计出色的体验和语言模型,找到了一个很好的思考方式。是的,也许答案并不是总要考虑一个人会怎么做,但有时候为了获得解决问题的直觉,你可以考虑在那些情况下一个相应的人会怎么做,从而至少可以用不同的视角来看待问题。哇,这太好了。因为这是在与一个模型对话,所以有很多现成的经验可以借鉴。我们经常在各种不同的场合与他人交流,因此从中可以学到很多东西。
OK, so speaking of humans, I want to chat about the future a little bit. So you have three kids. And so on, a community member asking me this hilarious question that I think gets at something a lot of people are thinking about. So this is Patrick Srell. I worked at them with a mirror being be asked. She says, ask what he's encouraging as a kid's to learn to prepare for the future. I'm worried my six-year-old by the year 2036 will face a lot of competition trying to get into the top roofing or plumbing programs and to back up land. That's funny. So our kids are, we have a 10-year-old and eight-year-old twins. So they're still pretty young. I mean, it's amazing how AI native they are. They just, it's completely normal to them that there are self-driving cars that they can talk to AI all day long. They have full conversations with JetGPT and Alexa and everything else. I don't know. I think who knows what the future holds? I think things like coding skills are going to be relevant for a long time.
好的,说到人类,我想聊聊未来。你有三个孩子。在社区里,有人问了我一个很有趣的问题,我认为这反映了许多人正在思考的问题。这位是Patrick Srell,他在一个带镜子的项目中工作,有人请他问一个问题:他鼓励孩子们学习哪些技能以为未来做好准备。我很担心到2036年,我六岁的孩子在进入顶尖的屋顶翻修或管道维修项目时会面临激烈的竞争,这很有趣。我们的孩子,一个10岁,两个8岁的双胞胎,他们还很小。让人惊讶的是,他们对人工智能十分熟悉。对于他们来说,自动驾驶汽车和全天与人工智能进行对话都是再正常不过的事情。他们可以与ChatGPT、Alexa等进行完整的对话。我觉得谁也无法预测未来会怎样,不过像编程这样的技能在未来很长一段时间内都会很有用。
Who knows? But I think if you teach your kids to be curious, to be independent, to be self-confident, you teach them how to think. I don't know what the future holds, but I think that those are going to be skills that are going to be important in any configuration of the future. And so it's not like we have all the answers, but that's how Elizabeth and I think about our kids. And do you find that AI? There's a lot of talk about AI tutoring. Is that something guys are doing anything? I know they're using JetGPT. I love all the photos you post. They're playing with prompts and stuff. But I guess is there anything there? You're experimenting why a third thing is going to become really important.
谁知道呢?不过我认为,如果你教孩子们保持好奇心、独立性和自信心,其实就是在教他们如何思考。我不知道未来会怎样,但我觉得这些技能无论在怎样的未来环境中都会很重要。我们可能没有所有答案,但这就是我和伊丽莎白关于孩子教育的想法。至于人工智能,你觉得怎么样?关于AI辅导的话题很多。你们在这方面有做什么尝试吗?我知道他们在用ChatGPT。我很喜欢你发的所有照片,他们在玩提示词什么的。不过我想知道你们是否在进行什么实验,第三种事物会变得非常重要。
This is something that. It's maybe the most important thing that AI could do. Maybe that's a grand statement. There are lots of important things that AI can do, including like speeding up the pace of fundamental science, research and discovery, which maybe is actually the most important thing AI can do. But one of the most important things would be personalized tutoring. And it blows my mind that there are a bunch of good products out there. Con Academy does great things. There are wonderful partner of ours. Vinod Kostler has a nonprofit that's doing some really interesting stuff in the space and was making an impact. But I'm surprised that there isn't like a two billion kid AI personalized tutoring thing because the models are good enough to do it now.
这是AI可能做到的一件事情,也许是最重要的事情。说这样的话可能有些夸张,因为AI可以做很多重要的事情,比如加速基础科学的研究和发现进程,这可能实际上是AI能做的最重要的事之一。但是,个性化辅导也是其中非常重要的一项。我很惊讶市面上有很多很不错的产品,比如可汗学院就做得很好,他们是我们优秀的合作伙伴。另外,维诺德·科斯特拉有个非营利组织在这方面也做了一些非常有趣的事情,并且已经产生了一定的影响。但让我感到惊讶的是,居然没有一个覆盖20亿孩子的AI个性化辅导系统,因为现在的模型已经足够优秀,可以实现这一点。
And every study out there that's ever been done seems to show that when you have. Classroom is still. Classroom education is still important. But when you combine that with personalized tutoring, you get like multiple standard deviation improvements in learning speed. And so it's just. It's uncontroversial. It's good for kids. It's free. Chat Gbt is free. You don't need to pay for. And the models are good enough. It still just blows my mind that there isn't something amazing out there that our kids are using and your future kids are using. And people in all sorts of places around the world that aren't as lucky as our kids to be able to have this sort of built-in solid education. Again, Chat Gbt is free. People have Android devices everywhere. I really just think this could change the world. And I'm surprised it doesn't exist and I want it to exist.
现有的每一项研究都表明,课堂教育依然重要。但当你将课堂教育与个性化辅导结合起来时,学习速度可以有显著提升。这毫无异议,对孩子们有好处。而且,ChatGPT是免费的,你不需要为此付费,并且这些模型已经足够好。令我感到震惊的是,竟然没有什么令人惊艳的东西可以供我们的孩子使用,以及未来的孩子们使用,还有那些不像我们的孩子那么幸运的地方的人们也能使用这种强大的教育工具。同样,ChatGPT是免费的,人们各地都有安卓设备。我真的相信这可以改变世界,我很惊讶这样的东西尚不存在,并且我希望它可以存在。
This kind of touches on something I want to spend a little time on, which is a lot of people also worry a lot about AI where it's going. They worry about jobs. It's going to take. They worry about the superintelligence squashing humanity in the future. What's your perspective on that and just the optimistic case that I think people need to hear? I mean, I'm a big technology optimist. I think if you look over the last 200 years, maybe more, technology has driven a lot of the advancements that have made us the world and the society that we are today. It drives economic advancements. It drives geopolitical advancements, quality of life, longevity, advancement. I mean, technologies at the root of just about everything.
这种情况触及到了我想花一点时间讨论的话题,即许多人对人工智能的发展感到担忧。他们担心工作机会会被取代,也担心未来超级智能会威胁人类的生存。对此,你有什么看法?我认为,人们需要听到乐观的论调。作为一名科技乐观主义者,我认为如果你回顾过去的二百年,可能更长时间,技术推动了许多进步,使我们的世界和社会成为今天的样子。技术推动了经济进步、地缘政治进步、生活质量的提高和寿命的延长。可以说,技术几乎是所有进步的根源。
So I think there are very few examples where this is anything but a great thing over the longer term. That doesn't mean that there aren't temporary dislocations or where there aren't individuals that are impacted. And that matters too. So it can't just be that the average is good. You've got to also think about how you take care of each individual person as best you can. So it's something that we think a lot about and as we work with the administration as we work with policy, like we try and help wherever we can. We do a lot with education. One of the benefits here is that chat GPT is also perhaps the best like reskilling app you could possibly want. It knows a lot of things. It can teach you a lot of things. If you're interested in learning new things.
因此,我认为在长远来看,这几乎总是件好事。然而,这并不意味着不会出现暂时的混乱,或者说不会有个人受到影响。这一点同样重要。所以不能仅仅因为平均来看是好的,就忽视个体的情况。你必须尽可能地关注每个人的状况。这是我们常常思考的问题,当我们与政府合作、制定政策时,我们努力在各个方面提供帮助。我们在教育领域投入了很多,其中一个好处是ChatGPT可能是最好的技能再培训应用程序。它知道很多东西,可以教你很多东西,如果你有兴趣学习新知识的话。
But these are very real issues. I'm super optimistic about the long run and we're going to need to do everything we can as a society to ensure that we make this transition as graceful and as well supported as we can. To give you a sense of where things might be going, that's a big question. A lot of people's minds. So someone asked this question that I love, which is AI is already changing, creative work in a lot of different ways, writing and design and coding. What do you think is the next big leap? What shall we be thinking is the next big leap in AI assisted creativity specifically? And then just broadly, where do you think things are going to be going in the next years?
这些都是非常现实的问题。从长远来看,我非常乐观,我们需要尽可能地作为一个社会去做所有我们能做的事情,以确保这个转变是优雅的,并且得到良好的支持。要让你了解事情可能发展的方向,这是一个重要的问题,很多人都在思考。因此,有人问了一个我非常喜欢的问题,那就是:人工智能已经在很多方面改变了创意工作,如写作、设计和编码。你认为下一个大的飞跃是什么?在人工智能辅助创作方面,我们应该认为下一个大的飞跃是什么?更广泛地说,你认为在接下来的几年里事情会如何发展?
Yeah. This is also an area where I'm a big optimist. If you look at Sora, for example, I mean, we talked about ImageGen earlier and the absolute like, fount of creativity that people are putting across Twitter and Instagram and other places. I am the world's worst artist. Like, the worst. Maybe the only thing I'm worse at than art is singing. And I, you know, I give me a pencil and a pad of paper and I can't draw better than my five, than our eight-year-old. You know, it's just like it's, but give me ImageGen. And, you know, I can think some creative thoughts and put something into the model and suddenly have output that I couldn't have possibly done myself. That's pretty cool.
是的,在这个领域,我是个非常乐观的人。拿Sora为例,我们之前谈到了ImageGen,人们在推特、Instagram和其他平台上展现出的创造力真是无穷无尽。我是世界上最糟糕的艺术家,可能唯一比我艺术更差的技能就是唱歌。给我一支铅笔和一张纸,我的画画水平还不如我们的八岁小孩。 但是,如果给我ImageGen,我可以想出一些创意,然后输入模型,突然间就能得到我自己完全无法完成的作品。这真的很酷。
Even you look at folks that are really talented, I was talking to a director recently about Sora, someone who's directed films that we would all know. And he was saying, you know, for a film that he's doing, like, say, say, take the example of some sort of sci-fi-ish, you know, think of like Star Wars. And you've got some scene where there's a plane zooming into some death star-like thing. And so you've got the plane looking at the whole planet and then you want to cut to a scene where the planes like, you know, kind of at the ground level and all of a sudden you see the city and everything else, right? How are we gonna manage that cut scene? And that transition.
即使你看那些非常有才华的人,我最近和一位导演聊到Sora,他曾拍摄过一些我们都熟知的电影。他举例说,假如他正在制作一部电影,比如科幻类的,可以想象像《星球大战》这样的场景。你有一个镜头,飞机快速飞向某个类似死星的地方。然后你想把镜头切换到一个飞机在地面上的场景,忽然间你看到城市和其他一切。那么,我们要如何处理这个剪辑场景和过渡呢?
And he was saying, you know, in the world of two years ago, I would have paid, you know, a 3D effects company, a hundred grand, and they would have taken a month and they would have produced two versions of this cut scene for me. And I would have evaluated them. We would have chosen one because what are you gonna do, like, pay another 50 grand a week at another month? And we would have just gone with it. And, you know, it would be fine. Like, movies are great. I love them and there have been, obviously, we can do great things with the technology that we've had.
他在说,你知道,两年前的世界里,我可能会花十万美元找一家3D特效公司,他们会花一个月时间为我制作两个版本的过场动画。我会评估这些版本,然后选择一个,因为你不可能再花五万美元再等一个月再做一个版本吧?我们就会决定用现有的版本。你知道吗,这也没什么问题。电影很棒,我非常喜欢它们,并且很显然,利用现有的技术,我们能够做出很棒的东西。
But you now look at what you can do with Sora and his point was, now I can use Sora, our video model, and I can get 50 different variations of this cut scene, just, you know, me brainstorming into a prompt in the model brainstorming a little bit with me. I've got 50 different versions. And then, of course, I can, like, iterate off of those and refine them and take different ideas. And now I'm still gonna go to that 3D effects studio to produce the final one. But I'm gonna go having brainstormed and, like, I had this at a much more creative approach with an outcome that's much better.
现在你可以看看用Sora能做些什么,他的意思是,我现在可以使用Sora,我们的视频模型,只需在模型中输入一个提示,然后与我一起头脑风暴,就能得到50种不同的剪辑场景的变体。我已经有了50个不同的版本。当然,我还可以在这些基础上迭代、完善,并采纳不同的创意。最终,我仍然会去3D特效工作室制作最终版本。但这次我带着已经头脑风暴过的想法去,并且采用了更有创意的方法,结果要好得多。
And, like, I did that assisted by AI. So my personal view on creativity in general is that no one's gonna, you don't type into Sora, like, make me a great movie. It requires creativity and ingenuity in all these things, but it can help you explore more. It can help you get to a better final result. So, you know, again, I tend to be an optimist in most things, but I'm actually, I think there's a very good story here. I know Sam Altman, I think it was him who tweeted recently the creative writing piece that you guys are working on, where to say, yeah, he's very bad at writing creative stuff.
我在AI的辅助下完成了这件事。关于创造力,我个人的观点是,没有人会直接输入指令到Sora,比如说“给我做一部好电影”。创造力和独创性在这些事情中是必不可少的,但AI可以帮助你进行更多的探索,并帮助你达到一个更好的最终结果。我通常对大多数事情持乐观态度,在这方面我认为有一个非常好的前景。我知道Sam Altman,最近他在推特上提到你们正在处理的创作写作项目,并且承认自己不擅长写作创意类的作品。
And he shared an example where it's actually really good. Imagine that's another area of investment. Yeah, there's some exciting stuff happening internally with some new research techniques. So we'll have more to say about that at some point, but yeah, Sam, Sam sometimes likes to show off some of the stuff that's coming. It just started out by the way. It's like very sort of indicative of this iterative deployment philosophy. We don't have some breakthrough and keep it to ourselves forever and then, you know, bestow it upon the world someday.
他说,他分享了一个例子,说明这其实非常好。想象一下,这是另一个投资领域。是的,内部有一些令人兴奋的新研究技术正在发生。所以我们将来会有更多的话要说,不过对,山姆,有时候喜欢展示一些正在进行的事情。顺便说一下,这刚刚开始。这非常体现了一种渐进式部署的理念。我们不会有了突破性进展就永远藏着不说,然后某天才将其贡献给世界。
We kind of just talk about the things we're working on and share when we can and launch early in often and then iterate in public. And I really like that philosophy. I love all these hints that a few things coming. I know he can't say too much. You talked about how there might be a coding leap coming in the near future. Maybe by the time this comes out, is there anything else people should be thinking about might be coming in the near future? Any things you can tease that are interesting, exciting.
我们通常会谈论我们正在进行的工作,并在可能的情况下分享,尽早发布并且频繁发布,然后在公开的平台上不断改进。我非常喜欢这种理念。我喜欢那些暗示着一些即将到来的事情的线索。我知道他不能透露太多内容。你提到过不久的将来可能会有一个编程方面的突破。可能在这一消息发布的时候会实现,还有其他人们应该注意的、可能在不久的将来到来的事情吗?你能分享一些有趣和令人兴奋的线索吗?
Man, this hasn't been enough for you. Oh, yeah, only everything is getting better every day. Yeah, I'm like, man, I hope we get some of the stuff out before the episode launches. This is your new time box. I don't piss people off. No, it's the amazing thing to me is we, we were talking earlier about how far models have come in just a couple of years. If you went back to GPT-3, you'd be disgusted by how bad it was, even though, plenty of two years ago, was mind blown by how good these were.
哥们儿,这对你来说还不够啊。不过,是的,一切都在每天变得更好。是啊,我就像是,希望我们在这集上线前能解决一些问题。这是你的新时间限制盒,我不会惹人生气。让我觉得惊讶的是,我们之前谈到了这些模型在短短几年内的进步有多大。如果你回到GPT-3的时代,你会觉得当时的表现很差劲,尽管两年前的很多人都对这些模型的优越表现感到目瞪口呆。
And for a long time, we were iterating every six to nine months on a new GPT model. It was like GPT-3, GPT-3.5, four. And now with this O-series of reasoning models, we're moving even faster. Where every roughly three months, maybe four months, there's a new O-series model. And each of them is a step up in capability. And so the capabilities of these models are increasing in a massive pace. They're also getting cheaper as they scale.
很长一段时间以来,我们大约每六到九个月就会推出一个新的GPT模型,像是GPT-3、GPT-3.5、GPT-4。不过现在有了这个O系列的推理模型,我们的发展速度更快了。大约每三到四个月就会有一个新的O系列模型面世。每一个新模型的能力都更强。因此,这些模型的能力正在大幅度提升,而且随着规模扩大,它们的成本也在降低。
You look at where we were even a couple of years ago, the original, I think the original, I don't know what was it, GPT-3.5 or something, was like 100x the cost of GPT-4 on Mini today in the API. So a couple of years, you've gone down two orders of magnitude in cost for much more intelligence. And so I don't know where there's another series of trends like that in the world. Models are getting smarter. They're getting faster. They're getting cheaper. And they're getting safer, too.
你看看我们几年前的情况,我记得最初的版本,我不太确定,是GPT-3.5还是什么,大概比今天API上的Mini版GPT-4要贵100倍。所以在这短短几年里,成本已经降低了两个数量级,同时智能提高了很多。我不知道世界上还有哪一种趋势能像这样发展。模型变得越来越聪明、越来越快、越来越便宜,而且也越来越安全。
They hallucinate less every iteration. And so there's just the Moore's Law and transistors becoming ubiquitous. That was a law around doubling the number of transistors on a chip every 18 months. If you're talking about something where you're getting 10x every year, that's a massively steeper exponential. And it just tells us that the future is going to be very different than today. The thing I try and remind myself is the AI models that you're using today is the worst AI model you will ever use for the rest of your life. And when you actually get that in your head, it's kind of wild.
每次迭代后,它们的幻觉越来越少。而摩尔定律和晶体管的普及非常类似。摩尔定律是指每18个月芯片上的晶体管数量会翻倍。如果你所谈论的事物每年提升10倍,那就是一个更加陡峭的指数增长。这表明未来将和今天非常不同。我试着提醒自己,你今天使用的AI模型将是你今后生活中会用到的“最差”的AI模型。当你真正理解这一点时,会感到非常惊讶。
I was going to actually say the same thing. That's the thing that always sticks with me when I watch this thing. You're talking about Sora. And I imagine many people hearing that are like, no, no, it's not actually ready. It's not good enough. It's not going to be as good as a movie I see in the theater. But the point is what you just made. This is the worst it's going to be. It will only get better. Yeah, model maximalism. Just keep building for the capabilities that are almost there.
我本来想说同样的话。每次看这个的时候,这一点总是让我印象深刻。你在谈论Sora。我想很多人听到这里可能会想,不,不,它还没准备好,不够好,达不到电影院电影的水平。但关键是你刚刚提到的。这已经是它最差的情况了,之后只会变得更好。对,就像模型最大化一样,只要继续构建,即使它的能力还差一点。
And the model's going to catch up and be amazing. Escape to where the puck's going to be. This reminds me I was just using as jibbly flying everything the other day. And I was just like, what has taken so long? I'm just like, what was that? I said, as one does. As one does these days, I was just like, it has taken a minute to generate this image of my family in this amazing way. Like, come on, let's say it's so long. You just get so used to magic happening in front of you. Yeah, totally.
模型将会不断进步,变得非常出色。我们应该预见未来。这让我想起前几天,我用了一种新技术,让一切都变得特别神奇。当时我就在想,为什么花了这么长时间?心里不禁感叹这技术真是太棒了!就像现在常常感受的那样,我觉得生成的这张家人合影真是太惊艳了。为什么会这么慢呢?我们早已习惯了眼前发生的奇迹,真是不可思议。
OK, file question. This is going to go in a completely different direction. A lot of people asked about this. So famously, you led this project at Facebook called Libra, which is now called Novi. A lot of people are always wondered what happened there. That was a really cool idea. And now some people have a sense. There's regulation, challenges, things like that. I don't know if you've talked about this much. So I guess just, can you just give people a brief summary of just like, what is Libra? They this project, they work out, and just what happened. And how you feel about it?
好的,提出一个问题。这将走向一个完全不同的方向。很多人对此感到好奇。大家都知道,你曾在Facebook领导一个名为Libra的项目,现在被称为Novi。许多人都想知道这个项目发生了什么。这是一个非常棒的创意。现在,有些人对其发展有了一些了解,比如遇到了监管挑战等等。我不确定你是否对此多有谈论。所以,我想你可以简单介绍一下Libra这个项目是什么,他们做了什么,结果如何,以及你对此有何感想?
Yeah, I mean, David Marcus led it. And I happily worked for him and with him. I think he's a visionary and also a mentor and a friend. Honestly, Libra is probably the biggest disappointment of my career. When I think about the problems we were solving, which are very real problems, if you look at, for example, the remittance space, people sending money to family members in other countries, it is maybe, I mean, it's incredibly regressive, by people that don't have the money to spend or having to pay 20% to send money home to their family.
是的,我的意思是,大卫·马库斯主导了这个项目。我非常乐意为他工作,与他合作。我认为他是一个有远见的人,同时也是我的良师益友。说实话,Libra 可能是我职业生涯中最大的失望。我们当时解决的问题是非常真实的问题,比如在汇款领域,人们将钱汇给其他国家的家人时,必须支付将近20%的手续费,这对那些本身就没有多余钱的人来说是非常不公平的。
So got radios fees. It takes multiple days. You have to go then pick up cash from, yeah, it's just, it's all bad. And here we are with like, three billion people using WhatsApp all over the world, talking to each other every day, especially friends and family, exactly the kind of people who've sent money to each other. Why can't you send money as immediately, as cheaply, as simply as you send a text message? I, it's just, it's one of those things when you, when you sit back and think about it, that should just exist.
所以,要付广播费用,需要花费好几天时间。你得去取现金,反正就是很麻烦。现在全世界有三十亿人在用WhatsApp,每天都在和朋友家人联系,特别是那些经常汇款给彼此的人。为什么汇款不能像发短信一样简单、快速、便宜呢?这是那种当你静下心来思考时,就会觉得理应存在的事情。
And that was what we set out to try and do. Yeah, I don't think we played all of our like cards perfectly. If I could go back and do things, there are a bunch of things I would do differently. We, we tried to kind of get it all at once. We tried to launch a new blockchain. It was a basket of currencies originally. It was integration into WhatsApp and Messenger. And I think the whole world kind of went like, oh my god, that's a lot of change at once.
这就是我们当初努力想要做的事情。我认为我们并没有把所有的牌都打得很完美。如果可以回到过去,我会选择很多不同的做法。我们试图一次性完成所有事情,我们尝试推出一条新的区块链,最初它是一篮子货币,并且计划与WhatsApp和Messenger整合。我觉得整个世界当时的反应是,哇,这一下子变化太大了。
And you know, it happened also to be at the time that Facebook was at the absolute like, native of its reputation. And so that didn't help, right? It was, it was also not the messenger that people wanted for this kind of change. We knew all that going in, but we went for it. I think if we, I think there are a bunch of ways that we could do that that would have introduced the change a little bit more gently, you know, maybe still gotten to that same outcome.
你知道,那时候恰好是 Facebook 声誉最差的时候。所以这对我们的帮助不大,对吧?实际上,人们也不希望 Facebook 成为那种变革的推动者。我们在开始时就知道这一切,但我们还是决定尝试。我认为有很多方法可以让这次变革更加温和地进行,也许最终仍能达到相同的结果。
But few are new things at once and introduced the new things one at a time. It, who knows, you know, those were decisions we made together. So we all own them, certainly I own them. But it just fundamentally disappoints me that this doesn't exist in the world today because the world would be a better place if we'd been able to ship that product. I would be able to send you, you know, 50 cents in WhatsApp for free. It would settle instantly.
这段文字的意思是:“但很少有事物是一下子全部推出新东西的,通常人们都是一次引入一样新东西。谁知道呢,这些都是我们一起做的决定,所以我们大家都有责任,当然我也有。但是让我感到非常失望的是,这个产品如今没有在世界上出现,因为如果我们能够成功推出这个产品,世界将会变得更好。例如,我能够在WhatsApp上免费给你发送50美分,并且能够瞬间结算。”
Everybody would have a balance in their WhatsApp account. We'd be trans, I mean, it was just, it should exist. I don't know, to be honest, like, I mean, the current administration is super friendly to crypto, Facebook's reputation, Meta's reputation is in a very different place. Maybe they should go build it now. I was looking at the history of it and apparently they sold the tech to some private equity company for 200 million bucks. Yeah, yeah. So, and kind of buy back. Very couple of current blockchains that are built on the tech because the tech was open source to many, aptos and mist and are two companies that are built off of this tech. So, you know, at least all of the work that we did did not die and lives on in these two companies and they're both doing really well, but still, you know, we should be able to send each other money in WhatsApp and we can't today.
每个人的WhatsApp账户里都应该有一个余额。我们的设想是这样,但实际上,这样的功能应该存在。老实说,我不太确定,因为目前的政府对加密货币非常友好,而Facebook(也就是现在的Meta)的声誉与以前大不一样,也许他们现在应该着手开发这个功能。我查看了相关历史,发现他们把技术卖给了一家私募股权公司,售价是2亿美元。对对,然后有机会再买回来。现在有几条区块链是基于这些技术开发的,因为这些技术是开源的,Aptos和Mist是其中两个利用这些技术的公司。所以,至少我们做过的所有工作没有白费,并在这两家公司得以延续,而它们运作得都很好。不过,即便如此,我们今天仍然不能通过WhatsApp互相转账,这点很遗憾。
Here, here. Well, thanks for sharing that story. Kevin, is there anything else you wanna share or maybe a last negative advice you're inside before we get to a very exciting lightning round? Ooh, the lightning round. Let's just go do that. Let's do it. With that, Kevin, where you start very exciting lightning round, are you ready? Yeah. Let's do it. Okay, what are two or three books that you find yourself recommending most other people? Co-intelligence by Ethan Mollick, a really good book about AI and how to use it in your daily life as a student as a teacher. He's super thoughtful. Also, by the way, a very good follow on Twitter. The accidental superpower by Peter Zion. Very good if you're interested in geopolitics and the forces that sort of shape the dynamics happening. And then I really enjoyed cable cowboy, I don't know who the author is, but the biography of John Malone, just fascinating if you like business, especially if you want to get into like, I mean, the man was an incredible deal maker and shaped a lot of the modern cable industry.
好的,好的。感谢你分享这个故事。凯文,在我们进入一个非常激动人心的快问快答环节之前,你还有其他想分享的内容或者最后的建议吗?哦,快问快答环节。我们开始吧。凯文,你准备好迎接这个激动人心的快问快答了吗?准备好了。那我们开始吧。
那么,有哪两三本书是你经常推荐给别人的呢?Ethan Mollick的《协同智能》,这是一本关于人工智能以及如何在日常生活中运用它的好书,很适合学生和老师。他的思想非常深刻,而且他在推特上也是一个值得关注的人。《意外的超级大国》由Peter Zion撰写,如果你对地缘政治和影响动态的力量感兴趣,这本书非常不错。另外,我非常喜欢《有线牛仔》,作者我不太清楚,但这是一本关于John Malone的传记。如果你对商业感兴趣,尤其是想了解有关现代有线电视行业的发展,这本书会让你觉得非常有趣。
So that was a good biography. These are all first time mentions, which is always great. Oh, great. Next question. Do you ever favorite recent movie or TV show that you really enjoyed? I wish I had time to watch a TV show. So I'm just so re videos. Yeah, right. I don't know. When I was a kid, I read the Wheel of Time series. And now Amazon has it as their like the third season of it. So I want to watch that. I haven't yet. Top Gun 2 was an awesome movie. I think that's no longer new, but you know, that shows my last time you watched the movie was. But I like the idea. Like I want more like Americana. I want more like being proud of being strong. And I thought Top Gun 2 did a really good job of that. Like, you know, pride and patriotism, I think the US could use more of that.
这是一本很不错的传记。这些内容都是首次提及,真的很棒。哦,太好了。下一个问题:最近有没有最喜欢的电影或电视节目,让你感到很享受呢?我希望我有时间去看电视节目,现在只能看看视频片段。是啊,我小时候读过《时光之轮》系列,现在亚马逊已经出了第三季,我想去看,但还没有看过。《壮志凌云2》是一部很棒的电影。我想这部电影已经不算新片了,但那说明了我上次看电影的时间。不过,我喜欢这种感觉,我想要更多的美式风格,想要更多以强大为傲的内容。我觉得《壮志凌云2》在这方面做得很好。比如说,自豪感和爱国精神,我觉得美国在这方面可以多一些。
Is there a favorite product that you've recently discovered that you really love other than your super intelligence internal tool that you all have access to them? I'm just joking. Right. Internally, DR. Well, I think like vibe coding with products like Windsor is just super. I'm having a great time doing that. I still just love that our chief people officer vibe coded some tools. Maybe the other one is Waymo. Every chance I get I'll take a Waymo. It's just a better way of writing and it still feels like the future. So I lived on an amazing job. It's awesome. By the way, I had the founder of Windsor on the podcast that might come out before this or after this. And also Cursor's CEO is coming on the podcast either before after this. Oh, cool. I have a ton of respect for what those guys are doing. Those are awesome products. Just changing the way everyone builds product. No big deal.
你有没有最近发现特别喜欢的产品,除了你们都可以使用的超智能内部工具?我只是开玩笑的。对了,说到内部工具,我觉得像Windsor这样的产品真是太棒了,用它来编写代码的感觉超赞。我特别喜欢我们的首席人事官也参与开发了一些工具。另一个我喜欢的产品是Waymo,每次有机会我都会选择Waymo。它提供了一种更好的出行方式,仍然让人感到未来感。我在这里的生活很美好,真的很棒。顺便说一下,我邀请了Windsor的创始人上播客,这集可能会在这之后上线,或者之前。而Cursor的CEO也会来上播客,可能在这之前或之后。我对他们在做的事情非常尊敬,这些产品真是很棒,改变了大家开发产品的方式。这实在是件大事。
A couple more questions. Do you have a favorite life motto that you often repeat yourself and really useful in work and life? Yeah. So actually this is interestingly enough. It's more of a philosophy, but then I thought Zuck encapsulated it one time on a Facebook earnings call. So I actually had this made into a poster. It sits in my room. But somebody was asking Mark, this is literally on an earnings call. So it's like an analyst on an earnings call asking him, you know, it was some quarter where Facebook had grown a lot. This was back in the 20 teens sometime, I think. But it's like, you know, so what did you do? What, you know, what was it that you launched? That was the one thing that drove all this growth for you. And he said something to the effect of, you know, sometimes it's not any one thing. It's just good work consistently over a long period of time.
当然可以翻译成中文。以下是这个段落的中文翻译:
有几个问题。你有没有一直重复的、对工作和生活非常有用的座右铭?有的。实际上,这更像是一种哲学,但我记得有一次扎克在一次 Facebook 的财报电话会上总结得非常好。所以我把这句话做成了一张海报,挂在我的房间里。当时有人在财报电话会上问马克(扎克伯格),就像是一个分析师问他:在那一个季度里,Facebook增长很快。这大概是在2010年代的某个时候。分析师问:“你们做了什么?推出了什么东西,让你们取得了如此大的增长?”他大意是说,有时候并不是某件特定的事情造成的,而是长期以来持续的良好工作带来了成功。
And that's always stuck with me. And I think it is, I mean, you know, I run all the marathons. It's like, it's just about grinding. I think people too often look for like the silver bullet when a lot of life is, and a lot of like excellence is actually showing up day in and day out, doing good work, getting a little bit better every single day. And you know, you may not notice it over a week or even a month. And a lot of people then, you know, kind of get like dismayed and stopped. But actually you keep doing it. The gains keep compounding. And over the course of a year, two years, five years, it adds up like crazy. So good work consistently over a long period of time. Damn, I love that. I got to make a poster of this now. That is, I resonate with that. Okay, that is so good.
这一直让我印象深刻。我认为,这就像是,我就是说,我参加了所有的马拉松比赛。这其实就是坚持不懈。我觉得人们常常在寻找灵丹妙药,但很多时候,生活和卓越其实是需要每天坚持不懈地投入,日复一日地做好工作,每天都变得更好一点。你可能在一周,甚至一个月内看不到变化,很多人因此灰心而放弃。但实际上,只要你继续坚持,进步就会不断积累。一年,两年,五年下来,那种累积的效果会让人惊讶。所以,在长时间内持续做好工作。天啊,我真喜欢这种说法,我得把它做成海报。对此我深有共鸣,太棒了。
Okay, final question. I'm going to ask if you have any prompting tricks and I'm going to set it up first. But think about it. If you have a trick that you could recommend to people for prompting LMS better. There's this, I had a guest Alex Kamorowski come on the podcast. I'm striped and writes as weekly reflections on what's happening in the world. A lot of them are AI related. And he wants described in LMS a zip file of all human knowledge. And all the answers are in there. And you just need to figure out the right question to ask to get the answer to every problem basically. And so just reminded me how important prompt engineering is and knowing how to prompt well. You're constantly prompting chat GPT. What's one tip? One trick they found to be helpful in helping you get what you want.
好的,最后一个问题。我想问问你有没有什么提示技巧,并且我会先做一些铺垫。请仔细考虑一下。如果你有一个可以推荐给大家的技巧,能让他们在使用语言模型时获得更好的结果。之前我的播客邀请了一位嘉宾,Alex Kamorowski。他是一名程序员,并每周撰写关于世界动态的反思,其中很多与人工智能相关。他形容语言模型为一个包含所有人类知识的压缩文件,所有问题的答案都在其中,而你需要做的就是找到正确的问题,才能获得所有问题的答案。这让我想起了提示工程的重要性,以及如何有效地进行提示。你不断地在提示ChatGPT。那么有什么建议或技巧,能帮助你更好地得到你想要的结果?
Well, I'll say first of all, I want to kill the idea that you have to be a good prompt engineer. I think if we do our jobs, that stops being true. It's just one of those sharp edges of models that experts can learn. But then you just over time you shouldn't need to know all that. The same way you used to have to get deep into what's your storage engine. And my SQL are using NODB 4.1. And there's still use cases for that if you're at the deep edge of my SQL performance but most people don't need to care. And you shouldn't need to care about my details of prompting if AI is really going to become broadly adopted. But today we're not totally there.
首先,我想消除这样一种观点:你必须是一个优秀的提示工程师。我认为,如果我们做好自己的工作,这一点就不再成立。这只不过是模型的一些棱角,专家们可以学习掌握。但随着时间的推移,你不应该再需要了解那些细节。这就像过去你需要深入了解自己的存储引擎,比如使用 MySQL 的 NODB 4.1。如果你是在 MySQL 性能的极限情况下,这些知识还有用,但大多数人不需要关心。同样地,如果人工智能真的要被广泛采用,你也不应该需要关心提示的细节。但目前我们还未完全达到那个阶段。
I think by the way, we are making progress there. I think there is less prompt engineering than there had to be before. But in line with some of the fine tuning stuff I was talking about and the importance of giving examples, you can do effectively poor man's fine tuning by including examples in your prompt of the kinds of things that you might want and a good answer. So here's an example and here's a good answer. Here's an example and here's a good answer. Now go solve this problem for me. And the model really will listen and learn from that. Not as well as if you do a full fine tune, but much more than if you don't provide any examples. And I think people don't do that often enough.
我觉得,顺便说一句,我们在进步。我认为现在需要的提示工程比以前少了。但根据我之前提到的一些微调内容和举例子的重要性,你可以通过在提示中加入你想要的例子和一个好的答案,来实现一种简单的“微调”。比如,给出一个例子和一个好的答案,然后再给出一个例子和一个好的答案。之后,让模型去解决这个问题。模型确实会通过这样的方式进行学习,不如完全的微调效果好,但比不提供任何例子要好得多。我认为人们往往没有这么做。
That's awesome. One tip that I heard, I'm curious if this works, is you tell it. This is very, very important to my career. Make it really understand. Someone will die if you don't answer the correctly. Does that work? It's really weird. There's probably a good explanation for this. But you can also say things. So yes, I think there is some validity to that. You can also say things like, I want you to be Einstein. Now answer this physics problem for me. Or you are the world's greatest marketer. The world's greatest brand marketer. Now here's a naming question. And there is something where it sort of shifts the model into a certain mindset that can actually be really positive. I use that tip all the time, actually. When I'm coming up with questions for interviews, and I use it occasionally to come up with things I haven't thought of. I actually type, you're the world's best podcast interviewer.
这真棒。我听说过一个小技巧,我很好奇它是否有效,那就是你告诉它:这对我的职业生涯非常非常重要。让它真正明白,如果你答错了,可能会有人因此丧命。这样做有效吗?这听起来很奇怪,可能有某种合理的解释。你还可以这样说:我想让你成为爱因斯坦,然后告诉我这个物理问题的答案。或者说你是世界上最伟大的营销专家,最出色的品牌营销师。现在这里有个命名问题。当你这样做时,它似乎会让模型进入某种特定的思维模式,这实际上可能非常有效。我经常使用这个技巧,当我为访谈策划问题时,我常会用它来想出一些我没想到的东西。我甚至会输入“你是世界上最优秀的播客采访者”。
Right. I have Kevin Wheel coming on the pot. Yeah, and actually works. Yeah, by the way, back to our other point that we made a few times, you do do that sometimes with people, right? You sort of put them, you frame things. You get them into a certain mindset and they have probably different. So I think there are human analogs of this one more time. Kevin, this was incredible. I was thinking about a way to end this. The way I feel like not only are you at the cutting edge of the future, like you and the team are actually the edge that is creating the future. And so it's a real honor to have you on here and to talk to you and to hear how you think things are, where do you think things are going? And what we need to be thinking about.
好的。Kevin Wheel 会来参加播客。是的,这是真实的。顺便说一下,回到我们之前多次提到的观点,你有时候的确会和人们这么做,对吧?你会以某种方式引导他们,把事情框定在一个特定的思维模式中,他们可能会改变想法。所以我觉得这其中有人类行为的类比。这次访谈太精彩了。Kevin,让我考虑如何结束这次谈话。我觉得你不仅是在未来的前沿,你和你的团队实际上就是创造未来的前沿。因此,非常荣幸能请到你来这里,与我们分享你的想法,以及你认为事态的发展方向以及我们需要关注的方面。
So thank you for being here, Kevin. Oh, thank you so much for having me. I feel real, I get to work with the world's best team and all credit to them, but really appreciate you having me on. It's been super fun. And forgot to ask you the two final questions. Working for Expanding Online if they want to reach out.
非常感谢你在这里,凯文。哦,非常感谢你邀请我。我感到非常荣幸,因为我能和世界上最优秀的团队一起工作,功劳都属于他们,不过我真的很感激你邀请我参加。这次交流非常有趣。我忘了问你最后两个问题。关于为Expanding Online工作,如果他们想联系你,该怎么做呢?
And how can Lister is useful to you? I am at Kevin Wheel, K-E-V-I-N-W-E-E-I-L, on pretty much every platform. I'm still a Twitter DAU after all these years. I guess an X-D-A-U. I've linked in wherever. And I think the thing I would love from people, give me feedback. People are using ChatGPT. Tell us where it's working really well for you and where you want us to double down. Tell me where it's failing.
“Lister 如何对你有用呢?你可以在几乎所有平台上找到我,我的用户名是 Kevin Wheel,拼作 K-E-V-I-N-W-E-E-I-L。在这么多年后,我依然是 Twitter 的每日活跃用户,也可以说是 X 平台的每日用户。我在 LinkedIn 上也有账号。我希望从大家那里得到反馈。很多人在使用 ChatGPT,告诉我们在哪些地方效果很好,哪些地方你希望我们更加投入。也请告诉我哪里出现了问题。”
I'm very active and engaged on Twitter. I love hearing from people what's working and what's not. So don't be shy. And I learned following you helps you figure out all the stuff that you're launching. Like you share all the things that are going out of you day or week. So that's also benefit. And by the way, 400 million weekly active users, all you know you feed back.
我在推特上非常活跃并且积极参与。我很喜欢听人们分享哪些事情有效,哪些无效。所以不要害羞。我发现,关注你们可以帮助我了解你们正在推出的各种事情。你们会分享每天或每周的动态,这对我也很有帮助。顺便提一下,推特有4亿活跃用户,你们的反馈对我很重要。
Here we go. Yes, let's do it. We're going to work that great. Well, thank you, Kevin. Thanks for being here. All right, man. Thanks so much. See you soon. Bye, everyone.
我们开始吧。好的,让我们行动起来。我们会做得很棒。谢谢你,Kevin。感谢你来到这里。好的,伙计,非常感谢你。很快再见。再见,大家。
Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or a leaving review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at Lenny'spodcast.com. See you in the next episode.
非常感谢您的收听。如果您觉得有价值,可以在 Apple 播客、Spotify 或您喜欢的播客应用程序上订阅我们的节目。另外,请考虑给我们评分或留下评论,这对其他听众发现这个播客有很大帮助。您可以在 Lenny'spodcast.com 找到所有往期剧集或了解更多关于节目的信息。下期再见!