首页  >>  来自播客: Lenny's Podcast 更新   反馈

The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei Li

发布时间 2025-11-16 14:00:20    来源
A lot of people call you the godmother of AI. The work you did actually was the spark that brought us out of AI winter. In the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. 2017-ish was the beginning of companies calling themselves AI companies. There's a line I think this was when you're presenting to Congress. There's nothing artificial about AI. It's inspired by people. It's created by people and most importantly it impacts people. It's not like I think AI will have no impact on jobs or people. In fact, I believe that whatever AI does currently or in the future is up to us. It's up to the people. I do believe technology is a net positive for humanity. But I think every technology is a double-edged sword. If we're not doing the right thing as a society, as individuals, we can screw this up as well.
很多人称你为AI教母。你所做的工作实际上点燃了AI复苏的火花。在2015年到2016年中期,一些科技公司避免使用“人工智能”这个词,因为他们不确定这个词是否有负面含义。大约在2017年,公司开始自称AI公司。有一句话,我想是在你向国会展示时提到的:人工智能并非真的"人工",它是受人启发,由人创造的,最重要的是,它影响着人类。我并不认为人工智能不会对工作或人类产生影响。事实上,我相信人工智能目前或将来的作为取决于我们,取决于人类。我相信,科技对人类是总体积极的,但我也认为每项技术都有两面性。如果我们作为一个社会或作为个体没有做对的事情,我们也可能把事情搞砸。

You had this breakthrough inside of just okay, we can train machines to think like humans, but it's just missing the data that humans have to learn as a child. I chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We need to train machines with as much information as possible on images of objects. But objects are very, very difficult to learn. A single object can have infinite possibilities that is shown on an image in order to train computers with tens and thousands of object concepts. You really need to show it millions of examples.
你有一个发现:虽然我们可以训练机器像人类一样思考,但它们缺乏人类从小学习的丰富数据。我选择通过视觉智能的视角来研究人工智能,因为人类是非常依赖视觉的动物。我们需要通过丰富的物体图像信息来训练机器。然而,学习物体是非常困难的,因为一个物体在图像上可以有无数种呈现方式。为了让计算机学会成千上万的物体概念,你确实需要给它展示数百万个例子。

Today, my guest is Dr. Fei-Fei Li, who's known as the godmother of AI. Fei-Fei has been responsible for and at the center of many of the biggest breakthroughs that sparked the AI revolution that we were currently living through. She spearheaded the creation of ImageNet, which was basically her realizing that AI needed a ton of clean, labeled data to get smarter. And that data said became deep breakthrough that led to the current approach to building and scaling AI models.
今天,我的嘉宾是被誉为“人工智能教母”的李飞飞博士。李飞飞在许多引发我们正在经历的人工智能革命的重大突破中起到了核心作用。她主导创建了ImageNet,这是因为她意识到人工智能需要大量干净、标记明确的数据来变得更智能。这些数据成为了深层次的突破,促成了当前构建和扩展人工智能模型的方法。

She was chief AI scientist at Google Cloud, which is where some of the biggest early technology breakthroughs emerged from. She was director at Sail, Stanford's artificial intelligence lab, where many of the biggest AI minds came out of. She's also a curator of Stanford's Human Centered AI Institute, which is playing a vital role in a direction that AI is taking. She's also been on the board of Twitter. She was named one of Times 100 Most Influential People in AI. She's all suddenly United Nations Advisory Board. I could go on.
她曾是谷歌云的首席人工智能科学家,在那里出现了一些早期的重要技术突破。她曾是斯坦福大学人工智能实验室(SAIL)的主任,那里培养了许多顶尖的人工智能人才。她也是斯坦福大学以人为中心的人工智能研究所的策展人,该研究所在人工智能发展方向上扮演着重要角色。她还曾在推特的董事会任职,并被《时代》杂志评选为100位最具影响力的人工智能人物之一。最近,她还成为了联合国顾问委员会的一员。我还可以继续列举她的成就。

In our conversation, Fei-Fei shares a brief history of how we got to today in the world of AI, including this mind-blowing reminder that nine to ten years ago, calling yourself an AI company was basically a death knell for your brand. Because no one believed that AI was actually going to work. Today, it's completely different. Every company is an AI company. We also chat about her take on how she sees AI impacting humanity in the future. How far current technologies will take us? Why she's so passionate about building a world model and what exactly world models are?
在我们的谈话中,Fei-Fei简要介绍了我们如何走到今天的AI世界,包括一个惊人的提醒:大约九到十年前,称自己为AI公司的话基本上是宣布品牌的死刑。因为当时没有人相信AI真的会奏效。而今天,情况完全不同。几乎每家公司都是AI公司。我们还讨论了她对AI未来如何影响人类的看法。当前的技术能带我们走多远?为什么她对构建“世界模型”如此充满热情,以及什么是“世界模型”?

And most exciting of all, the launch of the world's first large world model, Marble, which just came out as this podcast comes out. Anyone can go play with this at marble.worldlabs.ai. It's insane. Definitely check it out. Fei-Fei is incredible and way to under the radar for the impact that she's had on the world. So I am really excited to have her on and to spread her wisdom with more people. A huge thank you to Ben Horowitz and Condoleezza Rice for suggesting topics for this conversation. If you enjoyed this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube.
最令人兴奋的是,全球首个大型世界模型——Marble的发布。这款模型在这期播客上线时刚刚推出。任何人都可以在marble.worldlabs.ai上体验这个模型。这真是太疯狂了,绝对值得一试。Fei-Fei非常了不起,她对世界的影响力一直被低估了。我很高兴能邀请她来分享智慧,并与更多的人分享。特别感谢Ben Horowitz和Condoleezza Rice为这次对话提供的话题建议。如果您喜欢这期播客,别忘了在您常用的播客应用或YouTube上订阅和关注。

With that, I bring you Dr. Fei-Fei Lee after a short word from our sponsors. This episode is brought to you by Figma, Makers of Figma Make. When I was a PM at Airbnb, I still remember when Figma came out and how much it improved how we operated as a team. Suddenly, I could involve my whole team in the design process, dip feedback on design concepts really quickly, and it just made the whole product development process so much more fun.
在此之后,我们将为您呈现李飞飞博士。不过在此之前,请允许我先介绍一下我们的赞助商。这一集由Figma赞助,Figma Make的创造者。还记得当我在Airbnb担任产品经理时,Figma刚推出时,它大大提升了我们团队的运作效率。突然间,我可以让整个团队参与到设计过程中来,能快速地对设计概念进行反馈,这让整个产品开发过程变得更加有趣。

But Figma never felt like it was for me. It was great for giving feedback and designs, but as a builder, I wanted to make stuff. That's why Figma built Figma Make. With just a few prompts, you can make any idea or design into a fully functional prototype or app that anyone can iterate on and validate with customers. Figma Make is a different kind of vibe coding tool. Because it's all in Figma, you can use your team's existing design building blocks, making it easy to create outputs that look good and feel real and are connected to how your team builds.
但我一直觉得 Figma 并不适合我。虽然它在反馈和设计方面很出色,但作为一个创造者,我希望动手制作。因此,Figma 开发了 Figma Make。通过几个简单的提示,你就可以把任何想法或设计转换为一个全功能的原型或应用程序,供任何人进行迭代和与客户验证。Figma Make 是一种不同风格的代码编写工具。因为它完全集成在 Figma 中,你可以使用团队现有的设计构件,非常容易地创造出美观且真实的结果,并且与团队的构建方式紧密相连。

Stop spending so much time telling people about your product vision and instead show it to them. Make code back prototypes and apps fast with Figma Make. Check it out at figma.com slash money. Did you know that I have a whole team that helps me with my podcast and with my newsletter? I want everyone on my team to be super happy and thrive in the roles. JustWorks knows that your employees are more than just your employees. They're your people. My team is spread out across Colorado, Australia, Nepal, West Africa, and San Francisco. My life would be so incredibly complicated to hire people internationally to pay people on time and in their local currencies and to answer their HR questions 24-7. But with JustWorks, it's super easy. Whether you're setting up your own automated payroll, offering premium benefits, or hiring internationally, JustWorks offers simple software and 24-7 human support from small business experts for you and your people. They do your human resources right so that you can do right by your people.
停止花费大量时间向别人解释你的产品愿景,不如直接展示给他们看。使用 Figma Make 快速制作代码支持的原型和应用。可以在 figma.com/money 查看。在我的播客和新闻通讯中,有一个团队在帮助我。我希望团队中的每个人都能在他们的岗位上快乐成长。JustWorks 明白,你的员工不仅仅是员工,他们是你的人。我的团队分布在科罗拉多州、澳大利亚、尼泊尔、西非和旧金山。要在国际上招聘,按时用当地货币支付工资,并全天候回答人力资源问题,这对我来说将非常复杂。但有了 JustWorks,一切变得非常简单。无论是设置自动化工资单、提供优质福利,还是进行国际招聘,JustWorks 为您及您的员工提供简单的软件和全天候的小企业专家支持。他们为您做好人力资源的工作,以便您可以更好地对待您的员工。

JustWorks for your people. Faye, thank you so much for being here and welcome to the podcast. I'm excited to be here, Lenny. I'm even more excited to have you here. It is such a treat to get to chat with you. There's so much that I want to talk about. You've been at the center of this AI explosion that we're seeing right now for so long. We're going to talk about a bunch of the history that I think a lot of people don't even know about how this whole thing started. But let me first read a quote from Wired about you. Just so people get a sense. In the intro, I'll share all of the other epic things you've done. But I think this is a good way to just set context. Faye is one of a tiny group of scientists, a group perhaps small enough to fit around a kitchen table who are responsible for AI's recent remarkable advances. A lot of people call you the Godmother of AI. Unlike a lot of AI leaders, you're an AI optimist. You don't think AI is going to replace us. You don't think it's going to take all our jobs. You don't think it's going to kill us. So I thought it'd be fun to start there. Just what's your perspective on how AI is going to impact humanity over time?
为你们的人提供JustWorks服务。Faye,非常感谢你来到这里,欢迎参加我们的播客节目。我很高兴能来到这里,Lenny,我更高兴有你在。这是一次难得的机会能和你交谈。我有太多想要交流的内容。你一直处于当前我们所看到的人工智能爆炸的核心位置已久。我们将谈论许多历史,这些事情很多人可能都不知道是如何开始的。但首先,让我读一下《连线》杂志关于你的一段话,以便让大家了解一下。在引言中,我会分享你所做过的所有其他史诗般的事情。但我认为这个是个很好的背景信息。Faye属于一个极小的科学家群体,可能小到可以围坐在一张厨房桌子旁,他们对AI最近的显著进步负有责任。很多人称你为AI教母。与许多AI领袖不同,你是一个人工智能乐观主义者。你不认为AI会取代我们,也不认为它会夺走我们所有的工作,更不认为它会摧毁我们。所以我觉得从这里开始会很有趣。你如何看待AI将如何随着时间的推移影响人类?

Yeah. Okay. So Lenny, let me be very clear. I'm not a utopian. So it's not like I think AI will have no impact on jobs or people. In fact, I'm a humanist. I believe that whatever AI does in currently or in the future is up to us. It's up to the people. So I do believe technology is a net positive for humanity. If you look at the long course of civilization, I think we are an fundamentally we're an innovative species that we, you know, if you look at from, you know, written record thousands of years ago to now humans just kept innovating ourselves and innovating our tools. And with that, we make lives better. We make work better. We build civilization. And I do believe AI is part of that. So that's where the optimism comes from. But I think every technology is a double-edged sword.
好的,Lenny,我想说清楚一点。我不是一个空想主义者,所以我并不认为人工智能不会对工作或人类产生任何影响。实际上,我是一个人文主义者。我相信无论是目前还是将来人工智能做什么,都取决于我们人类。技术的发展对人类总体来说是积极的。回顾漫长的文明历程,我认为我们从本质上是一个有创新精神的物种。从几千年前的文字记载到现在,人类一直在创新自己和我们的工具。通过这些创新,我们改善了生活,提高了工作质量,并建设了文明。我相信人工智能也是其中的一部分,这也是我对未来感到乐观的原因。但我也认为每一项技术都有两面性。

And if we're not doing the right thing, as a species, as a society, as communities, as individuals, we can screw this up as well. There's this line. I think this was when you're presenting to Congress. There's nothing artificial about AI. It's inspired by people. It's created by people and most importantly, it impacts people. I don't have a question there, but what a great line. Yeah, I feel pretty deeply. You know, I started working AI two and a half decades ago and I've been having students for the past two decades. And almost every student who graduates, I remind them, you know, when they graduate from my lab, that your field is called artificial intelligence, but there's nothing artificial about it.
如果我们在物种、社会、社区或个人层面上没有做正确的事情,我们也可能把事情搞砸。这有一句话。我想这是你在向国会汇报时说的。人工智能并不“人工”,它是由人类启发的,由人类创造的,最重要的是,它影响着人类。我没有问题要问,但这是句很了不起的话。是的,我深有感触。你知道,我在两年半前开始研究人工智能,过去的二十年里,我一直在带学生。几乎每一个毕业的学生,我都会提醒他们,当他们从我的实验室毕业时,你们的领域叫做“人工智能”,但实际上它一点也不“人工”。

Coming back to the point you just made about how it's kind of up to us, about where this all goes, what is it you think we need to get right? How do we set things on a path? I know this is a very difficult question to answer, but just what should, what's your advice? What do you think we should be in mind? Yeah, like how many hours do we have? How do we align AI? There we go, let's solve it. Yeah, so I think people should be responsible individuals, no matter what we do. This is what we teach our children and this is what we need to do as grown-ups as well, no matter which part of the AI development or AI deployment or AI application you are participating in. And most likely many of us, especially as technologists, were in multiple points, we should act like responsible individuals and care about us, actually care a lot about us.
回到你刚才提到的那个点,说这个事情的发展方向很大程度上取决于我们,那么你认为我们需要把什么事情做好?我们该如何设定一个好的发展路径?我知道这个问题很难回答,但你有什么建议?我们应该注意些什么?是啊,这个话题可以讲上几个小时呢,怎么才能让AI的发展更符合我们想要的方向呢?好的,来解决这个问题。我认为无论我们做什么,人们都应该做一个负责任的人。这是我们教给孩子的道理,也是我们作为成年人应该做到的。不论你参与的是哪个阶段的AI开发、部署或应用,我们中的很多人,尤其是技术人员,可能会参与多个阶段,我们应该像负责任的人那样行事,关心我们自己,实际上要非常关心我们自己。

I think everybody today should care about AI because it is going to impact your individual life, it is going to impact your community, it's going to impact the society and the future generation, and caring about it as a responsible person is the first but also the most important step. Okay, so let me actually take a step back and kind of go to the beginning of AI. Most people started hearing and caring about AI is what it's called today. Just like I don't know, a few years ago when ChatGPT came out, maybe it was like three years ago. Three years ago, almost one more month, three years ago. Wow, okay, that was ChatGPT coming out, is that the milestone? Yeah, yeah, mine, okay, cool, that's exactly how I saw it. But very few people know there was a long, long history of people working on, it was called Machine Learning Back Then and there's other terms and now it's just everything's AI.
我认为当今每个人都应该关注人工智能,因为它将影响你的个人生活、社区、社会以及未来几代人的生活。作为一个有责任心的人,关注人工智能是第一步,也是最重要的一步。好吧,让我退一步,说说人工智能的起源。大多数人开始听说并关注现在所谓的人工智能,可能是在几年前,比如当ChatGPT问世的时候,大概是三年前。三年前,差不多还有一个月就到三年了。哇,那时候ChatGPT出现,这是不是个重要的里程碑?对,是的,完全是这样。然而,很少有人知道,早在称为“人工智能”之前,有人长期在这方面进行研究,当时更多被称作“机器学习”,还有其他术语,现在这些都被统称为人工智能。

And there was kind of like a long period of just a lot of people working on it and then there's is what people were first used to the AI winter where people just gave up almost, most people did and just, okay, this idea isn't going anywhere. And then the work you did actually was essentially the spark that brought us out of AI winter and is directly responsible for the world we're in now of just AI is all we talk about as you just said, it's going to impact everything we do. So that would be really interesting to hear from you, just kind of like the brief history of what the world is like before ImageNet, then just the work you did to create ImageNet why that was so important and then just what happened after.
有一段时间,许多人在这个领域努力工作,然后迎来了人们习惯称之为“人工智能寒冬”的时期。在那个时期,大多数人几乎都放弃了,认为这个想法不会有任何进展。然而,你所做的工作实际上是点燃了走出人工智能寒冬的火花,并且直接促成了我们现在这个经常谈论人工智能的世界。正如你所说的,人工智能将影响我们所做的一切。所以,我们很想听听,你能不能简要介绍一下ImageNet出现之前的世界是什么样的,你创建ImageNet的工作为什么如此重要,以及在那之后发生了什么。

It is for me hard to keep in mind that AI is so new for everybody when I lived my entire professional life in AI. There's a part of me that is just, it's so satisfying to see a personal curiosity that I started barely out of teenage hood and now has become a transformative force of our civilization. It generally is a civilization level technology. So that journey is about about 30 years or 20 something, 20 plus years and it's just very satisfying. So where did I all start? Well, I'm not even the first generation AI researcher. The first generation really date back to the 50s and 60s and you know, Alan Turing was ahead of his time by in the 40s by asking daring humanity with the question, can we is there thinking machines?
对我来说,很难记住AI对于大家来说还是一个非常新的事物,因为我的整个职业生涯都与AI息息相关。有一部分的我,对这样一件事情感到极大的满足:我在刚刚成年时开始的个人好奇心,现在已经成为了改变我们文明的力量。总体来说,这是一项文明级别的技术。这段历程大概持续了30年,或20多年,总之二十多年,这让我感到非常满足。那么这一切是从哪里开始的呢?其实,我甚至不是第一代的AI研究者。第一代的AI研究可以追溯到上世纪50年代和60年代,而艾伦·图灵在20世纪40年代就以那个时代超前的思维向人类发问:“我们是否能创造出会思考的机器?”

Right. And of course, he has a specific way of testing this concept of thinking machine, which is a conversational chapot, which to his standard, we now have a thinking machine. But that was just a more anecdotal inspiration. The field really began in the 50s when computer scientists came together and look at how we can use computer programs and algorithms to build programs that can do things that have been only incapable by human cognition. And that was the beginning and the founding fathers, the Dartmouth, the workshop in the 1956. You know, we have Professor John McCarthy who later came to Stanford who coined the term artificial intelligence.
好的。当然,他有一种特定的方法来测试这种思维机器的概念,这就是通过对话机器人。他认为按照这个标准,我们已经有了思维机器。但这只是一种轶事性的灵感。这个领域真正起步于20世纪50年代,当时计算机科学家们聚集在一起,研究如何利用计算机程序和算法来创建能够执行原本只有人类认知能力才能完成的任务的程序。这是该领域的起点。1956年的达特茅斯会议被认为是该领域的奠基事件之一。在会上,约翰·麦卡锡教授,之后来到斯坦福工作,他还创造了“人工智能”这个术语。

And between the 50s, 60s, 70s and 80s, it was the early days of AI exploration and we had logic systems, we had expert systems. We also had early exploration of neural network. And then it came to around the late 80s, the 90s and the very beginning of the 21st century. That stretch about 20 years is actually the beginning of machine learning. It's the marriage between computer programming and statistical learning. And that marriage brought a very, very critical concept into AI, which is that purely rule-based program is not going to account for the vast amount of cognitive capabilities that we imagine computers can do. So we have to use machines to learn the patterns.
在20世纪50年代、60年代、70年代和80年代之间,是人工智能探索的早期阶段,那时我们有逻辑系统和专家系统,也开始了对神经网络的早期探索。然后,到20世纪80年代末、90年代和21世纪初的那段时间,大约20年的时间实际上是机器学习的开端。这是计算机编程与统计学习的结合。这种结合为人工智能带来了一个非常重要的概念:单纯依靠基于规则的程序无法涵盖我们想象中计算机可能具备的广泛认知能力。因此,我们必须用机器来学习模式。

Once the machines can learn the patterns, it has the hope to do more things. For example, if you give it three cats, the hope is not just for the machines to recognize these three cats. The hope is the machines can recognize the fourth cat, the fifth cat, the sixth cat, and all the other cats. And that's a learning ability that is fundamental to humans and many animals. And we as a field realize we need machine learning. So that was up till the beginning of the 21st century. I entered the field of AI literally in the year of 2000. That's when my PhD began at Caltech. And so I was one of the first generation machine learning researchers.
一旦机器能够学习到模式,它就有希望去做更多的事情。例如,如果你给它看三只猫,希望不仅是让机器识别这三只猫,而是希望机器能够识别第四只猫、第五只猫、第六只猫及所有其他的猫。这种学习能力是人类和许多动物的基本能力。我们这个领域意识到了需要机器学习。直到21世纪初,我在2000年进入了人工智能领域,那时我在加州理工学院开始攻读博士学位。所以,我是第一代机器学习研究者之一。

And we were already studying this concept of machine learning, especially in your network. I remember that was one of my first courses in the at Caltech is called your network. But it was very painful. It was still smack in the middle of the so-called AI winter, meaning the public didn't look at this too much. There wasn't that much funding. But there was also a lot of ideas flowing around. And I think two things happened to myself that brought my own career so close to the birth of modern AI is that I chose to look at artificial intelligence through the lens of visual intelligence. Because humans are deeply visual animals. We can talk a little more later. But so much of our intelligence is built upon visual perceptual spatial understanding, not just language per se. I think they're complementary.
我们那时已经在研究机器学习的概念,特别是在你的网络中。我记得我在加州理工学院上的第一门课程之一就叫你的网络。但那时候这门课很难,因为正值所谓的“AI寒冬”时期,公众对此并不太关注,也没有太多的资金支持。不过,那时也有很多创意在流传。我认为,推动我职业生涯与现代AI诞生紧密相关的有两件事:其一是我选择通过视觉智能这个视角来研究人工智能。因为人类是高度视觉化的动物,我们的智力很大程度上是基于视觉感知和空间理解,而不仅仅是语言本身。我认为这两者是互为补充的。

So I choose to look at visual intelligence. And my PhD and my early professor years, my students and I are very committed to a North Star problem, which is solving the problem of object recognition. Because it's a building block for the perceptual world. Right? We go around the world, interpreting, reasoning and interacting with it, more or less at the object level. We don't interact with the world at the molecular level. We don't interact with the world as we sometimes do. But we rarely, for example, if you want to lift a teapot, you don't say, okay, the teapot is made of a hundred pieces of porcelain. And let me work on these a hundred pieces. You look at this as one object and interact with it.
所以,我选择研究视觉智能。 在我攻读博士学位和教授早期的职业生涯中,我和我的学生们全心投入于一个北极星问题,那就是解决物体识别问题。因为这是感知世界的基础构件,对吧?我们在世界中穿行、理解、推理和互动,基本上都是在物体层面上进行的。我们不会在分子层面上与世界互动。有时我们确实以不同的方式互动,但很少这样做。举个例子,如果你想提起一个茶壶,你不会说,好,这个茶壶是由一百块瓷片组成的,让我挨个处理这上百块瓷片。你会把它看作一个整体,然后与之互动。

So object is really important. So I was among the first researchers to identify this as a North Star problem. But I think what happened is that as a student of AI, and I learned a researcher of AI, I was working on all kinds of mathematical models, including your network, including Bayesian network, including many, many models. And there was one singular pain point is that these models don't have data to be trained up. And as a field, we were so focusing on these models, but it don't down me that human learning, as well as evolution, is actually a big data learning process. Humans will learn with so much experience, you know, constantly and evolution.
对象确实非常重要。我是首批将其识别为“北极星问题”的研究人员之一。但我认为,作为一名人工智能的学生,当我逐渐成为人工智能研究员时,我一直在研究各种数学模型,包括神经网络、贝叶斯网络等许多模型。然而,有一个明显的痛点,那就是这些模型缺乏训练的数据。在这一领域,我们过于专注于这些模型,却没注意到人类学习和进化实际上是一个大数据学习的过程。人类通过不断的经验进行学习,并且经历进化。

If you look at time, animals evolve with experiencing the world. So I think my student and I conjectured that very critically overlooked ingredient of bringing AI to life is big data. And then we began this image that project in 2006, 2007. We were very ambitious. We want to get the entire internet's image data on objects. Now granted, internet was a lot smaller than today. So I feel like that emission was at least not too crazy. Now it's totally delusional to think a couple of grander students and the professor can do this. But and that's what we did. We curated very carefully 15 million images on the internet, created a taxonomy of 22,000 concepts, borrowing other researchers work like linguists work on WordNet.
如果你观察时间,会发现动物是通过体验世界而进化的。所以,我和我的学生推测,将AI带入生活的一个非常关键但被忽视的要素是大数据。然后,我们在2006年和2007年开始了这个图像项目。我们非常有野心,希望获得整个互联网关于物体的图像数据。 当然,那时的互联网比现在小得多,所以我觉得这个目标至少看起来不算太疯狂。如今再想一群研究生和一个教授能做到这点,就显得完全不切实际了。不过我们确实做到了这一点:我们非常仔细地收集整理了互联网上的1500万张图像,并借鉴其他研究者的工作,比如语言学家在WordNet上的研究,创建了一个包含22000个概念的分类体系。

And it's a particular way of dictionarying words. And we combine that into image that and we open source that to the research community, we held an annual image that challenge to encourage everybody to participate in this. We continue to do our own research. But 2012 was the moment that many people think was the beginning of the deep learning or birth of modern AI because a group of Toronto researchers led by Professor Jeff Hinton participated in image that challenge, used the image that big data and two GPUs from Nvidia and created successfully the first neural network algorithm that can it didn't totally solve but made a huge progress towards solving the problem of object recognition.
这是一种给单词编字典的独特方法。我们将其整合进ImageNet计划,并向研究社区开放源代码。我们举办了一年一度的ImageNet挑战赛,鼓励大家参与其中。我们也持续进行自己的研究。不过,2012年被许多人认为是深度学习或现代人工智能诞生的开端,因为由Jeff Hinton教授领导的多伦多研究小组参加了ImageNet挑战赛,使用了ImageNet的大数据和来自Nvidia的两个GPU,成功创建了第一个神经网络算法。虽然这个算法没有完全解决问题,但在解决物体识别问题上取得了巨大的进展。

And that combination of the TREAL technology, big data neural network and GPU was kind of the golden recipe for modern AI. And then fast forward the public moment of AI, which is the chat GPD moment, if you look at the ingredients of what brought chat GPD to the world, technically it still used these three ingredients. Now its internet scale data mostly text is a much more complex neural network architecture than 2012 but it's still neural network and a lot more GPUs but it's still GPUs. So these three ingredients are still at the core of modern AI. Incredible. I have never heard that full story before.
这种TREAL技术、大数据神经网络和GPU的结合可以说是现代人工智能的“黄金配方”。快速推进到人工智能的公共时刻,也就是ChatGPT的出现时刻,如果你仔细看一下是什么因素让ChatGPT问世,技术上它仍然使用这三个要素。如今,互联网级别的数据主要是文本,其神经网络架构比2012年要复杂得多,但它仍然是神经网络,使用的GPU数量多了很多,但仍然是GPU。因此,这三个要素仍然是现代人工智能的核心。难以置信,我以前从未听过这个完整的故事。

I love that it was two GPUs was the first and now it's said I don't know hundreds of thousands right that are orders of magnitudes are more powerful. And those two GPUs were they were like gaming GPUs, they just went to the like the game star right they were people used for playing games. As you said this continues to be in a large way the way models get smarter. Some of the fastest growing companies in the world right now I've had them all mostly on the podcast, Merchord, Surgeon scale. Like they do this they continue to do this for labs just give them more and more label data of the things they're most excited about.
我喜欢最初只是用两个GPU,如今已经发展到数十万甚至更多,而这些GPU的性能要强大得多。起初的那两个GPU其实是游戏GPU,人们用来玩游戏的。正如你所说的,这仍然是让模型变得更智能的主要方式之一。当下世界上一些增长最快的公司,我大多已经在播客上采访过,比如Merchord和Surgeon Scale。他们继续通过提供越来越多的标记数据来推动实验室的进步,他们对这些数据感到非常兴奋。

Oh yeah I remember Alex Wang from scale very early days I probably still has his emails when he was starting scale he he was very kind he keeps sending me emails about how you met that inspired scale. I was very pleased to see that. One of my other favorite takeaways from what you just shared is just such an example of high agency and just doing things that's kind of a memon twitter just you can just do things you're just like okay this is probably necessary to move AI and it's called machine learning back then right. Was that the term most people used?
哦,对,我记得Alex Wang,他在Scale的早期日子里。我可能现在还留着他当时发给我的邮件,那时他刚开始创办Scale,他对我很友好,常常发邮件给我,讲述我们的相遇如何激励了Scale的发展。看到这些让我很高兴。你刚才分享的内容中,我最喜欢的一点就是展现了高自主性和行动力,这让人联想到推特上的一种趋势:你可以直接去做事情,就像这样,“好吧,这可能是推动AI前进的必要步骤”,那会儿这被称为机器学习,对吗?大多数人用的就是这个术语吧?

I think it was interchangeably it's true like I do remember the companies the tech companies I am not going to name names but I was I was in a conversation in one of the early days I think is in the middle of 2015 middle of 2016 some tech companies avoid using the word AI because they were not sure if AI was a dirty word and I remember I was actually re encouraging everybody to use the word AI because to me that is one of the most audacious question humanity has ever asked in our quest for science and technology and I feel very proud of this term but yes at the beginning some people were not sure.
我记得在2015年到2016年之间,我和一些科技公司有过交流,当时这些公司不愿意使用“AI”这个词,因为他们不确定“AI”是否会被视为一个负面的词汇。我不会指出具体是哪家公司。当时,我鼓励大家使用“AI”这个词,因为我认为在我们追求科学和技术的过程中,这是人类提出的最大胆的问题之一,我对这个词感到非常自豪。但确实,一开始有些人对此并不确定。

What year was that roughly when AI was already working? 2016 I think it was less than ten years ago. That was the changing like some people start calling it AI but I think if you look at the Silicon Valley tech companies if you trace their marketing term I think 2017-ish was the beginning of companies calling themselves AI companies. That's incredible just how the world has changed. Now you can't not call yourself an AI company. I know. Nine-ish years later.
“人工智能大约是什么时候开始运作的呢?我想是2016年,还不到十年前。这是一个转折点,有些人开始称其为人工智能,但如果你看看硅谷的科技公司,追踪他们的营销术语,大概是从2017年开始,这些公司开始自称为人工智能公司。令人难以置信的是,世界发生了如此巨大的变化。现在,你几乎不能不称自己为人工智能公司。我知道,这大约是九年后的事情。”

Yeah. Oh man. Okay is there anything else around the history that early history that you think people don't know that you think is important before we chat about where things are going and the work that you're doing? I think as all histories you know I'm keenly aware that I am recognized for being part of the history but there are so many heroes and so many researchers. We're talking about generations of researchers. They're you know in my own world there are so many people who have inspired me which I talked about in my book but I do feel our culture especially Silicon Valley tends to assign achievements to a single person while I think it has value but it's just to be remembered AI is a field of at this point 70 years old and we have gone through many generations. Nobody.
好的。在了解历史的过程中,有没有什么早期历史方面的内容是你觉得人们可能不知道但却很重要的?在我们讨论事态发展和你正在进行的工作之前,我想讲讲这些。和所有历史一样,我深知我在历史中占有一席之地,但是有太多的英雄和研究人员值得被铭记。我们在谈论的是好几代的研究人员。在我自己的领域中,有很多人给予了我灵感,我在书中也提到了这些人。不过,我确实觉得我们文化,尤其是硅谷的文化,往往将成就归结于某一个人,虽然这有它的价值,但需要记住的是,人工智能领域已经有70年的历史,我们经历了很多代的发展。没有人是唯一的功臣。

No one could have gotten here by themselves. Okay. Let me ask you this question. It feels like we're always on this precipice of AGI this kind of vague term people throw around AGI's coming it's gonna take over everything. How what's your take on how far you think we might be from AGI? Do you think we're gonna get there on the current trajectory? Or on do you think we need more break views? Do you think the current approach will get us there?
没有人能够独自到达这里。好吧,让我问你这个问题。感觉我们总是处在通用人工智能(AGI)的边缘,这种模糊的术语总是被人们提起,AGI要来了,它要接管一切。你怎么看我们距离AGI还有多远?你认为我们会在目前的轨迹上到达吗?还是说你认为我们需要更多的突破?你觉得当前的方法能带我们到达吗?

Yeah this is a very interesting term Lenny. I don't know if anyone has ever defined AGI. You know there are many different definitions including you know some kind of superpower for machines all the way to can machines can become economically viable agents in the society in other words making salaries to live. Is that a definition of AGI? As a scientist I take science very seriously and I enter the field because I was inspired by this audacious question of chemistings think and do things in the way that human can do.
是的,Lenny,这个术语非常有趣。我不知道是否有人真正定义过通用人工智能(AGI)。你知道,对于AGI有很多不同的定义,从某种意义上的机器超级能力到机器是否能在社会中成为经济上可行的代理人,也就是说能够赚取薪水以维持生活。那算是AGI的定义吗?作为一个科学家,我非常认真地对待科学,因为我受到这个大胆问题的激励而进入这个领域,即让机器以人类能做到的方式思考和行动。

For me that's always the north star of AI and from that point of view I don't know what's the difference between AI and AGI. I think we've done very well in achieving parts of the goal including conversational AI but I don't think we have completely conquered all the goals of AI and I think our founding fathers the Alan Turing. I wonder if Alan Turing is around today and you ask him to contrast the AI versus AGI. Tim I just shrugged and said well I asked the same question back in 1940s so I don't want to get into a rabbit hole of defining AI versus AGI.
对我来说,这一直是人工智能的北极星。从这个角度来看,我不太清楚人工智能(AI)和通用人工智能(AGI)之间的区别。我觉得我们在实现部分目标上表现得很好,比如对话式人工智能,但我认为我们还没有完全达成所有人工智能的目标。我想知道如果艾伦·图灵还在,他会如何区分AI和AGI。或许他会耸耸肩说:“我在1940年代也问过同样的问题,所以我不想深陷于定义AI和AGI的区别。”

I feel AGI is more a marketing term than a scientific term as a scientist and technologist AI is my north star is my fields north star and I'm happy people call it whatever name they want to call it. Let me ask you maybe this way like you described there's kind of these components that from ImageNet and AlexNet kind of took us to where we're today GPUs essentially data label data just like the algorithm of the model there is also just a transformer feels like an important step in that trajectory.
作为一名科学家和技术专家,我觉得AGI(通用人工智能)更多是一个营销术语,而不是一个科学术语。人工智能是我的引导方向,也是我所在领域的引导方向,我对人们用任何名字称呼它都没有意见。我想这样问你,你提到过一些组成部分,比如ImageNet和AlexNet,它们在一定程度上带领我们走到了今天。GPU的本质就是数据、标注数据,就像模型的算法一样。而Transformer也在这个过程中感觉像是一个重要的步骤。

Do you feel like those are the same components that'll get us to I don't know 10 times smarter model something that's like life changing for the entire world. Or do you think we need more break theaters I know we're going to talk about world models which I think is a component of this but is there anything else that you think is like oh this little plateau or okay this will take us just need more data more compute more GPUs.
你觉得这些因素是否能够帮助我们开发出一个智力提高十倍的模型,一个对整个世界产生深远影响的模型?还是说我们需要更多的突破?我知道我们会谈到世界模型,我认为这是其中的一个因素,但你是否认为还有其他东西是关键的,比如目前的发展遇到了瓶颈,或者我们只是需要更多的数据、更多的计算能力和更多的GPU?

Oh no I definitely think we need more innovations. I think scaling loss of more data more GPUs and bigger current model architecture. is there's still a lot to be done there but I absolutely think we need to innovate more there's not a single deeply scientific discipline in human history that has arrived at a place that says we're done we're done innovating and AI is one of the if not the youngest discipline in human civilization in terms of science and technology we're still scratching the surface.
哦,我完全觉得我们需要更多的创新。我认为在更多的数据、更强大的GPU和更大的现有模型架构上,还有很多事情要做。但我绝对认为我们需要更多的创新。在人类历史上,还没有任何一个深层次的科学学科达到过一个可以说“我们已经完成,不再需要创新”的地步。人工智能是在人类文明中的科学技术领域中最年轻的学科之一,我们仍然只是在探索表面。

For example like I said we're going to segue into world models today you take a a model and run it through a video of a couple of office rooms and ask the model to count the number of chairs and this is something a toddler could do or maybe maybe a elementary school kid could do and AI could not do that right so there's just so much AI today could not do then let alone thinking about how did you know someone like Isaac Newton look at the movements of the celestial bodies and derive an equation or a set of equations that governs the movement of all bodies that level of creativity extrapolation abstraction we have no way of enabling AI to do that today.
例如,正如我所说,我们今天会转到世界模型的主题。你可以拿一个模型去分析几间办公室房间的视频,然后让模型数一下房间里的椅子数量。这是一个小孩甚至小学生都能做到的事情,而人工智能却做不到。如此看来,有许多事情是今天的人工智能无法完成的,更不用说像以撒·牛顿那样,通过观察天体的运动,推导出一个或一组方程来描述所有天体运动的这种高水平的创造力与抽象推演了。我们今天还没有办法让人工智能做到这一点。

And then let's look at emotional intelligence if you look at a student coming to a teacher's office and have a conversation about motivation passion what to learn what's the problem that's that's you know really bothering you that conversation as powerful as as today's conversational bots are you don't get that level of emotional cognitive intelligence from today's AI so there's a lot we can do better and I do not believe we're done innovating.
然后,让我们来看一下情感智力。如果你观察一个学生走进老师的办公室,进行一场关于动机、热情、学习内容以及真正困扰他们的问题的对话。即使当今的对话机器人已经非常强大,你仍然无法从当今的人工智能中获得那种情感和认知智力的水平。因此,我们还有很多改进的空间,我相信我们的创新之路还没有结束。

Demis had this really interesting interview recently from DeepMind sash google where someone asked him just like what do you think how far we from a GI what does it look like we're through there he had a really interesting way of approaching it is if we were to give them the most cutting edge model all the information until the end of the 20th century see if it could come up with all the breakthroughs Einstein had and so far we're never near that but they can just know we're now in fact it's even worse let's give AI all the data including modern instruments data of celestial bodies which Newton did not have and give it to that and just ask AI to create them six 17th century set of equations on the laws of bodily movements today's AI can I'll do that all right we're ways away.
最近,DeepMind的Demis接受了一次非常有趣的采访,采访中有人问他,我们距离达到通用人工智能(AGI)还有多远,以及这会是什么样子。他提出了一个有趣的视角:如果我们把当代最先进的模型以及截至20世纪末的所有信息交给AI,看看它能否得出所有爱因斯坦的突破。到目前为止,我们还远没有达到这种水平。更糟的是,即使我们把包括现代仪器观察到的天体数据在内的所有数据交给AI(这些数据是牛顿所没有的),并且要求AI去创建一套17世纪的物体运动定律的方程,当前的AI也做不到。总之,我们离目标还有很长的路要走。

So what I mean yeah okay so let's talk about world models this is uh to me this is just another really amazing example of you being ahead of where people end up so your way ahead on okay we just need a lot of clean data for AI and neural networks to learn you've been talking about this idea of world models for a long time you started a company to build uh essentially there's language models this is a different thing this is a world model we'll talk about that is and now as I was preparing for this Elon's like talking about world models Jensen's talking about world models I know Google's working on this stuff you've been at this for a long time and you're actually just launch something that's gonna you're gonna talk about uh right before this podcast airs.
好的,我的意思是,我们来谈谈世界模型。对我来说,这只是另一个你在别人之先的绝佳例子。你早就领先一步,认为我们只需要大量干净的数据来让人工智能和神经网络学习。你已经谈论世界模型这个概念很长时间了,而且还创立了一家公司来构建,实际上是语言模型,但这是另一种东西,这就是世界模型。我们会讨论它是什么。当我准备这次谈话时,发现埃隆和Jensen都在谈论世界模型,我也知道谷歌正在研究这方面的东西。你在这方面已经深耕很长时间了,而且你刚刚推出了一些即将在此播客播出前谈到的东西。

Um talk about what is a world model why is it so important I'm very excited to see that more and more people are talking about world models like Elon like Jensen um I have been thinking about really how to push AI forward all my life right and the large language models uh that came out of uh the research world and then open AI and and all this for the past few years were extremely inspiring even for a researcher like me I remembered when GPT2 came out and that was in I think late 2020 I was um co-director um I still am but I was at that time full time co-director of Stanford's uh human center AI institute and I I remember it was you know the public was not aware of the power of the large language model yet but as researchers we were seeing it we're seeing.
嗯,谈谈什么是世界模型,以及为什么它如此重要。我很高兴看到越来越多的人在讨论世界模型,比如Elon(马斯克)和Jensen(黄仁勋)。我一生都在思考如何真正推动人工智能的发展。而最近这些年来,来自研究界,然后是OpenAI,所推出的大型语言模型就算是对我这样的研究人员也是极大的鼓舞。我记得GPT-2大概是在2020年末推出的时候,我是斯坦福人类中心人工智能研究所的共同主任,当时是全职。而我记得那个时候,公众对于大型语言模型的强大能力并不清楚,但作为研究人员,我们已经看到了它的潜力。

And I had pretty long conversations with my natural language processing colleagues like Percy Liang and Chris Madding we were talking about how critical this technology is gonna be and the Stanford AI institute human center AI institute HDI was the first one to establish a full research center on foundation model we were Percy Liang and and many researchers led the first academic paper on foundation model so so it was just very inspiring for me.
我和我的自然语言处理同事,比如Percy Liang和Chris Madding进行过很长的对话,我们讨论了这项技术的重要性。斯坦福大学的人类中心人工智能研究院HDI是第一个建立基础模型完整研究中心的机构。Percy Liang和许多研究人员主导了第一篇关于基础模型的学术论文,这对我来说非常鼓舞人心。

So of course I come from the world of visual intelligence and I was just thinking there's so much we can push forward on beyond language because humans uh humans have used our sense of spatial intelligence a world understanding to do so many things and they are beyond language think about a very chaotic first responder scene whether it's fire or some traffic accident or or some natural disaster and it's if you immerse yourself in a little scene and think about how people organize themselves to to rescue people to stop further disasters to put down fires to to a lot of that is movements is is spontaneous understanding. of objects worlds human and situational awareness language is part of that but a lot of those situations language cannot get you to put down a fire so that is what is that I was thinking a lot and in the meantime I was doing a lot of robotics research and I it don't know me that the linchpin of connecting the additional intelligence in addition to language and connecting embodied AI which are robotics connecting visual is the sense of spatial intelligence about understanding the world and that's when I think I it was 2024 I gave a TED Talk about spatial intelligence at world models and I start formulating this idea back in 2022 based on my robotics and computer vision research and then one thing that was really clear to me is that I really want to work with the brightest technologist and and move as fast as possible to bring this technology to life and that's when we found it this company called world labs and you can see the world the world is in the title of our company because we believe so much in world modeling and spatial intelligence.
当然,我来自视觉智能的领域,我一直在思考我们可以在语言之外推进很多东西。因为人类使用空间智能和对世界的理解来做了很多事情,而这些是超越语言的。想想在一个非常混乱的紧急救援现场,不管是火灾还是交通事故,或是自然灾害等情景。如果你将自己置身于这样的场景中,观察人们如何组织自己来救援人员、阻止更多灾难、扑灭火灾,这些大多是通过动作和对物体、世界、人类及环境的自发理解实现的。语言是其中的一部分,但很多情况下,语言无法帮助你灭火。这是我一直在思考的内容。同时,我也在进行大量的机器人研究,我意识到将除语言之外的附加智能与具体现实结合起来的关键在于一种对世界的空间智能的理解。2024年,我做了一次关于空间智能和世界模型的TED演讲,而我早在2022年就开始基于我的机器人和计算机视觉研究形成这个理念。有一件事情对我非常明确,我真的想和最优秀的技术专家合作,尽快将这项技术变为现实。因此,我们创办了这家名为World Labs的公司,“世界”这个词被放入公司名称中是因为我们非常相信世界建模和空间智能。

People are so used to just chat box and that's a large language model the simple way to understand a world models you basically describe a scene and it generates an infinitely the explore the world will link to a the thing you launch which we'll talk about but just is that a simple way to understand it that's part of it Lenny I think a simple way to understand a world model is that this model can allow anyone to create any world in their mind's eye by prompting whether it's image or sentence and also be able to interact in this world whether you're browsing and walking or or picking objects up or or changing changing things as well as to reason within this world for example if if the person consuming if the agent consuming this output of the world model is a robot it should be able to plan its path and and help to you know tidy the kitchen for example so so world model is a a foundation that that you can use to reason to interact and to create worlds great yeah so robots feels like that's potentially the next big focus for AI researchers and just like the impact on the world and what you're saying here is this is a key missing piece of making robots actually work in the real world understanding how the world works.
人们已经习惯了聊天框,这是一种大型语言模型,可以简单理解为一种世界模型。基本上,你描述一个场景,它就会生成一个可以无尽探索的世界,这与我们将要讨论的内容有关。但简单来说,Lenny,我认为理解世界模型的简单方法是:这个模型可以让任何人在脑海中创建任何世界,只需输入提示,无论是图像还是句子,并且能够在这个世界中互动。不论你是浏览、行走、拾取物体还是改变东西,甚至在这个世界中进行推理。例如,如果使用这个世界模型输出的对象是机器人,它应该能够规划路径,比如帮助整理厨房。所以,世界模型是一种基础,你可以用来进行推理、互动和创造世界。 机器人似乎是AI研究人员的下一个重要焦点,这将对世界产生影响。你所说的就是一个关键的、让机器人能够在现实世界中真正运作的缺失部分——理解世界如何运作。

Yeah well first of all I do think there's more than robots that's exciting so but I agree with everything you just said I think world modeling and spatial intelligence is a key missing piece of a body AI I also think let's not underestimate that humans are embodied agents and humans can be augmented by AI's intelligence just like today humans are language animals but we're very much augmented by AI when helping us to you know do language tasks including software engineering I think that we shouldn't underestimate or maybe it's we tend not to talk about how humans as an embodied agent can actually benefit so much from world models and spatial intelligent models as well as robots can so the big onlocks here robots which a huge deal if this works out I imagine each of us has robots doing a bunch of stuff for us goes into you know they help us with disasters things like games obviously is a really cool example just like infinitely playable games that you just invent at your head and then creativity feels like just like being fun having fun being creative thinking of magic wild new worlds and environments and also design humans design from machines to buildings to homes and also scientific discovery.
好的,首先我确实认为除了机器人之外,还有很多令人兴奋的东西。所以,我同意你刚才所说的一切。我认为世界建模和空间智能是人工智能身体中缺失的一块关键拼图。我也认为,我们不要低估这样一个事实:人类是具身代理,而人类可以通过人工智能的智慧得到增强。就像今天,人类是语言的动物,但我们在处理语言任务(包括软件工程)时已经被人工智能极大地增强了。我认为我们不应该低估,或者说我们往往没有讨论到,人类作为具身代理,实际上可以像机器人一样,从世界建模和空间智能模型中获益良多。 其中,机器人是关键的突破领域之一,如果这个实现了的话,我想象我们每个人都有机器人在为我们做很多事情,比如在灾难中帮助我们,或者在游戏中——显然,这类例子非常酷,你可以在脑海中无限创造游戏。然后,创造力感觉就像是纯粹的乐趣,包括享受乐趣、进行创造、想象神奇的新世界和环境。同时,设计方面,人类从机器到建筑再到家居的设计,以及科学发现,都是如此。

Right there is so much I like to use the example of the discovery of the structure of DNA if you look at one of the most important piece in DNA's discovery history is the x-ray diffraction photo that was captured by Rosalyn Franklin and it was a flat 2D photo of a structure that looks like it looks like a cross with diffractions you can you can google those photos but with that 2D flat photo humans especially two important humans James Watson and Francis Crick in addition to their other information was able to reason in 3D space and deduce a highly three-dimensional double helix structure of the DNA and that structure cannot possibly be 2D you cannot think in 2D and deduce that structure you have to think in 3D spatial use the human spatial intelligence so I think even the scientific discovery spatial intelligence or AI assistant spatial intelligence is critical.
这里有一个例子我非常喜欢,就是DNA结构的发现。如果你回顾DNA发现历史中的一个关键环节,那就是罗莎琳·富兰克林(Rosalind Franklin)拍摄的X射线衍射照片。这张照片是一个平面的二维图片,展示了一个像十字架一样的衍射结构。你可以在网上找到这些照片。正是通过这张二维的平面照片,人类,尤其是两个重要的人物——詹姆斯·沃森(James Watson)和弗朗西斯·克里克(Francis Crick),结合其他信息,能够在三维空间中推理,并推导出DNA的高度三维双螺旋结构。而这个结构是不可能用二维思维来推导的,必须运用三维空间的思维能力,利用人类的空间智能。因此,我认为在科学发现中,空间智能或AI辅助的空间智能是至关重要的。

this is such an example of I think it was critics and they have this line that the next big thing is going to start off feeling like a toy when Chad GPC just came out if like I remember Sam mom and just tweeted is like here's a cool thing we're playing with check it out now it's the fastest growing product all of history change the world yeah and it's oftentimes the things that just look like okay this is cool that it's a fun to play with and end up changing the world most.
这是一个非常典型的例子,我想起了评论家们有一句话:下一个重大事物一开始往往会像个玩具。当 ChatGPT 刚推出的时候,我记得 Sam Altman(OpenAI 的 CEO)只是发了一条推文,说这是一个我们正在玩耍的酷东西,大家可以来试试。现在,它成为了历史上增长最快的产品,改变了世界。通常那些看起来只是“哦,这很酷,很好玩”的东西,最终会对世界产生深远的影响。

yeah this episode is brought to you by cinch the customer communications cloud here's the thing about digital customer communications whether you're sending marketing campaigns verification codes or account alerts you need them to reach users reliably that's where cinch comes in over 150,000 businesses including eight of the top 10 largest tech companies globally use cinches API to build messaging email and calling into their products and there's something big happening in messaging that product teams need to know about rich communication services or RCS think of RCS as SMS 2.0 instead of getting text from a random number your users will see your verified company name and logo without needing to download anything new it's a more secure and branded experience plus you get features like interactive carousels and suggested replies and here's why this matters US carriers are starting to adopt RCS cinch is already helping major brands send RCS messages around the world and they're helping Lenny's podcast listeners get registered first before the rush hits the US market learn more it gets started at cinch dot com slash Lenny that's s-i-n-c-h dot com slash Lenny.
这一集由 Cinch 客户通讯云赞助。关于数字化客户通讯,有一点很重要:无论是发送营销活动、验证代码还是账户提醒,你都需要确保这些信息能够可靠地到达用户。这就是 Cinch 的用武之地。超过 150,000 家企业,包括全球最大科技公司中的八家,使用 Cinch 的 API 将消息、电子邮件和通话功能集成到他们的产品中。 有一件大事正在消息领域发生,那就是产品团队需要了解的富通信服务(RCS)。可以把 RCS 看作是短信的2.0版。用户将不再从一个随机号码收到短信,而是看到经过验证的公司名称和标志,无需下载任何新应用。这提供了一个更加安全和品牌化的体验,而且你还可以获得互动轮播和建议回复等功能。 为什么这很重要呢?美国的通讯运营商正在开始采用 RCS,Cinch 已经在全球范围内帮助大品牌发送 RCS 信息,而且他们正在帮助 Lenny 播客的听众在 RCS服务进入美国市场之前率先注册。了解更多信息并开始使用,请访问 cinch.com/Lenny,也就是 s-i-n-c-h.com/Lenny。

i reached out to Ben Horowitz who loves what you're doing a big fan of yours uh their investors I believe in yeah we we've known each other for many years but yes right now they are investors of uh war labs amazing okay so I asked them what I should ask you about and he suggested ask you why is the bitter why is the bitter lesson alone not likely to work for robots so first of all just explain what the bitter lesson was in the history of AI and then just why that won't get us to where we want to be with robots.
我联系了Ben Horowitz,他非常喜欢你正在做的事情,也是你的粉丝。他们是我信任的投资者。是的,我们已经认识很多年了,现在他们是War Labs的投资者,真是太棒了。我问他们我应该问你什么问题,他建议我问你,为什么“痛苦的教训”单靠它不能在机器人领域发挥作用。首先,请解释一下“痛苦的教训”在AI历史中的意义,然后说明为什么单靠它无法让我们在机器人领域达到我们的目标。

so well first of all there are many bitter lessons but but the bitter lessons everybody refers to is a uh is a paper written by Richard Sutton who won the Turing Award recently and he does a lot of reinforcement learning and Richard has said right if you look at the the history especially the algorithmic development of AI it turns out simpler model with a ton of data always win at the end of the day instead of the the you know more complex model with less data I mean that was actually this paper came years after image that that to me was not bitter it was a sweet lesson that's why I built image that because I believe that big data plays that role.
首先,有很多痛苦的教训,但大家提到的“痛苦的教训”通常是指理查德·萨顿(Richard Sutton)撰写的一篇论文。他最近获得了图灵奖,并且在强化学习方面有很多研究。理查德指出,如果你查看AI的历史,特别是算法的发展,会发现简单的模型加上大量的数据总是最终获胜,而不是更复杂的模型搭配较少的数据。尽管这篇论文是在ImageNet项目诞生多年后出现的,但对我来说,这并不是一个痛苦的教训,而是一个甜蜜的教训。这也是为什么我创建了ImageNet,因为我相信大数据在其中发挥了重要作用。

so why can bitter lesson work in robotics alone well first of all um I think we need to give credit to where we are today robotics is very much in the early days of experimentation it's not the the research is not nearly as mature as say language models so many people are still um experimenting with different algorithms as some of those algorithms are driven by big data so I do think big data will continue to play a role in robotics and um but what is hard for robotics there are a couple of things one is that it's harder to get data it's a lot harder to get data you can say well there's a web data this is where the latest robotics research is using web videos and I think web videos do do play a role but if you think about what made language model worth a very as someone who does computer vision and and spatial intelligence and robotics I'm very jealous of my colleagues in in language because they had this perfect setup where their training data are in words eventually tokens and then the producer model that outputs words so you have this perfect alignment between what you hope to get which we call objective function and what your training data looks like.
为什么“苦痛的教训”(bitter lesson)在机器人领域会有效果呢?首先,我认为我们应该感谢目前所取得的成就。机器人学仍然处于实验的初期阶段,其研究的成熟度远不及语言模型。因此,许多人仍在尝试不同的算法,其中一些算法依赖于大数据。我确实相信大数据在机器人学中会继续发挥作用。 然而,机器人学面临几个挑战,其中之一就是获取数据更为困难。你可能会说有网络数据可以利用,现在最新的机器人研究正在使用网络视频。我认为网络视频确实有用。但如果你想想是什么让语言模型如此有效,作为一个从事计算机视觉、空间智能和机器人学的人,我非常羡慕我的语言学同事们。他们有一个完美的设置:他们的训练数据是文本,最终转化为标记,然后生成的模型输出也是文本。这在目标函数(想要达成的效果)和训练数据之间形成了完美的对齐。

but robotics is different even spatial intelligence is different you hope to get actions out of robots but your training data lacks actions in 3d world and that's what robots have to do right actions in 3d world so you have to um find different ways to fit a uh what do they call a a a a square in a round hole that what we have is tons of web videos so then we have to start talking about uh adding supplementing data such as teleoperation data or synthetic data so that the robots are trained with this hypothesis of bitter lesser which is large amount of data I think there's still hope because even what we are doing in world modeling will really unlock a lot of this information for robots but I think we have to be careful because we're at the early days of and bitter lessen is still to be tested uh because we haven't fully figured out the data for another part of the bitter lessen of robotics I think we should be so so realistic about is again compared to language models or even spatial models robots are physical systems so robots are closer to self-driving cars than a large language model and that's very important to recognize.
但机器人是不同的,即便是空间智能也是不同的。你希望机器人能够在三维世界中执行动作,但你的训练数据缺乏这方面的动作,而这正是机器人需要做的事情:在三维世界中执行正确的动作。因此,你必须找到不同的方法来解决这个问题——就像把方块塞进圆孔一样。我们有大量的网络视频,因此我们需要开始讨论补充数据,比如远程操作数据或合成数据,以便用大量数据来训练机器人。我认为还有希望,因为即使是我们在世界建模中所做的工作也能为机器人解锁大量信息。但我认为我们必须小心,因为我们还处于早期阶段,“大量数据假设”仍需验证。因为我们还没有完全解决机器人的"大量数据假设"的另一部分的数据问题。我们需要现实一点,因为与语言模型甚至空间模型相比,机器人还是物理系统,所以机器人更接近自动驾驶汽车而不是大型语言模型,这一点非常重要。

That means that in order for robots to work we not only need brains we also need the physical body we also need applications scenarios and if you look at the the the the history of self-driving car my colleague Sebastian Thrum took Stanford's car to win the first DARPA challenge in 2006 or 2005 it's 20 years since that prototype of a self-driving car being able to drive 130 miles in the Nevada desert to today's Weymill and on the street of San Francisco and we're not even done yet there's still a lot so that's a 20 year journey as self-driving cars are much simpler robots they're just metal boxes running on 2D surfaces and the goal is not to touch anything robot is 3D things running in 3D world and the goal is to touch things so the journey is going to be you know there's many aspects elements and of course one could say well the self-driving car early algorithm were pre-deplurning era so deep learning is accelerating the brains and I think that's true that's why I'm in robotics that's why I'm in spatial intelligence and I'm excited by it but in the meantime the car industry is very mature and productizing also involves the mature use cases supply chains the hardware.
这意味着,为了让机器人正常工作,我们不仅需要“大脑”,还需要“身体”和应用场景。如果你查看自动驾驶汽车的发展历史,我的同事Sebastian Thrun曾经在2005或2006年带领斯坦福大学的车赢得了首届DARPA挑战赛。从能够在内华达沙漠行驶130英里的自动驾驶汽车原型,到今天在旧金山街头行驶的Waymo,已经过去了20年,但我们仍未完成,还有很多工作要做。这是一个20年的旅程,而自动驾驶汽车是相对简单的机器人,它们只是金属箱在二维平面上行驶,目标是不碰到任何东西。而真正的机器人是在三维世界中工作,目标是接触物体。因此,这段旅程会有更多的方面和元素。当然,有人可能会说早期的自动驾驶汽车算法是在深度学习出现之前,而深度学习正在加速“大脑”的开发,我认为这是真的,这也是我从事机器人领域和空间智能研究的原因,我对此感到兴奋。但与此同时,汽车行业已经非常成熟,将产品化还涉及到成熟的用例、供应链和硬件的使用。

So I think it's a very interesting time to work in these problems but it's true but it's right we might still be subject to a number of bitter lessons doing this work do you ever just feel off for the way that brain works and is able to do all of this for us just the complexity just to get a machine to just walk around and not hit things and fall does just give you more respect for what we've already got. Totally we operate on about 20 watts that's dimmer than any light bulb in the room I'm in right now and yet we can do so much so I think actually the more I work in AI the more I respect human let's talk about this product you just launched called marble a very cute name talk about what this is why this import I've been playing with it it's incredible rolling to it and for folks to check it out what is marble.
我觉得现在是解决这些问题的非常有趣的时刻,但确实,我们可能在这个过程中会经历一些挫折。你有没有想过,人脑是如何运作的,从而能为我们完成所有这些事情?仅仅是要让机器四处走动而不碰到东西、不摔倒,就已经让我们更敬佩我们自身的能力。没错,人脑大概只在20瓦的能耗下运作,比我现在所在房间里的任何灯泡都要暗,但我们却能做出如此多的事情。实际上,我越是从事人工智能工作,就越加尊重人类。现在来说说你刚推出的产品,名字很可爱,叫做Marble。聊聊这是什么,为什么重要。我已经试玩过了,感觉很棒,赶紧看看Marble是什么吧。

Yeah I'm very excited so first of all marble is one of the first product that Warlabs has rolled out. Warlabs is a foundation frontier model company we are funded by four co-founders who have deep technical history my co-founders Justin Johnson Christoph Lassner and Ben Mildenhall we all come from the research field of AI communicographics computer vision and we believe that spatial intelligence and world modeling is as important if not more to language models and complementary to language models so we wanted to seize this opportunity to create a deep tech research lab that can connect the dots between frontier models with products so marble is an app that's built upon our frontier models we've spent a year and plus building the world's first generative model that can output genuinely 3D worlds that's a very very hard problem and it was a very hard process we have a team of incredible founding team of incredible technologists from you know incredible teams.
好的,我非常兴奋。首先,大理石(Marble)是Warlabs推出的第一个产品。Warlabs是一家基础前沿模型公司,由四位具有深厚技术背景的联合创始人资助。我和我的联合创始人Justin Johnson、Christoph Lassner和Ben Mildenhall,都来自人工智能、计算机图形学和计算机视觉的研究领域。我们认为空间智能和世界建模与语言模型同样重要,甚至更重要,并且可以与语言模型互为补充。因此,我们想抓住这个机会创建一个深度技术研究实验室,将前沿模型与产品连接起来。大理石(Marble)是基于我们前沿模型开发的应用程序。我们花了一年以上时间构建了世界上第一个能够真正生成3D世界的生成模型。这是一个非常非常困难的问题,过程也非常艰难。我们拥有一个由杰出技术专家组成的卓越创始团队,他们来自一些非常出色的团队。

And then around a month or two ago we saw the first time that we we can just prompt with a sentence and a image and multiple images and create worlds that we can just navigate in you if you put it on cargo which we have an option to let you do that you can even walk around right so it was even though we've been building this for quite a while it was still just awe inspiring and we wanted to get into the hands of people who needed and then we know that so many creators designers people who are thinking about robotic simulation people who are thinking about different use cases of navigable, interactable, immersive worlds game developers will find this useful so we developed the marble as a first step it's it's again still very early but it's the world's first model doing this and it's the world's first product that allows people to just prompt we call it prompt.
大约一两个月前,我们第一次看到可以通过一句话和一张图片来创建多个图像,并构建我们可以在其中导航的世界。如果你把它放在一个平台上,我们可以选择让你在其中游走。尽管我们已经构建了这一技术一段时间,仍然感到非常震撼,我们希望把它交到需要的人手中。很多创作者、设计师,以及负责机器人模拟的人,还有考虑不同用途的互动沉浸式世界的人和游戏开发者都会觉得这很有用。所以我们开发了一个叫做大理石的产品作为第一步。尽管这个项目还在早期阶段,但这是世界上第一个可实现这种功能的模型,也是第一个允许人们通过简单提示创建世界的产品。我们称这为提示。

Two worlds well I've been playing around it it is insane like you could just have a little shire world where you just infinitely walk around middle earth basically and there's no there's no one there yet but it's insane you just go anywhere there's like dystopian world I'm just looking at all these examples yes and my favorite part actually I don't know I don't know if there's a feature bug you can see like the dots of the world before it actually renders with all the textures and I just love to like you get a glimpse into what is going on with this model basically that is so cool to hear because this is where as a researcher I'm I'm learning because the the dots that lead you into the world was an intentional feature visualization it is not part of the model it's the model actually just generates the world but we we were trying to find a way to guide people into the world and a number of engineers worked on different versions but we converged on the dot.
两个世界。这款游戏太疯狂了,我一直在玩。在其中你可以有一个小小的“夏尔”世界,在那里你可以无限地漫游中土世界,基本上是一个无人居住的地方。这里有一个反乌托邦世界,我只是在看这些示例。实际上,我最喜欢的部分是——我不知道这算是个功能还是个漏洞——你可以在世界渲染出所有纹理之前看到世界的点阵。我特别喜欢看到这些点阵,让我窥见了这个模型会怎样生成这个世界。听到这个真是太酷了,因为作为研究人员,我正在学习这个过程。那些引导您进入世界的点阵其实是一个特意设计的可视化功能,并不是模型的一部分。模型其实只是生成这个世界,而我们尝试为人们找到一种方式来引导他们进入这个世界,许多工程师尝试了不同的版本,最终我们选择了点阵。

And so many people you're the only one told us how delightful that experience is and it was really satisfying for us to hear that this intentional visualization feature that's not just the big hardcore model actually has delighted our users wow so you add that to make it more like to have humans understand more and more to like wow that is hilarious it makes me think about a lens in the way they it's not the same thing but they talk about what they're thinking and what they're doing yes it is it is it also makes me think about just the matrix like it's exactly the matrix experience I don't know if that was your inspiration um what like I said a number of engineers worked on that it could be there inspiration it's in there it's in there uh it's in there subconscious yeah.
很多人中只有你告诉我们这个体验是多么愉快,这让我们感到非常满意。听到这个不仅仅是一个大型硬核模型的有意可视化功能让我们的用户感到愉悦,真是太棒了。你添加这个功能,让人类能更多地理解和感受到“哇,这真有意思”。这让我想到了镜头的功能,虽然不完全一样,但他们在交流自己的想法和行为。是的,这确实让我想到《黑客帝国》中的体验,我不知道这是否是你的灵感来源。正如我所说,许多工程师参与了这个项目,这可能是他们的灵感,潜意识里就有这种想法。

Okay so just for folks that have made a lot of play around with this maybe use a what's like what are some applications today that folks can start using today what's what's your goal with this launch yeah so um we do believe that world modeling is very horizontal but we're already seeing some really exciting uh use cases virtual production for movies because what they need are 3d uh worlds that they can align with the camera so when the actors are acting on it they can you know they can position the camera and shoot the segments really well and uh we're already seeing um incredible use in fact I don't know if you have seen our launch video showing marble it was produced by a virtual production company we collaborated with Sony and they use marble scenes to shoot those videos.
好的,对于那些已经在这个领域进行了大量探索的人来说,可以考虑今天可以开始使用的一些应用。那么,您推出这个产品的目标是什么? 嗯,我们确实认为世界建模应用范围很广,但我们已经看到一些非常激动人心的应用案例。例如,用于电影的虚拟制作,因为他们需要可以与摄像机对齐的3D世界,这样当演员在其中表演时,可以很好地定位摄像机和拍摄场景。事实上,我们已经看到了一些令人难以置信的应用。不知道你是否看过我们的发布视频,其中展示了一个由虚拟制作公司制作的大理石场景。我们与索尼合作,他们使用这些大理石场景来拍摄那些视频。

So our we were collaborating with those technical artists and directors and they were saying this has cut our production time by 40 x in fact it has 40 x yes in fact it has to because we only had one month to work on this project and there were so many scenes they were trying to shoot so so using marble really really significantly accelerated the production of virtual virtual production for VFX and movies that's one use cases we are already seeing our users putting taking our marble scene and taking the mesh export and putting games you know whether it's games on VR or games uh just just fun games that they have developed.
所以我们与那些技术艺术家和导演合作,他们说这确实将我们的制作时间缩短了40倍。实际上,这个项目本身就必须缩短到40倍,因为我们只有一个月的时间来完成,并且有很多场景需要拍摄。因此,使用Marble确实大大加速了特效和电影的虚拟制作。这是我们已经看到的一个用例之一,我们的用户已经在使用我们的Marble场景和网格导出,并将其应用于游戏中,无论是VR游戏还是他们开发的休闲游戏。

We have had um we were showing an example of a robotic simulation because uh when I was I mean I'm still a researcher doing robotic uh training one of the biggest pain point is to create synthetic data for training robots and these synthetic data needs to be very diverse they need to come from different environments with different objects to manipulate and uh and one path to it is is to ask computers to simulate otherwise humans have to you know build every single asset for robots that that's just going to take a lot longer.
我们展示了一个机器人模拟的例子,因为当我作为一个机器人训练的研究人员时,遇到的最大困难之一就是创造合成数据来训练机器人。这些合成数据需要非常多样化,来自不同的环境,并包含需要操作的不同物体。一种解决方法是让计算机进行模拟,否则人类就需要为机器人手动构建每一个细节,那将花费更多的时间。

So we already have researchers reaching out and wanting to use marble to create those synthetic environments. We also have unexpected um user outreach in terms of how they want to use marble for example a psychologist team called us to use marble to do psychology research it turned out some of the psychiatric patients they study they need to understand how their brain responds to different immersive scenes of different features uh for example messy scenes or clean scenes or or whatever you name it and it's very hard for researchers to get their hands on um this kind of immersive scenes and it will take them too long and too much budget to uh to create and marble is a really almost instantaneous way of getting so many of these um experimental uh environments into their hands.
我们已经有研究人员联系使用Marble来创建这些合成环境。此外,我们还收到了意想不到的用户反馈,例如有一个心理学团队联系了我们,希望使用Marble进行心理研究。结果表明,他们研究的一些精神病患者需要了解自己大脑对不同比特的沉浸式场景的反应,例如混乱场景、整洁场景等。研究人员很难获得这样的沉浸式场景,并且创造这些场景会花费太多时间和预算。而Marble几乎可以即时提供许多这样的实验环境,非常便利。

So we're seeing um now we're seeing multiple use cases at this point but the the VFX the game developers the simulation uh developers as well as designers are very excited this is very much the way things work in AI I've had other AI leaders on the podcast and it's always put things out there early as soon as you can to discover where the big use cases are the head of jet gpt told me how when they first put out jet gpt he was just scanning tiktok to see how people were using it and all the things they were talking about and that's what convinced them where to lean in and and help them see how people actually want to use it.
我们现在看到了多个应用案例,尤其是在视觉特效、游戏开发和仿真开发方面,以及设计师群体中,大家都非常兴奋。这正是人工智能领域的发展方式。我在播客中邀请过其他人工智能领域的领导者,他们总是建议尽早推出产品,以便发现重要的应用案例。Jet GPT的负责人告诉我,他们在首次发布Jet GPT时,会扫描TikTok,观察用户如何使用,以及他们正在讨论的内容。这让他们了解到用户的实际需求,并帮助他们确定哪些方向是值得深入发展的。

I love this last use case of like for therapy I'm just imagining like heights to people seeing dealing with heights or snakes or spiders which it's amazing a friend of mine last night literally called me and talked about his height scare and asked me if marble should be used that's amazing you went straight there that's you know because I'm imagining all the like the exposure therapy uh stuff like this could be so good for that uh that is so cool okay so let me uh I should have asked you this before but I think there's a there's gonna be a question of just how does this differ from things like VO3 and other video generation models it's pretty clear to me but I think it might be helpful just to explain how this is different from all the video AI tools people have seen.
我喜欢这个关于治疗的最后一个用例,我想象了那些对高度、蛇或者蜘蛛有恐惧的人,这真是太神奇了。昨天晚上我的一个朋友打电话跟我谈到他的恐高问题,还问我大理石是否可以用来应对恐惧。这真是太棒了,因为我想到很多像暴露疗法这样的东西,这样的技术可以对此非常有帮助。太酷了。我本来应该早点问你的,但我觉得这里可能会有一个问题,那就是它和VO3或其他视频生成模型有什么不同。尽管对我来说已经很明显,但我认为解释它与人们见过的其他视频AI工具的不同之处可能会很有帮助。

Wornglap's thesis is that spatial intelligence is fundamentally very important and spatial intelligence is not just it's not just about videos in fact the world is not passively watching videos passing by right I I love Plato has the allegory of the cave analogy to describe vision he said that imagine a prisoner titled his chair not not very humane but um in the cave watching a full life theater on the in front of him but but the actual life theater that actors are acting is behind his back it was just lit so that the projection of the the uh the action is on a on a wall of the cave and and then the goal the the task of this prisoner is to figure out what's going on.
Wornglap 的论文指出,空间智能非常重要,而空间智能不仅仅是观看视频。事实上,世界并不是被动地观看视频。柏拉图用洞穴寓言来描述视觉,他让我们想象一个囚犯坐在椅子上(这不太人道),在洞穴里观看眼前的完整剧场表演,但实际的生活剧场是在他身后上演的。光线让表演的投影映在洞穴的墙壁上,囚犯的任务就是弄清楚到底发生了什么。

It's a pretty extreme example but it really shows it describes what vision is about is that to make sense of the 3D world or 4D world out of 2D so spatial intelligence to me is deeper than only creating that flat 2D world spatial intelligence to me is the ability to create reason interact makes sense of deeply spatial world whether it's 2D or 3D or 4D including dynamics and all that so so world lab is focusing on that and of course um the ability to create videos per se could be part of this and in fact just a couple of weeks ago we rolled out the world's first real-time demoable real-time video generation and a single at h100 GPU.
这是一个极端的例子,但它确实展示了视觉的本质,就是从二维中理解三维或四维世界的能力。因此,对我来说,空间智能不仅仅是创建一个平面的二维世界。空间智能是能够在深度空间世界中进行创造、推理、互动,并理解其中的奥妙,无论是二维、三维或四维,包括动态等。因此,我们的实验室专注于这一领域。当然,能够创造视频也可以是其中的一部分。实际上,就在几周前,我们推出了世界上第一个能够实时演示的视频生成功能,这只需要一个 H100 GPU。

So we we part of our technology includes that but I think marble is very different because we really want creators designers developers to having their hands a model that can give them uh worlds with 3D structure so they can use it for for their work and that's where uh that's why marble is so different the way you see. it as it's a it's a platform for a ton of opportunity to do stuff uh as you describe videos are just like here's a one-up video that's very fun and cool and you could and that's it that's it and you move on by the way we could in marble we couldn't allow people to export in video forms so you could actually like you said you go into a world so so let's say it's a hobbit cave you can actually especially as a traitor you have such a specific way of uh moving the camera in a trajectory in the director's mind right and then you can export that from marble into a video.
所以,我们的技术也包含这一部分,但我认为Marble与众不同,因为我们真的希望创作者、设计师和开发者能够接触到一个可以为他们提供具有3D结构的世界的模型,以便他们能将其用于工作。这就是为什么Marble如此特别。在你看来,它是一个充满各种机会的平台。你提到的视频通常只是一段有趣和酷炫的一次性视频,然后就结束了。然而,在Marble中,我们允许用户以视频形式导出内容。比如说你进入一个世界,比如一个霍比特人的洞穴,特别是作为导演,你可以按照想法以特定的方式移动摄像机,然后你可以将这个过程从Marble导出为视频。

What does it take to create something like this just like how big is the team how many how many GPUs you work in like anything you can share there I don't know how much of this is private information but just what does it take to create something like this that you launched here it takes a lot of brain power so uh we just talk about 20 watts per brain it's uh so from that point of view it's a small number but it's actually incredible you know it's a half billion years of evolution to give us those power um we have a team of 30-ish people now and uh we are predominantly uh researchers are research engineers and uh but we also have designers and and product we we actually really believe that we want to create a company that's anchored in the deep-tech of spatial intelligence but we we are actually building serious products um so so we have we have this uh integration of R&D and productization and of course we use you know a ton of GPUs that's a next just a happy to hear.
要创建这样的东西需要什么条件?就像这样一个项目团队有多大,有多少人,使用多少GPU工作?能分享的东西都可以说一下。我不确定这里面有多少是私人信息,但就是想知道,为推出这样一个产品需要什么。这个过程需要大量的智力支持。每个大脑消耗大约20瓦特的能量,从这个角度看似乎不多,但实际上这是一种令人难以置信的能力,这是经过大约5亿年的进化才拥有的。我们现在有一个大约30个人的团队,主要是研究人员和研究工程师,但我们也有设计师和产品人员。我们确实坚信要创建一个以空间智能深度技术为基础的公司,同时也在打造严肃的产品。所以,我们在研发和产品化之间实现了有效整合。当然,我们也使用了大量的GPU,这也让我们感到高兴。

Well congrats on the launch I know there's a huge milestone I know it took a ton of work so I just want to say congrats to you and your team let me talk about your founder journey for a moment so you're a founder of this company started how many years ago a couple years ago two three years ago uh here ago uh here I go uh here you're okay 18 month yeah okay what's something you wish you knew before you started this that you wish you could like whisper into your faith of 18 months ago well I continue to wish I know the future of technology I think actually that's one of our founding advantage is that we see the future earlier in general than most people but still man this is so exciting and so uh amazing that that was unknown and what's coming but I know the reason you're asking me this question is a lot about the future of technology you're probably more you know look I I did not start a company of this scale at 20 year old so you know I started a dry cleaner when I was 19 but that's a little smaller scale we got to talk with that and then I you know um funded Google Cloud AI and then I founded a institute as Stanford but those are different beasts.
好,祝贺你们的产品成功发布!我知道这是一个重要的里程碑,需要付出大量的努力,所以我想对你和你的团队表示祝贺。现在让我谈谈你的创业历程吧。你是这家公司的创始人,公司是什么时候创立的?几年前?两三年前?还是18个月前?好的,现在让我问你一个问题:有什么是你希望在18个月前就知道的、可以告诉自己当时的事? 我一直希望我能预知科技的发展方向。我认为这其实是我们公司创业的一个优势,那就是比大多数人更早看到未来。然而,很多事情仍然未知,这反而让它显得更加刺激和奇妙。我知道你问这个问题是因为对未来科技很感兴趣。你知道,我并不是在20岁时就创办了规模这么大的公司。我19岁时开过一家干洗店,但那还是小规模的。之后,我支持了谷歌云的人工智能项目,还在斯坦福创办了一个研究所,但这些都是不同性质的事情。

I did feel I was a little more prepared as a a founder of the the grinding journey that that I um compared to maybe um maybe the the 20 year old funders but I still I'm surprised and and uh it puts me into paranoia sometimes that how intensely competitive AI landscape is from from the model the technology itself as well as talents and you know when I founded the company um we did not have these incredible stories of how much certain talents would cost you know um so these are things that continue to surprise me and uh and I have to be very alert about the competition you're talking about is yeah the competition for talent the speed which is the how things are moving.
我确实感觉自己作为创始人,在创业过程中比可能那些20岁的创业者准备得更充分一些。然而,我还是感到惊讶,有时甚至感到有些焦虑,因为AI领域的竞争异常激烈。这不仅体现在技术模型本身,也体现在人才方面。在我创办公司时,我们并没有听到这么多人讨论某些人才的高昂成本,所以这一直让我感到意外。我需要非常警惕这种竞争,竞争的方面包括对人才的争夺以及行业发展的速度。

Yeah yeah you mentioned this point that I want to come back to that you if you just look over the course of your career you were like at all of the major uh collections of humans that led to so many of the breakthroughs that are happening today obviously we talk about image net also just sale at Stanford is where a lot of the work happened Google Cloud which are a lot of the breakthroughs happened would brought you to those places uh like for people looking for how to advance in their career be at the center of the future just like is there a throughline there of just what pulled you from place to place and pulled you into those groups that might be helpful for people to hear.
你提到了一个我想回到的点:在你的职业生涯中,你几乎参与了所有重要的人类集合,这些集合促成了今天的许多突破。我们会提到ImageNet,以及斯坦福大学的工作,还有很多突破发生的谷歌云。那么,是什么让你来到这些地方的呢?对于那些希望在事业上取得进步的人来说,如何才能站在未来的中心?是否有一个贯穿始终的动力推动你从一个地方进入另外一个地方,以及加入这些团体,这可能对大家有所帮助。

Yeah this is actually a great question learning because I do think about it and uh obviously we talked about its curiosity and passion that brought me to AI that is more a scientific nor start right I did not care if AI was a thing nor not so so that was one part but how did I end up choosing um in the particular places I work in including starting world labs is I think I'm very grateful to myself or maybe to my parents' genes and I'm an intellectually very fearless person and I have to say when I hire young people I look for that because I um I think that's a very important quality.
是的,这确实是个很好的问题,因为我确实想过这个问题。显然,促使我进入人工智能领域的是好奇心和热情,而不是一开始就出于科学或者其他原因。我并不在意人工智能是不是一个热门领域,这是其中的一个方面。但至于我如何选择我工作的具体地方,包括创立World Labs,我认为我非常感激我自己,或者也许是我父母的基因,因为我是一个在智力上非常无畏的人。我不得不说,当我招聘年轻人时,我会寻找这种特质,因为我认为这是一个非常重要的品质。

If one wants to make a difference is that when you want to make a difference you have to accept that you're creating something new or you're diving into something new people haven't done that and if you have that self-awareness you almost have to allow yourself to be fearless and to be courageous so when I for example um came to steward you know in the world of academia I was very close to this thing called tenure um which is you know have the job forever in at Princeton but I I choose to chose to come to steward because I love Princeton it's my alma mater it's just at that moment there are people who are so amazing at steward and the Silicon Valley ecosystem was so amazing that I was okay to take a risk of restarting my tenure clock um going to um becoming the first female director of sale.
如果你想有所不同,就需要接受你正在创造新的东西,或者投入到别人没有尝试过的新领域。拥有这种自我认知,你几乎必须让自己无所畏惧,勇敢去尝试。比如,当我进入学术界时,我离终身职位非常接近了,这意味着可以在普林斯顿永远拥有工作。但我选择来到Steward,因为我热爱普林斯顿——它是我的母校。只是当时Steward有非常棒的人,他们与硅谷生态系统也很契合,这让我愿意冒风险,重新开始我的终身职位计时,并成为销售部的第一位女性主任。

I was actually relatively speaking of very young faculty at that time and I wanted to do that because I care about that community I didn't spend too much time thinking about all the failure cases obviously I was very lucky that the more senior faculty supported me but I just wanted to make a difference and then going to Google was similar I wanted to work with people like Jeff Dean Jeff Hinton and um all these incredible demos the incredible people um you know so so the same with world labs I I have this passion and I also believe that people with the same mission can do incredible things so that's how I guided my through through like I don't overthink of all possible things that can go wrong because that's too many I feel like that's an important element this is not focusing on the downside focusing more on the people the mission what gets you excited what do you think I do.
当时我算是比较年轻的教师,我选择投身其中是因为我关心这个群体。我没有花太多时间去考虑所有可能失败的情况,显然我很幸运,得到了很多资深教师的支持。我只是想有所作为。去谷歌也是出于类似的原因,我希望能和像杰夫·迪恩、杰夫·辛顿这样的人以及其他杰出人才一起工作。在世界实验室也是如此,我有这样的热情,也相信有共同使命的人能够创造奇迹。这就是我指引自己的方式,我不去过多考虑所有可能出错的地方,因为那些太多了。我认为一个重要的因素是不专注于负面,而是更关注人、使命,以及让你兴奋的事情,这就是我的看法。

Yeah I do want to say one thing to all the young talents in AI the engineers the researchers out there because some of you apply to world labs I feel very privileged you consider world labs I do find many of the young people today think about every single aspect of a equation when they decide on jobs at some point maybe you know maybe maybe that's the way they want to do it but sometimes I do want to encourage young people to focus on what's important because I find myself um constantly in mentoring mode when I talk to job job candidates not necessarily recruiting or not recruiting but just in mentoring mode when I see an incredible young talent who is over focusing on every minute dimension and aspect of considering a job when when maybe the most important thing is where's your passion do you align with the mission do you believe it have faith in this team and and just just focus on the impact.
我确实想对所有从事人工智能工作的年轻人才、工程师和研究人员说一句话,因为你们中有些人申请了World Labs,我很荣幸你们考虑加入World Labs。我发现现在的年轻人在选择工作时,会考虑到方方面面的因素。也许,这正是他们想要的方式。但有时候,我想鼓励年轻人专注于真正重要的事情。 因为当我与求职者交谈时,我经常处于指导模式中,这不仅仅是为了招聘与否,而是因为我看到一些令人惊叹的年轻人才过于关注作出选择时的每一个细节。有时,最重要的是你的激情所在,你是否认同这个使命,你是否相信这个团队,并专注于你所能产生的影响。

And you can make the kind of work in team you can you can work with yeah it's tough it's tougher people in the AI space now there's so much so much at them so much new so much happening so much from oh that's true I could see the stress and so I think that advice is really important just like what will actually make you feel fulfilled in what you're doing not just where's the fastest growing company or is the who's going to win.
在团队合作中,你可以做这种工作,是的,现在的AI领域确实很艰难,行业里的人面临太多新的事物,变化太快。我能看到这种压力,所以我觉得这个建议真的很重要:专注于什么能让你在工作中感到充实,而不仅仅是追求哪个公司发展最快,或者谁会胜出。

I don't know I want to make sure I ask you about the work you're doing today at Stanford at the HCI HCI Human Centered AI Institute what are you what are you doing there I know this is the thing you do on a site still so yes I HCI humans under AI Institute was co-founded by me and the group of faculty like Professor John H. Mendey Professor James Landey Professor Chris Manning back in 2018 I was actually finishing my last last the sabbatical at Google and it was a very very important decision for me because I could have stayed in the industry but my time at Google taught me one thing is AI is going to be a civilizational technology and it's it don't know me how important this is to humanity to the point that I actually wrote a piece in New York Times that year 2018 to talk about the need for a guiding framework to develop and to apply AI and that framework has to be anchored in human benevolence is human centeredness and I felt that Stanford one of the world's top university in the heart of Silicon Valley that gave birth to important companies from Nvidia to Google should be a thought leader to create this human centered AI framework and to to actually embody that in our research education and policy and in the ecosystem work.
我不太清楚,我想确保向您询问您今天在斯坦福大学人机交互与人本人工智能研究所的工作。您在那里做些什么?我知道这是您仍然在从事的事情。是的,我与John H. Mendey教授、James Landey教授和Chris Manning教授等一群教师于2018年共同创立了这个人本人工智能研究所。当时,我正在谷歌完成我的最后一个学术休假,这是一个对我非常重要的决定,因为我本可以留在行业中,但我在谷歌的经历教会了我一件事:人工智能将成为一种对文明具有重大影响的技术。我意识到这一点对人类的重要性,以至于在2018年我在《纽约时报》上写了一篇文章,谈到需要一个指导框架来发展和应用人工智能,而这个框架必须以人类的善意和以人为本为基础。我认为斯坦福大学作为世界顶尖大学之一,位于硅谷的中心,诞生了许多重要的公司,如Nvidia和Google,应该在创建以人为中心的人工智能框架方面成为思想领袖,并在我们的研究、教育、政策以及生态系统工作中体现这一框架。

So I founded HCI it you know after fast forward after six seven years it has become the world's largest AI institute that does human centered research education ecosystem outreach and policy impact impact it involves hundreds of faculty across all eight schools Stanford from medicine to education to sustainability to business to engineering to humanities to more and we we support researchers especially at the interdisciplinary area from digital economy to legal studies to political science to discovery of new drugs to to new algorithms to let's beyond transformers we also actually put a very strong focus on policy because when we started HCI I realized that Silicon Valley did not talk to Washington DC and or Brussels or other parts of the world and it's given how important this this technology is we need to bring everybody on board so we created multiple programs from congressional bootcamp to AI index report to policy briefing and we especially participated in policy making including advocating for a national AI research cloud bill that was passed in the first Trump administration and participate participating in state level regulatory AI discussions.
所以,我创立了HCI。经过六七年的快速发展,它已经成为全球最大的人类中心AI研究教育生态系统,涉及影响政策的研究。这个系统包括了来自斯坦福所有八个学院的数百名教授,从医学到教育,从可持续发展到商业,工程,人文学科等各个领域。我们支持跨学科领域的研究人员,覆盖数字经济,法律研究,政治学,新药发现,以及超越变压器的新算法。 我们非常重视政策方面的工作。因为在创立HCI的时候,我意识到硅谷与华盛顿特区、布鲁塞尔及世界其他地区之间的沟通不足。鉴于这项技术的重要性,我们需要让每个人参与进来。因此,我们创建了多个项目,如国会训练营、AI指数报告和政策简报。我们特别参与了政策制定,包括支持通过了特朗普政府期间的国家AI研究云法案,并参与了州级别的AI法规讨论。

So there's a lot we did and and I continue to be on one of the the leaders even though I'm much less involved operationally because I care not only we create this technology but we use it in the right way wow I was not aware of all that other work you were doing as you're talking as reminded Charlie Munger have this quote take a simple idea and take it very seriously I feel like you've done that in so many different ways and and stayed with it and it's unbelievable the impact that you've had in so many ways over the years I'm going to skip the lightning round and I'm just going to ask you one last question is there anything else that you wanted to share anything else you want to leave a list nurse with.
所以我们做了很多工作,即使我现在在业务操作上的参与减少了,我仍然是其中一位领导者,因为我关心的不仅仅是创造这种技术,还要以正确的方式使用它。哇,我之前并不知道你还在进行这么多其他的工作。正如你所说的,这让我想起查理·芒格的一句话:简单的想法要认真对待。我觉得你在很多不同的方式中都做到了这一点,并且坚持了下来,你这些年来所产生的影响难以置信。我将跳过快速问答环节,只想问你最后一个问题:你还有什么想分享的吗?或者有什么想对收听者说的吗?

I'm very excited by AI Lenny I want to answer one question that I when I travel around the world everybody asks me is that if I'm a musician if I'm a teacher middle school teacher if I'm a nurse if I'm a content if I'm a farmer do I have a role in AI or is AI just going to take over my life or my work and I think this is the most important question of AI and I find that in Silicon Valley we tend not to speak hard to heart with people with people like us and not like us in Silicon Valley but like all of us we tend to just toss around words like infinite productivity or infinite leisure time or or you know infinite power or whatever but at the end of the day AI is about people and when people ask me that question it's a resounding yes everybody has a role in AI it depends on what you do and what you want but no technology should take away human dignity and the human dignity and the agency should be at the heart of the development the deployment as well as the governance of every technology.
我对人工智能感到非常兴奋。每次我环游世界时,都有人问我一个问题:如果我是音乐家,中学教师,护士,内容创作者,或农民,我在人工智能中有自己的角色吗?还是说人工智能会接管我的生活或工作?我认为这是人工智能最重要的问题。我发现,在硅谷,我们往往没有与人们进行真正的心灵对话,而是抛出诸如无限生产力、无限闲暇时间或无限力量之类的词语。但归根结底,人工智能关乎人。当人们问我这个问题时,我坚定地回答:每个人在人工智能中都有角色,这取决于你的工作和愿望。但是,任何技术都不应剥夺人类的尊严,人类的尊严和主动性应该是每项技术的开发、部署以及管理的核心。

So if you are a young artist and your passion is storytelling uh embrace AI as a tool in fact embrace Marvel who I hope it becomes a tool for you um because the way you tell your story is unique and the world still needs it but how you tell your story how do you use the most incredible tool to tell your story in the most unique way is important and that that voice needs to be heard.
所以,如果你是一个年轻的艺术家,而你的热情是讲故事,那么请拥抱AI作为工具,实际上也可以拥抱像漫威这样能够成为你工具的东西。因为你讲述故事的方式是独一无二的,世界依然需要你的故事。但更重要的是,你要思考如何使用这些最棒的工具,以最独特的方式来讲述你的故事,让你的声音被听到。

If you're a farmer near retirement AI still matters because you're a citizen you can participate in your community you should have a voice in how AI is used how AI is applied you you work with people that you can you know encourage all of all of you to use AI to make life easier for you.
如果您是一位即将退休的农民,人工智能仍然很重要,因为作为一个公民,您可以参与社区事务,应该就人工智能的使用和应用发表意见。您可以与他人合作,鼓励大家利用人工智能,让生活更轻松。

If you're a nurse I hope you know that at least in my career I have worked so much in healthcare research because I feel our healthcare workers should be greatly augmented and helped by AI technology. Whether it's smart cameras to feed more information or robotic assistance because our nurses are overworked over fatigued and as our society ages we need more help for for people to be taken care of so AI can play that role.
如果您是一名护士,我希望您知道,至少在我的职业生涯中,我在医疗研究方面投入了大量工作,因为我认为我们的医护人员应该得到AI技术的大力支持和帮助。不论是智能摄像头提供更多信息,还是机器人辅助设备,因为我们的护士工作过于繁重,疲惫不堪,并且随着我们的社会老龄化加剧,照顾他人需要更多的帮助,因此AI可以在这方面发挥作用。

So I just want to say that it's so important that um even the technologies like me um are sincere about that everybody has a role in AI what a beautiful way to end it such a tie back to where we started about how it's up to us and taking individual responsibility for what AI will do in our lives.
我只是想说,即使是像我这样的技术人员也应该真诚地认识到,每个人在人工智能中都有自己的角色。这是一个多么美好的方式来结束这个话题,它完美地呼应了我们最开始讲的内容——这取决于我们自己,承担起人工智能在我们生活中将产生的影响的个人责任。

Final question we're can folks find marble we're gonna go maybe try to join uh world labs if they want to what's the website where do people go well world labs website is www.worldlabs.ai and you can find um you can find our research progress there we we have technical blogs you can find marble the product there you can sign in there you can find our job posts a link there you can uh you know we're in San Francisco we love to work with the world's best talents.
最后一个问题,人们可以在哪里找到 Marble?如果他们想加入 World Labs,该去哪里?网站是什么地址?好吧,World Labs 的网站是 www.worldlabs.ai。你可以在上面找到我们的研究进展,我们有技术博客,你可以在那里找到我们的产品 Marble。你可以在那里注册,也可以看到我们的招聘信息。我们位于旧金山,热爱与全球最优秀的人才合作。

Amazing buffet thank you so much for being here thank you Lenny hi everyone thank you so much for listening if you found this valuable you can subscribe to the show on ample podcasts spotify or your favorite podcast app also please consider giving us a rating or leaving a review as that really helps other listeners find the podcast you can find all past episodes or learn more about the show at Lenny's podcast.com see you in the next episode.
令人惊叹的自助餐,非常感谢你们的到来,谢谢你,Lenny。大家好,非常感谢你们的收听。如果你觉得这期内容有价值,可以在 Apple Podcasts、Spotify 或你喜欢的播客应用上订阅我们的节目。同时,请考虑给我们评分或留下评论,这对其他听众找到我们的播客非常有帮助。你可以在 Lenny's podcast.com 找到所有的往期节目或了解更多关于节目的信息。期待在下一期节目中与你们再见!



function setTranscriptHeight() { const transcriptDiv = document.querySelector('.transcript'); const rect = transcriptDiv.getBoundingClientRect(); const tranHeight = window.innerHeight - rect.top - 10; transcriptDiv.style.height = tranHeight + 'px'; if (false) { console.log('window.innerHeight', window.innerHeight); console.log('rect.top', rect.top); console.log('tranHeight', tranHeight); console.log('.transcript', document.querySelector('.transcript').getBoundingClientRect()) //console.log('.video', document.querySelector('.video').getBoundingClientRect()) console.log('.container', document.querySelector('.container').getBoundingClientRect()) } if (isMobileDevice()) { const videoDiv = document.querySelector('.video'); const videoRect = videoDiv.getBoundingClientRect(); videoDiv.style.position = 'fixed'; transcriptDiv.style.paddingTop = videoRect.bottom+'px'; } const videoDiv = document.querySelector('.video'); videoDiv.style.height = parseInt(videoDiv.getBoundingClientRect().width*390/640)+'px'; console.log('videoDiv', videoDiv.getBoundingClientRect()); console.log('videoDiv.style.height', videoDiv.style.height); } window.onload = function() { setTranscriptHeight(); }; if (!isMobileDevice()){ window.addEventListener('resize', setTranscriptHeight); }