A lot of people call you the godmother of AI. The work you did actually was the spark that brought us out of AI winter. In the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. 2017-ish was the beginning of companies calling themselves AI companies. There's a line I think this was when you're presenting to Congress. There's nothing artificial about AI. It's inspired by people. It's created by people and most importantly it impacts people. It's not like I think AI will have no impact on jobs or people. In fact, I believe that whatever AI does currently or in the future is up to us. It's up to the people. I do believe technology is a net positive for humanity. But I think every technology is a double-edged sword. If we're not doing the right thing as a society, as individuals, we can screw this up as well.
You had this breakthrough inside of just okay, we can train machines to think like humans, but it's just missing the data that humans have to learn as a child. I chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We need to train machines with as much information as possible on images of objects. But objects are very, very difficult to learn. A single object can have infinite possibilities that is shown on an image in order to train computers with tens and thousands of object concepts. You really need to show it millions of examples.
Today, my guest is Dr. Fei-Fei Li, who's known as the godmother of AI. Fei-Fei has been responsible for and at the center of many of the biggest breakthroughs that sparked the AI revolution that we were currently living through. She spearheaded the creation of ImageNet, which was basically her realizing that AI needed a ton of clean, labeled data to get smarter. And that data said became deep breakthrough that led to the current approach to building and scaling AI models.
She was chief AI scientist at Google Cloud, which is where some of the biggest early technology breakthroughs emerged from. She was director at Sail, Stanford's artificial intelligence lab, where many of the biggest AI minds came out of. She's also a curator of Stanford's Human Centered AI Institute, which is playing a vital role in a direction that AI is taking. She's also been on the board of Twitter. She was named one of Times 100 Most Influential People in AI. She's all suddenly United Nations Advisory Board. I could go on.
In our conversation, Fei-Fei shares a brief history of how we got to today in the world of AI, including this mind-blowing reminder that nine to ten years ago, calling yourself an AI company was basically a death knell for your brand. Because no one believed that AI was actually going to work. Today, it's completely different. Every company is an AI company. We also chat about her take on how she sees AI impacting humanity in the future. How far current technologies will take us? Why she's so passionate about building a world model and what exactly world models are?
And most exciting of all, the launch of the world's first large world model, Marble, which just came out as this podcast comes out. Anyone can go play with this at marble.worldlabs.ai. It's insane. Definitely check it out. Fei-Fei is incredible and way to under the radar for the impact that she's had on the world. So I am really excited to have her on and to spread her wisdom with more people. A huge thank you to Ben Horowitz and Condoleezza Rice for suggesting topics for this conversation. If you enjoyed this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube.
With that, I bring you Dr. Fei-Fei Lee after a short word from our sponsors. This episode is brought to you by Figma, Makers of Figma Make. When I was a PM at Airbnb, I still remember when Figma came out and how much it improved how we operated as a team. Suddenly, I could involve my whole team in the design process, dip feedback on design concepts really quickly, and it just made the whole product development process so much more fun.
But Figma never felt like it was for me. It was great for giving feedback and designs, but as a builder, I wanted to make stuff. That's why Figma built Figma Make. With just a few prompts, you can make any idea or design into a fully functional prototype or app that anyone can iterate on and validate with customers. Figma Make is a different kind of vibe coding tool. Because it's all in Figma, you can use your team's existing design building blocks, making it easy to create outputs that look good and feel real and are connected to how your team builds.
但我一直觉得 Figma 并不适合我。虽然它在反馈和设计方面很出色,但作为一个创造者,我希望动手制作。因此,Figma 开发了 Figma Make。通过几个简单的提示,你就可以把任何想法或设计转换为一个全功能的原型或应用程序,供任何人进行迭代和与客户验证。Figma Make 是一种不同风格的代码编写工具。因为它完全集成在 Figma 中,你可以使用团队现有的设计构件,非常容易地创造出美观且真实的结果,并且与团队的构建方式紧密相连。
Stop spending so much time telling people about your product vision and instead show it to them. Make code back prototypes and apps fast with Figma Make. Check it out at figma.com slash money. Did you know that I have a whole team that helps me with my podcast and with my newsletter? I want everyone on my team to be super happy and thrive in the roles. JustWorks knows that your employees are more than just your employees. They're your people. My team is spread out across Colorado, Australia, Nepal, West Africa, and San Francisco. My life would be so incredibly complicated to hire people internationally to pay people on time and in their local currencies and to answer their HR questions 24-7. But with JustWorks, it's super easy. Whether you're setting up your own automated payroll, offering premium benefits, or hiring internationally, JustWorks offers simple software and 24-7 human support from small business experts for you and your people. They do your human resources right so that you can do right by your people.
停止花费大量时间向别人解释你的产品愿景,不如直接展示给他们看。使用 Figma Make 快速制作代码支持的原型和应用。可以在 figma.com/money 查看。在我的播客和新闻通讯中,有一个团队在帮助我。我希望团队中的每个人都能在他们的岗位上快乐成长。JustWorks 明白,你的员工不仅仅是员工,他们是你的人。我的团队分布在科罗拉多州、澳大利亚、尼泊尔、西非和旧金山。要在国际上招聘,按时用当地货币支付工资,并全天候回答人力资源问题,这对我来说将非常复杂。但有了 JustWorks,一切变得非常简单。无论是设置自动化工资单、提供优质福利,还是进行国际招聘,JustWorks 为您及您的员工提供简单的软件和全天候的小企业专家支持。他们为您做好人力资源的工作,以便您可以更好地对待您的员工。
JustWorks for your people. Faye, thank you so much for being here and welcome to the podcast. I'm excited to be here, Lenny. I'm even more excited to have you here. It is such a treat to get to chat with you. There's so much that I want to talk about. You've been at the center of this AI explosion that we're seeing right now for so long. We're going to talk about a bunch of the history that I think a lot of people don't even know about how this whole thing started. But let me first read a quote from Wired about you. Just so people get a sense. In the intro, I'll share all of the other epic things you've done. But I think this is a good way to just set context. Faye is one of a tiny group of scientists, a group perhaps small enough to fit around a kitchen table who are responsible for AI's recent remarkable advances. A lot of people call you the Godmother of AI. Unlike a lot of AI leaders, you're an AI optimist. You don't think AI is going to replace us. You don't think it's going to take all our jobs. You don't think it's going to kill us. So I thought it'd be fun to start there. Just what's your perspective on how AI is going to impact humanity over time?
Yeah. Okay. So Lenny, let me be very clear. I'm not a utopian. So it's not like I think AI will have no impact on jobs or people. In fact, I'm a humanist. I believe that whatever AI does in currently or in the future is up to us. It's up to the people. So I do believe technology is a net positive for humanity. If you look at the long course of civilization, I think we are an fundamentally we're an innovative species that we, you know, if you look at from, you know, written record thousands of years ago to now humans just kept innovating ourselves and innovating our tools. And with that, we make lives better. We make work better. We build civilization. And I do believe AI is part of that. So that's where the optimism comes from. But I think every technology is a double-edged sword.
And if we're not doing the right thing, as a species, as a society, as communities, as individuals, we can screw this up as well. There's this line. I think this was when you're presenting to Congress. There's nothing artificial about AI. It's inspired by people. It's created by people and most importantly, it impacts people. I don't have a question there, but what a great line. Yeah, I feel pretty deeply. You know, I started working AI two and a half decades ago and I've been having students for the past two decades. And almost every student who graduates, I remind them, you know, when they graduate from my lab, that your field is called artificial intelligence, but there's nothing artificial about it.
Coming back to the point you just made about how it's kind of up to us, about where this all goes, what is it you think we need to get right? How do we set things on a path? I know this is a very difficult question to answer, but just what should, what's your advice? What do you think we should be in mind? Yeah, like how many hours do we have? How do we align AI? There we go, let's solve it. Yeah, so I think people should be responsible individuals, no matter what we do. This is what we teach our children and this is what we need to do as grown-ups as well, no matter which part of the AI development or AI deployment or AI application you are participating in. And most likely many of us, especially as technologists, were in multiple points, we should act like responsible individuals and care about us, actually care a lot about us.
I think everybody today should care about AI because it is going to impact your individual life, it is going to impact your community, it's going to impact the society and the future generation, and caring about it as a responsible person is the first but also the most important step. Okay, so let me actually take a step back and kind of go to the beginning of AI. Most people started hearing and caring about AI is what it's called today. Just like I don't know, a few years ago when ChatGPT came out, maybe it was like three years ago. Three years ago, almost one more month, three years ago. Wow, okay, that was ChatGPT coming out, is that the milestone? Yeah, yeah, mine, okay, cool, that's exactly how I saw it. But very few people know there was a long, long history of people working on, it was called Machine Learning Back Then and there's other terms and now it's just everything's AI.
And there was kind of like a long period of just a lot of people working on it and then there's is what people were first used to the AI winter where people just gave up almost, most people did and just, okay, this idea isn't going anywhere. And then the work you did actually was essentially the spark that brought us out of AI winter and is directly responsible for the world we're in now of just AI is all we talk about as you just said, it's going to impact everything we do. So that would be really interesting to hear from you, just kind of like the brief history of what the world is like before ImageNet, then just the work you did to create ImageNet why that was so important and then just what happened after.
It is for me hard to keep in mind that AI is so new for everybody when I lived my entire professional life in AI. There's a part of me that is just, it's so satisfying to see a personal curiosity that I started barely out of teenage hood and now has become a transformative force of our civilization. It generally is a civilization level technology. So that journey is about about 30 years or 20 something, 20 plus years and it's just very satisfying. So where did I all start? Well, I'm not even the first generation AI researcher. The first generation really date back to the 50s and 60s and you know, Alan Turing was ahead of his time by in the 40s by asking daring humanity with the question, can we is there thinking machines?
Right. And of course, he has a specific way of testing this concept of thinking machine, which is a conversational chapot, which to his standard, we now have a thinking machine. But that was just a more anecdotal inspiration. The field really began in the 50s when computer scientists came together and look at how we can use computer programs and algorithms to build programs that can do things that have been only incapable by human cognition. And that was the beginning and the founding fathers, the Dartmouth, the workshop in the 1956. You know, we have Professor John McCarthy who later came to Stanford who coined the term artificial intelligence.
And between the 50s, 60s, 70s and 80s, it was the early days of AI exploration and we had logic systems, we had expert systems. We also had early exploration of neural network. And then it came to around the late 80s, the 90s and the very beginning of the 21st century. That stretch about 20 years is actually the beginning of machine learning. It's the marriage between computer programming and statistical learning. And that marriage brought a very, very critical concept into AI, which is that purely rule-based program is not going to account for the vast amount of cognitive capabilities that we imagine computers can do. So we have to use machines to learn the patterns.
Once the machines can learn the patterns, it has the hope to do more things. For example, if you give it three cats, the hope is not just for the machines to recognize these three cats. The hope is the machines can recognize the fourth cat, the fifth cat, the sixth cat, and all the other cats. And that's a learning ability that is fundamental to humans and many animals. And we as a field realize we need machine learning. So that was up till the beginning of the 21st century. I entered the field of AI literally in the year of 2000. That's when my PhD began at Caltech. And so I was one of the first generation machine learning researchers.
And we were already studying this concept of machine learning, especially in your network. I remember that was one of my first courses in the at Caltech is called your network. But it was very painful. It was still smack in the middle of the so-called AI winter, meaning the public didn't look at this too much. There wasn't that much funding. But there was also a lot of ideas flowing around. And I think two things happened to myself that brought my own career so close to the birth of modern AI is that I chose to look at artificial intelligence through the lens of visual intelligence. Because humans are deeply visual animals. We can talk a little more later. But so much of our intelligence is built upon visual perceptual spatial understanding, not just language per se. I think they're complementary.
So I choose to look at visual intelligence. And my PhD and my early professor years, my students and I are very committed to a North Star problem, which is solving the problem of object recognition. Because it's a building block for the perceptual world. Right? We go around the world, interpreting, reasoning and interacting with it, more or less at the object level. We don't interact with the world at the molecular level. We don't interact with the world as we sometimes do. But we rarely, for example, if you want to lift a teapot, you don't say, okay, the teapot is made of a hundred pieces of porcelain. And let me work on these a hundred pieces. You look at this as one object and interact with it.
So object is really important. So I was among the first researchers to identify this as a North Star problem. But I think what happened is that as a student of AI, and I learned a researcher of AI, I was working on all kinds of mathematical models, including your network, including Bayesian network, including many, many models. And there was one singular pain point is that these models don't have data to be trained up. And as a field, we were so focusing on these models, but it don't down me that human learning, as well as evolution, is actually a big data learning process. Humans will learn with so much experience, you know, constantly and evolution.
If you look at time, animals evolve with experiencing the world. So I think my student and I conjectured that very critically overlooked ingredient of bringing AI to life is big data. And then we began this image that project in 2006, 2007. We were very ambitious. We want to get the entire internet's image data on objects. Now granted, internet was a lot smaller than today. So I feel like that emission was at least not too crazy. Now it's totally delusional to think a couple of grander students and the professor can do this. But and that's what we did. We curated very carefully 15 million images on the internet, created a taxonomy of 22,000 concepts, borrowing other researchers work like linguists work on WordNet.
And it's a particular way of dictionarying words. And we combine that into image that and we open source that to the research community, we held an annual image that challenge to encourage everybody to participate in this. We continue to do our own research. But 2012 was the moment that many people think was the beginning of the deep learning or birth of modern AI because a group of Toronto researchers led by Professor Jeff Hinton participated in image that challenge, used the image that big data and two GPUs from Nvidia and created successfully the first neural network algorithm that can it didn't totally solve but made a huge progress towards solving the problem of object recognition.
And that combination of the TREAL technology, big data neural network and GPU was kind of the golden recipe for modern AI. And then fast forward the public moment of AI, which is the chat GPD moment, if you look at the ingredients of what brought chat GPD to the world, technically it still used these three ingredients. Now its internet scale data mostly text is a much more complex neural network architecture than 2012 but it's still neural network and a lot more GPUs but it's still GPUs. So these three ingredients are still at the core of modern AI. Incredible. I have never heard that full story before.
I love that it was two GPUs was the first and now it's said I don't know hundreds of thousands right that are orders of magnitudes are more powerful. And those two GPUs were they were like gaming GPUs, they just went to the like the game star right they were people used for playing games. As you said this continues to be in a large way the way models get smarter. Some of the fastest growing companies in the world right now I've had them all mostly on the podcast, Merchord, Surgeon scale. Like they do this they continue to do this for labs just give them more and more label data of the things they're most excited about.
Oh yeah I remember Alex Wang from scale very early days I probably still has his emails when he was starting scale he he was very kind he keeps sending me emails about how you met that inspired scale. I was very pleased to see that. One of my other favorite takeaways from what you just shared is just such an example of high agency and just doing things that's kind of a memon twitter just you can just do things you're just like okay this is probably necessary to move AI and it's called machine learning back then right. Was that the term most people used?
I think it was interchangeably it's true like I do remember the companies the tech companies I am not going to name names but I was I was in a conversation in one of the early days I think is in the middle of 2015 middle of 2016 some tech companies avoid using the word AI because they were not sure if AI was a dirty word and I remember I was actually re encouraging everybody to use the word AI because to me that is one of the most audacious question humanity has ever asked in our quest for science and technology and I feel very proud of this term but yes at the beginning some people were not sure.
What year was that roughly when AI was already working? 2016 I think it was less than ten years ago. That was the changing like some people start calling it AI but I think if you look at the Silicon Valley tech companies if you trace their marketing term I think 2017-ish was the beginning of companies calling themselves AI companies. That's incredible just how the world has changed. Now you can't not call yourself an AI company. I know. Nine-ish years later.
Yeah. Oh man. Okay is there anything else around the history that early history that you think people don't know that you think is important before we chat about where things are going and the work that you're doing? I think as all histories you know I'm keenly aware that I am recognized for being part of the history but there are so many heroes and so many researchers. We're talking about generations of researchers. They're you know in my own world there are so many people who have inspired me which I talked about in my book but I do feel our culture especially Silicon Valley tends to assign achievements to a single person while I think it has value but it's just to be remembered AI is a field of at this point 70 years old and we have gone through many generations. Nobody.
No one could have gotten here by themselves. Okay. Let me ask you this question. It feels like we're always on this precipice of AGI this kind of vague term people throw around AGI's coming it's gonna take over everything. How what's your take on how far you think we might be from AGI? Do you think we're gonna get there on the current trajectory? Or on do you think we need more break views? Do you think the current approach will get us there?
Yeah this is a very interesting term Lenny. I don't know if anyone has ever defined AGI. You know there are many different definitions including you know some kind of superpower for machines all the way to can machines can become economically viable agents in the society in other words making salaries to live. Is that a definition of AGI? As a scientist I take science very seriously and I enter the field because I was inspired by this audacious question of chemistings think and do things in the way that human can do.
For me that's always the north star of AI and from that point of view I don't know what's the difference between AI and AGI. I think we've done very well in achieving parts of the goal including conversational AI but I don't think we have completely conquered all the goals of AI and I think our founding fathers the Alan Turing. I wonder if Alan Turing is around today and you ask him to contrast the AI versus AGI. Tim I just shrugged and said well I asked the same question back in 1940s so I don't want to get into a rabbit hole of defining AI versus AGI.
I feel AGI is more a marketing term than a scientific term as a scientist and technologist AI is my north star is my fields north star and I'm happy people call it whatever name they want to call it. Let me ask you maybe this way like you described there's kind of these components that from ImageNet and AlexNet kind of took us to where we're today GPUs essentially data label data just like the algorithm of the model there is also just a transformer feels like an important step in that trajectory.
Do you feel like those are the same components that'll get us to I don't know 10 times smarter model something that's like life changing for the entire world. Or do you think we need more break theaters I know we're going to talk about world models which I think is a component of this but is there anything else that you think is like oh this little plateau or okay this will take us just need more data more compute more GPUs.
Oh no I definitely think we need more innovations. I think scaling loss of more data more GPUs and bigger current model architecture. is there's still a lot to be done there but I absolutely think we need to innovate more there's not a single deeply scientific discipline in human history that has arrived at a place that says we're done we're done innovating and AI is one of the if not the youngest discipline in human civilization in terms of science and technology we're still scratching the surface.
For example like I said we're going to segue into world models today you take a a model and run it through a video of a couple of office rooms and ask the model to count the number of chairs and this is something a toddler could do or maybe maybe a elementary school kid could do and AI could not do that right so there's just so much AI today could not do then let alone thinking about how did you know someone like Isaac Newton look at the movements of the celestial bodies and derive an equation or a set of equations that governs the movement of all bodies that level of creativity extrapolation abstraction we have no way of enabling AI to do that today.
And then let's look at emotional intelligence if you look at a student coming to a teacher's office and have a conversation about motivation passion what to learn what's the problem that's that's you know really bothering you that conversation as powerful as as today's conversational bots are you don't get that level of emotional cognitive intelligence from today's AI so there's a lot we can do better and I do not believe we're done innovating.
Demis had this really interesting interview recently from DeepMind sash google where someone asked him just like what do you think how far we from a GI what does it look like we're through there he had a really interesting way of approaching it is if we were to give them the most cutting edge model all the information until the end of the 20th century see if it could come up with all the breakthroughs Einstein had and so far we're never near that but they can just know we're now in fact it's even worse let's give AI all the data including modern instruments data of celestial bodies which Newton did not have and give it to that and just ask AI to create them six 17th century set of equations on the laws of bodily movements today's AI can I'll do that all right we're ways away.
So what I mean yeah okay so let's talk about world models this is uh to me this is just another really amazing example of you being ahead of where people end up so your way ahead on okay we just need a lot of clean data for AI and neural networks to learn you've been talking about this idea of world models for a long time you started a company to build uh essentially there's language models this is a different thing this is a world model we'll talk about that is and now as I was preparing for this Elon's like talking about world models Jensen's talking about world models I know Google's working on this stuff you've been at this for a long time and you're actually just launch something that's gonna you're gonna talk about uh right before this podcast airs.
Um talk about what is a world model why is it so important I'm very excited to see that more and more people are talking about world models like Elon like Jensen um I have been thinking about really how to push AI forward all my life right and the large language models uh that came out of uh the research world and then open AI and and all this for the past few years were extremely inspiring even for a researcher like me I remembered when GPT2 came out and that was in I think late 2020 I was um co-director um I still am but I was at that time full time co-director of Stanford's uh human center AI institute and I I remember it was you know the public was not aware of the power of the large language model yet but as researchers we were seeing it we're seeing.
And I had pretty long conversations with my natural language processing colleagues like Percy Liang and Chris Madding we were talking about how critical this technology is gonna be and the Stanford AI institute human center AI institute HDI was the first one to establish a full research center on foundation model we were Percy Liang and and many researchers led the first academic paper on foundation model so so it was just very inspiring for me.
So of course I come from the world of visual intelligence and I was just thinking there's so much we can push forward on beyond language because humans uh humans have used our sense of spatial intelligence a world understanding to do so many things and they are beyond language think about a very chaotic first responder scene whether it's fire or some traffic accident or or some natural disaster and it's if you immerse yourself in a little scene and think about how people organize themselves to to rescue people to stop further disasters to put down fires to to a lot of that is movements is is spontaneous understanding. of objects worlds human and situational awareness language is part of that but a lot of those situations language cannot get you to put down a fire so that is what is that I was thinking a lot and in the meantime I was doing a lot of robotics research and I it don't know me that the linchpin of connecting the additional intelligence in addition to language and connecting embodied AI which are robotics connecting visual is the sense of spatial intelligence about understanding the world and that's when I think I it was 2024 I gave a TED Talk about spatial intelligence at world models and I start formulating this idea back in 2022 based on my robotics and computer vision research and then one thing that was really clear to me is that I really want to work with the brightest technologist and and move as fast as possible to bring this technology to life and that's when we found it this company called world labs and you can see the world the world is in the title of our company because we believe so much in world modeling and spatial intelligence.
People are so used to just chat box and that's a large language model the simple way to understand a world models you basically describe a scene and it generates an infinitely the explore the world will link to a the thing you launch which we'll talk about but just is that a simple way to understand it that's part of it Lenny I think a simple way to understand a world model is that this model can allow anyone to create any world in their mind's eye by prompting whether it's image or sentence and also be able to interact in this world whether you're browsing and walking or or picking objects up or or changing changing things as well as to reason within this world for example if if the person consuming if the agent consuming this output of the world model is a robot it should be able to plan its path and and help to you know tidy the kitchen for example so so world model is a a foundation that that you can use to reason to interact and to create worlds great yeah so robots feels like that's potentially the next big focus for AI researchers and just like the impact on the world and what you're saying here is this is a key missing piece of making robots actually work in the real world understanding how the world works.
Yeah well first of all I do think there's more than robots that's exciting so but I agree with everything you just said I think world modeling and spatial intelligence is a key missing piece of a body AI I also think let's not underestimate that humans are embodied agents and humans can be augmented by AI's intelligence just like today humans are language animals but we're very much augmented by AI when helping us to you know do language tasks including software engineering I think that we shouldn't underestimate or maybe it's we tend not to talk about how humans as an embodied agent can actually benefit so much from world models and spatial intelligent models as well as robots can so the big onlocks here robots which a huge deal if this works out I imagine each of us has robots doing a bunch of stuff for us goes into you know they help us with disasters things like games obviously is a really cool example just like infinitely playable games that you just invent at your head and then creativity feels like just like being fun having fun being creative thinking of magic wild new worlds and environments and also design humans design from machines to buildings to homes and also scientific discovery.
Right there is so much I like to use the example of the discovery of the structure of DNA if you look at one of the most important piece in DNA's discovery history is the x-ray diffraction photo that was captured by Rosalyn Franklin and it was a flat 2D photo of a structure that looks like it looks like a cross with diffractions you can you can google those photos but with that 2D flat photo humans especially two important humans James Watson and Francis Crick in addition to their other information was able to reason in 3D space and deduce a highly three-dimensional double helix structure of the DNA and that structure cannot possibly be 2D you cannot think in 2D and deduce that structure you have to think in 3D spatial use the human spatial intelligence so I think even the scientific discovery spatial intelligence or AI assistant spatial intelligence is critical.
this is such an example of I think it was critics and they have this line that the next big thing is going to start off feeling like a toy when Chad GPC just came out if like I remember Sam mom and just tweeted is like here's a cool thing we're playing with check it out now it's the fastest growing product all of history change the world yeah and it's oftentimes the things that just look like okay this is cool that it's a fun to play with and end up changing the world most.
这是一个非常典型的例子,我想起了评论家们有一句话:下一个重大事物一开始往往会像个玩具。当 ChatGPT 刚推出的时候,我记得 Sam Altman(OpenAI 的 CEO)只是发了一条推文,说这是一个我们正在玩耍的酷东西,大家可以来试试。现在,它成为了历史上增长最快的产品,改变了世界。通常那些看起来只是“哦,这很酷,很好玩”的东西,最终会对世界产生深远的影响。
yeah this episode is brought to you by cinch the customer communications cloud here's the thing about digital customer communications whether you're sending marketing campaigns verification codes or account alerts you need them to reach users reliably that's where cinch comes in over 150,000 businesses including eight of the top 10 largest tech companies globally use cinches API to build messaging email and calling into their products and there's something big happening in messaging that product teams need to know about rich communication services or RCS think of RCS as SMS 2.0 instead of getting text from a random number your users will see your verified company name and logo without needing to download anything new it's a more secure and branded experience plus you get features like interactive carousels and suggested replies and here's why this matters US carriers are starting to adopt RCS cinch is already helping major brands send RCS messages around the world and they're helping Lenny's podcast listeners get registered first before the rush hits the US market learn more it gets started at cinch dot com slash Lenny that's s-i-n-c-h dot com slash Lenny.
i reached out to Ben Horowitz who loves what you're doing a big fan of yours uh their investors I believe in yeah we we've known each other for many years but yes right now they are investors of uh war labs amazing okay so I asked them what I should ask you about and he suggested ask you why is the bitter why is the bitter lesson alone not likely to work for robots so first of all just explain what the bitter lesson was in the history of AI and then just why that won't get us to where we want to be with robots.
so well first of all there are many bitter lessons but but the bitter lessons everybody refers to is a uh is a paper written by Richard Sutton who won the Turing Award recently and he does a lot of reinforcement learning and Richard has said right if you look at the the history especially the algorithmic development of AI it turns out simpler model with a ton of data always win at the end of the day instead of the the you know more complex model with less data I mean that was actually this paper came years after image that that to me was not bitter it was a sweet lesson that's why I built image that because I believe that big data plays that role.
so why can bitter lesson work in robotics alone well first of all um I think we need to give credit to where we are today robotics is very much in the early days of experimentation it's not the the research is not nearly as mature as say language models so many people are still um experimenting with different algorithms as some of those algorithms are driven by big data so I do think big data will continue to play a role in robotics and um but what is hard for robotics there are a couple of things one is that it's harder to get data it's a lot harder to get data you can say well there's a web data this is where the latest robotics research is using web videos and I think web videos do do play a role but if you think about what made language model worth a very as someone who does computer vision and and spatial intelligence and robotics I'm very jealous of my colleagues in in language because they had this perfect setup where their training data are in words eventually tokens and then the producer model that outputs words so you have this perfect alignment between what you hope to get which we call objective function and what your training data looks like.
but robotics is different even spatial intelligence is different you hope to get actions out of robots but your training data lacks actions in 3d world and that's what robots have to do right actions in 3d world so you have to um find different ways to fit a uh what do they call a a a a square in a round hole that what we have is tons of web videos so then we have to start talking about uh adding supplementing data such as teleoperation data or synthetic data so that the robots are trained with this hypothesis of bitter lesser which is large amount of data I think there's still hope because even what we are doing in world modeling will really unlock a lot of this information for robots but I think we have to be careful because we're at the early days of and bitter lessen is still to be tested uh because we haven't fully figured out the data for another part of the bitter lessen of robotics I think we should be so so realistic about is again compared to language models or even spatial models robots are physical systems so robots are closer to self-driving cars than a large language model and that's very important to recognize.
That means that in order for robots to work we not only need brains we also need the physical body we also need applications scenarios and if you look at the the the the history of self-driving car my colleague Sebastian Thrum took Stanford's car to win the first DARPA challenge in 2006 or 2005 it's 20 years since that prototype of a self-driving car being able to drive 130 miles in the Nevada desert to today's Weymill and on the street of San Francisco and we're not even done yet there's still a lot so that's a 20 year journey as self-driving cars are much simpler robots they're just metal boxes running on 2D surfaces and the goal is not to touch anything robot is 3D things running in 3D world and the goal is to touch things so the journey is going to be you know there's many aspects elements and of course one could say well the self-driving car early algorithm were pre-deplurning era so deep learning is accelerating the brains and I think that's true that's why I'm in robotics that's why I'm in spatial intelligence and I'm excited by it but in the meantime the car industry is very mature and productizing also involves the mature use cases supply chains the hardware.
So I think it's a very interesting time to work in these problems but it's true but it's right we might still be subject to a number of bitter lessons doing this work do you ever just feel off for the way that brain works and is able to do all of this for us just the complexity just to get a machine to just walk around and not hit things and fall does just give you more respect for what we've already got. Totally we operate on about 20 watts that's dimmer than any light bulb in the room I'm in right now and yet we can do so much so I think actually the more I work in AI the more I respect human let's talk about this product you just launched called marble a very cute name talk about what this is why this import I've been playing with it it's incredible rolling to it and for folks to check it out what is marble.
Yeah I'm very excited so first of all marble is one of the first product that Warlabs has rolled out. Warlabs is a foundation frontier model company we are funded by four co-founders who have deep technical history my co-founders Justin Johnson Christoph Lassner and Ben Mildenhall we all come from the research field of AI communicographics computer vision and we believe that spatial intelligence and world modeling is as important if not more to language models and complementary to language models so we wanted to seize this opportunity to create a deep tech research lab that can connect the dots between frontier models with products so marble is an app that's built upon our frontier models we've spent a year and plus building the world's first generative model that can output genuinely 3D worlds that's a very very hard problem and it was a very hard process we have a team of incredible founding team of incredible technologists from you know incredible teams.
And then around a month or two ago we saw the first time that we we can just prompt with a sentence and a image and multiple images and create worlds that we can just navigate in you if you put it on cargo which we have an option to let you do that you can even walk around right so it was even though we've been building this for quite a while it was still just awe inspiring and we wanted to get into the hands of people who needed and then we know that so many creators designers people who are thinking about robotic simulation people who are thinking about different use cases of navigable, interactable, immersive worlds game developers will find this useful so we developed the marble as a first step it's it's again still very early but it's the world's first model doing this and it's the world's first product that allows people to just prompt we call it prompt.
Two worlds well I've been playing around it it is insane like you could just have a little shire world where you just infinitely walk around middle earth basically and there's no there's no one there yet but it's insane you just go anywhere there's like dystopian world I'm just looking at all these examples yes and my favorite part actually I don't know I don't know if there's a feature bug you can see like the dots of the world before it actually renders with all the textures and I just love to like you get a glimpse into what is going on with this model basically that is so cool to hear because this is where as a researcher I'm I'm learning because the the dots that lead you into the world was an intentional feature visualization it is not part of the model it's the model actually just generates the world but we we were trying to find a way to guide people into the world and a number of engineers worked on different versions but we converged on the dot.
And so many people you're the only one told us how delightful that experience is and it was really satisfying for us to hear that this intentional visualization feature that's not just the big hardcore model actually has delighted our users wow so you add that to make it more like to have humans understand more and more to like wow that is hilarious it makes me think about a lens in the way they it's not the same thing but they talk about what they're thinking and what they're doing yes it is it is it also makes me think about just the matrix like it's exactly the matrix experience I don't know if that was your inspiration um what like I said a number of engineers worked on that it could be there inspiration it's in there it's in there uh it's in there subconscious yeah.
Okay so just for folks that have made a lot of play around with this maybe use a what's like what are some applications today that folks can start using today what's what's your goal with this launch yeah so um we do believe that world modeling is very horizontal but we're already seeing some really exciting uh use cases virtual production for movies because what they need are 3d uh worlds that they can align with the camera so when the actors are acting on it they can you know they can position the camera and shoot the segments really well and uh we're already seeing um incredible use in fact I don't know if you have seen our launch video showing marble it was produced by a virtual production company we collaborated with Sony and they use marble scenes to shoot those videos.
So our we were collaborating with those technical artists and directors and they were saying this has cut our production time by 40 x in fact it has 40 x yes in fact it has to because we only had one month to work on this project and there were so many scenes they were trying to shoot so so using marble really really significantly accelerated the production of virtual virtual production for VFX and movies that's one use cases we are already seeing our users putting taking our marble scene and taking the mesh export and putting games you know whether it's games on VR or games uh just just fun games that they have developed.
We have had um we were showing an example of a robotic simulation because uh when I was I mean I'm still a researcher doing robotic uh training one of the biggest pain point is to create synthetic data for training robots and these synthetic data needs to be very diverse they need to come from different environments with different objects to manipulate and uh and one path to it is is to ask computers to simulate otherwise humans have to you know build every single asset for robots that that's just going to take a lot longer.
So we already have researchers reaching out and wanting to use marble to create those synthetic environments. We also have unexpected um user outreach in terms of how they want to use marble for example a psychologist team called us to use marble to do psychology research it turned out some of the psychiatric patients they study they need to understand how their brain responds to different immersive scenes of different features uh for example messy scenes or clean scenes or or whatever you name it and it's very hard for researchers to get their hands on um this kind of immersive scenes and it will take them too long and too much budget to uh to create and marble is a really almost instantaneous way of getting so many of these um experimental uh environments into their hands.
So we're seeing um now we're seeing multiple use cases at this point but the the VFX the game developers the simulation uh developers as well as designers are very excited this is very much the way things work in AI I've had other AI leaders on the podcast and it's always put things out there early as soon as you can to discover where the big use cases are the head of jet gpt told me how when they first put out jet gpt he was just scanning tiktok to see how people were using it and all the things they were talking about and that's what convinced them where to lean in and and help them see how people actually want to use it.
I love this last use case of like for therapy I'm just imagining like heights to people seeing dealing with heights or snakes or spiders which it's amazing a friend of mine last night literally called me and talked about his height scare and asked me if marble should be used that's amazing you went straight there that's you know because I'm imagining all the like the exposure therapy uh stuff like this could be so good for that uh that is so cool okay so let me uh I should have asked you this before but I think there's a there's gonna be a question of just how does this differ from things like VO3 and other video generation models it's pretty clear to me but I think it might be helpful just to explain how this is different from all the video AI tools people have seen.
Wornglap's thesis is that spatial intelligence is fundamentally very important and spatial intelligence is not just it's not just about videos in fact the world is not passively watching videos passing by right I I love Plato has the allegory of the cave analogy to describe vision he said that imagine a prisoner titled his chair not not very humane but um in the cave watching a full life theater on the in front of him but but the actual life theater that actors are acting is behind his back it was just lit so that the projection of the the uh the action is on a on a wall of the cave and and then the goal the the task of this prisoner is to figure out what's going on.
It's a pretty extreme example but it really shows it describes what vision is about is that to make sense of the 3D world or 4D world out of 2D so spatial intelligence to me is deeper than only creating that flat 2D world spatial intelligence to me is the ability to create reason interact makes sense of deeply spatial world whether it's 2D or 3D or 4D including dynamics and all that so so world lab is focusing on that and of course um the ability to create videos per se could be part of this and in fact just a couple of weeks ago we rolled out the world's first real-time demoable real-time video generation and a single at h100 GPU.
So we we part of our technology includes that but I think marble is very different because we really want creators designers developers to having their hands a model that can give them uh worlds with 3D structure so they can use it for for their work and that's where uh that's why marble is so different the way you see. it as it's a it's a platform for a ton of opportunity to do stuff uh as you describe videos are just like here's a one-up video that's very fun and cool and you could and that's it that's it and you move on by the way we could in marble we couldn't allow people to export in video forms so you could actually like you said you go into a world so so let's say it's a hobbit cave you can actually especially as a traitor you have such a specific way of uh moving the camera in a trajectory in the director's mind right and then you can export that from marble into a video.
What does it take to create something like this just like how big is the team how many how many GPUs you work in like anything you can share there I don't know how much of this is private information but just what does it take to create something like this that you launched here it takes a lot of brain power so uh we just talk about 20 watts per brain it's uh so from that point of view it's a small number but it's actually incredible you know it's a half billion years of evolution to give us those power um we have a team of 30-ish people now and uh we are predominantly uh researchers are research engineers and uh but we also have designers and and product we we actually really believe that we want to create a company that's anchored in the deep-tech of spatial intelligence but we we are actually building serious products um so so we have we have this uh integration of R&D and productization and of course we use you know a ton of GPUs that's a next just a happy to hear.
Well congrats on the launch I know there's a huge milestone I know it took a ton of work so I just want to say congrats to you and your team let me talk about your founder journey for a moment so you're a founder of this company started how many years ago a couple years ago two three years ago uh here ago uh here I go uh here you're okay 18 month yeah okay what's something you wish you knew before you started this that you wish you could like whisper into your faith of 18 months ago well I continue to wish I know the future of technology I think actually that's one of our founding advantage is that we see the future earlier in general than most people but still man this is so exciting and so uh amazing that that was unknown and what's coming but I know the reason you're asking me this question is a lot about the future of technology you're probably more you know look I I did not start a company of this scale at 20 year old so you know I started a dry cleaner when I was 19 but that's a little smaller scale we got to talk with that and then I you know um funded Google Cloud AI and then I founded a institute as Stanford but those are different beasts.
I did feel I was a little more prepared as a a founder of the the grinding journey that that I um compared to maybe um maybe the the 20 year old funders but I still I'm surprised and and uh it puts me into paranoia sometimes that how intensely competitive AI landscape is from from the model the technology itself as well as talents and you know when I founded the company um we did not have these incredible stories of how much certain talents would cost you know um so these are things that continue to surprise me and uh and I have to be very alert about the competition you're talking about is yeah the competition for talent the speed which is the how things are moving.
Yeah yeah you mentioned this point that I want to come back to that you if you just look over the course of your career you were like at all of the major uh collections of humans that led to so many of the breakthroughs that are happening today obviously we talk about image net also just sale at Stanford is where a lot of the work happened Google Cloud which are a lot of the breakthroughs happened would brought you to those places uh like for people looking for how to advance in their career be at the center of the future just like is there a throughline there of just what pulled you from place to place and pulled you into those groups that might be helpful for people to hear.
Yeah this is actually a great question learning because I do think about it and uh obviously we talked about its curiosity and passion that brought me to AI that is more a scientific nor start right I did not care if AI was a thing nor not so so that was one part but how did I end up choosing um in the particular places I work in including starting world labs is I think I'm very grateful to myself or maybe to my parents' genes and I'm an intellectually very fearless person and I have to say when I hire young people I look for that because I um I think that's a very important quality.
If one wants to make a difference is that when you want to make a difference you have to accept that you're creating something new or you're diving into something new people haven't done that and if you have that self-awareness you almost have to allow yourself to be fearless and to be courageous so when I for example um came to steward you know in the world of academia I was very close to this thing called tenure um which is you know have the job forever in at Princeton but I I choose to chose to come to steward because I love Princeton it's my alma mater it's just at that moment there are people who are so amazing at steward and the Silicon Valley ecosystem was so amazing that I was okay to take a risk of restarting my tenure clock um going to um becoming the first female director of sale.
I was actually relatively speaking of very young faculty at that time and I wanted to do that because I care about that community I didn't spend too much time thinking about all the failure cases obviously I was very lucky that the more senior faculty supported me but I just wanted to make a difference and then going to Google was similar I wanted to work with people like Jeff Dean Jeff Hinton and um all these incredible demos the incredible people um you know so so the same with world labs I I have this passion and I also believe that people with the same mission can do incredible things so that's how I guided my through through like I don't overthink of all possible things that can go wrong because that's too many I feel like that's an important element this is not focusing on the downside focusing more on the people the mission what gets you excited what do you think I do.
Yeah I do want to say one thing to all the young talents in AI the engineers the researchers out there because some of you apply to world labs I feel very privileged you consider world labs I do find many of the young people today think about every single aspect of a equation when they decide on jobs at some point maybe you know maybe maybe that's the way they want to do it but sometimes I do want to encourage young people to focus on what's important because I find myself um constantly in mentoring mode when I talk to job job candidates not necessarily recruiting or not recruiting but just in mentoring mode when I see an incredible young talent who is over focusing on every minute dimension and aspect of considering a job when when maybe the most important thing is where's your passion do you align with the mission do you believe it have faith in this team and and just just focus on the impact.
And you can make the kind of work in team you can you can work with yeah it's tough it's tougher people in the AI space now there's so much so much at them so much new so much happening so much from oh that's true I could see the stress and so I think that advice is really important just like what will actually make you feel fulfilled in what you're doing not just where's the fastest growing company or is the who's going to win.
I don't know I want to make sure I ask you about the work you're doing today at Stanford at the HCI HCI Human Centered AI Institute what are you what are you doing there I know this is the thing you do on a site still so yes I HCI humans under AI Institute was co-founded by me and the group of faculty like Professor John H. Mendey Professor James Landey Professor Chris Manning back in 2018 I was actually finishing my last last the sabbatical at Google and it was a very very important decision for me because I could have stayed in the industry but my time at Google taught me one thing is AI is going to be a civilizational technology and it's it don't know me how important this is to humanity to the point that I actually wrote a piece in New York Times that year 2018 to talk about the need for a guiding framework to develop and to apply AI and that framework has to be anchored in human benevolence is human centeredness and I felt that Stanford one of the world's top university in the heart of Silicon Valley that gave birth to important companies from Nvidia to Google should be a thought leader to create this human centered AI framework and to to actually embody that in our research education and policy and in the ecosystem work.
我不太清楚,我想确保向您询问您今天在斯坦福大学人机交互与人本人工智能研究所的工作。您在那里做些什么?我知道这是您仍然在从事的事情。是的,我与John H. Mendey教授、James Landey教授和Chris Manning教授等一群教师于2018年共同创立了这个人本人工智能研究所。当时,我正在谷歌完成我的最后一个学术休假,这是一个对我非常重要的决定,因为我本可以留在行业中,但我在谷歌的经历教会了我一件事:人工智能将成为一种对文明具有重大影响的技术。我意识到这一点对人类的重要性,以至于在2018年我在《纽约时报》上写了一篇文章,谈到需要一个指导框架来发展和应用人工智能,而这个框架必须以人类的善意和以人为本为基础。我认为斯坦福大学作为世界顶尖大学之一,位于硅谷的中心,诞生了许多重要的公司,如Nvidia和Google,应该在创建以人为中心的人工智能框架方面成为思想领袖,并在我们的研究、教育、政策以及生态系统工作中体现这一框架。
So I founded HCI it you know after fast forward after six seven years it has become the world's largest AI institute that does human centered research education ecosystem outreach and policy impact impact it involves hundreds of faculty across all eight schools Stanford from medicine to education to sustainability to business to engineering to humanities to more and we we support researchers especially at the interdisciplinary area from digital economy to legal studies to political science to discovery of new drugs to to new algorithms to let's beyond transformers we also actually put a very strong focus on policy because when we started HCI I realized that Silicon Valley did not talk to Washington DC and or Brussels or other parts of the world and it's given how important this this technology is we need to bring everybody on board so we created multiple programs from congressional bootcamp to AI index report to policy briefing and we especially participated in policy making including advocating for a national AI research cloud bill that was passed in the first Trump administration and participate participating in state level regulatory AI discussions.
So there's a lot we did and and I continue to be on one of the the leaders even though I'm much less involved operationally because I care not only we create this technology but we use it in the right way wow I was not aware of all that other work you were doing as you're talking as reminded Charlie Munger have this quote take a simple idea and take it very seriously I feel like you've done that in so many different ways and and stayed with it and it's unbelievable the impact that you've had in so many ways over the years I'm going to skip the lightning round and I'm just going to ask you one last question is there anything else that you wanted to share anything else you want to leave a list nurse with.
I'm very excited by AI Lenny I want to answer one question that I when I travel around the world everybody asks me is that if I'm a musician if I'm a teacher middle school teacher if I'm a nurse if I'm a content if I'm a farmer do I have a role in AI or is AI just going to take over my life or my work and I think this is the most important question of AI and I find that in Silicon Valley we tend not to speak hard to heart with people with people like us and not like us in Silicon Valley but like all of us we tend to just toss around words like infinite productivity or infinite leisure time or or you know infinite power or whatever but at the end of the day AI is about people and when people ask me that question it's a resounding yes everybody has a role in AI it depends on what you do and what you want but no technology should take away human dignity and the human dignity and the agency should be at the heart of the development the deployment as well as the governance of every technology.
So if you are a young artist and your passion is storytelling uh embrace AI as a tool in fact embrace Marvel who I hope it becomes a tool for you um because the way you tell your story is unique and the world still needs it but how you tell your story how do you use the most incredible tool to tell your story in the most unique way is important and that that voice needs to be heard.
If you're a farmer near retirement AI still matters because you're a citizen you can participate in your community you should have a voice in how AI is used how AI is applied you you work with people that you can you know encourage all of all of you to use AI to make life easier for you.
If you're a nurse I hope you know that at least in my career I have worked so much in healthcare research because I feel our healthcare workers should be greatly augmented and helped by AI technology. Whether it's smart cameras to feed more information or robotic assistance because our nurses are overworked over fatigued and as our society ages we need more help for for people to be taken care of so AI can play that role.
So I just want to say that it's so important that um even the technologies like me um are sincere about that everybody has a role in AI what a beautiful way to end it such a tie back to where we started about how it's up to us and taking individual responsibility for what AI will do in our lives.
Final question we're can folks find marble we're gonna go maybe try to join uh world labs if they want to what's the website where do people go well world labs website is www.worldlabs.ai and you can find um you can find our research progress there we we have technical blogs you can find marble the product there you can sign in there you can find our job posts a link there you can uh you know we're in San Francisco we love to work with the world's best talents.
最后一个问题,人们可以在哪里找到 Marble?如果他们想加入 World Labs,该去哪里?网站是什么地址?好吧,World Labs 的网站是 www.worldlabs.ai。你可以在上面找到我们的研究进展,我们有技术博客,你可以在那里找到我们的产品 Marble。你可以在那里注册,也可以看到我们的招聘信息。我们位于旧金山,热爱与全球最优秀的人才合作。
Amazing buffet thank you so much for being here thank you Lenny hi everyone thank you so much for listening if you found this valuable you can subscribe to the show on ample podcasts spotify or your favorite podcast app also please consider giving us a rating or leaving a review as that really helps other listeners find the podcast you can find all past episodes or learn more about the show at Lenny's podcast.com see you in the next episode.
令人惊叹的自助餐,非常感谢你们的到来,谢谢你,Lenny。大家好,非常感谢你们的收听。如果你觉得这期内容有价值,可以在 Apple Podcasts、Spotify 或你喜欢的播客应用上订阅我们的节目。同时,请考虑给我们评分或留下评论,这对其他听众找到我们的播客非常有帮助。你可以在 Lenny's podcast.com 找到所有的往期节目或了解更多关于节目的信息。期待在下一期节目中与你们再见!