首页  >>  来自播客: Sequoia Capital 更新   反馈

Making AI accessible with Andrej Karpathy and Stephanie Zhan

发布时间 2024-03-26 21:18:23    来源

摘要

Andrej Karpathy, founding member of OpenAI and former Sr. Director of AI at Tesla, speaks with Stephanie Zhan at Sequoia ...

GPT-4正在为你翻译摘要中......

中英文字稿  

I'm filled to introduce our next and final speaker, Andrei Carpathi. And then Carpathi probably needs no introduction. Most of us have probably watched his YouTube videos at length. But he's renowned for his research in deep learning. He designed the first deep learning class at Stanford, was part of the founding team at OpenAI, led the computer vision team at Tesla, and is now a mystery man again now that he has just left OpenAI. So we're very lucky to have you here. And, Andrei, you've been such a dream speaker, and so we're excited to have you and Stephanie close out the day. Thank you.
我很荣幸为大家介绍我们的下一位也是最后一位讲者,Andrei Carpathi。而Carpathi可能无需介绍。我们大多数人可能都长时间观看过他的YouTube视频。但他以深度学习研究而闻名。他设计了斯坦福大学的第一个深度学习课程,在OpenAI的创始团队中,领导了特斯拉的计算机视觉团队,现在再次 taste around is. is. ... . is . with. is. this many of . of as a . of rough K The a e K The with. of rough with for this e . Pat a . this of of using an The The Simply Advertisement R. . W by The 7 data recovery key!!!!!!!!!!!!! 4 8 7 7⚫️4 5 6 0⚫️4 9 2 1 ! 7 6d 4 9 5 3 4 9 i 7 7 4 9 3 2 5 7 1 7 2 2 9 1 5⚫️2 0 8 3 7 5 ! C c 4 5 7 7 2 1 6 8 1 5 Þ s 6 4 2 6 4 9 3 < !"$ 0 1 5 4 3 9 1 7 0 7 9 2 6 4 5 0 3 9 1 8 4 7 8 9 1⚫️1 3 2 9 0 1 8 7 2 8 5 7 ! " 4 9 0 1 1 3 7 1 9 6 8 7 8 5 4 7 1 0 3 m en c 2 9 9 3 1 1 7 9 4 6 2 0⚫️5 #5 9 0 0 5 09 8 9 2 8 5 8 2 " 1 9 3 0 4 USA 2021⚫⚫⚫⚫⚫⚫⚫ 2.16(device 1⚫2😎 0.00order 5 t 7 8 3 2 6 4 5 9 4 5 1 5 7 6 0 0 7 c 7 2 8 5 r 1 1 7 9 2 ! 1 2 5 0 5 5 7

Andrei's first reaction as we walked up here was, oh my God, to his picture. It's like a very intimidating photo. I don't know what Euro was taking, but he's impressed. Okay, amazing. Andrii, thank you so much for joining us today, and welcome back. Yeah, thank you. Fun fact that most people don't actually know how many folks here know where OpenAI's original office was. That's amazing. Nick. I'm going to guess right here. Right here on the opposite side of our San Francisco office where actually many of you guys were just in huddles. So this is fun for us because it brings us back to our roots back when I first started Sequoia and when Andrei first started co-founding OpenAI, Andrei, in addition to living out the Willy Wonka, working atop a chocolate factory dream, what were some of your favorite moments working from here? Yes, OpenAI was right there.
安德烈第一次看到这里的照片时的反应是,哦,我的上帝。这张照片太令人畏惧了。我不知道Euro拍了什么照片,但他感到印象深刻。好的,太棒了。安德烈,非常感谢你今天加入我们,欢迎回来。是的,谢谢。有趣的事实是,大多数人实际上不知道OpenAI的原始办公室在哪里。太惊人了。尼克。我猜就在这里。就在我们旧金山办公室对面,实际上你们很多人刚才还在一起讨论。对于我们来说,这是一件有趣的事情,因为它让我们回到了最初创立Sequoia时和安德烈创办OpenAI时的起点。安德烈,除了实现像威利·旺卡一样在巧克力工厂的梦想之外,你在这里工作的一些最喜欢的时刻是什么?是的,OpenAI就在那里。

And this was the first office after, I guess, Greg's apartment, which maybe doesn't count. And so, yeah, we spent maybe two years here, and the chocolate factory was just downstairs, so it always smelled really nice. Yeah, I guess the team was 10, 20 plus. And yeah, we had a few very fun episodes here. One of them was alluded to by Jensen at GTC that happened just yesterday or two days ago. So Jensen was describing how he brought the first DGX and how he delivered it to OpenAI, so that happened right there. So that's where we all signed it. It's in the room over there.
这是在格雷格的公寓之后的第一个办公室,也许那个不算。所以,是的,我们在这里待了大概两年,巧克力工厂就在楼下,所以整天都闻着很香。团队可能有10、20多人。在这里我们经历了一些非常有趣的事情。其中之一是Jensen在GTC提到的,就在昨天或两天前发生的。Jensen描述了他如何带来第一个DGX并将其交付给OpenAI,那就发生在那里。所以我们都在那里签了字。它就在那边的房间里。

So Andrei needs no introduction, but I wanted to give a little bit of backstory on some of his journey to date. As Sony had introduced, he was trained by Jeff Hinton and then Fei-Fei. His first claim to fame was his deep learning course at Stanford. He co-founded OpenAI back in 2015, and 2017 he was poached by Elon. I remember this very, very clearly. For folks who don't remember the context then, Elon had just transitioned through six different autopilot leaders, each of whom lasted six months each. And I remember when Andrei took this job, I thought, congratulations and good luck.
所以安德烈无需介绍,但我想在他迄今的旅程中提供一些背景故事。正如索尼所介绍的,他接受了杰夫·辛顿和费菲的培训。他在斯坦福大学的深度学习课程是他首次出名。他于2015年共同创立了OpenAI,2017年被埃隆挖走。我非常清楚地记得这一点。对于那些不记得当时背景的人来说,埃隆刚刚经历了六位不同的自动驾驶负责人,每位负责人任职六个月。我记得当安德烈接受这份工作时,我想说,祝贺和好运。

Not too long after that, he went back to OpenAI and has been there for the last year. Now unlike all the rest of us today, he is basking in the ultimate glory of freedom in all time and responsibility. And so we're really excited to see what you have to share today. A few things that I appreciate the most from Andrei are that he is an incredible, fascinating futurist thinker. He is a relentless optimist and he's a very practical builder. And so I think he'll share some of his insights around that today.
不久之后,他回到了OpenAI,并在那里度过了过去的一年。现在,与我们今天的所有人不同的是,他沐浴在永恒自由和责任的终极荣耀中。因此,我们非常期待看到你今天将分享的内容。我最欣赏Andrei的几个方面是,他是一个令人难以置信、引人入胜的未来思想家。他是一个坚定的乐观主义者,也是一个非常实际的建设者。因此,我认为他今天会分享一些关于这个方面的见解。

To kick things off, AGI even seven years ago seemed like an incredibly impossible task to achieve even in the span of our lifetimes. Now it seems within sight. What is your view of the future over the next ten years? Yes, I think you're right. I think a few years ago I sort of felt like AGI was, it wasn't clear how it was going to happen. It was very sort of academic and you would like to think about different approaches. And now I think it's very clear and there's like a lot of space and everyone is trying to fill it. And so there's a lot of optimization. And I think roughly speaking the way things are happening is everyone is trying to build what I refer to as this LOMOS. And basically I like to think of it as an operating system. You have to get a bunch of like very, basically peripherals that you plug into this new CPU or something like that. The peripherals are of course like text, images, audio and all the modalities. And then you have a CPU which is the LOM transformer itself. And then it's also connected to all the software 1.0 infrastructure that we've already built up for ourselves. And so I think everyone is kind of trying to build something like that and then make it available as something that's customizable to all the different nooks and crannies of the economy. And so I think that's kind of roughly what everyone is trying to build out and what we sort of also heard about earlier today.
扔出来的是,即使在七年前,AGI似乎是一个非常不可能在我们有生之年内实现的任务。现在,它看起来已经近在眼前。您对未来十年有什么看法?是的,我觉得你说得对。我觉得几年前,我似乎觉得AGI并不清楚会如何发生。它当时非常学术化,您可以考虑不同的方法。现在我觉得一切已经很清楚了,有很多探索空间,每个人都在努力填满它。总的来说,事情发展的方式大致是每个人都在努力构建我所说的这个LOMOS。基本上我喜欢将其看作是一个操作系统。你必须有一些外困设备,可以插入这个新的CPU或类似的设备。外困设备当然包括文本、图像、音频以及所有的模态。然后你有一个CPU,就是LOM transformer本身。然后它还连接到我们已经为自己建立起来的所有软件1.0基础设施。所以我觉得每个人都在努力构建类似的东西,然后将其提供为一种可定制的产品,适用于经济的各个角落。所以我认为这大致是大家都在努力发展的内容,也是我们今天稍早听到的内容。

So I think that's roughly where it's headed is we can bring up and down these relatively self-contained agents that we can give high level tasks to and specialize in various ways. So yeah, I think it's going to be very interesting and exciting. And it's not just one agent. And then there's many agents and what does that look like? And if that view of the future is true, how should we all be living our lives differently? I don't know. I guess we have to try to build it, influence it, make sure it's good and yeah, just try to make sure it turns out well. So now that you're a free independent agent, I want to address the elephant in the room, which is that OpenAI is dominating the ecosystem. And most of our audience here today are founders who are trying to carve out a little niche, praying that OpenAI doesn't take them out overnight. Where do you think opportunities exist for other players to build new independent companies versus what areas do you think OpenAI will continue to dominate even as its ambition grows? Yeah, so my high level impression is basically OpenAI is trying to build out this LMOs OS and I think as we heard earlier today, it's trying to develop this platform on top of which you can position different companies in different verticals. Now I think the OS analogy is also really interesting because when you look at something like Windows or something like that, these are also operating systems, they come with a few default apps. Like a browser comes with Windows, you can use the Edge browser.
因此,我认为大致的发展方向是我们可以引入和控制这些相对独立的代理,我们可以为它们提供高级任务并在各种方式上专门化。所以,我认为这将是非常有趣和令人兴奋的。这并不只是一个代理。那有很多代理,它们会是什么样子呢?如果对未来的看法是真实的,我们应该怎样改变我们的生活方式呢?我不知道。我猜我们必须努力去构建它,影响它,确保它是良好的,并努力确保它取得成功。现在你是一个自由独立的代理,我想要谈论一下大家心中的大象,即OpenAI正在主导这个生态系统。今天在这里的大部分观众都是创始人,他们正在尝试开创一小块市场,祈祷OpenAI不会一夜之间淘汰他们。你认为其他玩家在哪些领域有机会建立新的独立公司,而OpenAI在哪些领域会继续主导,即使它的雄心不断增长?是的,我总体印象是OpenAI正在尝试建立这个LMOs操作系统,正如我们今天早些时候所听到的,它正努力开发这个平台,可以在其之上定位不同垂直领域的各种公司。现在我认为操作系统的类比也非常有趣,因为当你看到类似Windows之类的东西时,这些也是操作系统,它们带有一些默认应用程序。就像浏览器与Windows一起提供,你可以使用Edge浏览器。

And so I think in the same way, OpenAI or any of the other companies might come up with a few default apps quote unquote, but it doesn't mean that you can have different browsers that are running on it, just like you can have different chat agents running on that infrastructure. And so there will be a few default apps, but there will also be potentially a vibrant ecosystem of all kinds of apps that are fine-tuned to all the different nooks and carols of the economy. And I really liked the analogy of the early iPhone apps and what they looked like and they were all kind of like jokes. And it took time for that to develop and I think absolutely I agree that we're going through the same thing right now. People are trying to figure out what is this thing good at? What is it not good at? How do I work it? How do I program with it? How do I debug it?
因此,我认为同样的,OpenAI或其他任何公司可能会推出一些默认应用程序,但这并不意味着你可以在上面运行不同的浏览器,就像你可以在该基础架构上运行不同的聊天代理一样。因此,可能会有一些默认应用程序,但也可能会出现各种适应经济的各种应用程序的活跃生态系统。我真的很喜欢早期iPhone应用的类比以及它们的外观,它们都有点像笑话。这需要时间来发展,我绝对同意我们现在正经历同样的事情。人们正试图弄清楚这个东西擅长什么?取悦什么?如何使用它?如何与之编程?如何调试它?

How do I just actually get it to perform real tasks and what kind of oversight? Because it's quite autonomous, but not fully autonomous. So what does the oversight look like? What does the evaluation look like? So there's many things to think through and just to understand sort of like the psychology of it. And I think that's what's going to take some time to figure out exactly how to work with this infrastructure. So I think we'll see that over the next few years. So the race is on right now with LLMs, OpenAI and Thoropic, Mistrol, Lama, Gemini. The whole ecosystem of open source models, now a whole long tail of small models. How do you foresee the future of the ecosystem playing out? Yeah, so again, I think the open source now, sorry, the operating system is now interesting because we have say like we have basically an oligopoly of a few proprietary systems, like say Windows, Mac OS, etc. And then we also have Linux. And so and Linux has an infinity of distributions. And so I think maybe it's going to look something like that. I also think we have to be careful with the naming because a lot of the ones that you listed like Lama, Mistrol, so I wouldn't actually say they're open source, right? And so it's kind of like tossing over a binary for like an operating system. You know, like you can kind of work with it and it's like useful, but it's not fully useful, right?
我怎么才能让它执行真正的任务,并且需要怎样的监督?因为它相当自治,但并非完全自治。监督是什么样子?评估又是怎样的?所以需要考虑很多事情,要理解它的心理。我认为这需要一些时间来找出如何与这个基础设施配合工作。我认为未来几年内我们会看到这一点。现在LLMs、OpenAI、Thoropic、Mistrol、Lama、Gemini等都在竞争。整个开源模型生态系统,现在还有许多小模型。你如何预测生态系统的未来发展?是的,我认为现在开源操作系统很有趣,因为我们有几个专有系统的寡头垄断,比如Windows、Mac OS等。还有Linux。Linux有无数的发行版。所以也许会看起来像那样。我也认为我们在命名上要小心,因为你列举的很多,比如Lama、Mistrol,我不会说它们是开源的。所以这有点像给操作系统投放一个二进制文件。你可以和它一起工作,它是有用的,但并非完全有用,对吧?

And there are a number of what I would say is like fully open source LLMs. So there's, you know, Pithia models, LLM 360, ALMO, etc. So and they're fully releasing the entire infrastructure that's required to compile the operating system, right? To train the model from the data to gather the data, etc. And so when you're just given a binary, it's much better, of course, because you can fine tune the model, which is useful. But also I think it's subtle, but you can't fully fine tune the model because the more you fine tune the model, the more it's going to start regressing on everything else. And so what you actually really want to do, for example, if you want to add capability is you not regress the other capabilities, you may want to train on some kind of like a mixture of the previous data set distribution and the new data set distribution, because you don't want to regress the old distribution, it's going to add knowledge.
有一些我想说完全开源的LLMs。所以有Pithia模型,LLM 360,ALMO等。它们完全发布了编译操作系统所需的全部基础设施。从数据训练模型到收集数据等。所以当你只是获得一个二进制文件时,当然是更好的,因为你可以微调模型,这是有用的。但我认为微调模型是微妙的,因为你不能完全微调模型,因为你微调模型越多,它就会开始在其他方面退化。所以实际上你真正想做的,例如如果你想增加功能,你不是降低其他功能,而是可能要在之前的数据集分布和新的数据集分布之间进行某种混合训练,因为你不想降低旧的分布,这将增加知识。

And if you're just given the weights, you can't do that, actually. You need the training loop, you need the data set, etc. So you are actually constrained in how you can work with these models. And again, like I think it's definitely helpful, but it's, I think we need like slightly better language for it almost. There's open weights models, open source models, and then proprietary models, I guess, and that might be the ecosystem. And yeah, probably it's going to look very similar to the ones that we have today. And hopefully you'll continue to help build some of that out.
如果你只是拿到了权重,实际上你是做不到的。你需要训练循环,你需要数据集等等。因此,你实际上受到了如何使用这些模型的限制。再次强调,我认为这肯定是有帮助的,但我觉得我们几乎需要更好的语言来表达它。有开放权重模型、开源模型,然后是专有模型,我猜这可能是生态系统。是的,它可能会看起来非常类似于我们今天拥有的那些。希望你继续帮助构建其中的一部分。

So I'd love to address the other elephant in the room, which is scale. Simplistically, it seems like scale is all that matters, scale of data, scale of compute, and therefore the large research labs, large tech giants have an immense advantage today. What is your view of that? And is that all that matters? And if not, what else does? So I would say scale is definitely number one. I do think there are details there to get right. And I think a lot also goes into the data set preparation and so on, making it very good and clean, etc. That matters a lot. These are all sort of like compute efficiency gains that you can get.
因此,我想要谈谈房间里还有的另一个大问题,那就是规模。简单地说,规模似乎是最重要的,数据规模、计算规模,因此今天大型研究实验室和科技巨头拥有巨大的优势。你对此有什么看法?这难道就是全部吗?如果不是,还有什么其他重要的因素?所以我会说规模肯定是第一位的。我认为在这方面有很多细节需要做好。我认为在数据集准备等方面也有很多工作要做,使之变得非常好且干净等等。这些都是你可以获得的计算效率收益。

So there's the data, the algorithms, and then of course the training of the model and making it really large. So I think scale will be the primary determining factor, is like the first principle component of things, for sure. But there are many of the other things that you need to get right. So it's almost like the scale sets some kind of a speed limit almost, but you do need some of the other things, but it's like if you don't have the scale, then you fundamentally just can't train some of these massive models if you are going to be training models. If you're just going to be doing fine tuning and so on, then I think maybe less scale is necessary, but we haven't really seen that just yet. Fully play out.
所以有数据、算法,当然还有模型的训练以及使其变得非常庞大。所以我认为规模将是主要决定因素,就像事情的第一原则组成部分一样。但还有许多其他需要做对的事情。所以几乎可以说,规模几乎就像设定了某种速度限制,但你确实需要一些其他的东西,但如果你没有规模,那么基本上就无法训练这些庞大的模型,如果你要训练模型的话。如果你只是进行微调等操作,那么我认为可能不需要那么大的规模,但我们还没有真正看到这种情况。完全展示出来。

And can you share more about some of the ingredients that you think also matter? Maybe lower and priority behind scale? Yeah, so the first thing I think is like you can't just train these models. If you're just giving them money and the scale, it's actually still really hard to build these models. And part of it is that the infrastructure is still so new and it's still being developed and not quite there. But training these models at scale is extremely difficult and is a very complicated distributed optimization problem. And there's actually like the talent for this is fairly scarce right now. And it just basically turns into this insane thing running on tens of thousands of GPUs. All of them are like failing at random at different points in time. And so like instrumenting that and getting that to work is actually an extremely difficult challenge. GPUs were not like intended for like 10,000 GPU workloads until very recently.
你能分享一些你认为重要的成分吗?也许是降低规模和优先级?是的,我认为首先是,你不能只是训练这些模型。如果你只是给它们钱和规模,构建这些模型仍然非常困难。部分原因是基础设施仍然很新,并且仍在发展中,尚未完全成熟。但在规模上训练这些模型是极其困难的,是一个非常复杂的分布式优化问题。目前这方面的人才实际上相当稀缺。它基本上变成了在成千上万个GPU上运行的疯狂的东西。它们中的每一个在不同时间点随机失败。因此,将其加以监视并让其正常运行实际上是一个非常困难的挑战。直到最近,GPU并不是为类似于10,000 GPU负载而设计的。

And so I think a lot of the infrastructure is sort of like creaking under that pressure and we need to work through that. But right now if you're just giving someone a ton of money or a ton of scale or GPUs, it's not obvious to me that they can just produce one of these models, which is why it's not just about scale. You actually need a ton of expertise both on the infrastructure side, the algorithm side and then the data side and being careful with that. So I think those are the major components. The ecosystem is moving so quickly. Even some of the challenges we thought existed a year ago are being solved more and more today, hallucinations, context windows, multimodal capabilities, inference getting better, faster, cheaper.
因此,我认为许多基础设施都在这种压力下发出嘎吱嘎吱的声音,我们需要应对这种情况。但是,如果现在你给某人一大笔钱或一大批规模或GPU,我并不认为他们能够轻易地生产出这种模型,这就是为什么这不仅仅是关于规模的原因。实际上,你需要在基础设施端、算法端和数据端都具备丰富的专业知识,并且要谨慎处理。所以我认为这些是主要的组成部分。这个生态系统发展速度如此之快。即使我们在一年前认为存在一些挑战,如今这些挑战也在不断得到解决,幻觉、上下文窗口、多模态能力、推理变得更好、更快、更便宜。

What are the LLM research challenges today that keep you up at night? What do you think are media enough problems but also solvable problems that we can continue to go after? So I'll say on the algorithm side, one thing I'm thinking about quite a bit is this like distinct split between diffusion models and autoregressive models, they're both ways of representing probably the distributions and it just turns out that different modalities are apparently a good fit for one of the two. I think that there's probably some space to unify them or to like connect them in some way and also get some best of both worlds or sort of figure out how we can get a hybrid architecture and so on.
今天LLM研究中令你终夜难眠的挑战是什么?你认为有哪些问题既足够复杂但也是可以解决的,我们可以继续努力解决?对于算法这一方面,我现在一直在思考的一个问题是扩散模型和自回归模型之间的明显分歧,它们都是表示概率分布的方式,不同的情况似乎更适合其中的一种。我认为可能有一些方法来统一它们,或者以某种方式将它们连接起来,并同时获得它们的优点,或者找出如何得到一个混合的架构等等。

So it's just up to me that we have sort of like two separate points in the space of models and they're both extremely good and it just feels wrong to me that there's nothing in between. So I think we'll see that sort of carved out and I think there are interesting problems there. And then the other thing that maybe I would point to is there's still like a massive gap in just the energetic efficiency of running all this stuff. My brain is 20 watts roughly. Jensen was just talking to GTC about the massive supercomputers that they're going to build building now. These are the numbers are in mega megawatts, right? And so maybe you don't need all that to run like a brain. I don't know how much you need exactly. But I think it's safe to say we're probably off by a factor of a thousand to like a million somewhere there in terms of like the efficiency of running these models.
所以,我认为我们在模型空间中有两个独立的点,它们都非常优秀,但我觉得这种中间状态让我感到不舒服。我认为我们会看到这种情况被澄清,而且在那里有一些有趣的问题。另一件事,也许我可以指出的是,在运行所有这些东西的能效上仍然存在着巨大的差距。我的大脑大约消耗20瓦特。Jensen刚刚在GTC上谈到了他们即将建造的大型超级计算机,这些数字是以兆瓦为单位的。也许你不需要所有这些来运行一个大脑。我不知道需要多少,但我认为可以肯定地说,在运行这些模型的效率方面,我们可能差了约一千到一百万倍。

And I think part of it is just because the computers we've designed of course are just like not a good fit for this workload. And I think NVIDIA GPUs are like a good step in that direction. In terms of like the you need extremely high parallelism. We don't actually care about sequential computation that is sort of like data dependent in some way. We just need to like blast the same algorithm across many different sort of array elements or something. You can think about it that way. So I would say number one is just adapting the computer architecture to the new data workflows.
我认为部分原因是因为我们设计的计算机并不适合这种工作负载。而NVIDIA的GPU在这方面是一个不错的方向。我们需要极高的并行性。实际上我们并不关心某种方式上的数据相关的顺序计算。我们只需要将相同的算法快速地在许多不同的数组元素之间执行。你可以这样理解。所以我认为第一步就是调整计算机架构以适应新的数据工作流程。

Number two is like pushing on a few things that we're currently seeing improvements on. So number one may be precision. We're seeing precision come down from what originally was like 64 bit for double. We're now to down to I don't know it is four, five, six or even 1.58 depending on which papers you read. And so I think precision is one big lever of getting a handle on this. And then second one of course is sparsity. So that's also like another big delta I would say like your brain is not always fully activated. And so sparsity I think is another big lever. But then the last lever I also feel like just the von Neumann architecture of like computers and how they built where you're shuttling data in and out and doing a ton of data movement between memory and you know the courses are doing all the compute.
数字二就像在我们目前看到改进的一些方面上施加压力。所以第一点可能是精度。我们看到精度已经从最初的64位双精度下降到我不知道是四、五、六,甚至可能是1.58,取决于你读哪些论文。我认为精度是控制这一切的一个重要杠杆。然后第二个当然是稀疏性。所以这也是另一个重要因素,我会说就像你的大脑并不总是完全激活的。所以我认为稀疏性是另一个重要的杠杆。但是最后一个杠杆,我也觉得就是像计算机的冯诺依曼体系结构以及它们是如何构建的,即你在传输数据并进行大量数据移动在内存和计算单元之间。

This is all broken as well kind of and it's not how your brain works and that's why it's so efficient. And so I think it should be a very exciting time in computer architecture. I'm not a computer architect but I think there's it seems like we're off by a factor of million thousand to a million something like that. And there should be really exciting sort of innovations there that bring that down. I think there are at least a few builders in the audience working on this problem. Okay, switching gears a little bit. You've worked alongside many of the greats of our generation. Sam, Greg from OpenAI and the rest of the OpenAI team, Elon Musk. Who here knows the the joke about the rowing team, the American team versus the Japanese team? Okay, great. So this will be a good one. Elon shared the set of last base camp and I think it reflects a lot of his philosophy around how he builds cultures and teams.
这也有点坏掉,而且这不符合你的大脑运作方式,这就是为什么这么有效率。所以我认为在计算机架构领域应该是一个非常令人兴奋的时刻。我不是计算机架构师,但我觉得似乎我们的问题存在着百万到百万级的偏差。因此,在那里应该会有一些非常令人兴奋的创新。我想观众中至少有几位人正在解决这个问题。好的,稍微换个话题。你曾与我们这一代伟人一起工作过,比如OpenAI的山姆、格雷格以及其他OpenAI团队成员、埃隆·马斯克。在座的有谁知道关于划船队的笑话,美国队对阵日本队的那个笑话?好的,太好了。埃隆在上次营地分享了这个故事,我认为这反映了他在如何建立文化和团队方面的哲学。

So you have two teams. The Japanese team has four rowers and one steer and the American team has four steerers and one rower. And can anyone guess when the American team loses? What do they do? Shout it out. Exactly. They fire the rower. And Elon shared this example I think as a reflection of how he thinks about hiring the right people, building the right people, building the right teams at the right ratio. From working so closely with folks like these incredible leaders, what have you learned? Yeah, so I would say definitely Elon runs his company is an extremely unique style. I don't actually think that people appreciate how unique it is. You sort of like even read about it and so much. You don't understand it, I think. It's like even hard to describe, but I don't even know where to start. But it's like a very unique, different thing. Like I like to say that he runs the biggest startups and I think it's just, I don't even know basically like how to describe it. It almost feels like it's a longer sort of thing that I have to think through.
所以你有两个团队。日本团队有四名划手和一名舵手,美国团队有四名舵手和一名划手。有谁能猜到当美国团队输掉比赛时会怎么做?大声说出来。没错,他们开除了划手。埃隆分享了这个例子,我认为这是他思考如何雇佣合适的人员,在合适的比例下建立合适的团队的一种反思。从与这些令人难以置信的领导者密切合作中,你学到了什么?是的,我会说埃隆经营他的公司的方式非常独特。我其实认为人们并不是很欣赏这种独特性。你甚至读了这么多关于他的事情,也不会理解,我认为。很难描述,但我甚至不知道从哪里开始。但这是一种非常独特、不同的东西。我喜欢说他经营着最大的初创企业,我觉得这只是...我甚至不知道基本上如何描述它。几乎感觉像是一种更长的过程,我需要认真思考一下。

Well, number one is like, so he likes very small, strong, highly technical teams. So that's number one. So I would say at companies by default, they sort of like the teams grow and they get large. Elon was always like a force against growth. I would have to work and expend efforts to hire people. I would have to like basically plead to hire people. And then the other thing is that big companies usually you want, it's really hard to get rid of low performers. And I think Elon is very friendly to by default getting rid of low performance. So I actually had to fight for people to keep them on the team. Because he would by default want to remove people. And so that's one thing. So keep a small, strong, highly technical team. No middle management that is kind of like non-technical for sure. So that's number one.
首先,他喜欢非常小的、强大的、高度技术化的团队。这是第一点。所以在公司里,团队往往会不断壮大。埃隆总是在追求反对增长。我不得不努力招聘人才。基本上我必须恳求才能招人。另一件事是,大公司通常很难摆脱低绩效员工。而我觉得埃隆很乐于默认裁员低绩效员工。因此,我需要争取保留员工在团队中。因为他默认情况下希望剔除一些人。所以,保持一个小而强大、高度技术化的团队。没有不熟悉技术的中层管理人员。这是第一点。

Number two is kind of like the vibes of how this is, how everything runs and how it feels when he sort of like walks into the office. He wants it to be a vibrant place. People are walking around, they're pacing around. They're working on exciting stuff. They're charting something they're coding. He doesn't like stagnation. He doesn't like to look for it to look that way. He doesn't like large meetings. He always encourages people to like leave meetings if they're not being useful. So actually you do see this. It's a large meeting. If you're not contributing and you're not learning, just walk out. And this is like fully encouraged. And I think this is something that you don't normally see. So I think like vibes is like a second big lever that I think he really instills culturally. Part of that also is like I think a lot of big companies, they're like Pamper employees. I think like there's much less of that. The culture of it is you're there to do your best technical work. And there's the intensity and so on. And I think maybe the last one that is very unique and very interesting and very strange is just how connected he is to the team. So usually a CEO of a company is like a remote person, five layers up who talks to their VPs who talk to their reports and directors and eventually you talk to your manager. That's not how you're as companies, right? Like he will come to the office. He will talk to the engineers. Many of the meetings that we had were like 50 people in the room with Elon and he talks directly to the engineers. He doesn't want to talk just to the VPs and the directors. So normally people would talk, spend like 99% of the time maybe talking to the VPs. He spends maybe 50% of the time and he just wants to talk to the engineers. So if the team is small and strong, then engineers and the code are the source of truth. And so they have the source of truth, not some manager and he wants to talk to them to understand the actual state of things and what should be done to improve it. So I would say like the degree to which he's connected with the team and not something remote is also unique.
第二点有点像这样的氛围,这里的一切是如何运行的,当他走进办公室时的感觉。他希望这是一个充满活力的地方。人们在走动,他们在忙碌着。他们在做一些令人兴奋的事情。他们在编码一些东西。他不喜欢停滞不前。他不想让事情看起来那样。他不喜欢大型会议。他总是鼓励人们如果会议对他们没有用处就离开。事实上你会看到这一点。在一个大型会议上,如果你没有贡献和学习,就离开吧。这是完全被鼓励的。我认为这是一种你不常见的风气,我认为这是他文化上真正灌输的第二个大杠杆。另外,我认为很多大公司会像对待员工像宠物那样。这里的文化是你在这里是为了做你最好的技术工作的。有着强度等等。我认为可能最后一个非常独特、非常有趣、非常奇怪的是他和团队的紧密联系。通常一家公司的CEO是一个遥远的人,五层楼高,他和他的副总裁交谈,副总裁和主管交谈,最终你和经理交谈。这不是他公司的做法,对吧?他会来到办公室。他会和工程师交谈。我们参加的很多会议是50人在屋里和埃隆交谈,他直接和工程师交谈。他不想只和副总裁和主管交谈。通常人们可能花99%的时间和副总裁交谈。他可能只花50%的时间和他们交谈,他只想和工程师交流。因此,如果团队是小而有力的,那么工程师和代码就是真相的来源。所以他们拥有真相,而不是一些经理,并且他想和他们交谈理解事物的实际状态和应该做什么来改进它。因此,我会说他与团队联系的程度以及不是什么遥远的东西也是独特的。

And also just like his large hammer and his willingness to exercise it within the organization. So maybe if he talks to the engineers and they bring up that, you know, what's blocking you? I just don't have a GPU to run my thing. And he's like, oh, OK. And if he hears that twice, he's going to be like, OK, this is a problem. So like, what is our timeline? And when you don't have satisfying answers, he's like, OK, I want to talk to the person in charge of the GPU cluster. And like someone dials the phone and he's just like, OK, double the cluster right now. Like, let's have a meeting tomorrow. From now on, sending daily updates onto the cluster is twice the size. And then they kind of like push back and they're like, OK, well, we have this procurement set up. We have this timeline and video says that we don't have enough GPUs and it will take six months or something. And then you get a rise of an eyebrow. And then he's like, OK, I want to talk to Jensen. And then he just kind of like removes bottlenecks. So I think the extent to which he's extremely involved and removes bottlenecks and plies his hammer, I think is also like not appreciated. So I think there's like a lot of these kinds of aspects that are very unique, I would say, and very interesting. And honestly, like going to a normal company outside of that is definitely miss aspects of that. And so I think, yeah, that's maybe that's a long rant. But that's just kind of like, I don't think I hit in all the points, but it is very unique thing and it's very interesting. And yeah, I guess that's my rant. Hopefully tactics that most people here can employ.
他还像他那把大锤一样,愿意在组织内部运用它。也许如果他和工程师们交谈,他们提出了阻碍你的问题是什么?我只是没有GPU来运行我的项目。他会说,哦,好的。如果他听到两次这样的情况,他就会说,好的,这是一个问题。那我们的时间表是什么?当你没有令人满意的答案时,他会说,好的,我想和负责GPU集群的人谈一下。然后有人拿起电话,他就会说,现在就把集群加倍。明天开个会。从现在开始,每天更新集群,让它的大小加倍。然后他们有些推迟,说,好吧,我们有这个采购计划,我们有这个时间表,但是我们没有足够的GPU,需要六个月什么的。然后他眉毛一挑,说,好的,我要和Jensen谈一下。他会消除瓶颈。我认为他极为投入并消除瓶颈,运用他的锤子,这种程度可能并不被人们重视。我认为这些方面是非常独特和有趣的。老实说,普通公司缺少这些方面。所以我想,是的,这可能有点长。但这只是我认为,我可能没有涵盖所有的点,但这是非常独特和有趣的。我想,希望大家都能采纳这些策略。

Taking a step back, you've helped build some of the most generational companies. You've also been such a key enabler for many people, many of whom are in the audience today of getting into the field of AI. Knowing you, what you care most about is democratizing access to AI. Education, tools, helping create more equality in the whole ecosystem. At large, there are many more winners. As you think about the next chapter in your life, what gives you the most meaning?
退一步看,您已经帮助建立了一些最具代际意义的公司。您也是许多人的关键支持者,其中很多人今天都在听众中,他们进入了人工智能领域。我知道,您最关心的是让更多人能够接触到人工智能。教育、工具、帮助创造更多平等的整个生态系统。总的来说,有许多更多的赢家。当您想到生活的下一个篇章时,什么给您最大的意义?

Yeah, I think like, I think you've described it in the right way, like where my brain goes by default is like, you know, I've worked for a few companies, but I think like ultimately I care not about any one specific company. I care a lot more about the ecosystem. I want the ecosystem to be healthy. I want to be thriving. I want it to be like a coral reef of a lot of cool, exciting startups and all the nooks and crannies of the economy. And I want the whole thing to be like this boiling soup of cool stuff. And only Andre dreams about coral reefs. You know, it's going to be like a cool place. And I think, you know, that's why I love startups and I love companies. And I want there to be vibrant ecosystem of them. And by default, I would say a bit more hesitant about kind of like, you know, like five mega corpse kind of like taking over, especially with AGI being such a magnifier of power. I would be kind of kind of worried about what that could look like and so on. So I have to think that through more. But yeah, I like I love the ecosystem and I want it to be healthy and vibrant.
是的,我觉得你描述得很对,我脑海里默认的想法就是,你懂的,我曾在一些公司工作过,但我觉得最重要的不是任何一个特定的公司。我更关心整个生态系统。我希望生态系统健康繁荣,像是很多酷炫令人兴奋的创业公司构成的珊瑚礁,遍布经济的各个角落。我希望整个生态系统像是一锅煮滚着酷炫东西的汤。只有安德烈梦想着珊瑚礁。你知道,这将是一个很酷的地方。我认为,这就是为什么我热爱创业公司和企业。我希望它们之间有一个充满活力的生态系统。但是默认情况下,我会有些犹豫,特别是当AGI成为权力的放大器时,我会对五大巨头接管的情况感到担忧。我担心这会是什么样子等等。所以我需要更深入地思考这个问题。但是,我喜欢并关心这个生态系统,希望它健康而充满活力。

Amazing. We'd love to have some questions from the audience. Yes, Brian. Hi, Brian, how I can would you recommend founders follow Elon's management methods or is it kind of unique to him and you shouldn't try to copy him? Yeah, I think that's a good question. I think it's up to the DNA of the founder, like you have to have that same kind of a DNA and that's some kind of a vibe. And I think when you're hiring the team, it's really important that you're like the you're you're making it clear upfront that this is the kind of company that you have. And when people sign up for it, they're they're very happy to go along with it actually. But if you change it later, I think people are unhappy with that and that's very messy.
太棒了。我们很乐意接受观众的问题。是的,布赖恩。嗨,布赖恩,你会推荐创始人们跟随埃隆的管理方法吗,或者这种方法是独一无二的,不能试图模仿他?是的,我认为这是一个很好的问题。我认为这取决于创始人的DNA,就像你必须拥有相同的DNA和一种氛围。当你雇佣团队时,非常重要的是你在最开始就明确指出这是你拥有的公司的类型。当人们加入后,他们非常乐意跟随。但是如果你以后改变了,我认为人们会对此感到不满,这样会变得非常混乱。

So as long as you do it from the start and you're consistent, I think you can run a company like that. And, you know, as but you know, it has its own like pros and cons as well. And I think so, you know, up to people, but I think it's a consistent model of company building and running. Yes, Alex. Hi, I'm curious if there are any types of model composability that you're really excited about, maybe other than mixture of experts. Not sure what you think about like merge model merges, franken merges or any other like things to make model development more composable.
只要你从一开始就坚持这样做,并且保持一贯性,我认为你可以像那样经营一家公司。但是,你知道,这种方式也有它自己的优点和缺点。我认为,这取决于个人,但是我认为这是一种建立和经营公司的一贯模式。是的,Alex。嗨,我很好奇你是否对某种模型合成方法感到兴奋,也许除了专家混合之外还有其他类型。不确定你对像模型合并、弗兰肯合并或其他能够使模型开发更可合成的方法有什么看法。

Yeah, that's a good question. I see like papers in this area, but I don't know that anything has like really stuck. Maybe the composability. I don't know exactly know what you mean, but you know, there's a ton of work on like primary efficient training and things like that. I don't know if you would put that in the category of composability in the way I understand it, but it's something the case that like traditional code is very composable. And I would say neural nets are a lot more fully connected and less composable by default. But they do compose and can fine tune as a part of a whole. So as an example, if you're doing like a system that you want to have chai-chee petin just images or something like that, it's very common that you pre-train components and then you plug them in and fine tune maybe through the whole thing as an example. So there's some possibility in those aspects where you can pre-train small pieces of the cortex outside and compose them later. So through initialization and fine tuning. So I think to some extent it's maybe those are my scattered thoughts on it, but I don't know if I have anything very coherent otherwise.
是的,这是一个很好的问题。我在这个领域看到了很多论文,但我不知道有什么特别突出的地方。也许是可组合性。我不完全明白你的意思,但是,关于高效训练和类似的工作有很多研究。我不确定你是否会将这些放在我理解的可组合性类别中,但传统的代码确实非常可组合。我认为神经网络默认更多地是全连接的,不太可组合。但它们可以组合并作为整体进行微调。例如,如果你要做一个系统,想要训练出像柴鸡蛋这样的图像,一种常见做法是预先训练组件,然后将它们插入并通过整个过程进行微调。通过这种方式,在某些方面可能存在可能性,你可以在之后将大脑的小片段进行预训练并进行组合。所以通过初始化和微调。我认为,在某种程度上,这可能是我关于这个问题零散的想法,但除此之外,我不知道是否有其他连贯的观点。

Yes, Nick. So we've got these next word prediction things. Do you think there's a path towards building a physicist or a von Neumann type model that has a mental model of physics that's self-consistent and can generate new ideas for how you actually do fusion? How do you get faster than light travel if it's even possible? Is there any path towards that or is it like a fundamentally different vector in terms of these AI model developments?
是的,尼克。所以我们有这些下一个词预测的东西。你认为有没有一条路径可以建立一个类似物理学家或冯·诺依曼类型的模型,其中有一个与自身一致的物理心智模型,并能产生关于如何实际进行聚变的新想法?如果可能的话,如何实现超光速旅行?有没有通往这个目标的路径,或者说在这些人工智能模型发展方面,它是一条基本不同的矢量?

I think it's fundamentally different in some, in one aspect. I guess like what you're talking about maybe is just like capability question because the current models are just like not good enough. And I think there are big rocks to be turned here. And I think people still haven't like really seen what's possible in the space like at all. And I roughly speaking, I think we've done step one of AlphaGo. This is what the team we've done imitation learning part. There's step two of AlphaGo, which is the RL and people haven't done that yet. And I think it's going to fundamentally like this is the part that actually made it work and made something superhuman. And so I think this is, I think there's like big rocks and capability to still be turned over here. And the details of that are kind of tricky potentially. But I think this is, we just happened on step two of AlphaGo long story short. And we've just done imitation.
我认为在某些方面基本上是有所不同的。我想你说的可能只是像能力问题,因为当前的模型还不够好。我认为还有很多大问题有待解决。我觉得人们还没有真正看到这个领域的可能性。粗略地说,我觉得我们已经完成了AlphaGo的第一步。这就是我们团队完成了模仿学习部分。还有AlphaGo的第二步,就是强化学习,人们还没有做过那一步。我认为这将从根本上改变,这部分实际上是使其工作并成为超越人类的部分。所以我认为这是,我认为还有很多大问题和能力有待开发。其中的细节可能有些棘手。但总的来说,我觉得我们刚好走过了AlphaGo的第二步,我们只是完成了模仿。

And I don't think that people appreciate like, for example, number one, like how terrible the data collection is for things like Jash EPT. Like say you have a problem, like some prompt is some kind of mathematical problem. A human comes in and gives the ideal solution, right, to that problem. The problem is that the human psychology is different from the model psychology. What's easy or hard for the, for the human are different to what's easier or hard for the model. And so human kind of fills out some kind of a trace that like comes to the solution, but like some parts of that are trivial to the model and some parts of that are a massive leap that the model doesn't understand. And so you're kind of just like losing it. And then everything else is polluted by that later. And so like fundamentally what you need is the model needs to practice itself how to solve these problems.
我不认为人们真正欣赏,比如,像Jash EPT这样的数据收集有多糟糕。比如说你遇到一个问题,比如某种数学问题。一个人来了,给出了理想的解决方案,对吧,这个问题是人类心理学与模型心理学不同。对于人类来说容易或困难的事情不同于对模型来说容易或困难的事情。所以人类会填写一些类似解决方案的痕迹,但其中一些部分对于模型来说微不足道,而另一些部分对模型来说是一个巨大的飞跃,模型无法理解。所以你会觉得有点迷茫。然后后面的一切都受到了其影响。所以基本上你需要的是模型需要练习自己如何解决这些问题。

It needs to figure out what works for it or does not work for it. Maybe it's not very good at fordage edition, so it's going to fall back and use a calculator. But it needs to learn that for itself based on its own capability and its own knowledge. So that's number one is like that's totally broken, I think. It's a good initializer though for something agent-like. And then the other thing is like we're doing reinforcement learning from human feedback, but that's like a super weak form of reinforcement learning. It doesn't even count as reinforcement learning, I think.
它需要找出适合自己的方法或者不适合自己的方法。也许它不擅长做数学计算,所以会退而求其次使用计算器。但它需要根据自己的能力和知识学会这些。所以第一点就像完全没有解决,我觉得。不过它作为类似代理的一个很好的初始化器。另一件事是我们正在从人类反馈中进行强化学习,但这只是一种非常薄弱的形式的强化学习。我觉得这甚至不能算作强化学习。

Like what is the equivalent in AlphaGo for RLHF? It's like what is the reward model? What I call it is a vibe check. Like imagine like if you wanted to train like an AlphaGo RLHF it would be giving two people two boards and like said which one do you prefer. And then you would take those labels and you would train the model and then you would RL against that. Well what are the issues with that? It's like number one that's just vibes of the board, that's what you're training against. Number two, if it's a reward model that's a neural nut, then it's very easy to overfit to that reward model for the model you're optimizing over. And it's going to find all these spurious ways of hacking that massive model is the problem. So AlphaGo gets around these problems because they have a very clear objective function you can RL against it. So RLHF is like nowhere near I would say RL, this is like silly.
在AlphaGo中,RLHF的等价物是什么呢?就好像奖励模型是什么?我称之为“感觉检查”。比如,想象一下,如果你想要训练一个AlphaGo RLHF,你会让两个人拿到两个棋盘,然后问他们喜欢哪一个。然后你会拿这些标签训练模型,然后进行强化学习。那么问题是什么呢?首先,你训练的是棋盘的感觉,就是你要对抗的东西。其次,如果是一个神经网络的奖励模型,很容易过拟合这个奖励模型,而不是你正在优化的模型。它会找到很多不相关的方法来欺骗这个庞大的模型。所以AlphaGo避开了这些问题,因为他们有一个非常明确的目标函数可供进行强化学习。所以我会说RLHF离强化学习远着呢,这有点荒谬。

And the other thing is imitation learning super silly. RLHF is nice improvement but it's still silly. And I think people need to look for better ways of training these models so that it's in the loop with itself and in some psychology. And I think there will probably be unlocks in the direction. So it's sort of like graduate school for AI models. It needs to sit in a room with a book and quietly question itself for a decade.
另一件事是模仿学习是非常愚蠢的。强化学习有所改进但仍然很愚蠢。我认为人们需要寻找更好的训练这些模型的方法,以使其与自身和一些心理学相互联系。我认为可能会在这个方向上有所突破。所以这有点像人工智能模型的研究生学院。它需要坐在房间里读书,安静地反思十年。

Yeah. I think that would be part of it, yes. And I think like when you are learning stuff and you're going through textbooks like there is exercises in textbook where are those? Those are prompts to you to exercise the material. And when you're learning material, not just like reading left to right, like number one, you're exercising but maybe you're taking notes, you're rephrasing, reframing. Like you're doing a lot of manipulation of this knowledge in a way of you learning that knowledge. And we haven't seen equivalents of that at all in alarms. So it's like super early days, I think. So I think that's a good thing. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. It's cool to be to be optimal and practical at the same time.
是的。我认为这是其中的一部分,是的。当你正在学习东西的时候,你会通过教科书中的练习来运用知识。当你学习材料时,不仅仅是从左到右阅读,可能你还在做笔记,重新表述,重塑。你在以一种学习知识的方式对这些知识进行大量操纵。在警报方面,我们还没有看到类似的操作。所以我认为现在还处于非常早期阶段。所以我认为这是件好事。是的。很酷能同时做到最佳和实用。

So I would be asking like, how would you be aligned the priority of like A, either doing cost reduction in revenue generation or B, like finding the better quality models with like better reasoning capabilities? How would you be aligning that? So maybe I understand the question. I think what I see a lot of people do is they start out with the most capable model that doesn't matter what the cost is. So you use GPT4, you super prompted, etc. You do reg, etc.
因此,我会问,你会如何对A和B的优先级进行调整?是优先降低成本还是提高收入?还是寻找具有更好推理能力的高质量模型?你会如何调整?也许我理解了这个问题。我认为很多人做的是从最能胜任的模型开始,不考虑成本。比如使用GPT4,超级提示等。进行回归等。

So you're just trying to get your thing to work. So you're going after sort of accuracy first. And then you make concessions later. You check if you can fall back to 3.5 or so in terms of queries, you check if you sort of make it cheaper later. So I would say go after performance first, and then you make it cheaper later. It's kind of like the paradigm that I've seen a few people that I talked to about this kind of say works for them. And maybe it's not even just a single problem for like think about what are the ways in which you can even just make it work at all.
所以你只是想让你的东西能够运行。所以你首先追求准确性。然后再做出让步。你可以检查一下是否能够回到3.5左右的查询,你可以检查一下是否能够降低成本。所以我会说先追求性能,然后再降低成本。我遇到的一些人说这种范例对他们有效。也许这甚至不仅仅是一个问题,你可以考虑通过哪些方式让它能够正常运行。

Because if you just can make it work at all, like say you make 10 prompts or 20 prompts and you pick the best one and you have some debate or I don't know what kind of a crazy flow you can come up with, right? Like just get your thing to work really well. Because if you have a thing that works really well, then one other thing you can do is you can distill that, right? So you can get a large distribution of possible problem types. You run your super expensive thing on it to get your labels and then you get a smaller, cheaper thing that you find you're on it. And so I would say I would always go after sort of get it to work as well as possible no matter what first and then make it cheaper. This thing I would suggest.
因为如果你能让它工作,比如说你有10个提示或者20个提示,然后你挑选最好的一个,然后进行一些辩论或者想出一种疯狂的方法,对吧?就是让你的东西运行得非常好。因为如果你有一个运行得非常好的东西,那么另一件事情你可以做的就是提炼它。所以你可以获得多种可能问题类型的广泛分布。你可以用你的超级昂贵的东西运行它来获得你的标签,然后你可以用一个更小更便宜的东西来运行它。所以我会建议首先让它尽可能工作得更好,然后再让它变得更便宜。这就是我想建议的事情。

Hi, Sam. Hi. One question. So this past year we saw a lot of kind of impressive results from the open source ecosystem. I'm curious what your opinion is of how that will continue to keep pace or not keep pace with closed source development as the models continue to improve in scale.
嗨,山姆。嗨。有个问题。所以去年我们看到了开源生态系统取得了许多令人印象深刻的成果。我很好奇你对于开源发展将如何与专有软件发展保持步调或者落后的看法,特别是当这些模型不断提升规模的时候。

Yeah, I think that's a very good question. Yeah, I think that's a very good question. I don't really know. Fundamentally, like these models are so capital intensive, right? Like one thing that is really interesting is for example you have Facebook and meta and so on who can afford to train these models at scale, but then it's also not part of it's not the thing that they do and it's not involved like their money printer is unrelated to that. And so they have actual incentive to potentially release some of these models so that they empowered the ecosystem as a whole so they can actually borrow all the best ideas.
是的,我认为这是一个非常好的问题。我实在不太清楚。从根本上讲,这些模型的资金投入是如此巨大,对吧?一个很有趣的事情是,比如像Facebook和Meta这样的公司可以负担得起大规模训练这些模型,但这并不是他们的主业,他们的财源与此无关。因此,他们有动机释放一些这些模型,以便为整个生态系统提供支持,这样他们就可以借鉴所有最好的想法。

So that to me makes sense. But so far I would say they've only just done the open weights model. And so I think they should actually go further. And that's what I would hope to see. And I think it would be better for everyone. And I think potentially maybe they're squeamish about some of the some of the aspects of it eventually with respect to data and so on. I don't know how to overcome that. Maybe they should like try to just find data sources that they think are very easy to use or something like that and try to constrain themselves to those. So I would say like those are kind of our champions potentially. And I would like to see more transparency also coming from you know and I think Meta and Facebook are doing pretty well. Like they released papers they published a log book and sorry it was yeah a log book and so on. So they're doing I think they're doing well but they could do much better in terms of fostering the ecosystem. And I think maybe that's coming we'll see. Peter.
这对我来说是有道理的。但到目前为止,我会说他们只是做了开放权重模型。我认为他们应该进一步推进。这就是我希望看到的。我认为这对每个人都会更好。也许他们对数据等方面可能有些顾虑。我不知道如何克服这一点。也许他们应该尝试只选择他们认为非常容易使用的数据来源,然后尝试限制自己使用这些数据。我会说,这些可能是我们的潜在支持者。我希望看到更多来自你们的透明度,我认为Meta和Facebook做得相当不错。他们发布论文,公开一本日志,等等。所以他们做得很好,但在培育生态系统方面可以做得更好。也许这将会到来,我们拭目以待。Peter。

Yeah maybe this is like an obvious answer given the previous question but what do you think would make the AI ecosystem cooler and more vibrant or what's holding it back? Is it you know openness or do you think those other stuff that is also like a big thing that you'd want to work on? Yeah I certainly think like one big aspect is just like the stuff that's available. I had a tweet recently about like number one build the thing number two build the ramp. I would say there's a lot of people building a thing. I would say there's a lot less happening of like building the ramps so that people can actually understand all this stuff. And you know I think we're all new to all of this. We're all trying to understand how it works. We all need to like ramp up and collaborate to some extent to even figure out how to use this effectively. So I would love for people to be a lot more open with respect to you know what they've learned how they trained all this how what works what doesn't work for them etc. And yes just from us to like learn a lot more from each other that's number one. And then number two I also think like there is quite a bit of momentum in the open ecosystems as well. So I think that's already good to see. And maybe there's some opportunities for improvement I talked about already. So yeah.
或许这个问题的答案显而易见,但你认为如何使人工智能生态系统更加先进和充满活力,或者是什么在阻碍它的发展呢?是开放性,或者你认为还有其他重要的因素需要着手解决吗?我认为一个重要的方面就是所提供的资源。最近我发了一个推文,内容是首先建造物品,然后再建造斜坡。我觉得很多人都在建造物品,但相对较少的人在着手建造斜坡,让人们能够理解这些内容。我们都在尝试弄清楚如何使用人工智能,需要加快学习进度,一定程度上合作,以便更有效地使用这项技术。我希望人们能够更加开放,分享他们所学到的知识、如何训练模型、什么有效、什么无效等等。我们需要彼此学习,这是第一步。其次,我也认为在开放生态系统中已经产生了相当大的动力,这是一个好的迹象。也许我们可以进一步改善一些方面,这也是我之前谈到的。嗯。

Last question from the audience Michael. To get to like the the next big performance leap from models do you think that it's sufficient to modify the transformer architecture with say thought tokens or activation beacons or do we need to throw that out entirely and come up with a new fundamental building block to take us to the next big step forward or AGI. Yeah I think I think that's a good question. Well the first thing I would say is like transformer is amazing. It's just like so incredible. I don't think I would have seen that coming for sure. Like for a while before the transformer arrived I thought there would be an insane diversification of neural networks. And that was not the case. It's like complete opposite actually. It's complete like it's like all the same model actually. So it's incredible to me that we have that.
最后一个来自观众迈克尔的问题。为了实现从当前模型到下一个性能飞跃,您认为仅仅修改变压器架构,比如通过思维令牌或激活信标,就足够了吗?还是我们需要完全放弃这一点,提出一个新的基础构建模块,以带领我们迈向下一个重要步骤或AGI。是的,我认为这是一个很好的问题。首先我想说的是变压器架构是令人惊叹的。它简直太不可思议了。我肯定也没有料到会是这样。在变压器出现之前,我之前认为神经网络会变得多样化得难以置信。但事实并非如此。实际上完全相反。实际上所有都是相同的模型。所以对我来说,我们拥有这个是令人难以置信的。

I don't know that it's like the final neural network. I think there will definitely be. I would say it's really hard to tell to say that given the history of the of the field and I've been in it for a while it's really hard to say that this is like the end of it. Absolutely it's not. And I think I feel very optimistic that someone will be able to find a pretty big change to how we do things today. I would say on the front of the autogress of the fusion which is kind of like the modeling and the loss setup. I would say there's definitely some fruit there probably but also on the transformer and like I mentioned these levers of precision and sparsity and as we drive that and together with the co-design of the hardware and how that might evolve and just making network architectures there are a lot more sort of well tuned to those constraints and how all that works. To some extent also I would say like transformer is kind of designed for the GPU by the way like it was the big leap I would say in the transformer paper and that's where they were coming from is we want an architecture that is fundamentally extremely paralyzable and because the recurrent neural network has sequential dependencies terrible for GPU. Transformer basically broke that through the attention and this was like the major sort of insight there and it has some predecessors of insights like the neural GPU and other papers at Google that are sort of thinking about this but that is a way of targeting the algorithm to the hardware that you have available so I would say that's kind of like in that same spirit but long story short like I think it's very likely we'll see changes to it still but it's been proven like remarkably resilient I have to say like it came out you know many years ago now like I don't know yeah something six yeah so you know like the original transformer and what we're using today are like not super different yeah.
我不知道这是否就像最终的神经网络。我认为肯定会有。我要说很难预测,鉴于领域的历史,我在这个领域已经有一段时间了,很难说这就是结局。绝对不是。我觉得有人一定能找到一种相当大的改变,改变我们今天的操作方式。我想说的是,在自回归融合方面,这种建模和损失设置可能确实有一些成果,但也在变压器和像我提到的精度和稀疏度这些杠杆上,随着我们推进,再加上一起设计硬件以及它可能会发展的方式,以及设计网络架构,更好地适应这些约束条件以及所有这些的运作方式。在某种程度上,我还要说,变压器实际上是为GPU设计的,顺便说一下,我认为变压器论文中的重大突破就是他们要解决的,我们想要一个基本上是极具可并行性的架构,因为递归神经网络具有顺序依赖性,对GPU来说非常糟糕。变压器通过注意力机制打破了这一点,这是一个主要的见解,在这方面有一些先前的见解,比如神经GPU和Google的其他论文,正在思考这个问题,但这是一种将算法针对你可用的硬件进行定位的方式,所以我想说,它是以类似精神来进行的,但长话短说,我认为很可能我们仍然会看到对它的改变,但它被证明非常有弹性,我必须说,它出现了,你知道,现在已经很多年了,我不知道是六年还是怎么样,你知道,原始的变压器和我们今天使用的并没有太大的不同。

As a parting message to all the founders and builders in the audience what advice would you give them as they dedicate the rest of their lives to helping shape the future of AI? So yeah I don't have super I don't usually have crazy generic advice I think like maybe the thing that's top of my mind is I think founders of course care a lot about like their startup I would I also want like how do we have a vibrant ecosystem of startups how do startups continue to win especially with respect to like big tech and how do we how does the ecosystem become healthier and what can you do? Sounds like you should become an investor. Amazing thank you so much for joining us Andre for this and also for the whole day today.
作为一名告别讯息,给在场所有创始人和建设者,你会给他们什么建议,因为他们将把余生献给塑造人工智能的未来?所以是的,我没有什么超级通用的建议,我认为可能我脑海中首要的是,创始人当然非常关心他们的创业公司,我也希望有一个充满活力的创业公司生态系统,创业公司如何继续成功,特别是在与大科技公司相比,生态系统如何变得更加健康,你能做些什么?听起来你应该成为一名投资者。非常感谢您参加我们的节目,安德烈,也感谢您今天整天的陪伴。