首页  >>  来自播客: Investment videos - YouTube 更新   反馈

Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters

发布时间 2024-04-19 00:13:14    来源


Zuck on: - Llama 3 - open sourcing towards AGI - custom silicon, synthetic data, & energy constraints on scaling - Caesar Augustus, intelligence explosion, bioweapons, $10b models, & much more Enjoy! Timestamps 00:00:00 Llama 3 00:09:15 Coding on path to AGI 00:26:07 Energy bottlenecks 00:34:03 Is AI the most important technology ever? 00:38:04 Dangers of open source 00:54:40 Caesar Augustus and metaverse 01:05:36 Open sourcing the $10b model & custom silicon 01:16:02 Zuck as CEO of Google+ Links Apple Podcasts: https://podcasts.apple.com/us/podcast/mark-zuckerberg-llama-3-open-sourcing-%2410b-models-caeser/id1516093381?i=1000652877239 Spotify: https://open.spotify.com/episode/6Lbsk4HtQZfkJ4dZjh7E7k?si=GOqj7hUdSaWSgi7ULWXjMA Transcript: https://www.dwarkeshpatel.com/p/mark-zuckerberg Me on Twitter: https://twitter.com/dwarkesh_sp Sponsors If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form - This episode is brought to you by Stripe, financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue. Learn more at https://stripe.com/ - V7 Go is a tool to automate multimodal tasks using GenAI, reliably and at scale. Use code DWARKESH20 for 20% off on the pro plan. Learn more at https://www.v7labs.com/go?utm_campaign=Dwarkesh%20Podcast%20Newsletter&utm_source=Dwarkesh-Podcast&utm_medium=Newsletter&utm_term=Paid-Email - CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/



That's not even a question for me, whether we're going to go take a swing at building the next thing. I'm just incapable of not doing that. There's a bunch of times when we wanted to launch features, and then Apple's just like, nope, you're not launching that. It's just like, that sucks. Are we set up for that with AI, where you're going to get a handful of companies that run these closed models that are going to be in control of the APIs, and therefore going to be able to tell you what you can build? Then when you start getting into building a data center that's like 300 megawatts, or 500 megawatts, or a gigawatt, just no one has built a single gigawatt data center yet. From wherever you sit, there's going to be some actor who you don't trust if they're the ones who have like the super strong AI. I think that that's potentially a much bigger risk. Mark, welcome to the podcast. Hey, thanks for having me. Big fan of your podcast. Oh, thank you. That's very nice of you to say. Okay, so let's start by talking about the releases that will go out when this interview goes out. Tell me about the models. Tell me about MedAI. What's new? What's exciting about them? Yeah, sure. I think the main thing that most people in the world are going to see is the new version of MedAI. Right? The most important thing about what we're doing is the upgrade to the model. We're rolling out Lama3. We're doing it both as open source for the dev community, and it is now going to be powering MedAI. So there's a lot that I'm sure we'll go into around Lama3, but I think the bottom line on this is that with Lama3, we now think that MedAI is the most intelligent AI assistant that people can use that's freely available. We're also integrating Google and Bing for real-time knowledge. We're going to make it a lot more prominent across our apps. So basically, at the top of WhatsApp and Instagram and Facebook and Messenger, you'll just be able to use the search box right there to ask any questions. There's a bunch of new creation features that we added that I think are pretty cool that I think people enjoy.

I think animations is a good one. You can basically just take any image and animate it. But I think one that people are going to find pretty wild is it now generates high-quality images so quickly. I don't know if you've gotten a chance to play with this, that it actually generates it as you're typing and updates it in real-time. So you're typing your query and it's honing in on. It's like, okay, here, show me a picture of a cow in a field with mountains in the background. It's just like everything's populating. You didn't get any nice drinking beer. And it's updating the image in real-time. It's pretty wild. I think people are going to enjoy that. So that, I think, is that's what most people are going to see in the world. We're rolling that out. Not everywhere, but we're starting in a handful of countries and we'll do more over the coming weeks and months. So that, I think, is going to be a pretty big deal. And I'm really excited to get that in people's hands. It's a big step forward for MedAI. But I think, if you want to get under the hood a bit, the llama 3 stuff is obviously the most technically interesting. So we're basically, for the first version, we're training three versions. You know, an 8 billion and a 70 billion, which we're releasing today. And a 405 billion dense model, which is still training. So we're not releasing that today. But the 8 in 70, I mean, I'm pretty excited about how they turned out. I mean, they're leading for their scale. You know, it's, I mean, we'll release a blog post with all the benchmarks that people can check it out themselves. And obviously, it's an open source, so people get a chance to play with it.
我认为动画是一个很好的选择。你基本上可以拿任何图像来制作动画。但我认为人们会觉得很惊奇的一点是它现在可以快速生成高质量的图像。我不知道你是否有机会尝试过这个功能,它实际上是在你输入时生成图像,并实时更新。所以当你输入你的查询时,它会逐步完善。比如,当你要求展示一张有山丘背景的田地里有一头牛的图片时,所有东西都会被填充进去。你没有得到一个漂亮的喝啤酒的图片。并且图像会实时更新。这相当惊人。我认为人们会喜欢这个功能。这将是大多数人在世界上看到的。我们正在推出这个功能。虽然不是所有地区,但我们正从一些国家开始,接下来的几周和几个月我们会开展更多。我认为这将是一个相当重要的事件。我很期待人们使用它。这是MedAI迈出的一大步。但我认为,如果你想深入了解一下,llama 3可能是最有技术含量的部分。所以我们基本上在第一个版本中训练了三个模型。你知道,一个是80亿,一个是700亿,这两个我们今天发布。还有一个4050亿的密集模型,目前正在训练,所以今天不会发布。但80亿和700亿,我对它们的成果感到非常兴奋。就他们各自的规模来看,它们是领先的。我们将发布一篇博客文章,里面有所有的基准测试供人们自己检验。而且它是开源的,所以人们有机会尝试使用它。

We have a roadmap of new releases coming that are going to bring multi modality, more multi-linguality, bigger context windows to those as well. And then, you know, hopefully, sometime later in the year, we'll get to roll out the 405, which I think is, is, you know, in training, it's still training. But for where it is right now in training, it is already at around 85 MMOU. And just, we expect that it's going to have leading benchmarks on a bunch of the benchmarks. So I'm pretty excited about all that. I mean, the 70 billion is great too. I mean, we're releasing that today. It's around 82 MMOU and has leading scores on math and reasoning. So I mean, it's, I think just getting this in people's hands is going to be pretty wild. Oh, interesting. Yeah, that's the first time here. That's super impressive. Yeah. And it'll be billion is, the 8 billion is, um, is nearly as, as powerful as the biggest version of llama two that we released. So it's like the smallest llama three is basically as powerful as the biggest llama two. Okay.
我们有新版本的路线图即将推出,将为用户带来多模态、更多语言和更大的上下文窗口。然后,希望在今年晚些时候,我们将开始推出 405,我认为它目前仍处于训练阶段。但就目前的训练进展来看,它的MMOU已经达到了大约85。我们预计它将在许多基准测试上达到领先水平。我对所有这些都感到非常兴奋。70亿也很棒。我们今天将发布它。它的MMOU约为82,并在数学和推理方面取得了领先成绩。我认为将这些交到用户手中会非常激动人心。哦,有趣。这是我第一次听说。真是令人印象深刻。而且,这80亿几乎与我们发布的llama two最大版本一样强大。所以说,最小的llama three基本上就像是最大的llama two一样强大。

So before we dig into these models, I actually want to go back in time. 2022 is, I'm assuming when you started acquiring these H 100s, um, or you can tell me when, uh, we were like, stock prices getting hammered. People are like, what's happening with all this cat, because people aren't buying the metaverse. And presumably you're spending that cat, but you get these H 100s. How back then, how did you know to get the H 100s? How did you know we'll need the GPUs? Um, I think it was, it was because we were working on reels. So, you know, we got into this situation where, um, you know, we always want to have enough capacity to build something that we can't quite see that were on the horizon yet. Um, and we got into this position with reels where we needed more GPUs to train the models, right? It was, it was this big evolution for our services where instead of just ranking content from people who you follow or your friends and whatever pages you follow, um, we made this big push to basically start recommending what we call unconnected content, basically connected content from people or pages that you're not following.

So now kind of the, the corpus of, of kind of content candidates that we could potentially show you expanded from, you know, on the order of thousands to on the order of hundreds of millions. So completely different infrastructure. And we, um, started working on, on doing that. And we were constrained on, um, on basically the infrastructure that we had to catch up to what TikTok was doing as quickly as we would have wanted to. Um, so I basically looked at that and I was like, Hey, we have to make sure that we're never in this situation again. So let's order enough GPUs to do what we need to do on reels and ranking content and feed. But let's also, let's double that. Right. cause again, like our normal principle is there's going to be something on the horizon that we can't see. As you know, it would be a, um, well, we thought it would be, we thought it was going to be something that I had to do with training large models. Right. I mean, but at the time I thought it was probably going to be more something that I had to do with content, but I don't know. I mean, it's, it's almost just the pattern matching and running the company is there's always another thing.

Right. So I'm not even sure I had, at that time I was so deep and just, you know, trying to get, you know, the recommendations working for reels and other content. Cause I mean, that's just such a big unlock for Instagram and Facebook and now being able to show people content that's interesting to them that they're from people that they're not even following. But um, yeah, I, that, that ended up being a very good decision in retrospect. Yeah. Yeah. Okay. And it came from being behind. So then it wasn't like I was, you know, I wasn't like, oh, I was so far ahead. Actually, most of the times I think where we kind of make some decision that ends up seeming good is because we messed something up before and just didn't want to repeat the mistake. Uh, this is a total detour, but I actually want to ask about this while we're on this. We'll get back to you and I in a, yeah, in a second. So you didn't suffer one billion, but presumably there's some amount you would have sold for, right? cause you write down in your head, like, I think the actual valuation of Facebook at the time is this and they're not actually getting the valuation. Right. I mean, out of $5 trillion, of course you would have sold. So what, like, how did you think about that choice?

Yeah. I don't know. I think some of these things are just personal. Um, I, I don't know at the time that I was sophisticated enough to do that analysis, but I had all these people around me who were making all these arguments for how, like, a billion dollars was, you know, it's like, here's the revenue that we need to make and here's how big we need to be. And like, it's clearly so many years in the future. Like, and it was, it was very far ahead of where we were at the time. And I don't know. I didn't, I didn't really have the financial sophistication to really even engage with that kind of debate. I just, I think I sort of deep down believed in what we were doing. And I did some analysis. Um, I was like, okay, well, what would I go do if I wasn't doing this? It's like, well, I really like building things and I like helping people communicate and I like understanding what's going on with people and the dynamics between people. So I think if I sold this company, I'd just go build another company like this. And I kind of like the one I have. So, um, so I mean, you know, what's, why, why, right? But I don't know. I think a lot of the biggest bets that people make are often just based on conviction and values. Um, not, it's, it's actually usually very hard to do the analyses trying to connect the dots forward. So you've had, um, Facebook AI research for a long time. Uh, now it's become seemingly central to your company. At what point did making AGI or whatever, however you consider that mission, at what point is that like, this is a creek priority of what meta is doing.

Yeah. I mean, it's been a big deal for a while. So we started fair, um, about 10 years ago. And the idea was that along the way to general intelligence or AI, like full AI, whatever you want to call it, there are going to be all these different innovations. And that's going to just improve everything that we do. So we didn't kind of conceive it as a product. It was more kind of a research group. And over the last 10 years, it has created a lot of different things that have basically improved all of our products, um, and advanced the field. And allowed other people in the field to create things that have improved our products too. So I think that that's been great. But there's obviously a big change.

Um, yeah. In the last few years when, you know, chat GPT comes out, um, the diffusion models or an image creation come out and like, I mean, this is some pretty wild stuff. Right. That, that I think is like pretty clearly going to affect how, how people interact with like every app that's out there. So I, at that point, we started a second group, um, the, the gen AI group, um, with the goal of basically bringing that stuff into our product. So building leading foundation models that would, that would sort of power all these different products.
嗯,是的。在过去几年中,你知道,聊天GPT问世了,扩散模型或图像创作问世了,这些都是相当疯狂的东西。我认为,这明显会影响人们与每个应用程序的互动方式。所以,在那时,我们成立了第二个团队,也就是GEN AI团队,旨在将这些内容引入我们的产品中。我们构建领先的基础模型,这些模型将为各种不同的产品提供动力。

And initially when we started doing that, um, the theory at first was, Hey, a lot of the stuff that we're doing is, is pretty social. Right. So, you know, it's helping people interact with creators, helping, um, people interact with businesses to, you know, so the businesses can sell things or do customer support or, um, you know, basic assistant functionality for, um, you know, whether it's for apps or the smart glasses or VR, like all these different things. So initially it wasn't completely clear that you were going to need kind of full AGI, um, to be able to support those use cases. But then through working on them, I think it's actually become clear that you do.

Right. And all these subtle ways. So for example, you know, for llama two, when we were working on it, we didn't prioritize coding. And the reason why we didn't prioritize coding is because people aren't going to ask meta AI a lot of coding questions and WhatsApp. Now they will. Right. Well, I don't know. I'm not sure that WhatsApp is like the UI that people are going to be doing a lot of coding questions. So we're like, all right, look, in terms of the things that, you know, or, or Facebook or Instagram or, you know, those, those different services, maybe, maybe the website or meta data. I that we're launching, I think. But, but the, the thing that was sort of a, I think has been a, you know, somewhat surprising result over the last, um, you know, 18 months is that it, it turns out that coding is important for a lot of domains, not just coding. Right. So even if people aren't asking coding questions, the models, um, training the models on coding helps them, um, just be more rigorous and answer the question and kind of, um, help reason across a lot of different types of domains.
对。还有一些微妙的方法。例如,当我们在开发 llama two 的时候,我们没有将编码作为优先任务。我们之所以没有优先考虑编码,是因为人们不会在 WhatsApp 上向 meta AI 提出大量关于编码的问题。但现在他们会。对。嗯,我不知道。我不确定 WhatsApp 是否是人们会对编码问题进行大量提问的用户界面。所以我们说,好吧,在我们要推出的事物中,可能是网站或 meta 数据。但是,在过去的18个月里,有一件事情,我认为有点出乎意料的是,原来编码对很多领域都很重要,不仅仅是编码领域。对。所以即使人们不问编码问题,对模型进行编码训练可以帮助它们更加严谨地回答问题,并在许多不同领域之间进行推理。

Okay. So that's one example where it's like, all right. So for llama three, we like really focused on training it with a lot of coding because it's like, all right, that's going to make it better on all these things. Even if people aren't answering or are an asking primarily coding questions. Reasoning, I think is another example. It's like, okay. Yeah. Maybe you want to chat with a creator or, you know, you're a business and you're trying to interact with a customer. You know, that interaction is not just like, okay, the person sends you a message and you just reply, right? It's a, it's like a multi step interaction where you're trying to think through, how do I accomplish the person's goals? And, um, you know, a lot of times when a customer comes, they don't necessarily know exactly what they're looking for or how to ask their questions. So it's not really the job of the AI to just respond to the question. It's like, you need to kind of think about it more holistically. It's really becomes a reasoning problem.
好的。这是一个例子,就像,好吧。所以对于llama three,我们真的很注重用很多编码来训练它,因为这样做会使它在所有这些方面都变得更好。即使人们不是在回答或主要询问编码问题。推理我认为是另一个例子。就像,好的。也许你想与一个创作者聊天,或者你是一个企业,试图与客户进行互动。这种互动不仅仅是,好的,这个人给你发了消息,你就回复了,对吧?这是一个多步互动,在这个过程中你要思考,如何实现这个人的目标。而且,你知道,很多时候,当客户来时,他们并不一定知道他们在寻找什么,或者如何提问他们的问题。因此,AI的工作并不只是回答问题,而是需要更全面地思考。这实际上变成了一个推理问题。

Right. So if someone else solves reasoning or makes good advances on reasoning and we're sitting here with a basic chat bot, then like our product is lame compared to what other people are building. So it's like, so, okay. So at the end of the day, we've got, we, you know, I, we basically realized we've got to solve general intelligence. Um, and we just kind of up the ante and the investment to make sure that we could do that. So the version of llama that, um, that, uh, that's going to solve all these use cases for users. Is that the version that will be powerful enough to like replace a programmer you might have in this building? I mean, I just think that all this stuff is going to be progressive over time, but in case llama 10, um, I mean, I think that there's a lot baked into that question.

I'm not sure that we're replacing people as much as making people tools to do more stuff. Is a programmer in this building 10x more productive after that? I would have more, but, um, but no, I mean, look, I, I'm not, I don't believe that there's like a single threshold of intelligence for, for humanity because, I mean, people have different skills. Oh, and at some point I think that AI is going to be, um, is, is probably going to surpass people at most of, of those things. I'm depending on how powerful the models are, but, um, but I think it's progressive. And I don't think AGI is one thing. I think it's, you're basically adding different capabilities.

So multimodality is, is kind of a key one that we're focused on now initially with photos and images and text, but eventually with videos. And then because we're so focused on the metaverse kind of 3D type stuff is important. Um, one modality that I'm pretty focused on that I haven't seen as many other people in the industry, um, focus on this is sort of like emotional understanding. Like, I mean, so much of, of the human brain is just dedicated to understanding people and kind of like understanding your expressions and emotions and that that's like its own whole modality, right? That, um, I mean, you could say, okay, maybe it's just video or image, but it's like clearly a very specialized version of those two.

So there's all these different capabilities that I think you want to basically train the models to focus on as well as, um, getting a lot better at reasoning, getting a lot better at memory, which I think is, is kind of its own whole thing. It's, I mean, I don't think we're going to be, you know, primarily shoving context or, or, or kind of things into a query context window, um, in the future to ask more complicated questions. I think that there will be kind of different stores of memory or different custom models that, um, that are maybe more personalized to people. But I know that I think that these are all just different capabilities.

And then obviously making them big and small, we care about both because, you know, we want to, you know, if you're running something like meta AI, then we have the ability to, that's pretty server based. Um, but we also want it running on smart glasses and, you know, there's not a lot of space in smart glasses. So, um, you want to have someone that's very efficient for that. What is the use case that if you're doing tens of billions of dollars with inference or even eventually hundreds of billions of dollars worth of inference, using intelligence in an industrial scale? What is the use case? Is it simulations? Is it the EIs that will be in the metaverse? Where, where, where, what will we be using the data centers for?

Um, I mean, our bet is that it's going to, this is basically going to change all of the products, right? So I think that there's going to be a kind of meta AI general assistant product. And I think that that will shift from something that feels more like a chat bot where it's like you just ask a question that kind of formulates an answer to things where you're increasingly giving it more complicated tasks and that goes away and does them. Mm. So that's going to take a lot of inference. It's going to take a lot of compute in other ways too.
嗯,我的意思是,我们的赌注是这会改变所有的产品,对吧?所以我认为会有一种类似META AI智能助手产品。我认为这将从像聊天机器人那样只需提问就能得到答案的形式转变为越来越多地赋予它更复杂的任务,它会去完成这些任务。嗯。所以这将需要很多推理,同时也会需要大量的计算。

Then I think that there's a big part of what we're going to do that is, um, like interacting with other agents for other people, so whether it's businesses or creators. Um, I guess a big part of my theory on this is that there's not just going to be like one singular AI that you interact with because I think, um, you know, every business is going to like want an AI that represents their interests. They're not going to like want to primarily interact with you through an AI that is going to sell their competitors customers. So, uh, sorry, their competitors products.
然后我认为我们要做的一大部分是与其他人的代理互动,无论是企业还是创造者。 我想我对此的理论的一个重要部分是,你不仅仅只会与一个AI互动,因为我认为,每个企业都会希望有一个代表他们利益的AI。他们不希望主要通过一个销售竞争对手产品的AI与您互动。

Um, so, um, uh, so yeah, so I think creators is going to be a big one. I mean, there are about 200 million creators on our platforms. They all basically have the pattern where, um, they want to engage their community, but they're limited by hours in the day and their community generally wants to engage them, but they don't know they're limited by hours in the day. Um, so if you could create something where, um, an AI could basically, where that creator can basically own the AI and train it in the way that they want, um, and can engage their community. I think that that's going to be super powerful too.

So, um, so I think that there's going to be a ton of engagement across all these things. Um, but these are just the consumer use cases. I mean, I think when you think about stuff like, I mean, you know, I run like our foundation, right? A Chan Zuckerberg initiative with my wife and, you know, we're doing a bunch of stuff on science and, um, and there's obviously a lot of AI work that, where I think is going to advance science and healthcare and all these things too. So I think that it's like, there's a, this is, I think, an end up affecting basically every area of the products and, and, and the, and the, uh, the economy.

The thing you mentioned about an AI that can just go out and do something for you, that's multi-step. Is that a bigger model? Is that you'll make like, LAMA four will still, there'll be a version that's still 70 B, but we'll just be, you'll just train it on the right data and that will be super powerful. How do like, what does the progression look like? Is it scaling? Is it just same size, but different banks like you were talking about? Um, I don't know that we know the answer to that.

So I think one thing that is seems to be a pattern is that you have the LAMA, that's sorry, the, the LAMA model. And then you build some kind of other application specific code around it. Right. So some of it is, is the fine tuning for the use case, but some of it is just like logic for, okay, how, um, like how MedAI should integrate that should work with tools like Google or Bing to bring in real time knowledge. I mean, that's not part of the base LAMA model. That's like part of a, okay. So for LAMA two, we had some of that. And it was a little more kind of hand engineered. And then part of our goal for LAMA three was to bring more of that into the model itself.
因此,我认为一个看起来是一个模式的东西是你有LAMA,抱歉,就是LAMA模型。然后你会围绕它构建某种其他的应用程序特定代码。对。所以其中一些是针对使用案例的微调,但其中一些只是逻辑,比如,MedAI应该如何集成,应该如何与像谷歌或必应这样的工具协作,带来实时知识。我的意思是,这不是基本LAMA模型的一部分。这是一部分,好吧。因此,对于LAMA 2,我们有一些。它有点更多地是手工设计的。因此,我们LAMA3的目标之一是将更多的内容带入模型本身。

And, but for LAMA three, as we start getting into more of these agent like behaviors, I think some of that is going to be more hand engineered. And then I think our goal for LAMA four will be to bring more of that into the model. So I think at each point, like at each step along the way, you kind of have a sense of what's going to be possible on the horizon. You start messing with it and hacking around it. Um, and then I think that that helps you hone your intuition for what you want to try to train, train into the next version of the model itself. Interesting. Which makes it more general, because obviously anything that your hand coding is, um, you know, you can unlock some use cases, but it's just inherently brittle and non-general.

Hey, everybody. Real quick, I want to tell you about a tool that I wish more applications used. So obviously you've noticed every single company is trying to add an AI chat bot to their website. But as a user, I usually find them really annoying because they have these long generic, often useless answers. Command bar is a user assistant that you can just embed into your website or application. And it feels like you're talking to a friendly human support agent who is browsing with you and for you. And it's much more personalized than a regular chat bot. It can actually look up users history and respond differently based on that. It can use APIs to perform actions. It can even proactively nudge users to explore new features. One thing that I think is really cool is that instead of just outputting text, command mark and kind of just say here, let me show you and start browsing alongside the user.
大家好。我想快速告诉大家一个工具,我希望更多应用程序会使用。显然,你已经注意到每家公司都试图在他们的网站上添加一个AI聊天机器人。但作为用户,我通常觉得它们很烦人,因为它们有这些冗长的、通常是无用的答案。Command bar是一个用户助手,你可以把它嵌入到你的网站或应用程序中。它感觉就像你正在和一个友好的人类支持代理人交谈,他正在和你一起浏览和为你服务。而且它比普通的聊天机器人更加个性化。它可以查看用户的历史记录并根据情况作出不同的回应。它可以使用API执行操作。它甚至可以主动提示用户去探索新功能。我认为真的很酷的一点是,它不仅仅是输出文本,Command bar可以说:“让我来展示给你看”,并开始和用户一起浏览。

Anyways, there are a bunch of great products already. You can learn more about them at command bar.com. Thanks to them for sponsoring this episode. And now back to Mark. When you say into the model itself, you train it on the thing that you want in the model itself. But what do you mean by into the model itself? Well, I mean, I think like the example that I gave for Alama to where, you know, it's we we really, I mean, for Alama to the tool use was very, very specific. Whereas Loma three has the ability to has much better tool use, right? So so we don't have to like hand code all the stuff to have it use Google to to go do a search. It just kind of can do that. So in similarly for coding and kind of running code and just a bunch of stuff like that and.
无论如何,已经有很多优秀的产品。您可以在command bar.com上了解更多信息。感谢他们赞助本集。现在回到马克。当你说进入模型本身时,你是指在模型本身训练它所需的内容。但你所指的进入模型本身是什么意思?嗯,我是说,就像我给Alama的例子一样,我们对于Alama这个工具的使用非常特定。而Loma three则具有更好的工具使用能力,对吧?因此,我们不必手动编码所有的内容,让它使用谷歌去进行搜索。它可以自己做到。同样地,对于编码和运行代码等等,都是如此。

But I think once you kind of get that capability, then you get a peak of, OK, well, what can we start doing next? OK, well, I don't necessarily want to wait until Loma fours around to start building those capabilities. So let's start hacking around it. And so you do you do a bunch of hand coding and that makes the the products better for the interim, but then that also helps show the way of what we want to try to build into the next version of the level. What is the community fine to an Alama through? You're most excited by maybe not the one that will be most useful to you, but Jess, you will just enjoy playing it with the most. They like fine to run it on antiquity and you'll just be like talking to Virgil or something. What are you excited about? I don't know. I mean, I think the nature of the stuff is it's like. You get surprised, right? So I think like any any specific thing that I sort of. Thought would be valuable. We'd probably be building, right? So, but. I think you'll get distilled versions. I think you'll get kind of smaller versions. I mean, I mean, one thing that I think is. Eight billion, I don't think is quite small enough for for a bunch of use cases, right?

I think like over time, I'd love to get, you know, a billion parameter model or a two billion parameter model or even like a, I don't know, maybe like a 500 million parameter model and see what you can do with that. Because I mean, as they start getting. If it if with eight billion parameters were basically nearly as powerful as the largest Lama to model, then the billion parameters, you should be able to do something that's interesting, right? And faster, good for classification or a lot of kind of like basic things that people do before kind of understanding the intent of a user query and feeding it to the most powerful model to kind of hone what what the what the prompt should be. So I don't know, I think that's one thing that maybe the community can help fill in, but I mean, we'll also we'll also thinking about getting around to distilling some of these ourselves, but right now the GPUs are trading the four or five. So what okay, so you have all these GPUs. These are 350,000 by the end of the year. That's the whole fleet. I mean, I was we we built two, I think it's like 22, 24,000 clusters that are kind of the single clusters that we have for training the big models. I mean, obviously across a lot of the stuff that we do, a lot of our stuff goes towards training, like reels models and sure, and like Facebook news feed and Instagram feed and then inference is a huge thing for us because we serve a ton of people, right? So our ratio of inference compute required to to training is probably much higher than most other companies that are doing this stuff just because the sheer volume of the community that we're serving. Yeah. Yeah. Yeah.

That was really interesting in the material they shared with me before that you trained it on more data than is compute optimal just for training because the inference is such a big deal for you guys and also for the community that it makes sense to just have this thing and have trillions of tokens in there. Yeah. Yeah. Although in one of the interesting things about it that we saw even with the 70 billion as we we thought it would get more saturated at, you know, it's like we trained on around 15 trillion tokens. Yeah. We, I guess our prediction going in was that it was going to ask some tote more, but even by the end, it was still learning, right? It's like we probably could have fed it more tokens and it would have gotten somewhat better. But I mean, at some point, you know, you're running a company.

You need to do these meta reasoning questions of like, all right, how do I want to spend our GPUs on like training this 70 billion model further? Do we want to kind of get on with it so we can start testing hypotheses for llama for? So we kind of needed to make, to make that call. And I think we got it. We, I think we got to a reasonable balance for, for this version of the 70 billion. Um, there will be others in the future where, you know, 70 billion multimodal one that'll come over the next period. But, um, but yeah, I mean, it's, that was, that was fascinating. That you could just, that it's the architectures at this point can just take so much data. Yeah, that's really interesting. So what does this imply about future models? I, you mentioned that the llama three eight B is better than the llama 270 B. No, it's nearly as well. Okay. I don't know what to do. But does that mean like the llama four to magnitude? There's, there's, I mean, like the llama four 70 B will be as good as the llama three, four or five B like, what is this? One of the great questions, right? That I think no one knows, um, is, is basically, you know, it's, it's one of the trickiest things in the world to plan around is when you have an exponential curve, how long does it keep going for? Yeah. And, um, I think it's likely enough that it will keep going, that it is worth investing the, um, tens or, you know, 100 billion plus in building the infrastructure to, um, assume that if that kind of keeps going, you're going to get some really amazing things that are just going to make amazing products. Mm. But I don't think anyone in the industry can really tell you that it will continue scaling at that rate for sure. Right. And in general, you know, in history, you hit bollumlex at certain points. And now there's so much energy on this that maybe those bollumlex get knocked over pretty quickly, but, um, but I don't know. I think that's, that's an interesting question. What does the world look like where there aren't these bottlenecks?
你需要做这些类似的元推理问题,比如,我们如何更好地利用我们的GPU来进一步训练这个700亿模型?我们是否想要继续下去,以便可以开始为llama进行假设测试?所以我们需要做出这个决定。我认为我们做到了。我认为我们在这个700亿版本中达到了一个合理的平衡。未来会有其他版本,可能会有700亿多模型会在未来到来。但是,这是一件很有趣的事情,就是在这一点上的架构可以处理这么多数据。是的,这真的很有趣。那么这对未来模型意味着什么?你提到llama 38B比llama 270B要好。不,它差不多。我不知道该怎么做。但是,这是否意味着像llama 40倍之类的情况呢?这是一个很大的问题,对吧?我认为没有人知道,基本上,当你有指数曲线时,这是世界上最棘手的事情之一是,这种情况会持续多久?是的。我认为很可能会继续下去,值得投资数百亿甚至数千亿在建立基础设施上,假设如果这种情况持续下去,你将会得到一些非常惊人的产品,这将是令人惊叹的产品。但是我认为没有人可以确切地告诉你,它将继续以这种速度扩展的。总的来说,在历史上,你会在某些时候遇到瓶颈。现在对此有很多关注,也许这些瓶颈很快就会被打破,但是,我不知道。我觉得这是一个有趣的问题。如果不存在这些瓶颈,世界会是什么样子?

As you know, suppose like progress just continues, uh, at this pace, which seems like plausible, um, like zooming out. Well, they're going to be different bottlenecks. Right. So if not training, then, like, oh, yeah, go ahead. Well, I think at some point, you know, over the last few years, I think there was this issue of, um, GPU production. Yeah. Right. So even companies that had the models, uh, sorry, that had the money to pay for the GPUs, um, couldn't necessarily get as many as they wanted because there was there were all these supply constraints. Yeah. Now I think that's sort of getting less. So now I think you're seeing a bunch of companies think about, wow, we should just like really invest a lot of money in building out these things. And I think that that will go for, um, for some period of time. Um, I think there's a, there is a capital question of like, okay, at what point does it stop being worth it to put the capital in? But I actually think before we hit that, you're going to run into energy constraints. Right. Because, um, I just, I mean, I don't think anyone's built a gigawatt single training cluster yet. Right. And, um, and then you run into these things that just end up being slower in the world, like getting energy permitted is like a very heavily regulated government function. Right. So you're going from on the one hand software, which is somewhat regulated. I, I'd argue that it is more regulated than I think a lot of people in the, in the tech community feel, although it's obviously different. If you're starting a small company, maybe you feel that less. If you're a big company, you know, we just interact with people, but different governments and regulators are, you know, we have kind of lots of rules or that we need to kind of follow and make sure we do a good job with around the world. Um, but I think that there's no doubt that like energy, and if you're talking about building large new power plants or large build outs and then building transmission lines that cross. Other private or public land, that is just a heavily regulated thing. So you're talking about many years of lead time.

So if we wanted to stand up just some like massive facility, um, to power that. I think that that is, that's, that's a very long term project. Right. And, um, so I don't know. I think that that's, I think people do it. I don't, but I don't think that this is like something that can be quite as magical as just like, okay, you get a level of AI and you get a bunch of capital and you put it in and then like all of a sudden the models are just going to kind of like, it just, like, I think you, you do hit different bottlenecks along the way. Yeah. Is there something a project? Maybe I really did. Maybe not that even a company like meta doesn't have the resources for it. Like if you're R&D budget or cap X budget was 10 X what it is now, then you could pursue it. Like it's in the back of your mind, but meta today, and maybe you could, like, because even you can't even issue stock or bond for it. It's like just 10 X bigger than your budget.

Well, I think energy is one piece. Yeah. Right. Um, I think we would probably build out bigger clusters than we currently can. If we could get the energy to do it. So I think that's, um, that's fundamentally money bottlenecked in the limit. Like if you had a trillion dollars, it's time. Yeah. Right. Um, well, if you look at it in terms of, but it depends on how far the, the exponential curves go. Right. Like I think a number of companies are working on, you know, right now, I think I did like a lot of data centers around the order of 50 megawatts or a hundred megawatts or like a big one might be a hundred, 50 megawatts. Okay. So you take a whole data center and you fill it up with just all the stuff that you need to do for training and you build the biggest cluster you can. I think you're, that's kind of, I think a bunch of companies are running at stuff like that. Um, but then when you start getting into building a data center, that's like 300 megawatts or 500 megawatts or a gigawatt. I just, I mean, just known as built a single gigawatt data center yet.

So I think it will happen. Right. I mean, this is only a matter of time, but it's, it's not going to be like next year. Right. It's, um, I think that some of these things will take, I don't know, some, some number of years to build out. And then the question is, okay, well, if you, I mean, just to, I guess, put this in perspective, I think a gigawatt, it's like around the size of like a meaningful nuclear power plant only going towards training a model. Didn't, didn't Amazon do this? There's like, they have a 950 gig a megawatt. Uh, yeah, I'm not exactly sure what you did. You have to, what they did, you don't have to ask them. Um, um, but it doesn't have to be in the same place, right? If distributed training works, it can be distributed. That I think is a big question. Yeah. Right. Just is basically how that's going to work. And I do think in the future, it seems quite possible that more of what we call training for these big models is actually more along the lines of inference, generating synthetic data to then go feed into the model.

So I don't know what that ratio is going to be, but I consider, um, the generation of synthetic data to be more inference than training today. But obviously if you're doing it in order to train a model, it's, it's part of the broader training process. So, um, I don't know. That's an, that's an open question is to, to kind of wear what the balance of that and how that plays out. If that's the case, would that potentially also be the case with llama three and maybe like llama four onwards where you put this out and if somebody has a ton of compute, then using the models that you've put out, you can just keep making these things arbitrarily smarter, like some, a Kuwait or UA or some random country has a ton of compute, um, and they can just, uh, actually just use llama for it to just make something much smarter. Um, I, I do think that there are going to be dynamics like that, but I also think that there is a fundamental limitation on, um, on kind of the network architecture, right? Or the, the kind of model architecture, right?

So I think like a 70 billion model that kind of we trained with a llama three architecture can get better, right? It can, it can keep going. Like I was saying, it's, you know, we felt like if we kept on feeding it more data or, or rotated the high-value tokens through again, then, then, you know, it would, it would continue getting better. But. And we've seen a bunch of other people around the world, um, you know, different companies basically take the llama to 70 billion base, like take that model architecture and then build a new model. Um, it's still the case that when you make a generational improvement to the kind of llama three 70 billion or the llama three four hundred and five, there's nothing open source, anything like that today. Right. Like a, it's, it's not, I think that that's like, it's a big step function and what people are going to be able to build on top of that. I don't think can go infinitely from there. I think it can, there can be some optimization in that until you get to the next step function. yeah.

Okay. So let's zoom out a little bit from, uh, specific models and even the many years, Lee times you would need to get energy approvals and so on, like big picture. These next couple of decades. Sure. What's happening with AI, um, does it feel like another technology, like metaverse or social or does it feel like a fundamentally different thing in the course of human history? Um, I think it's going to be pretty fundamental. I think it's going to be more like the creation of computing in the first place. Right. So, um, you'll get all these new apps. In the same way that when you got the web or you got mobile phones, you got like people basically rethought all these experiences and a lot of things that weren't possible before now became possible. Um, something that will happen, but I think it's a much lower level innovation. It's, um, it's going to be more like going from people didn't have computers to people have computers is my, my sense.

Um, but it's also it's, it's a. I don't know. It's, it's very hard to reason about exactly how this goes. I tend to think that. You know, in like the cosmic scale, obviously it'll happen quickly over a, you know, a couple of decades or something, but I, I do think that there, there is some set of people who are afraid of like, you know, it really just kind of spins and goes from being like somewhat intelligent to extremely intelligent overnight. And I just think that there's all these physical constraints that make that so that that's unlikely to happen. um, I, I just don't, I don't really see that, that playing out. So I think you'll have, I think we'll have time to kind of acclimate a bit, but it will really change the way that we work and give people all these creative tools to do different things that they, uh, yeah, I think, I think it's going to be. It's, it's going to really enable people to do the things that they want a lot more as it is my view.

Um, okay. So maybe not overnight, but is it your view that like on a cosmic scale, if you think like humans evolved and then like AI happened and then they like went out through the galaxy or maybe it takes too many decades, maybe it takes a century, but like, is that like the grand scheme of what's happening right now in history? um, sorry. In what sense? I mean, in the sense that there were other technologies, like computers and even like fire, but like the AI happening is as significant as like humans evolving in the first place. I think that's tricky. Um, I think people like to, you know, the history of humanity, I think has been people basically, you know, thinking that certain aspects of humanity are like really unique in different ways. And then coming to grips with the fact that that's not true, but humanity is actually still super special. Right. So it's, um, it's like we thought that the earth was the center of the universe. And it's like it's not, but like it's like humans are still pretty awesome, right? And pretty unique.

Um, I think that another bias that people tend to have is thinking that intelligence is somehow kind of fundamentally connected to life. And it's not actually clear that it is. Right. I think like, like people think that, um, I mean, I don't know that we have a clear enough definition of consciousness or, um, or, or life to kind of fully, um, interrogate this, but I know there was all the science fiction about, okay, you create intelligence and now it like starts taking on all these human like behaviors and, and things like that. But I actually think that the current incarnation of all this stuff at least kind of feels like it's going in a direction where intelligence can be pretty separated from consciousness and agency and things like that. That, um, I think just makes it a super valuable tool.

So I don't know. I mean, obviously it's, it's, um, it's very difficult to predict what direction the stuff goes in over time, which is why I, I don't think anyone should be dogmatic about, you know, how they plan to develop it or what they plan to do. I think you want to kind of look at like each release, you know, it's like, we're obviously very pro open source. Yeah. But I haven't committed that we're going to like release every single thing that we do, but it's basically we, like, I'm, I'm just generally very inclined to thinking that open sourcing it is going to be good for the community and also good for us. Right. Cause we'll, we'll benefit from, from the innovations. Um, but if it's at some point, like there's some qualitative change in what the, the thing is capable of and we feel like it's just not responsible to open source it then we won't, but, um, so I don't know. It's, it's all, it's all very difficult to predict. Yeah. Um, what is a kind of qualitative change, like a specific thing, you're training Lamify, Lamaphore, and you've seen this and like, you know what? I'm not sure about open sourcing it. Um, I think that that it's a little hard to answer that in the abstract because there are negative behaviors that any product can exhibit that as long as you can mitigate it, it's like, it's okay. Right. So, um, I mean, there's bad things about social media that we work to mitigate. Right. There's bad things about Lamatou that we spend a lot of time trying to make sure that it's not like, you know, helping people commit violent acts or things like that. Right. I mean, that doesn't mean that it's like a kind of autonomous or intelligent agent. It just means that it's learned a lot about the world and it can answer a set of questions that we think it would be unhelpful for it to answer.

Um, so I, um, I don't know. I think the question isn't really what behaviors would it show? It's what things would we not be able to mitigate after it shows that and, um, and I don't know. I, I think that there's so many ways in which something can be good or bad that it's hard to actually enumerate them all upfront. If you even look at like what we've had to deal with in, um, you know, social media and like the different types of harms we've basically gotten to it's like, there's like 18 or 19 categories of, of harmful things that, that people do. And we've basically built AI systems to try to go identify what those things are that people are doing and try to make sure that that, you know, it doesn't happen on our network as much as possible. So, um, yeah, I think you can, over time, I think you'll be able to break down, um, this into more of a taxonomy too. And I think this is a thing that we spend time researching too. Cause we want to make sure that we understand that.

So one of the things I asked Mark is what industrial scale use of LLM's would look like. You see this in previous technological revolutions where at first they're thinking in a very small scale way about what's enabled. And I think that's what chat bots might be for other lums. And I think the large scale use case might look something like what V seven go is. And by the way, it's made by V seven labs who are sponsoring this episode. So it's like a spreadsheet. You put in raw information like documents, images, whatever, and they become rows and the columns are populated by an LLM of your choice. And in fact, I used it to prepare for Mark. So I fed in a bunch of blog posts and papers from Meta's AI research. And as you can see, if you're on YouTube, it summarizes and extracts exactly the information I want as columns. And obviously mine is a small use case. But you can imagine, for example, a company like FedEx has to process half a million documents a day. Obviously a chat bot can't do that a spreadsheet can because this is just like a fire hose of intelligence in there, right? Anyways, you can learn more about them at v7 labs.com slash go or the link in the description back to Mark.

Yeah. Like it seems to me it would be a good idea. I would be disappointed in a future where AI systems aren't broadly deployed and everybody doesn't have access to them. Yeah. At the same time, I want to better understand the mitigations. Yeah. Because if the mitigation is the fine tuning, well, the whole thing about open weights is that you can then remove the fine tuning, which is often superficial on top of these capabilities.

Like if it's like talking on Slack with a biology researcher, and I think like models are very far from this that right now they're like Google search. But it's like I can show them my petri dish and they can next land like, here's why you're a smallpox sample thing grow. Here's what to change. How do you mitigate that? Because somebody can just like fine tune that in there, right? Yeah. I mean, that's true.

I think a lot of people will basically use the off the shelf model and some people who have basically bad faith are going to try to strip out all the bad stuff. So I do think that's an issue. The, um, the flip side of this is that, and this is one of the reasons why I'm kind of philosophically so pro open source is I do think that a concentration of AI in the future has the potential to be as dangerous as kind of it being widespread. So I think a lot of people are, they think about the questions of, okay, well, if we can do this stuff, is it bad for it to be out wild? Like just in kind of widely available.

Um, I think another version of this is like, okay, well, it's probably also pretty bad for one institution to have an AI that is way more powerful than everyone else's AI. Right. So if you look at like, like, I guess one security analogy that I think of is, um, you know, it doesn't take AI to basically, okay, there's security holes and so many different things. And if you could travel back in time a year or two years, right? It's like, that's not AI. It's like you just, let's say you just have like one year or two years more knowledge of the security holes. It's pretty much hacking to like any system.

Right. So it's not that far fetched to believe that a very intelligent AI would probably be able to identify some holes and basically be like a human who could potentially go back in time a year or two and compromise all these systems. Okay. So how have we dealt with that as a society? Well, one big part is open source software that makes it so that when improvements are made to the software, it doesn't just kind of get stuck in one company's products, but it can kind of be broadly deployed to a lot of different systems, whether it's banks or hospitals or government stuff and like, just everyone can kind of.

Like as the software gets hardened, which happens because more people can see it and more people can bang on it. Um, and there are, and there are standards on how this stuff works. Um, the world can kind of get upgraded together pretty quickly. And I kind of think that a world where AI is very widely deployed in a, in a way where it's gotten hardened, um, progressively over time and is one where all the different systems will be in check.

Yeah. In a way that seems like it is fundamentally more healthy to me than one where this is more concentrated. So there are risks on all sides, but I know that's one risk that I think. People, I don't hear them talking about quite as much. I think like there's sort of the risk of like, okay, well, what if the AI system does something bad? I, I am more like, you know, I stay up at night more worrying.

Well, what if like some actor that, you, whatever, it's like from wherever you sit, there's going to be some actor who you don't trust. If they're the ones who have like the super strong AI, whether it's some, like other government that we, that, that is sort of like an opponent of, of our country or some company that you don't trust or whatever, whatever it is. Um, like I think that that's potentially a much bigger risk as in they could like overthrow our government because they have a weapon that like nobody else has cause a lot of mayhem.

Right. It's, I think it's like, I, I, I think the intuition is that this stuff ends up being pretty kind of important and, and, um, and valuable for both kind of economic and, and kind of security and other things. And, um, I don't know. I just think, yeah, if, if like if someone who you don't trust or is an adversary of you, get something that is more powerful than, um, then I think that that could be an issue. I think the probably the best way to mitigate that is to have good open source, um, AI that, that basically becomes the standard.

Um, and in a lot of ways kind of can become the leader. And, um, in that way, it just, it just ensures that it's a much more kind of, even and balanced playing field. Yeah. That seems plausible to me. And if that works out, that would be the future I prefer. Um, I guess I want to understand like mechanistically how if somebody was going to cause mayhem with AI systems, how the fact that there are other open source systems in the world prevents that. Like the specific example of like somebody coming with a bio weapon, um, is it just that we'll do a bunch of like R and D in the rest of the world to like figure out the vaccines really fast? Like what's happening? Like the computer, the security one that I was talking about, I think someone with a weaker AI trying to hack into a system that is like protected by a stronger AI will succeed less. Mm. Right.

So, so I think that that's, um, I mean, that's like in terms of how do you know everything in the world is like that? Like what if bio weapons aren't like that? No, I mean, I don't know that everything in the world is like that. Um, um, I think that that's, I guess one of the bio weapons are one of the areas where I think the people who are most worried about the stuff are focused. And I think that that's a, I think that makes a lot of sense to think about that. Um, the, I mean, I think that there are certain mitigations you can try to not train certain knowledge into the model, right? There's different things, but, um, yeah, I mean, it's some level. I mean, if you get a sufficiently bad actor and you don't have other AI that can sort of balance them, um, and understand what's going on and what the threats are, then, um, then that could be a risk.

So I think that that's one of the things that we need to watch out for. Mm. Um, is there something you could see in the deployment of these systems where, uh, you, you observe like you're training llama for and it's like light you because you thought you weren't noticing or something and you're like, whoa, I, what, what's going on here? Um, not that you, this is probably not likely with a lama for test system, but is there something you can imagine like that where you'd like we really concerned about deceptiveness and, and if like billions of copies of things are on the wild? Um, yeah. I mean, I think that that's not necessarily, I mean, right now it's where you see a lot of hallucinations. Yeah. Right. It's more, more that, um, um, I think it's an interesting question how you would tell the difference between a hallucination and deception.

But yeah, I mean, I, look, I mean, I think there's a lot of risks and things to, to think about the, um, the flip side of all of this is that there are also a lot of, I try to, in, in, in running our company at least balance what I think of as these longer term theoretical risks, um, with what I actually think are quite real risks that exist today. So like when you talk about deception, the form of that that I worry about most people using this to generate misinformation and then like pump that through whether it's our networks or others. So the way that we've basically. Combated a lot of this type of harmful content is by building AI systems that are smarter than the adversarial ones. And like this is part of this kind of informs part of my theory on this, right? Is if you look at like the different types of harm that people do or try to do through, through social networks, um, there are ones that are not very adversarial. So for example, like, uh, hate speech, I would say is not super adversarial in the sense that like people aren't getting better at being racist, right? They're just like, it's, you just like, okay, if you, you kind of, that's one where I think the AIs are generally just getting way more sophisticated, faster than people are at those issues.

So we have, and we have issues both ways. It's like people do bad things that, whether they're trying to incite violence or something. Um, but we also have a lot of false positives, right? So where we, where we basically sense our stuff that we shouldn't. And I think understandably make a lot of people annoyed. So I think having an AI that just gets increasingly precise on that, that's going to be good over time. But let me give you another example, which is like nation states trying to interfere in elections. That's an example where they are absolutely, they have cutting edge technology and absolutely get better each year. So we block some technique. They learn what we did. They come at us with a different technique, right? It's not like a person trying to, you know, say, say mean things. Right. It's like, it's, it's the, they're, they're basically they have a goal. They're sophisticated. They have a lot of technology. Um, in those cases, I still think the ability to kind of have RAI systems grow and in sophistication to faster rate than theirs have. It's an arms race, but I think we're at least currently winning that arms race. Um, so I don't know, I think that that's, but this is like a lot of the stuff that I, that I spend time thinking about is like, okay. Yes, it is possible that whether it's llama four or llama five or llama six. Yeah. We need to think about like what behaviors were, were observing. And it's not just us. I think part of the reason why you make this open source is that there are a lot of other people who study this too. So yeah, we want to see what other people are observing, what we're observing, what we can mitigate, and then we'll make our assessment on whether we can make it, um, open source. But I, I think for the foreseeable future, I'm optimistic we will be able to. And in the near term, I don't want to take our eye off the ball of what our actual bad things that people are trying to use the models for today, even if they're not existential, but they're like, they're like pretty bad kind of day to day harms that we are familiar with and running our services. Um, that's actually a lot of what we have to, I think, spend our time on as well. Yeah. Yeah. Um, actually, I found this synthetic data thing really curious. Uh, I'm actually interested in why you don't think, uh, like current models, it makes sense why there might be an asymptote with just doing the synthetic data again and again, if it gets smarter and uses a kind of techniques you talk about in the paper or the blog post that's coming out, um, on the day this will be released, where it goes to the thought chain that is the most, um, correct. Why you, why this wouldn't like lead to a loop that, of course, it wouldn't be overnight, but over many months or years of training potentially with a smarter model, it gets smarter, makes better output, gets smarter and so forth.
所以我们有这样的问题,两种方式都有。就好像有些人做坏事,不管是企图煽动暴力或其他什么。但我们也有很多错误的判断,对吧?所以我们有时对我们不应该审查的内容感到恼火,我想这会让很多人感到不耐烦。所以我认为,拥有一个越来越精确的人工智能系统,随着时间推移会带来好处。但让我举个例子,比如国家试图干涉选举。这就是一个例子,他们绝对拥有尖端技术,而且每年都会变得更好。所以我们阻止了一些技术。他们学习我们的方法,然后用不同的技术攻击我们,对吧?这不像一个人试图说或者做一些恶意的事情。他们有一个目标,他们很复杂,拥有很多技术。在这些情况下,我依然认为,我们能够让人工智能系统的增长和成熟速度超过他们是至关重要的。这是一场军备竞赛,但我认为我们目前至少在这场军备竞赛中处于领先地位。所以我认为这是一些我花时间思考的很多问题之一。, 无论是 llama 4、llama 5 还是 llama 6,我们需要考虑我们正在观察的行为。而且不只是我们,我认为我们选择开源的原因之一是有很多其他人也在研究这个问题。所以是的,我们想看看其他人正在观察什么,我们正在观察什么,我们可以如何减少研究,然后我们会评估我们是否可以将其开源。但在可预见的未来,我乐观地认为我们将能够实现这一点。在短期内,我不希望我们忽视那些人们试图用模型做出的实际坏事,即使它们并非存在主义问题,但它们是那些我们熟悉并且正在运行服务的日常伤害。这实际上也是我们需要花时间处理的很大一部分内容。是的,我真的觉得这个合成数据很有趣。我很感兴趣,为什么你会认为,现在的模型,如果一遍又一遍地使用合成数据,可能会出现渐近线的情况,如果它变得更聪明并使用你在论文或者即将发布的博客文章中提到的那种技术,会走向最正确的思维链。为什么你觉得这不会导致一个循环,当然这不会一夜之间发生,但在训练很多个月或者几年,潜在地用一个更智能的模型,变得更聪明,做出更好的输出,变得更聪明,如此往复。

Um, well, I think it could be within the parameter of whatever the model architecture is. It's just that like it's some level. I don't know. I think that today is eight billion parameter models. I just don't think you're going to be able to get to be as good as the state of the art, multi-hundred billion parameter models that are incorporating new research into the architecture itself. Um, but those will be open source as well, right? Well, yeah, but I think that that's if, I mean, subject to all the questions that we just talked about. Sure. Yes. I mean, we would, we would hope that that'll be the case, but I, but I think that at each point, I don't know. It's like when you're building software, there's like a ton of stuff that you can do with software, but then at some level, you're constrained by the chips that it's running on, right? So there are always going to be different physical constraints. And it's like how big are the models is going to be constrained by how much energy you can get and, um, and use for inference. Um, so I guess I'm simultaneously very optimistic that this stuff will continue to improve quickly. And also a little more measured than I think some people are about kind of it's, I just don't think the runaway case is like a particularly likely one. I think it makes sense to keep your options open. Like there's so much we don't know. There's a case in which like it's really important to keep the balance of power. So when nobody becomes like a totalitarian dictator, there's a case in which like, you don't want to open source, uh, the architecture because like China's catch, can use it to catch up to America's AIs and like there is an Intel explosion. And they like win that. Yeah. A lot of things can be possible. Just like keeping your options open considering all of them seems reasonable. Yeah. Let's talk about some other things. Okay.

Metaverse, what time period in human history would you be most interested in going into? A 100,000 BCE to now. You just want to see what it was last for the past. Yeah. It has to be the past. Oh, yeah. It has to be the past. Um, I don't know. I mean, I have the periods of time that I'm interested. I'm really interested in American history and classical history. And, um, I'm really interested in the history of science too. So I actually think seeing in trying to understand more about how some of the big advances came about. I mean, all we have are like somewhat limited writings about some of that stuff. I'm not sure the metaverse is going to let you do that because I mean, it's, um, you know, we can't, we're, it's going to be hard to kind of go back in time for things that we don't have records of, but, uh, I'm actually not sure that going back in time is going to be that, that important thing for them. I mean, I think it's going to be cool for like history classes and stuff, but, um, that's probably not the use case that I'm most excited about for the, for the metaverse overall.

I mean, it's, um, I mean, the main thing is just the ability to feel present with people, no matter where you are. I think that's going to be killer. I mean, there's, um, I mean, in the AI conversation that we, that we're having, I mean, it's, uh, you know, so much of it is about physical constraints that kind of underlie all of this, right? And you want to move, I mean, one lesson of technology is you want to move things from the physical constraint realm into software as much as possible because software is so much easier to build and, and evolve. And like you can democratize it more because like not everyone is going to have a data center, but like a lot of people can, can kind of write code and take open source code and modify it. Um, the metaverse version of this is, I think, enabling realistic digital presence is going to be just an absolutely huge difference for, um, for making it so that, um, people don't feel like they have to physically be together for as many things. Um, now, I mean, I think that there are going to be things that are better about being physically together. Um, so it's not, I mean, these things aren't binary. It's not going to be like, okay, now it's, you don't need to do that anymore. But, um, but overall, I mean, I think that this, it's just going to be really powerful for, for socializing, for feeling connected with people, for working, um, for, I don't know, parts of industry, for medicine, for like, a lot of, like so many things.

I want to go back to something you said at the beginning of the conversation where, you didn't sell the company for a billion dollars and like the metaverse, you knew we were going to do this even though the, the, the market was hammering you for it.

And then I'm actually curious, like what is the source of that edge? And you said like, Oh, values, I have this intuition, but like everybody says that, right? Like what, if you had to say something that's specific to you, what is, how would you express what that is? Like, why are you so convinced about the metaverse?

Um, well, I think that those are different questions. So what, what, what are the things that, that kind of power me? Um, I think we've talked about it once the theme. So it's, I mean, I, I just really like building things. Um, I specifically like building things around how people communicate and sort of understanding how people express themselves and how people work.

Right. I was, everyone was in college. I was, I was studying computer science and psychology. I think a lot of other people in the industry started studying computer science. Right.

So, um, it's, uh, it's always been sort of the intersection of those two things for me. But I think it's also sort of this, like really deep drive. I don't know how to explain it, but I just feel like in the constitutionally, like I'm doing something wrong if I'm not building something new. Right.

And, um, so I think that there's like, you know, even when we're putting together the business case for, you know, investing like a hundred billion dollars in AI or some huge amount in the metaverse.

It's like, yeah, I mean, we have plans that I think make it pretty clear that if our stuff works, it'll be a good investment, but like you can't know for certain from the outset. And, um, so there's all these arguments that people have, you know, whether it's like, you know, with advisors or, or different folks.

It's like, well, how, how could you like it's a, how, how are you confident enough to do this? And it's like, well, the day I stop trying to build new things, I'm just done. I'm going to go build new things somewhere else. Right.

It's like, um, it's like it is, I'm fundamentally incapable of running something or in my own life and like not trying to build new things that I think are interesting. It's like, that's not even a question for me. Right.

It's like whether, like, whether we're going to go take a swing at like building the next thing. It's like, it's like, it's like, I'm, I'm just incapable of not doing that. Um, and I don't know. I, I'm kind of like this in like all the different aspects of my life. Right.

It's like, we built this like, you know, family built this ranch and coie and like, I just like worked like design all these buildings and like, kind of trying to like, we started raising cattle and I'm like, all right, well, I want to make like the best cattle in the world.

Right. So it's like, how do we, like, how do we architect this so that where we can figure this out and like and build and call the stuff up that we need to try to do that.

Um, so I don't know. That's me. Um, what was the other part of the question? Look, meta is just a really amazing tech company. Right. They have all these great software engineers and even they work with Stripe to handle payments.

And I think that's just a really notable fact that Stripe's ability to engineer these checkout experiences is so good that big companies like Ford, Zoom, a meta, even open AI, they work with Stripe to handle payments.
我认为这是一个非常显著的事实,即Stripe能够打造出如此出色的结账体验,以至于像福特、Zoom、a meta甚至是Open AI这样的大公司都选择与Stripe合作处理付款。

Because just think about how many different possibilities you have to handle. If you're in a different country, you'll pay a different way. And if you're buying a certain kind of item, that might affect how you decide to pay.

And Stripe is able to test these fine grained optimizations across tens of buildings or transactions a day to figure out what will convert people and obviously conversion means more revenue for you.

And look, I'm not a big company like meta or anything, but I've been using Stripe since long before they were advertisers. Stripe Atlas was just the easiest way for me to set up an LLC and they have these payments and invoicing features that make it super convenient for me to get money from advertisers.
看,我不像meta那样的大公司,但我在他们成为广告商之前就开始使用Stripe了。Stripe Atlas只是对我来说成立有限责任公司最简单的方式,他们提供的付款和发票功能使我能够很方便地从广告商那里获得钱款。

And obviously without that, it would be much harder for me to earn money from the podcast. And so it's been great for me. Go to stripe.com to learn more. Thanks to them for sponsoring the episode. Now back to Mark.

I'm not sure, but I'm actually curious about something else, which is so in 19 year old Mark reads a bunch of like antiquity and classics, high school college. What important lessons did you learn from it? Not just interesting things you found, but like, there aren't that many tokens who consume by the time you're 19, a bunch of them were about the classics.

Clearly that was important in some way. tokens. I don't know. That's a good question. I mean, one of the things that I thought was really fascinating is so when Augustus was first, so he became emperor and he was trying to establish peace. And the was no real conception of peace at the time. Like the people's people's understanding of peace was it is the temporary time between when your enemies will inevitably attack you again. So you get like a short rest. And and he had this view, which is like, look, like we want to change the economy from instead of being so mercenary and like, and kind of militaristic to like actually this positive something. It's like a very novel idea at the time. I don't know. I think that there's like something that's just really fundamental about that. It's like in terms of the the bounds on like what people can conceive at the time of like, what are rational ways to work.

And I'm going back to like, I'm this applies to both the metaverse and the AI stuff, but like a lot of investors and just different people just can't wrap their head around why we would open source this. And it's like, like, I don't understand, it's like open source that much just be like the temporary time between which you're making things proprietary. And it's, but I actually think it's like this very profound thing in tech that has actually it creates a lot of winners, right? And it's and and I'm so I don't want to strain the analogy too much. But but I do think that there's there's a lot of times, I think, ways where you can that are just like models for building things that people can't even like, they just like often can't wrap their head around how that would be a valuable thing for people to go do, or like a reasonable state of the world that it's I mean, it's I think there's more reasonable things than people think. That's super fascinating.

Can I give you my answer when I was thinking, what you might have gotten from it? I took this is probably totally off. But just how young some of these people are who have very important roles in the empire, like Cesar Augustus, like by the time he's 19, he's actually incredibly one of the most prominent people in Roman politics. And he's like leading battles and forming the second time remember it. I wonder if you were like the 19 year old is like, I can actually do this because like Cesar. I think that's I think that's an interesting example, both from a lot of history and American history.

Yeah, I mean, it's I mean, one of my favorite quotes is it's this Picasso quote that all children are artists and the challenges, how do you remain an artist when you grow up? And it's like basically, I think because when you're younger, I think it's just easier to have kind of wild ideas and you're not, you know, you have no, there are all these analogies to the innovators dilemma that exist in your life as well as your company or whatever you've built, right? So you're kind of earlier on your trajectory, it's easier to pivot and take in new ideas without it disrupting other commitments that you've made to different things. And so I don't know, I think that's an interesting part of running a company is like, how do you, how do you kind of stay dynamic?

Going back to the investors in open source, the $10 billion model, suppose it's totally safe, you've done these evaluations. And unlike in this case, the evaluators can also fine tune the model, which hopefully will be the case in future models. Would you open source that the $10 billion model? Well, I mean, as long as it's helping us, then yeah. But would it like the $10 billion of R&D? And then now it's like open source, right? Well, I think here's, I think a question, which we'll have to evaluate this as time goes on too. But we have a long history of open sourcing software, right? We don't tend to open source our product.

Right. So it's not like we take, we don't take like the code for Instagram and make it open source, but we take like a lot of the low level infrastructure and we make that open source, right? The, the probably the biggest one in our history was open compute project where we took the designs for kind of all of our, our servers, network switches and data centers and made it open source and ended up being super helpful because, you know, I mean, a lot of people can design servers, but now like the industry standardized on our design, which meant that the supply chains basically all got built out around our design, the volumes went up. So it got cheaper for everyone and saved us billions of dollars. So awesome, right?
对的。所以我们并不像拿 Instagram 的代码然后开源,但是我们会拿很多低层次的基础架构来开源,对吧?我们历史上可能最大的一个就是开放计算项目,我们拿出了所有我们服务器、网络交换机和数据中心的设计并进行开源,结果非常有帮助,因为,你知道的,很多人可以设计服务器,但现在整个行业都采纳了我们的设计,这意味着供应链基本上都围绕我们的设计展开,生产量也增加了。所以对每个人来说都变得更便宜,而且我们也省了数十亿美元。很棒,对吧?

Okay, so there's multiple ways we're open source, I think could be helpful for us. One is if people figure out how to run the models more cheaply, well, we're going to be spending tens or like a hundred billion dollars or more over time on all this stuff. So if we can do that 10% more effectively, we're saving billions or tens of billions of dollars. Okay, that's probably worth a lot by itself. Especially if there's other competitive models out there, it's not like our thing is like, be giving away some kind of crazy advantage.

So is your view that the trading will be commodified? I think there's a bunch of ways that this could play out. That's one. The, the other is, is that so commodity kind of implies that it's going to get very cheap because there's lots of options. The other direction that this could go in is qualitative improvements. So, so you mentioned fine tuning, right? It's like right now it's, it's, you know, it's pretty limited. What you can do with fine tuning, you major other models out there, and there are some options, but generally not for the biggest models.

So I think being able to do that and, and be able to kind of do different app specific things or use case specific things or build them into specific toolchains. I think we'll not only enable kind of more efficient development, it could enable qualitatively different things. Here's one analogy on this is, so one thing that I think generally sucks about the mobile ecosystem is that like you have these two gatekeeper companies, Apple and Google that can tell you what you're allowed to build.

And there are lots of times in our history. So there's the economic version of that, which is like, all right, we build something there just like I'm going to take a bunch of your money. But then there's the, there's the qualitative version, which is actually what kind of upsets me more, which is there's a bunch of times when we've launched or wanted to launch features, and then Apple's just like, nope, you're not launching that. It's like that sucks. Right?

And so the question is what is like, are we kind of set up for a world like that with AI where like, you're going to get a handful of companies that run these closed models that are going to be in control of the APIs and therefore going to be able to tell you what you can build. Well, for one, I can say for us, it is worth it to go build a model ourselves to make sure that we're not in that position. Right? Like, I don't want any of those other companies telling us what we can build. But from an open source perspective, I think a lot of developers don't want those companies telling them what they can build either.

So the question is, what is the ecosystem that gets built out around that? What are interesting new things? How much does that improve our products? I think that there's a lot of cases where if this ends up being like, you know, like our databases or caching systems or architecture, we'll get valuable contributions from the community that will make our stuff better.

And then our app specific work that we do will still be so differentiated that it won't really matter. Right? It's like, we'll be able to do what we do. We'll benefit in all the systems ours and the communities. We better because it's open source. There is one world where maybe it's not that.

I mean, maybe the model just ends up being more of the product itself. In that case, then I think it's a trickier economic calculation about whether you open source that because then you are kind of commoditizing yourself a lot. But I don't, from what I can see so far, it doesn't seem like we're in that zone. Do you expect to earn significant revenue from licensing your model to the cloud providers? So they have to pay you a fee to actually serve the model? We want to have an arrangement like that, but I don't know how significant it will be.

And we have this, this is basically our license for for llama. Yeah. You know, in a lot of ways, it's it's like a very permissive open source license, except that we have a limit for the largest companies using it. And this is why we put that limit in is we're not trying to prevent them from using it. We just want them to come talk to us because if they're going to just basically take what we built and resell it and make money off of it, then it's like, okay, well, if you're like, you know, Microsoft Azure or Amazon, then yeah, if you're going to reselling the model, then we should have some revenue share on that.

So just come talk to us before you go do that. And that's how that's played out. So for llama too, it's I mean, we basically just have deals with all these major cloud companies and llama too is available as a hosted service on all those clouds. And I assume that as we as we release bigger and bigger models, that'll become a bigger thing. It's not the main thing that we're doing, but I just think of others. So those companies are going to be selling our models. It makes sense that we should, you know, share the upside of that somehow. Yeah.

With the rest of the other open source dangers, I think I'm like, genuinely legitimate points about the balance of power stuff. And potentially, like the harms you can get rid of because we have better alignment techniques or something. I wish there was some sort of framework that meta had like other labs have this where they say, like, if we see this is a concrete thing, then that's a no go on the open source or like, even potentially in deployment, just like writing it down. So like, the company is ready for it. People have expectations around it and so forth. Yeah.

No, I think that that's a fair point on the existential risks side. Right now, we focus more on the types of risks that we see today, which are more of these content risks. So, you know, we have lines on we don't want the model to be basically doing things that are helping people commit violence or fraud or, you know, just harming people in different ways. So in practice for today's models, and I would guess the next generation, and maybe even the generation after that, I think while it is somewhat more maybe intellectually interesting to talk about the existential risks, I actually think the real harms that need more energy being mitigated are things that are going to like have someone take a model and do something to hurt a person with today is parameters of and kind of the types of kind of more mundane harms that we see today, like people kind of committing fraud against each other, things like that.

So that I just don't want to shortchange that. I think we have a responsibility to make sure we do a good job on that. Yeah, it matters to become a you can handle both. Yeah. Okay, so as far as the open source goes, I'm actually curious if you think the impact of the open source or in PyTorch, open compute, these things has been bigger for the world than even the social media aspects of meta, because I like talk to people who use these services who think like it's plausible, because a big part of the internet runs on these things. It's an interesting question. I mean, I think almost half the world uses are. Yeah, that's a true. So I think it's hard to beat that, but no, I think I think open source is it's really powerful as a new way of building things. And yeah, I mean, it's possible. I mean, it's maybe one of these things where I don't know, like Dell Labs, right, where they it's like they were working on the transistor because they wanted to enable long distance calling. And they did.

And it ended up being really profitable for them that they were able to enable long distance calling. And if you ask them five to 10 years out from that, what was the most useful thing that they invented, it's like, okay, well, we enabled long distance calling and now all these people are long distance calling. But if you ask 100 years later, maybe it's a different question. So I think that that's true of a lot of the things that we're building, right, reality labs, some of the AI stuff, some of the open source stuff, I think it's like the specific products evolve and to some degree come and go. But I think like the advances for humanity persist. And that's like a, I don't know, cool part of what we all get to do.

By when will the llama models be trained on your own custom silicon? Soon, not not not llama for. The approach that we took is first we we basically built custom silicon that could handle inference for our ranking and recommendation type stuff. So reels, newsfeed ads. And that was consuming a lot of GPUs. But when we were able to move that to our own silicon, we now were able to use the more expensive and video GPUs only for training. So at some point, we will hopefully have silicon ourselves that we can be using for probably first training, some of the simpler things that eventually training these like really large models. But in the meantime, I'd say the program is going quite well. And we're just rolling it out methodically and have a long term roadmap for it.

Final question, this is totally out of left field. But if you were a mate CEO of Google+, could you have made it work Google+, oof. Well, I don't know. I don't know. That's a that's a very difficult, very difficult, counterfactual. Okay, then the real final question will be when Gemini was launched, did you was there any chance that somebody in the office uttered Karthika de Linda est? No, I think we're tamer now. Google, as a market. Yeah, I don't know. It's a good question.
最后一个问题,这完全是出乎意料的。但如果你是Google+的首席执行官,你能让它成功吗?Google+哎呀。嗯,我不知道。我不知道。这是一个非常困难的反事实问题。好的,那么真正的最后一个问题是,当Gemini推出时,你们办公室里有人说过Karthika de Linda est吗?不,我认为我们现在更加谨慎了。Google作为一个市场。是的,我不知道。这是一个很好的问题。

I don't know. The problem is there was no CEO of Google+, it was just like a division within a company. I think it's like, and you asked before about what are the kind of scariest commodities, but you asked about it in terms of dollars. And I actually think for most companies, it's, it's of this scale, at least it's focus, right? It's like when you're a startup, maybe you're more constrained on capital. You know, you just are working on one idea and you might not have all the resources.

I think you cross some threshold at some point where the nature of what you're doing, you're building multiple things and you're creating more value across them, but you become more constrained on what can you direct and to go well. And like, there's always the cases where something just random awesome happens in the organization. I don't even know about it. And those are, that's great. But like, but I think in general, the organization's capacity is largely limited by what like the CEO and the management team are able to kind of oversee and kind of manage.

It's, I think that that's just been a big focus for us is like, all right, keep the, as I guess Ben Horowitz says, keep the main thing, right? And, and try to kind of stay focused on your key priorities. Yeah. All right. Awesome. That was excellent. Mark. Thanks so much. That was a lot of fun. Yeah. Really fun. Thanks, Raphne. Yeah. Absolutely. Hey, everybody. I hope you enjoyed that episode with Mark. As you can see, I'm now doing ads. So if you're interested in advertising on the podcast, go to the link in the description.

Otherwise, as you know, the most helpful thing you can do is just share the podcast with people who you think might enjoy it, you know, your friends, group chats, Twitter, I guess, threads. Yeah. Hope you enjoyed. And I'll see you on the next one.