I know you're on this worldwide tour trying to help control the fire that GPT-4 and chat GPT have started. This particular room is maybe a little different from a lot of that. Most of the people here in this room are either building companies or working on plans to build companies that are in the ecosystem really triggered by chat GPT. My kind of people out there. Yeah I wish you were here too. They are exactly your kind of people. I know that part of the mission here is to make the world a better place but also to build on top of the platform that you've created and obviously you navigated to the position you're in in life very deliberately and you're the perfect person to help advise them. We're going to keep this on focus in a way that it helps this room as much as possible. These 900 people create successful companies.
The first thing I'm going to ask you about is if AGI is in the near term future then we're right now at this inflection point where human history has a period of time up till AGI and then obviously has a completely different history from here forward. So it seems to me that at this stage you're going to be a centerpiece of the history books no matter how this evolves. Do you think it's the same? So I think it's the same in terms of what? In terms of the way history will describe this moment, this moment being this year of innovation in this field. I hope this will be like a page or a chapter in history books but I think that over the next several billion years such unbelievable things are going to happen that this will be just one small part and there will be new and bigger and more exciting opportunities and challenges in front of us.
I think one of the things that a lot of people are asking with prior iterations of GBT, open source iterations, you had a whole variety of ways of taking that source code and making a vertical company out of it or an adjacent company, something a federated learning or something. In the future iteration of these companies you've got this highly tunable closed API to start from. Any quick advice on okay I'm starting a company now I have to make some decisions right out of the gate. What do I start with? How do I make it work in any given vertical use case?
I think there's always more that stays the same about how to make a company than what changes. A lot of people whenever there's a new platform shift like this thing just because they're using the platform like that's what's going to guide business strategy. It doesn't, nothing lets you off the hook for building a product that people love for being very close to your users, fulfilling their needs for thinking about a long-term durable business strategy. That's actually probably only more important during a platform shift, not less.
If we think back to the launch of the App Store which is probably the most recent similar example, there were a ton of companies that built very lightweight things with, I don't want to call them like exploitative mechanics but just like it was not something durable and those companies had incredible meteoric rises and falls and then the companies that really did all the normal things you're supposed to do to build a great business endured for the last 15 years. You definitely want to be in that ladder category and the technology is just like this is a new enabler but what you have to do as a company is like build a great company that has a long-term company strategic advantage.
Then what about foundation models just as a starting point? If I look back two years, one of the best ways to start was to take an existing foundation model, maybe add some layers and retrain it in a vertical use case. Now the foundation models or the base model is maybe a trillion parameters so it's much, much bigger but your ability to manipulate it without having to retrain it is also far, far more flexible. I think you have 50,000 tokens to play with right now in the basic model. Is that right? About a minute. Thirty two thousand in the biggest model. Thirty two thousand in the base model.
Okay and actually so how's that going to evolve? There are new iterations that are going to come out pretty quickly. We're still trying to figure out exactly what developers want in terms of model customization. We're open to doing a lot of things here and we're you know we also hold out like developers are users. Our goal is to make developers super happy and figure out what they need. We thought it was going to be much more of a fine tuning story and we have been thinking about how to offer that in different ways but people are doing pretty amazing things with the base model and for a bunch of reasons often seem to prefer that. So we're like actively reconsidering what customization to prioritize given what users seem to want seem to be making work.
As the models get better and better it does seem like there is a trend towards less and less of a need to fine tune and you can do more and more of the context. And when you say fine tune you mean changing parameter weights. Yeah. I mean is there going to be ability and ability at all to change the parameter weights in the GPD world? Yeah well definitely offer something there. But it like right now it looks like maybe that will be less used than ability to offer like super cheap context like 1 million if we can ever figure that out on the base model.
Yeah let's drill in on that just a little bit because it seems like regardless of the specifics the trend is toward as the models are getting bigger and bigger and bigger so you go from one trillion to 10 trillion parameters. The amount you can achieve with just changing prompt engineering or changing the tokens that are feeding into it is growing disproportionately to the model size. Does that sound right?
Um, disproportionately to the model size yes but I think we're like at the end of the error where it's going to be these like giant giant models and we'll make it better in other ways. Um, but I would say it like it grows proportionate to the model capability. Yep.
And then the investment in the creation of the foundation models is in the on the order of 50 million, 100 million just in the in the training process. Um, so it seems like is it? What's the magnitude there? We don't share but it's much more than that. Okay. Yeah. And rising I assume over time. Yeah.
So then somebody trying to start from scratch. Somebody trying to start from scratch, you know, is trying to catch up to something that's anymore maybe or maybe we're all being incredibly dumb and we're missing one big idea and all of this is not as hard as expensive as we think and there will be a totally new paradigm that obsolete us which will be great and not great for us but like great for the world. Yeah. Yeah.
So let me get your take on some so Paul Graham calls you the greatest business strategist that he's ever encountered and of course all these people are wrestling with their business strategy and what exactly to build and where and so I've been asking you questions that are more or less vertical use cases that sit on top of GPT-4 and Chetchi and soon GPT-5 and so on. But there's also all these business models that are adjacent. So things like federated learning or data conditioning or just deployment and and and so those are interesting business models too.
If you were just investing in a class of company that's in the ecosystem any thoughts on where the greater returns are, where the faster growing more interesting business models are. I don't think PG quite said that. I know he said something like in that direction but in any sense in any case I don't think it'd be true. I think there are people who are like unbelievable business strategists and I am not one of them so I hesitate to give advice here.
The only thing I know how to do I think is this one strategy again and again which is very long time horizon capital intensive difficult technology bets and I don't even think I'm particularly good at those. I just think not many people to try them so there's very little competition which is nice. I mean these strategies I think that I have a lot of competition. But the strategy that it takes to now like take a platform like OpenAI and build a new fast growing defensible consumer enterprise company. I know almost nothing about like I know all of the theory but none of the practice and I would go find people who have done it and get the practice get the advice from them.
A couple questions about the underlying tech platform here so I've been building neural networks myself since the parameter count was sub 1 million and they were they're actually very useful for a bunch of commercial applications and then kind of watch them tip into the billion and then the you know with GPT2 I think about one and a half billion or so and then GPT3 and now GPT4 so you go up we don't know the current parameter count but I think it was 175 billion in GPT3 and it was just mind blowingly different from GPT2 and then GPT4 is even more mind blowingly different.
So the raw underlying parameter count seems like it's on a trend just listening to NVIDIA's forecast where you can go from a trillion to 10 trillion and then they're saying up to 10 quadrillion in a decade. So you've got four factors of 10 or 10,000 X in a decade. Does that even sound like it's in the right ballpark?
I think it's way too much focus on parameter count. I mean parameter count will trend up for sure but this reminds me a lot of the gigahertz race in chips in the like 90s and 2000s where everybody was trying to like point to a big number and then event like you don't need probably most of you don't need not many gigahertz or any or iPhone but it's fast. Like what we actually care about is capability and I think it's important that what we keep the focus on is rapidly increasing capability and if there's some reason that parameter count should decrease over time or we should have like multiple models working together each of which are smaller we would do that. Like what we want to deliver to the world that the most capable useful and safe models we are not here to like jerk ourselves off about parameter count. Yeah. Can we quote you on that? Okay it's going to get quoted no matter what. So yeah.
Well thank you for taking that away from me. So but one thing that's absolutely unique about this class of algorithm versus anything I've ever seen before is that it surprises you with raw horsepower regardless of whether you measure it in parameter count or some other way. It does things that you didn't anticipate purely by putting more horsepower behind it and so it takes advantage of the scale.
The analogy I was making this morning is if you have a spreadsheet you coded it up you run it on a computer that's 10,000 times faster it doesn't really surprise you it's it's nice and responsive it's still a spreadsheet whereas this class of algorithm does things that it just couldn't do before and so we actually one of our partners in our venture fund wrote an entire book on GPT2 and you can buy it on Amazon it's called start here or start here romance. I think about 10 copies of sold I bought one of them so maybe nine copies of sold but if you read the book it's just not a good book and here we are it's only that was four years ago it's only been four years and now the quality of the book has gone from you know GPT2 3, 4 not a good book you know somewhat reasonable book to now it's possible to write a truly excellent book you have to give it the the framework you have to you're still effectively writing the concept but it's filling in the words just beautifully and so as an author that could be a force multiplier of something like 10, 100 and it just enables an author to be that much more powerful so this class of algorithm then if the underlying substrate is getting faster and faster and faster it's going to do surprising things on a relatively short time scale and so I think one of the things that people in the room need to predict is okay what is the next real world society benefiting use case that hits that tipping point on this curve so any insights you can give us into you know what's what's going to be possible that wasn't possible a year prior to years prior okay
I said I don't have like business strategy advice I just thought of something I do I think in new areas like this one of the right approaches is to let tactics become strategy instead of the other way around and you know I have my ideas I'm sure you all have your ideas maybe we'll be mostly right we'll be wrong in some ways and even the details of how we're right will be wrong about um the I think you never want to lose sight of vision and focus on the long term but a very tight feedback loop of paying attention to what is working and what is not working and doing more of the stuff that's working and less of the stuff that's not working and just very very careful user observation can go super far so like you know I can speculate on ideas you'll speculate on ideas none of that will be as valuable as putting something out there and really deeply understanding what's happening and being responsive to it.
Um as Dave is getting ready for the next question Sam when did you know your baby chat G.P.T. was something really special and what was the special sauce that allowed you to pull off something that others haven't and Dave will come back but yeah oh who likes Sam so far? okay all right if Sam was hiring would you consider being part of his team okay all right we got a lot of hands great yeah please please come we really need help and it's going to be a pretty exciting next few years um I mean we've been working on it for so long that it's like you kind of know with gradually increasing confidence that it's it's really gonna work but this is you know we've been doing the company for seven years um these things take long right now I would say by and like in terms of why it worked one other time it's just because we've like been on the grind sweating every detail for a long time and most people are willing to do that um in terms of when we knew that Chats G.P.T. in particular was gonna like catch fire as a consumer product probably like 48 hours after lunch yeah all right.
Um so before Dave comes one back I asked Lex to ask a sexy question hey Lex hey you want to use the communicator you're good what is it it's a Star Trek you're good I'm good okay I grew up in the Soviet Union we didn't have oh check off check second second season yeah let me ask some sexy controversial questions so you got uh legends in artificial intelligence Ilya Suskevar and Andrei Kapathe over there who's smarter just kidding oh just kidding you don't have to answer that that's that joke everybody was about to he was thinking about it all right I like it uh no it's just uh
So we're at MIT and from here with Max Tagmark and others they put together this open letter to halt uh AI development uh for six months what are your thoughts about this open letter there's parts of the thrust that I really agree with we we spent more than six months after we finished training G.P.T.4 before we released it so taking the time to really study the safety for model to get external audits external red teamers um to to really try to understand what's going on in mitigated as much as you can that's it's important it's been really nice since we have launched G.P.T.4 how many people have said like wow this is not only most capable model open and I put out but like by far the safest and most aligned and unless I'm trying to get it to do something bad it won't um so that we totally I totally agree with um I also agree that as safety as as capabilities get more and more serious that the safety bar has got to increase um but unfortunately I think the letter is missing like most technical nuance about what's where we need the pause like it's actually like opening I an earlier version of the letter claimed open AI is trained G.P.T.5 right now we are not and won't for some time um so in that sense it was sort of silly but we are doing other things on top of G.P.T.4 that I think have all sorts of safety issues that are important to address and we're totally left out of the letter um so I think moving with caution and an increasing rigor for safety issues is really important the letter I don't think is the optimal way to address it.
It's just a quick question if I may one more uh is you have been extremely open having a lot of conversations being honest uh others at opening AI as well what's the philosophy behind that because compared to other companies they're much more closed than that in that regard and do you plan to continue doing that? We certainly plan to continue doing that um the tradeoff is like we say dumb stuff sometimes you know stuff that turns out to be totally wrong and I think a lot of other companies don't want to say something until they're sure it's right um but I think this technology is going to so impact all of us that we believe that engaging everyone in the discussion putting these systems out into the world deeply imperfect though they are in their current state so that people get to experience them think about them understand the the upsides and the downsides it's worth the trade off even though we do tend to embarrass ourselves and public and have to change our minds with new data frequently um so we're going to keep doing that because we think it's better than any alternative and a big part of our goal at opening AI is to like get the world to engage with this and think about it and and gradually update and build new institutions or adapt our existing institutions to be able to figure out what the future we all want is uh so that's kind of like why we're here.
So we only have a few minutes left and I have to ask you a question that that has been on my mind since I was 13 years old so I think if you read Ray Kurzweil or any of the luminaries in the sector the day when the algorithms start writing the code that improves the algorithms is a pivotal day it accelerates the process toward infinity or in the singularity of the world to absolute infinity and so now a lot of the companies that I'm an investor in or have been co-founder of are starting to use LLMs for code generation and it's interesting very wide range of lifts or improvement in the performance of an engineer ranging from about 5% to about 20x and it depends on what you're trying to do what type of code how much context it needs a lot of it is related to tuning in the system.
So there's two questions in there first within open AI how much of a force multiplier do you already see within the creation of the next iteration of the code and then the follow on question is okay what does it look like a few months from now a year from now two years from now are we getting close to that day where the thing is so rapidly self-improving that it hits some great question I think that it is going to be a much fuzzier boundary for you know getting to self-improvement or or not I think what will happen is that more and more of the improvement loop will be aided by AI's but humans will still be driving it and it's going to go like that for a long time and there's like a whole bunch of other things that I have never believed in the like one day or one month takeoff for a bunch of reasons but like one of which is how incredibly long it takes to build new data centers bigger data centers like even if we knew how to do it right now just like waiting for the concrete to dry getting the power into the building the stuff takes a while but I think what will happen is humans will be more and more augmented and be able to do things in the world faster and faster and it will not work out like it will not somehow like most of these things don't end up working out quite like the sci-fi books and neither will this one but the rate of change in the world will increase forever more from here as humans get better and better tools.