Hi everyone, welcome to GrayMatter, the podcast from GrayLock, where we share stories from company builders and business leaders. I'm Heather Mack, head of editorial at GrayLock.
Today, we're re-broadcasting our episode featuring GrayLock General Partner, Sam Modemadeys, conversation with David Lawn and Percy Lang.
今天,我们将重新播出一期节目,其中包括GrayLock General Partner Sam Modemadeys与David Lawn和Percy Lang的对话。
David is the co-founder and CEO of AI Startup Adept, and Percy is a computer science and statistics professor at Stanford. While text and image generating AI tools like ChatGPT and Dolly are all the rage right now, Adept is developing tools that take things a step further by actually executing actions based on text commands.
David 是 AI 初创企业 Adept 的联合创始人和 CEO,Percy 是斯坦福大学的计算机科学和统计教授。虽然像 ChatGPT 和 Dolly 这样的文本和图像生成 AI 工具现在非常流行,但 Adept 正在开发更进一步的工具,实际上执行基于文本命令的操作。
The company just raised 350 million in series B funding to further its development of a tool that can be thought of as an AI teammate, trained to use every software tool and API for every knowledge worker. GrayLock contributed to the latest funding round, and the firm has been partnering with Adept since collating the company's Series A in 2022.
In this interview, David, Percy, and Sam discuss how advancements in large language models are paving the way for the next wave of AI. This interview took place during GrayLock's intelligent future event in August 2022.
The summit featured experts and entrepreneurs from some of today's leading artificial intelligence organizations. You can read a transcript of this interview on our website, and you can also watch the video from this interview on our YouTube channel. Both are linked in the show notes, and if you are already a subscriber to GrayMatter, you can sign up wherever you get your podcasts.
Okay, David, Percy, I'm excited about this. There's no doubt that large-scale models are topical for all of us here, and I'm really excited to have the two of you to discuss them with.
For those of you in the audience who aren't familiar with these two gentlemen, Percy is the Associate Professor of Computer Science and Statistics at Stanford, where, among other things, sees the director for the Center for Research on Foundation Models. And David is one of the co-founders in CEO of Adept, an ML research and product lab building general intelligence by enabling humans and computers to work together.
And before Adept, David was at Google leading a lot of large models effort, and before that at OpenAI. And we're fortunate to get to partner with David and the team at Adept here at Gray Lock.
Percy, David, thank you guys for being here and for doing this.
Percy, David,谢谢你们来这里并做这件事。
So I want to start high level and just start with the state of the play. There's a lot of talk about large models, and it's easy to forget that a lot of the recent breakthroughs and models that we're all familiar with, like Dolly and GPT or GPT III are actually fairly recent.
And so we're still in the early innings of these models running in production and delivering real concrete customer and user value. Maybe just give us the state of play David starting with you. Like where are we with large scale models and what's the state of deployment of these models today?
Yeah, I think the stuff is just incredibly powerful, but I think we're still underestimating how much there is left to run on this stuff. Like it's still so incredibly early.
Like just like take a look at a couple different axes, right? Like when we were training these models at Google, it became incredibly clear up front that you could basically take a lot of these hand engineered machine learning models that people had been spending a lot of their time building, like rip it out with this giant model, give it some fine tuning data, and turn it into a smaller model again and serve it, and that would just end up outperforming all of these things that people had done in the past.
And so like the fact that they're able to sort of improve existing things that companies are already using machine learning for, but also just like how great it has been as a way to be able to create brand new AI products that couldn't exist before.
Like it's fascinating to me to watch things like GitHub, Copilot, and like Jasper and stuff like that. Like just like hit a nerve so fast and go from zero to hero in terms of adoption. I think we're just in the very early innings of seeing a lot more of that.
So I think like that's axis one. I think axis two to two is just that like primarily what we're talking about so far has been language models, right? But like there's so many other like modalities, sources of human knowledge, all of this stuff.
Like what happens when like it's not just like predicting the next token of text, right, becomes about predicting all of those other different things. And like we're going to end up in a world where a lot of humanity's knowledge is going to get encoded in like various different like foundation models for many different things. And that's going to be really powerful as well.
Yeah, I kind of want to highlight, I agree with everything that David said. I want to emphasize one distinction he made, which is you know already with all the applications out there, these foundation models can just lift all boats and just make all the numbers kind of go up.
I think another thing which is even more exciting is that there's a huge sea of applications that we're not even maybe even dreaming of because we're kind of sucking this paradigm where what is ML or you could get some data, you train on it. But with prompting and all these other zero shot capabilities, I think you're going to see a lot more kind of new types. So I think we should be looking not just for how to make kind of faster horses or faster cars, but kind of new types of applications.
Percy, maybe to follow up on that, I totally agree and I think it connects to David's point around something like co-pilot. And I think that's amazing to me about something like co-pilot is both how new of an experience it is and how quickly it's taken off and gotten to end user adoption. What are some of the other areas that like you're looking forward to and are excited about in terms of net new applications that become possible because of these large models?
Yeah, so I mean maybe one general category can think about is creation. So this includes code, text, proteins, videos, PowerPoint slides, anything that you can imagine humans can do it right now, which could be a creative or sort of a more of a, you know, sort of a more task oriented activity. You can imagine these systems being very in the helping you in the loop, taking you kind of much farther and giving you many more ideas.
So I think the space is quite broad and I think underscoring the kind of a multimodal aspect of this which David touched on is really important. We shouldn't, you know, right now we have language models and we have code models and we have image models. But you know, think about things that you could do when you mix these together kind of creating different illustrated books or, you know, films or things like that.
I think one thing that you have to deal with is kind of the context, long context dependence. I mean, relatively right now you're generating single images or text up to, you know, maybe 2000 or 8000 depending on model tokens. But you know, imagine generating kind of full films that's going to require pushing the technology farther. But we have the data and if kind of harnessed that and scale up then I think there's a lot of possibilities out there.
David, what would you add? At a depth you guys spend a lot of time thinking about how to use these models to unlock new ways of collaborating with computers and software. I'm curious what some of the use cases you think about art.
So I think the thing that I'm most excited about right now is that like I think all the creativity use cases I personally just highlighted are going to be extremely powerful. I think what's fascinating about these models, right, is like if you ask these these generative models to go like do something for you in the real world, they kind of just pretend like they're doing something because they don't have a first class sense of like what actions are and like what affordances are on your computer.
So the thing that I'm really excited about in particular is like how do we bridge this gap, right? How do we train a foundation model of all of the actions that people take on a computer? And I think once you have that you have this like incredibly powerful base for being able to turn like natural language into like any sort of arbitrary complexity thing that you would then do on your machine.
So maybe if we take something like actuation as a key net new capability or we take longer contexts as an important net new capability, I think there's a, the form of the question is where do we still need to see key research on locks and where the key areas of focus to actually make like these products a reality?
I mean I think there are maybe two sides of things. One is pushing up capabilities and one is making sure things are pushed up in a way that's robust and reliable and safe. So the first one is kind of in terms of scaling. If you think about video and the ability to scale to you know hundreds of thousands of you know sequence lines, I mean I think you're going to have to do something different. The transformer architecture has gone this surprisingly far but you need to do something different there. And then I think you know David mentioned this briefly but I think these models are still in some ways chatterbots, they give you the illusion that there's something going on and I think if in certain applications this is actually you know okay if there's kind of another external validity check on things and with the human and the loop doing things.
But I think there's a deep fundamental research question on how to make these models actually reliable and there's many strategies that people have tried you know using reinforcement learning or using kind of more explanation based or retrieval augmented methods. But I feel like there's still something kind of deeper missing and I think this is one thing I hope the academic community and researchers will work on to ensure that these foundation models have you know good and stable foundations as opposed to shaking ones.
Yeah, agree with a lot of what Percy just said. I think I would just add that like you know I think the default path that we're on is increasing scale and increasing data and I think that will continue to lead to a lot of gains but the question becomes how do we pull forward the future faster. And I think that there's a lot of different things that we should be thinking about.
I think one is like specifically on the data side. I don't think most people I'm curious like later on it be curious to like at dinner and stuff to understand from the audience like how many people would agree that actually I think we're much more constrained on data than we think. I think within the next couple of years like everyone's going to have just to take language and example right like plus or minus 20 percent quality similar number of tokens web crawls anybody else right. So the question becomes like where next. So I think that's a really important question. I think we have another important question when it comes to like like what does like true creativity mean right like I feel like to me true creativity means being able to discover new knowledge and I think the new knowledge discovery process at least for foundation models as we're training out today as we actually get better at training these models that actually just better models the training distribution right. And so like I think giving these models the ability to go like gather new information and be able to try out things I think is also going to be really key. And finally I think on the safety side like we have a lot more to invest a lot more questions that we have to go answer.
So let's let's get to safety in a moment continuing on data because I think that is a really important topic here David at adept you all are thinking about how to build products that humans collaborate with. And I think one of the nice consequences of that becomes the state of flywheel. Can you maybe add a little bit about how you're thinking about that and how you're approaching designing products that end users will work with.
Yeah I think that like it starts out with sort of having a pretty crisp definition of what we want the end game to look like. I think for us what we want to be building is we want to be building like we want to be building like teammates and collaborators for people right like like a series of increasingly powerful software tools that sort of help humans increase the level of abstraction at which they can interact with their machine right. Like kind of like it doesn't like it to do a different analogy it doesn't replace the musician but it gives musicians synthesizers kind of right that kind of analogy except for doing things on your computer. I think because that's where we want to go I think what's really important to us is how do we solve these HCI problems where it really feels like you're working together with the machine at the same time using that as an opportunity for for for us to be able to learn from basically how humans break down really complicated problems how humans like actually get things done like that may be part of like things that are much more complicated than trajectories you might just be able to see on the internet.
Just to add something to that I think the interaction piece is really interesting here because you know these models are though in some ways the most interactive ML models we have right you have a playground you can you type in a prompt and you immediately get us like you get to play with a model as opposed to this kind of a you know previous cycle where someone gathers some data trains a model and then you experience it kind of from the user. So the line between developer and user is actually kind of interestingly getting kind of blurred which I think is actually a good thing because if you can kind of connect these two up then you get kind of better get our experiences.
Is there anything interesting from like the Stanford both the HCI perspective and the foundation models perspective on the research side that you all are working on around interaction?
你们工作中有在交互方面进行研究时,从斯坦福人机交互角度和基础模型角度看是否有有趣的发现?
Yeah so one thing that we've been doing in at Stanford is you know as a part of a kind of a larger benchmarking effort trying to understand the ways in which these models what it means for kind of models to humans to interact with these models right because the classic way that people think about these models is you know you train these models and then there's a you know 100 benchmarks and you evaluate and this is sort of taking the kind of automation you know approach but as we know a lot of the potential here is in you know co-pilot or auto complete kind of experiences where there is a human in a loop and humans I mean in a depth is I think is also a good example of this and what does that mean should we be building our models differently if we know that humans are going to be in the in the picture as opposed to you know you're doing kind of full automation and that's kind of an interesting thing because maybe in some cases the model you want a model not to just be you know accurate but you want it to be more interpretable or more reliable or kind of understandable and for creative applications you may want model to actually have a kind of a broader distribution of outputs and we're seeing some of this where you know what human was good for actually kind of interaction is not necessary what's good just for the kind of a standard benchmarks so that's really interesting.
How's that going to get resolved like I'm thinking about in classical and a lot of classical machine learning applications are some again even there it's still hazy but there's some point of view on benchmark standards there are different products out there that can actually measure these things around bias and auditing as we massively blow up the scope you know around creativity all of that kind of shifts so how do you think this is when to resolve. Yeah I mean so first order you know scale definitely is helping so we're safe on that if you scale up the models I think it lifts all boats and then given a particular scale then you have a question of you know where you're investing your your resources I think what we want to do is develop kind of effective surrogate metrics which you can actually evaluate which correlate well with sort of human interaction we don't really have a good handle on this quite yet but having humans in a loop for kind of a in a loop is also potentially kind of problematic and hard and not reproducible so you want something that's easy to evaluate but at the same time that's actually tracking what you care about.
So I want to shift to building building products and companies around large scale models and David maybe I'll start with you like there are people in the audience who are in the early stages of building these companies and one fundamental question is okay do you go build on top of an open AI API do you go build on something in the open source do you go build your own large model like how do you how do you think a founder should navigate making that decision. I think it's probably the biggest question for people to ask right now I think the the root thing that I think is worth answering first is like what is the loop you're going to run for your company to compound like is it is it is it going to be oriented towards like really deeply understanding in particular customer's case is it going to be oriented towards some sort of data flybill that you're trying to build.
我想转向围绕大规模模型构建产品和公司,David,也许我可以从你那里开始。观众中有一些人正在初期建立这些公司,其中一个基本问题是:你会在开放的 AI API 上建立吗?你会在开源软件上建立吗?还是你会建立自己的大型模型?你认为创始人应该如何决定?我认为这可能是人们现在最需要问的最重要的问题。我认为首先值得回答的根本问题是,你将为公司运行什么循环以实现复合?它是否会面向深入了解特定客户案例?还是面向你正在尝试构建的某种数据枢纽?
I think the general thing here is that like like thinking about how that her faces with the differentiation that you want to have as a business is going to be really key because I think the world that you that I don't think we want to live in is one where effectively these companies become sort of like outsourced customer discovery engines and a new Amazon basic versions of these things come out over time right like that would not be a particularly good world to live in. So I think figuring out what that compounding looks like is the most important first step and the other thing that think about here is just like how many nines do you need. If you need a lot of nines of reliability I think one thing that's really really difficult is you just like you lack all the affordances that you could possibly want if you are sort of consuming this through an intermediary to get you to where you want me to be with your customers. So I think that like because of those different reasons like you could end up choosing a very different point in space for how you want to how you want to ultimately consume these services. Maybe just add one thing is that one nice thing about having these APIs is extremely easy to kind of get started and try something. You can sit down after noon and you punch in some data you can kind of get a sense of the possibilities and some cases is sort of a lower bound on how well you can do because you spend an afternoon and if you invest in more and you find to and build sort of custom things and can only get better in some sense.
我认为这里的一般想法是,想想如何应对作为一个企业所要具备的不同之处,这将是非常关键的,因为我认为我们不想生活在这样一个世界中,有效地将这些公司变成类似于外包客户发现引擎的公司,而且这些公司随着时间的推移推出了新的 Amazon 基本版本,这将不会是一个特别好的世界。因此,我认为找出这种复利是最重要的第一步,这里需要考虑的另一件事情就是你需要多少个“九”级别的可靠性。如果你需要很高的可靠性,我认为其中一个非常困难的事情就是,你缺乏你可能想要的所有合理的选择,如果你是通过中介来消费这些服务的话,你就无法完全掌握你想要与你的客户交流的方式。因此,我认为由于这些不同的原因,你可能会选择一个非常不同的空间点来决定你最终想要如何消费这些服务。也许只需补充一点,那就是拥有这些 API 的一个好处是非常容易开始尝试一些东西。你可以坐下来,花一个下午输入一些数据,你就可以了解到其中的可能性,在某些情况下,这是你可以做得更好的一个下限,因为你只花了一个下午的时间,如果你更多地投资并发现并构建定制的东西,那么你只会变得更好。
So, that I think has opened up a lot of the challenges to even formulate what is the right problem to go on and typically you don't know because you have to collect data and you have to train a model and then that loop becomes very expensive. But you can just sit down afternoon and try a few things and maybe like few shot your way to something that's actually reasonable.
Now that kind of gets you kind of into a different part of the space and you can iterate much faster. Yeah, it makes a lot of sense in terms of prototyping quickly and trying to like take out product market fit risk. One question becomes in person curious for you to take on this. If you start that way, how do you over time build durability into your product? Because that could make the argument, hey, maybe you're just a thin layer on top of someone else's API, you can quickly do risk product market fit but is there real durability in your layer stack?
Right. Yeah, I think the transition out of API is a very discrete one in some sense. I think it's kind of a, you know, people also do like human or a wizard of odd experiments. You put a human there and you have the human do it and then you see kind of work out all the interface issues and whether this makes sense at all and then you try to kind of put something else to human out. And now you can put an API there and you can see get a sense of what things are like. And then in some cases, like maybe, you know, fuchsia learning is actually for some things actually not that strong if you have, for example, data and maybe like a fine tune like T5 model or something much smaller can actually be effective and, you know, I don't think, I think the last thing you might should be like, let's go pre-trained of 500 billion parameter model when you don't know what application you're building.
Maybe continuing on the theme of building on top of these models, like despite the magical qualities of these things, there's still limitations, right? Like one of the limitations is falsehoods and there are others that I think developers need to navigate as they think about building these applications. David, maybe starting with you, like what do you think some of the key limitations are? And how do you guide people around navigating those?
That's a really good question. I think falsehoods are definitely a very interesting thing to talk about. These models love to be massive hallucination engines. And so getting them to stick to script can be quite difficult. I mean, I think in the research community we're all aware of a bunch of different techniques for improving that, from things like learning from human feedback to potentially like augmenting these models with retrieval and such.
I do think that on the topic of falsehoods in particular, this idea of packing all of world facts into the parameters of a model is pretty inefficient and somewhat wasteful, especially when some of those facts change over time, who may be running a particular country in a particular moment. And so I think it's pretty unlikely that that's going to be the terminal state for a lot of these things. So I'm really excited for a lot of the research that will happen to improve that. But I think the other part actually goes back to a question of practicality and HCI, which is that you kind of a sense of like every year, you know, like we're pushing fundamental advancements on these models. They get somewhat better on a wide variety of different tasks that are already shown receptivity to scale and receptivity to more training data examples. But how do you sort of like surf this wave where the particular capability you're looking for from the model are good enough to be deployed where you can learn how to get from there to the finish line. And how do you work around some of these limitations in the actual interface to these models such that it doesn't become a problem for your user to use that? I think that's actually a really fascinating problem.
Yeah, I mean, these models I think are exciting and the kind of flip side is that they have a ton of weaknesses and in terms of falsehoods, generating things that are not true, biases, stereotypes, and basically all the bad, good and bad and ugly of the internet kind of gets put into it.
And I think, you know, it's actually much more nuanced than just like let's remove all the incorrect facts and de-biase these models because there are efforts on filtering. You can filter out a kind of offensive language, but then you might end up marginalizing certain populations. And what is truth? We like to think there's a truth, but there's actually a lot of text which is just opinions and there's different viewpoints. And a lot of it's not even false and fireballs. You can't talk about truth without even false and fireability.
And in a lot of cases, you know, and there are some applications, for example, you know, creative cases where, yeah, you do want maybe things which are, you know, a bit more edgy and how you create fiction, for example, and if you can't, if everything has to be true. So it's, you can't really, there's no easy way to kind of even, even if you could throw out all the kind of quote unquote, you know, bad stuff.
So I think one thing that maybe is a good framework to think about is that there's no one way to kind of make these models up to much better. I think that what you want is, you know, control and, you know, documentation. You want to be able to understand, given a model, what is it capable of, what should it be used for and what it should not be? And this is tricky because these models have huge capability surface, right? You just get a, get a prompt. You can put any string and get any other string back.
So what am I supposed to do? What are the guarantees? And I think as a community, we need to develop a better language for thinking about, you know, what are the possible inputs and outputs? What are the contracts? Like the things that you have in traditional APIs and good old fashioned software engineering, we kind of need to import some of that so that downstream application developers can look at it's like, oh yeah, okay, I'll use this model, not that one for my particular use case.
There's another side to this which for lack of better word, I'll use the word risk, which is if I'm a product builder and I'm building these models and building different products, there's different levels of risk I might be willing to take in terms of what I'm willing to ship to end users and the level of guardrails I need in place before I'm willing to ship.
How do you guys think about that and frameworks are on that?
你们怎么看待这件事以及与之相关的架构?
So I think there's some interesting perspectives here specifically around just like how do you sequence out the different applications that you want to go after, right? I feel like if you can really, like one really nice property of these models is it's just so easy to go from zero to a hand-wavy 80% quality on a wide variety of tasks. For some of those tasks that's all you need and sometimes the iteration loop of having that 80% thing with humans is all you need to do to go run it over the finish line.
I feel like right now the biggest argument we all have is starting out by first addressing things like that. But I think that over time, actually one of the things that Kevin said that I really liked is that there will be more of a standardized toolkit for how to erase some of the lower hanging fruit risks related to generations that are inappropriate or models going off the rails in various different ways.
I think there's another set of risks that are slightly longer term that I think are also really important to go think about and I think those are definitely much harder. To build on top of that, I think there's two maybe different, also different categories or maybe one category of risk is also adversaries in the world, right?
Whenever you have a product that's kind of getting enough traction, there's probably people who want to mess with you. One example is data points in it. One thing, and decided I think hasn't really been born out, there's some papers on it.
If you think about it, these models are trained on the entire whatever web crawl. Anyone could go put up a web page or put something on GitHub and that can enter the training data and these things are actually pretty hard to detect. If you think from a security point of view, this is like a huge gaping hole in your system and a determined attacker could probably figure out a way to screw over your system.
The other thing to think about is misuse. These systems, all systems, the powerful models like these are dual use technologies. There's a lot of immense good that you can do with them, but they can also be used for fraud, disinformation, spam and all the things that we already know exist, but now amplified. That's kind of a scary thought. Definitely a lot of asymmetric capabilities here, because just like cooking up one of these giant code models to some RL agent to just get into systems, there's so many things out there that becomes way easier for malicious actors to do as a result. Attack is so much easier than defense.
I have a few more questions I want to get through, but just watching the time, I want to first open it up and see if there are questions in the audience. There's a question for Percy from an academic standpoint of what is ideal wish list that you might have in ways that corporations and big companies that are building a lot of the ecosystem could help?
Yeah, I think one big thing is openness and transparency, which is something I think is really sorely missing today. I think if you look at the deep learning revolution, what's been great about it is that it's benefited from having an open ecosystem with toolkits like TensorFlow, PyTorch, data sets are online tutorials, and people can just download things, tinker, and play with it. It's much more accessible. Now we have models which are behind APIs and with charging certain fees. Things have, you can't really tinker as much with it. I think, and also at the same time, a lot of organizations are addressing the same issues around me and we talked about safety and misuse, but there's not a agreement on what is the best practices.
I think what would be useful is to develop community norms around what is safe or what are best practices for mitigating some of these risks. In order to do that, there has to be some level of openness as well so that when a model is deployed, you have to get a sense of what these models are capable of and you kind of benchmark them and document them in a way that the community knows how to respond as opposed to, here is a thing you can play with. It might shoot yourself in the foot or it might not.
Good luck. We have time for one more question. So maybe I'll ask a final question which goes back to creativity. I think one of the important things to inspire in all of us is what's going to be possible, the magic of these models. I think Dolly was an important moment for people to see the type of creative generation that's possible. I guess for each of you, what are some of the things that you think are going to be possible in the way we interact with these models in a few years that you're most excited about?
Maybe starting with you, David. I think language, as we were talking about earlier, language is just the tip of the iceberg. We're already seeing amazing things just with language but I think when we start seeing foundation models or even bundle together foundation models that are multimodal for every domain of human knowledge, every type of input to a system or to ourselves that we care about, I just think we're going to end up with some truly, truly incredible, truly incredible outcomes with that.
I think if I were to choose a personal thing that I'm not working on that I think would be really cool, actually I think person I talked about this once is what happens when you start having foundation models for robots. When you can take all of these demonstrations, all of the different trajectories of robots interfacing with the real world and sort of like put them all into one particular system and then have them have the same type of generality that we've seen with language. I think that'd be incredible.
Yeah, I agree with that and I have definitely thoughts but maybe I'll end with a different perspective, not a different perspective but another example which is if you think about we get excited about generating an image, it's just an image and it's also something that you might imagine artists being able to do.
If you think about humans aren't changing that much, computers are and before a year ago or two years ago we weren't able to do that now we can and so if you extrapolate now images only, it's so small and you think about videos or now you think about 3D scenes or immersive experiences with personas, you could imagine generating worlds in a sense and that's kind of scary but also sort of exciting.
I don't know there could probably have many possibilities there but I think the bigness of things that you can create, I mean if you think about these models as excellent creators of large objects now and you think about what are big objects, well there are kind of environments in some sense and what if we could kind of do that, what would that look like and what would you order kind of some applications that could unlock, that would be interesting to think about.
Yeah there's a commonality there which again connects back to multiple modalities and just continuing to push scope, it's a really exciting glimpse of the future.
是的,有一个共同点,又一次连接到多种模式,并且不断推动范围,这是一个非常令人兴奋的未来展望。
Percy David thank you guys so much for doing this, thank you. That concludes this episode of Grey Matter. Like what you hear, we encourage you to rate and review Grey Matter on your favorite podcast platform, we sincerely appreciate your feedback.