I'd like to introduce Harrison Chase. One of the reasons I was really excited to come back today was because I think it was a year ago at this event that I met Harrison and I thought, boy, if I get to meet super cool people at Harrison, I'm definitely going to come back this year. Quick question, how many of you use Langchain? Yeah, wow, okay, it's almost everyone. Don't see that, don't. Pro-vular talk, run, pip and start Langchain. If you aren't using Langsof yet, I'm a huge fan. Harrison works with a massive developer community. If you look at the Pip High-down-nose stats, I think Langchain is by far the leading Generative AI orchestration platform, I think. And this gives a huge view into a lot of things happening in Generative AI, so I'm excited to have him share with us what he's seen with AI agents.
Thanks for the intro and thanks for having me excited to be here. So today I want to talk about agents. So Langchain's a developer framework for building all types of all on applications, but one of the most common ones that we see being built are agents. And we've heard a lot about agents from a variety of speakers before, so I'm not going to go into too much of a deep overview, but at a high level, it's using a language model to interact with the external world in a variety of forms. And so tool usage, memory, planning, taking actions is kind of the high level gist. And the simple form of this you can maybe think of as just running an LLM in a for loop. So you ask the LLM what to do, you then go execute that, and then you ask it what to do again, and then you keep on doing that until it decides it's done.
So today I want to talk about some of the areas that I'm really excited about, that we see developers spending a lot of time in and really taking this idea of an agent and making it something that's production ready and real world and really the future of agents as the title suggests. So there's three main things that I want to talk about, and we've actually touched on all of these in some capacity already, so I think it's a great roundup. So planning, the user experience, and memory. So for planning, Andrew covered this really nicely in his talk, but we see a few, the basic idea here is that if you think about running the LLM in a for loop, oftentimes there's multiple steps that it needs to take. And so when you're running it in a for loop, you're asking it implicitly to kind of reason and plan about what the best next step is, see the observation, and then kind of resume from there and think about what the next best step is right after that.
Right now at the moment, language models aren't really good enough to kind of do that reliably. And so we see a lot of external papers and external prompting strategies kind of like enforcing planning in some method, whether this be planning steps explicitly up front or reflection steps at the end to see if it's kind of like done everything correctly as it should. I think the interesting thing here, thinking about the future, is whether these types of prompting strategies and these types of like cognitive architectures continue to be things that developers are building or whether they get built into the model APIs as we heard Sam talk a little bit about. And so for all three of these to be clear, like I don't have answers, and I just have questions. One of my questions here is, are these planning, prompting things short term hacks or long term necessary components?
目前,语言模型还不够可靠,无法可靠地实现这种功能。因此,我们看到很多外部论文和外部提示策略在某种程度上强化规划,无论是在前面明确规划步骤,还是在最后反思步骤,看看是否已经正确完成所有任务。我认为在这里,思考未来的有趣之处在于这些提示策略和这些认知架构是否会继续成为开发人员正在构建的东西,或者是否会像我们听到山姆谈论的那样被内置到模型 API 中。因此,对于这三者来说,我要明确的是,我没有答案,只有问题。我在这里的一个问题是,这些规划、提示的事物是短期的修补还是长期必要的组成部分?
Another kind of like aspect of this is just the importance of basically flow engineering. And so this term I heard come out of this paper, alpha code, it basically achieves state of the art kind of like coding performance, not necessarily through better models or better prompting strategies, but through better flow engineering. So explicitly designing this kind of like graph or state machine type thing. And I think one way to think about this is you're actually offloading the planning of what to do to the human engineers who are doing that at the beginning. And so you're relying on that as a little bit of a crutch.
The next thing that I want to talk about is the UX of a lot of agent applications. This is actually one area I'm really excited about. I don't think we've kind of nailed the right way to interact with these agent applications. I think human in the loop is kind of still necessary because they're not super reliable. But if it's in the loop too much, then it's not actually doing that much useful thing. So there's kind of like a weird balance there. One UX thing that I really like from Devon, which came out a week, two weeks ago, and Jordan B kind of like put this nicely on Twitter, is the presence of like a rewind and edit ability. So you can basically go back to a point in time where the agent was and then edit what it did or edit the state that it's in so that it can make a more informed decision. And I think this is a really, really powerful UX that we're really excited about at Ling Training exploring this more. And I think this brings a little bit more reliability. But at the same time kind of like steering ability to the agents.
我接下来要谈论的是许多代理应用的用户体验。这其实是我非常兴奋的一个领域。我认为我们还没有完全掌握与这些代理应用互动的正确方式。我认为人机协作仍然是必要的,因为它们并不是非常可靠的。但如果它参与的太多,那么它实际上并没有做太多有用的事情。所以在这方面有一种奇怪的平衡。我非常喜欢来自 Devon 的一个用户体验功能,这是一两周前发布的,Jordan B 在 Twitter 上也很好地总结了这一点,那就是有像倒带和编辑的功能。这样你基本上可以回到代理处于某个时间点的状态,然后编辑它的行为或状态,以便它能做出更明智的决策。我认为这是一种非常强大的用户体验,我们在 Ling Training 上非常兴奋地探索着这方面。我觉得这可以增加一些可靠性,同时也让代理有更多的操控能力。
And speaking of kind of like steering ability, the last thing I want to talk about is the memory of agents. And so Mike, as I peer showed this off a little bit earlier, where he was basically interacting with the bot and kind of like teaching it what to do and correcting it. And so this is an example where I'm teaching in a chat setting, an AI to kind of like write a tweet in a specific style. And so you can see that I'm just correcting it in natural language to get to a style that I want. I then hit thumbs up. The next time I go back to this application, it remembers the style that I want. But I can keep on editing it. I can keep on making it a little more differentiated. And when I go back a third time, it remembers all of that. And so this I would kind of classify as kind of like procedural memory. So it's remembering the correct way to do something. I think another really important aspect is basically personalized memory.
So remembering facts about a human that you might not necessarily use to do something more correctly, but you might use to make the experience kind of like more personalized. So this is an example kind of like journaling app that we're building and playing around with for exploring memory. You can see that I mentioned that I went to a cooking class and it remembers that I like Italian food. And so I think bringing in these kind of like personalized aspects, whether it be procedural or kind of like these personalized facts, will be really important for the next generation of agents.