The video features Alex from Claude Relations at Anthropic along with Eric and Barry from the research and Apply to the Eye teams, respectively, diving into the intricacies of building effective AI agents. They elaborate on a recent blog post about the same, distinguishing between workflows and true agents, and offer practical advice for developers venturing into agent development.
Eric begins by clarifying the definition of an agent, differentiating it from a simple workflow. While many loosely apply the term “agent” to any system involving multiple LLM (Large Language Model) calls, the team clarifies that an agent is characterized by its autonomy. It operates through iterative loops, guided by the LLM's decision-making, until a task is resolved. The number of steps is not predetermined, contrasting sharply with workflows, which follow a fixed, pre-defined path. An agent adapts its approach based on the situation, making it suitable for tasks like customer support or code iteration, where the resolution process is variable.
Barry explains that the distinction between workflows and agents emerged as models became more sophisticated. Initially, single LLMs were used, evolving into systems utilizing multiple LLMs orchestrated in code. This progression revealed two distinct patterns: workflows, which are heavily coded and orchestrated, and agents, which are simpler yet possess a different kind of complexity. The rising capabilities of models prompted the team to formally define the term “agent” and differentiate from workflows.
In practical terms, Eric explains that the difference manifests at the prompt level. A workflow prompt is structured sequentially, with the output of one prompt feeding into the next in a linear fashion. Each prompt has a specific purpose, such as categorizing a user question. In contrast, an agent prompt is open-ended, providing the model with a variety of tools and resources, such as web search or code editing, to achieve a goal.
Barry shares a humorous anecdote from his onboarding experience, tasked with running OS World, a computer user benchmark. Faced with counterintuitive agent behavior, he and a colleague simulated the model's limited perspective by closing their eyes and briefly glimpsing the screen, mimicking the model's input. This exercise highlighted the importance of empathy and providing ample context to the model.
Eric emphasizes the need to consider the model’s perspective when designing tools. Developers often create beautiful, detailed prompts but neglect to adequately document the tools provided to the model, which can cause difficulty in the process. He stressed that a tool needs to have good documentation for the model as it would for a human engineer.
The conversation then transitions to the current state of agent technology, addressing both its overhyped and underhyped aspects. Eric suggests that the underhyped aspect is the automation of tasks, even small ones, that save people time. Automating these tasks can change the dynamics of how often these things can be done. Barry points out the difficulty in calibrating where agents are truly needed, identifying a "sweet spot" of valuable, complex tasks where the cost of error is relatively low. He cites coding and search as examples.
Barry explains the potential of coding agents, highlighting their verifiability through unit tests. The success of coding agents depends on the quality of the unit tests used to provide feedback to the model. Eric agrees and suggests that improving agent performance will come back to verification. They propose focusing on ways to add tests for the things that you really care about so that the model itself can test this and know whether it's right or wrong before it goes back to the human.
Looking ahead to 2025, Barry envisions multi-agent environments, where multiple AI agents interact and coordinate. He mentions an experiment where multiple "Claude" models play a text-based social deduction game, "Werewolf," to explore agent interaction. While single agents still need to show a lot of successful applications in production, it would be a potential extension with the next couple of generations of models. Eric predicts increased business adoption of agents to automate repetitive tasks. He expresses skepticism about consumer-facing agents for complex tasks like vacation planning, due to the difficulty of specifying preferences and the high risk of errors.
Finally, the speakers offer advice for developers interested in agent development. Eric advises focusing on measurable results to get feedback about whether what is being built is working. Barry recommends starting as simple as possible and building complexity gradually. They both emphasize the importance of building something that can improve as models get smarter.