首页  >>  来自播客: Y Combinator 更新   反馈  

Y Combinator - Why Vertical LLM Agents Are The New $1 Billion SaaS Opportunities

发布时间:2024-10-04 14:00:41   原节目
杰克·海勒 (Jake Heller) 是 CaseText 的创始人,他分享了关于构建成功的垂直领域 AI 代理的见解,尤其是在法律领域。在 GPT-4 发布后,CaseText 被 Thomson Reuters 以 6.5 亿美元的价格收购,估值从 1 亿美元迅速攀升。杰克是一位曾接受计算机科学培训的前律师,他注意到法律行业在技术应用方面与消费科技存在差距。这激发了他最初创建 CaseText 的想法,旨在提供更好的法律信息获取渠道。 最初,CaseText 尝试了用户生成内容 (UGC) 模式,让律师注释案例,但由于律师时间有限而失败。他们随后转型,投资于 AI 和自然语言处理,为法律工作流程提供渐进式的改进。然而,这些改进往往遇到阻力,因为律师们不愿改变他们既定的、有利可图的执业方式。 GPT-4 的出现改变了游戏规则。杰克和他的团队获得了 GPT-4 的早期使用权,并在 48 小时内决定将整个 120 人的公司转型,专注于一款名为 Co-Counsel 的新产品,这是一个 AI 法律助手。在 GPT-4 公开发布之前,他们秘密地让一小部分客户使用它,这些客户对它的能力感到震惊。Co-Counsel 可以阅读数百万份文件并进行总结,或者进行法律研究并起草备忘录,而所有这些工作都只需人类律师花费的时间的一小部分。 这种激烈的转变并非没有挑战。杰克面临着来自习惯于公司现有发展轨迹的员工的阻力。为了解决这个问题,杰克以身作则,亲自构建了 Co-Counsel 的第一个版本。客户演示也改变了员工的想法,他们亲眼目睹了这项新技术的潜力。 杰克强调了理解为用户解决的核心问题的重要性。他们专注于为 Co-Counsel 创建 "技能",模仿最优秀的律师处理任务的方式,将其分解为可操作的步骤。然后,每个步骤都被转化为一系列提示,经过严格的测试和改进。这种测试驱动的开发方法帮助他们实现了高准确率。 杰克反对“公司只是在构建 GPT 外壳”的观点。对他来说,真正的价值在于专有数据集、与客户数据库的集成、专门的 OCR 技术,以及最重要的是,对特定领域的细致理解。这确保了 AI 不仅仅是生成听起来合理的文本,而是提供准确、可靠和可操作的信息。 针对人们对 AI“幻觉”的担忧,杰克强调了彻底测试和迭代改进的重要性。他们创建了大量的测试,以识别错误模式,然后改进提示以解决这些模式。他发现,一旦一个提示通过了数百次测试,它很可能在各种用户输入上表现准确。 关于 OpenAI 的新型 O1 模型,杰克印象深刻,尤其是其执行精确细节思考的能力。他分享了一个例子,说明 O1 如何识别了法律摘要中的错误,而之前的模型却错过了。他设想未来,不仅可以提示 AI 如何回答问题,还可以提示 AI 如何思考,将领域专业知识注入推理过程。 杰克鼓励企业家不要因为关于 AI 局限性的常见说法而放弃 AI。他强调了专注于为用户解决实际问题,并通过构建强大、可靠的 AI 应用程序创造价值的重要性。关键在于对细节的细致关注、测试驱动的开发以及领域特定知识的集成,从而在行业中创造更有趣和更具战略性的角色。

Jake Heller, founder of CaseText, shares his insights on building successful vertical AI agents, particularly in the legal field. CaseText was acquired by Thomson Reuters for $650 million after the release of GPT-4, a rapid ascent from a $100 million valuation. Jake, a former lawyer with computer science training, noticed a gap in the legal industry's use of technology compared to consumer tech. This sparked his initial idea for CaseText, which aimed to provide better access to legal information. Initially, CaseText tried a UGC model, getting lawyers to annotate cases, but this failed due to lawyers’ limited time. They pivoted, investing in AI and natural language processing, offering incremental improvements to legal workflows. However, these improvements were often met with resistance, as lawyers were hesitant to change their established, profitable practices. The game changed with GPT-4. Jake and his team gained early access to it and within 48 hours, decided to shift the entire 120-person company to focus on a new product called Co-Counsel, an AI legal assistant. Before GPT-4’s public launch, they secretly let a select group of customers use it, who were astonished by its capabilities. Co-Counsel could read millions of documents and summarize them, or conduct legal research and draft memos, all in a fraction of the time a human lawyer would take. This radical shift wasn’t without its challenges. Jake faced resistance from employees accustomed to the company's existing trajectory. To address this, Jake led by example, building the first version of Co-Counsel himself. Customer demonstrations also changed minds, and employees witnessed firsthand the potential of the new technology. Jake emphasizes the importance of understanding the core problem being solved for the user. They focused on creating "skills" for Co-Counsel, mimicking how the best attorneys would approach a task, breaking it down into actionable steps. Each step was then translated into a series of prompts, tested rigorously, and refined. This test-driven development approach helped them achieve high accuracy. Jake rejects the notion that companies are simply building GPT wrappers. To him, the real value lies in the layers of proprietary data sets, integrations with customer databases, specialized OCR technology, and, most importantly, a nuanced understanding of the specific domain. This ensures that the AI isn't just generating plausible-sounding text but delivering accurate, reliable, and actionable information. Addressing concerns about AI "hallucinations," Jake highlights the importance of thorough testing and iterative improvement. They created a large battery of tests to identify patterns of errors and then refine prompts to address those patterns. He found that once a prompt passed hundreds of tests, it was likely to perform accurately on a wide range of user inputs. Regarding OpenAI's new O1 model, Jake is impressed, particularly with its ability to perform precise detail thinking. He shares an example of how O1 identified errors in a legal brief that previous models had missed. He envisions a future where AI can be prompted not just on how to answer questions, but how to think, injecting domain expertise into the reasoning process. Jake encourages entrepreneurs not to give up on AI due to common tropes about its limitations. He emphasizes the importance of focusing on solving real problems for users and creating value by building robust, reliable AI applications. The key is in the meticulous attention to detail, test-driven development, and integration of domain-specific knowledge, resulting in more interesting and strategic roles within the industry.