Sierra co-founder Clay Bavor on Making Customer-Facing AI Agents Delightful
发布时间 2024-08-27 09:00:18 来源
摘要
Customer service is hands down the first killer app of generative AI for businesses. The reasons are simple: the costs of existing ...
GPT-4正在为你翻译摘要中......
中英文字稿
One of the more interesting learnings from the past, you know, year and a half of working on this stuff is that the solution to many problems with AI is more AI. And it's somewhat unintuitive, but one of the remarkable properties of large language models is that they're better at detecting errors in their own output than in not making those errors in the first place. Joining us today is Clay Bivore, co-founder of Sierra. Before Clay started Sierra with his longtime friend Brett Taylor, he spent 18 years at Google, where he started and led Google Labs, their AR VR efforts, and a number of other forward-looking bets for the company. Sierra is allowing every company to elevate its customer experience through AI agents, and there is no one who knows more about what AI agents can do today and what they'll be doing tomorrow than Clay. You'll get to hear about how pictures of avocado chairs helped inspire the founding of Sierra, why the solution to problems with AI is often more AI, and so much more. Please enjoy this incredible episode with my friend Clay Bivore. Alright Clay, listen, this is a funny start because we know each other so well, but can you just tell everyone a little bit about yourself and just give us some background before we talk about the future of AI and what role Sierra is going to play on that?
在过去一年半的工作中,我们学到的一个有趣经验是,解决许多AI问题的方法是使用更多AI。这虽然有点不符合直觉,但大型语言模型一个显著的特点是,它们在发现自己输出中的错误方面比避免这些错误更出色。今天与我们一起的是Clay Bivore,Sierra的联合创始人。在与他长期的朋友Brett Taylor一起创办Sierra之前,Clay在谷歌工作了18年,他创立并领导了谷歌实验室、增强现实和虚拟现实项目,以及其他一些前瞻性计划。Sierra正在帮助每个公司通过AI代理提升客户体验,而在AI代理的现状和未来方面,没有人比Clay更了解。你将听到关于鳄梨椅子图片如何激发Sierra创建灵感的故事,为什么解决AI问题的方法通常是更多的AI,以及更多精彩内容。请享受这一期与我朋友Clay Bivore的精彩对话。好了,Clay,尽管我们彼此非常熟悉,但在讨论AI的未来和Sierra将扮演的角色之前,能不能先跟大家简单介绍一下你自己以及你的背景?
First of all, I'm a Bay Area native. I grew up not more than four or five miles from here. So grew up in the Bay Area, got to see the kind of dot com bubble grow and then burst, study computer science and then ended up right out of undergraduate at Google, where I was for 18 years until last last March. And so at Google I worked on really every part of the company. I started in search and then ads for several years. I ran the product and design teams for what is now workspace so Gmail and Google Docs and Google Drive and so on. And then spent the last really 10 years at Google working on various forward looking bets for the company, some hardware related like virtual and augmented reality, some AI related like Google lens and other applications of AI. And then 15 months ago left Google to start Sierra with a long time friend of mine, Brett Taylor.
首先,我是湾区本地人。我从小生活在离这里不超过四到五英里的地方。所以,我在湾区长大,见证了互联网泡沫的兴起和破灭。我学习了计算机科学,并在本科毕业后直接进入了谷歌,在那里工作了18年,直到去年三月。在谷歌期间,我几乎参与了公司所有部门的工作。我一开始在搜索部门,然后在广告部门工作了好几年。我还负责管理现在叫做Google Workspace的产品和设计团队,包括Gmail、Google Docs和Google Drive等等。在谷歌的最后十年里,我专注于公司各种前瞻性项目的研发,其中包括一些与硬件相关的,比如虚拟和增强现实,还有一些与人工智能有关的,比如Google Lens和其他AI应用。十五个月前,我离开谷歌,与我的老朋友布雷特·泰勒一起创办了Sierra公司。
We met in our early days at Google where we both started our careers in the associate product management program. So he was I think class one. I was class three and we met early on and stayed in touch in particular through a monthly poker group that in a good year would play like once. And met up December of 2022 and just saw what was happening in and around AI and these fundamentally new building blocks that we thought would enable us to create something really special and started Sierra out of that. So that's the recap. Actually, I'm curious on that. And we need to get to what is here pretty quickly here, but just for fun December 2022 very shortly after the chat GPT moment. How I guess what was the process like or how soon after that moment did you have the conviction that this is a sufficiently interesting new technology to build a company around. And I introduce one thing that's kind of interesting. I hope you talk about before you actually before the chat GPT moment, you had been telling me about how everything was going to change. I still remember distinctly him telling me you don't understand you're going to be able to talk about a scene that you envision and they're going to be able to make a movie out of you just talking about it. Do you remember you telling me? Yes. Yeah. And so I'm actually very curious about this too.
我们在谷歌早期相识,当时我们都在副产品管理项目中开始了职业生涯。他是第一批,我是第三批,我们早早相识,并通过一个每年好转时可能只聚一次的扑克牌俱乐部保持联系。我们在2022年12月见面时,观察到了人工智能领域的新动向,以及我们认为能够创造一些特别事物的全新基础。于是,我们由此创立了Sierra。这就是事情的简单回顾。实际上,我对此很感兴趣,我们需要快速了解当前的情况,但只是为了乐趣,2022年12月就在ChatGPT推出后不久,你是如何确定这是一个足够有趣的新技术,可以围绕它建立公司的呢?我希望你能谈谈一件有趣的事情。在ChatGPT推出之前,你告诉我一切都会改变。我仍然清楚地记得你对我说,你不明白,你将能够描述一个你想象中的场景,而他们将能够根据你的描述制作电影。你还记得你告诉我的吗?是的。所以我对此也非常好奇。
Well, I had such a privilege seat at Google to see so much of what came out of that transfer paper in 2017 and the emergence of early large language models. So at Google, one of the first was called Mina or Lambda. There was a paper, I think in 2020, a conversational chatbot for just about anything. I remember even before that, getting to interact with this thing in a pre-release prototype and having this uncanny sense that there was someone, something on the other side of it and that this was different. And another moment, I think it was mid 2022 when we had, I think it was the first or second version of Paul and Pathways Language Model at Google, it was a 540 billion perimeter model. And we were testing it to see kind of how smart it was. And one of the surest and signs of intelligence is the ability to think and reason and metaphor and analogy. So we tried a few things and one, which was pretty straightforward as we asked Paul, hey, explain black holes in three words. And it came back without skipping a beat, black holes suck. And we were like, oh, that's a pretty good summary. Also, the model seems to have a sense of humor, which is cool. And the moment that really blew my mind, we asked, and I remember the answer verbatim, we asked Paul, please explain the 2008 financial crisis using movie references. And again, without skipping a beat, so the 2008 financial crisis was like the movie Inception, except instead of dreams within dreams, it was debt within debt. And we all paused, what is this? So it understood basically the concept of CDOs, nestedness of debt, okay, what movie includes nestedness of something else, inception, nestedness of dreams. So it's like Inception. And we all thought, wow, this is something new and different. And then there were a couple other moments. I remember the first Dolly paper came out and they did a blog post and people reacted a little bit to it. But for me, I remember one of the stars of the show was they asked Dolly to make avocado chairs. And so, and I know this sounds so odd, but here is a set of 10 or 20 images of chairs that look like avocados. It wasn't photoshop, these images had never existed before, and yet the models seem to understand similar to the movie reference metaphor concepts of avocado and shareness and put those together and create these images pixel by pixel. So we have avocado chairs at Instacart. Yeah, it's actually did. We actually did. Wow. We actually had chairs shaped like avocados.
在Google工作让我有幸见证了2017年转移学习论文的影响,以及早期大型语言模型的出现。在Google,最早的模型之一叫做Mina或Lambda。我记得大概在2020年,有一篇关于一个能进行各种对话的聊天机器人的论文。甚至在这之前,我就体验过一个未发布的原型,感觉就像对面有某种智慧存在,这与众不同。还有一个时刻是在2022年中期,我们有了Paul和Pathways Language Model的第一或第二个版本,这是一个拥有5400亿参数的模型。当时测试它的智能程度,智能的标志是具备思考、推理、比喻和类比的能力。所以我们尝试了几个东西,其中一个很简单,我们让Paul用三个词解释黑洞,结果它立刻回答:“黑洞吸人”,我们认为这是一个不错的总结,模型似乎还有幽默感,这很酷。最让我惊讶的时刻是我们让Paul用电影比喻解释2008年金融危机,结果它毫无迟疑地说,2008年金融危机就像电影《盗梦空间》,只是没了梦中梦,而是债中债。我们都惊呆了,这是什么意思?它基本理解了CDO、债务的嵌套性,然后想到哪个电影包含其他事物的嵌套,《盗梦空间》,梦的嵌套性。所以这就像《盗梦空间》。我们都觉得,哇,这真是新鲜和不凡。还有其他几次瞬间,我记得第一次看到Dolly的论文,他们发了一篇博文,人们反应平淡。但对我来说,印象深刻的部分是他们让Dolly创作鳄梨椅子。听起来很匪夷所思,但出现了一组10到20张看起来像鳄梨的椅子图像。这不是PS,这些图像以前从未存在过,而模型似乎理解了类似电影比喻中鳄梨和椅子概念的结合,逐像素地创造出这些图像。我们在Instacart还真有鳄梨椅子,确实,我们真地有椅子是鳄梨形状的。
In related news, there were times where we were burning a little bit too much money. You know, those bags. So, how to, how to good sense that something was coming. And in fact, the team, the team I was running at Google at the time labs was putting a lot of large language models to use in early applications there. And so, how to hunch. Chat GPT certainly clarified that hunch, but I think Brett and I both for several years had been tracking what was happening and just seeing, you know, first it was translation and better than human level translation. And it was some of this language generation. And I think credit to OpenAI for doing the engineering work and data work and much more to make GPT 3 turn into chat GPT where suddenly you could grasp this thing's full potential without, you know, knowing how to write Python and use their APIs. All right, so we're going to talk about where AI is going. We're talking about agents. We're talking about customer service. Right. But first, just, can you maybe just tell people a little bit about Sierra and what you and Brett have created? Yeah. So in a nutshell, Sierra enables any company in the world to create its own branded customer facing AI to interact with its customers for anything from customer service to commerce. And the backdrop for this is this observation that any time there's been a really significant change in technology. People interact with computers with technology in different ways. And as a consequence, businesses are able to interact with their customers in entirely new ways. And you saw this in the 90s, the internet made the website possible. And for the first time a company could have a sort of digital storefront and be present to the world, update its inventory with the click of a button and so on. In, you know, the mid to mid early 2000s, 2005, 2008, if you were a company, you could all of a sudden through ubiquitous social networks interact with your customers at scale and have conversations at scale. And in 2015, right after the rise of smartphones, right as a company, you could put kind of a Swiss Army knife version of your company in everyone's pocket.
相关新闻中提到,有时我们在花钱方面有些过度。你知道,就是那些开支。所以,我们需要理智地意识到一些迹象。事实上,当时我在谷歌负责的团队正在实验室中运用大量大型语言模型进行早期应用。我们有一些直觉,ChatGPT验证了这种直觉。我和Brett几年来一直在关注事态发展,最初是翻译,后来是比人类更高水平的翻译,然后是语言生成。在这方面,OpenAI确实值得称赞,他们进行了大量的工程和数据工作,将GPT-3发展为ChatGPT,使得人们无需知道如何编写Python代码或使用API就能理解其全部潜力。我们接下来会讨论AI的发展方向,包括智能代理和客户服务。但首先,你能简单介绍一下Sierra以及你和Brett创造的东西吗?好的,简单来说,Sierra能够让世界上任何公司创建自己的品牌化人工智能,用于与客户进行从客服到商务的交互。其背景是一种观察,即每当技术发生重大变化时,人们与计算机和技术互动的方式就会改变,企业也因此能够以全新的方式与客户互动。在90年代,互联网的兴起使网站成为可能,公司首次可以拥有数字化店面,向全世界展示自己,并通过点击按钮更新库存。在2005年至2008年间,普及的社交网络让公司能够大规模与客户互动并进行大规模对话。2015年,智能手机普及后,公司可以将“瑞士军刀”版本的服务放到每个人的口袋里。
And so like, I bet you have your bank's mobile app on your phone, probably on your home screen. So the last few years of advances in AI has for the first time made it possible to create software that you can speak to, right? Software that can understand language, software that can generate language. And most interestingly, I think software that can reason and make decisions. And it's made for really delightful conversational experiences like those that we associate with chat GPT. And so we think there's a big, big deal for how businesses interact with their customers.
所以,我猜你可能在你的手机上安装了你银行的手机应用程序,甚至可能放在主屏幕上。这些年来,人工智能的进步首次使得创建可以与人对话的软件成为可能。这些软件能够理解语言,生成语言。更有趣的是,我认为这些软件能够推理和做出决策。这使得对话体验变得非常愉快,就像我们与ChatGPT等应用相关联的体验。因此,我们认为这对企业与客户的互动方式有重大影响。
And you think about the difference between how we do some things today versus what you could do if you could just have a conversation. If you could just have a conversation with the business you're interacting with. Think about like shopping. You're in the market for some shoes, right? Or Pat, maybe for you, some new weights or something. You're a very heavy weight. I'm a little one. And you're on the website and it's like, you basically have to imagine how the company's designer would have organized the product catalog.
你可以想象一下,如果我们在处理一些事情时能够直接进行对话而不是按照现在的方式,会有什么不同。想象一下,你正在与某个公司互动,比如说购物。你想买一双鞋,对吧?或者对你来说,也许是新的重量器材。你可能很重,我则比较轻一些。你在浏览网站时,基本上需要猜测这家公司设计师如何安排产品目录。
You can say, men's, men's shoes, men's running shoes, men's racing shoes, light weight, vapor fly. I can't remember the name and so on. Instead, with conversationally, you could just say, hey, I need some super lightweight running shoes. Kind of like those ones I got last time. What do you got? And it's almost like I'm dating myself a little bit here, but like Yahoo! Directory, where you navigate through this hierarchical structure to find what you want.
你可以说,男士、男鞋、男士跑鞋、男士竞速鞋、轻量化、vapor fly等等。我记不清名字了,诸如此类。相反地,如果是聊天的话,你可以直接说:“嘿,我需要一些超轻的跑鞋,就像我上次买的那种。你有什么推荐吗?” 这听起来有点像那个早期的Yahoo!目录,你需要通过一个分层结构来找到你想要的东西。
In contrast to Google, you explain what you want. And this takes it several steps further. And there's a quote from the head of customer experience that one of the companies we work with. She said, I don't want our customers to have to have a master's degree in our product catalog and our corporate processes. And to do a lot of things, buying shoes is fairly easy on the spectrum of interactions you have with companies. .
与谷歌相比,你需要明确说明你的需求。而这比谷歌又更进一步。我们合作的一家公司的客户体验负责人有一句话:“我不希望我们的客户在了解我们的产品目录和公司流程时,需要拥有硕士学位。” 对于很多事情来说,买鞋在与公司的各种交互中算是相对简单的。
Imagine adding a new person to your insurance policy. Like, where do you go in the mobile app for that? How do you get that done? And your eyes just glaze over it, right? And so the alternative, talking to an AI, and in particular, an AI agent, it's a technology around which we build Sierra, where that AI agent represents your company, your company, its best, we think is really, really powerful. And even in, you know, we're 15 months old as a company, we've had the privilege of already working with story brands like Weight Watchers, Sonos, SiriusXM, Olukai, if you're in the market for new flip flops, I strongly recommend Olukai flip flops.
想象一下给你的保险政策添加一个新成员的过程。在手机应用程序中,你该到哪里去做这件事?你怎么在应用里完成这个流程?通常情况下,你可能会觉得眼前一片模糊,对吧?但是,另一种选择是与人工智能沟通,特别是一个代表你公司的人工智能助手。这就是我们围绕Sierra构建的技术,我们认为,这种代表你公司最佳形象的AI助手,真的非常强大。即便我们公司才成立了15个月,我们已经有幸与一些知名品牌合作,比如Weight Watchers、Sonos、SiriusXM、Olukai。在市场上寻找新的人字拖的话,我强烈推荐Olukai的人字拖。
I have two firsts. Very good, excellent. Also make great golf shoes. Oh, really? Oh, yeah, yeah, yeah. You should get some. All right, great. And so for Weight Watchers, we're advising on points and helping members manage their subscriptions. With SiriusXM, we're helping diagnose and fix radio issues and figure out what channel your favorite music is on and so on. And the results, again, in the first year of the platform out there, were in one case resolving more than 70% of all incoming customer inquiries at extremely high customer satisfaction.
我有两个第一,非常好,优秀。还制作优质的高尔夫鞋哦,真的吗?哦,是的,你应该买几双。好的,太好了。对于体重观察者(Weight Watchers),我们提供关于积分的建议,并帮助会员管理他们的订阅。在SiriusXM方面,我们帮助诊断和修复收音机问题,查找你喜爱的音乐频道等等。在该平台推出的第一年,结果之一是解决了超过70%的客户查询,并且客户满意度极高。
And all this leads us to believe that every company is going to need their own AI agent, and we want to be the company that helps every company build their own. In the spirit of sort of the, you know, the future of these AI agents and what they could mean for customer-facing communications and customer-facing operations. Are there any good examples of things that were not possible 18 months ago that are possible today? And then maybe if we roll the clock forward, things that are still not quite possible today that you think will be possible.
这让我们相信,每家公司都将需要自己的AI代理,而我们希望成为帮助每家公司建立自己的AI代理的公司。展望这些AI代理的未来,以及它们可能对面向客户的沟通和操作产生的影响。有没有一些好的例子说明18个月前不可能实现的事情现在已经变得可能?然后,假如我们展望未来,还有哪些今天仍难以实现的事情是你认为将来会可能实现的?
Yeah. 18 months from now? Yeah. First of all, the progress month by month and over 18 months in particular is just kind of breathtaking. 18 months ago, GPT four-class models didn't exist, right? It was still kind of something just coming over the horizon. Agent architecture is cognitive architecture is kind of the way you compose large language models and other supporting pieces of infrastructure were very, very rudimentary. And so I go so far as to say like the idea of putting an AI in front of your customers that could be helpful and importantly safe and reliable.
好的。那么在18个月之后呢?首先,每个月的进展,尤其是在这18个月期间的发展,真的令人惊叹。18个月前,GPT-4级别的模型还不存在,那时这类技术仅仅是刚刚开始显露头角。那时候的代理架构和认知架构——即组合大型语言模型和其他支持性基础设施的方式——非常初级。所以我会大胆地说,把一个既有帮助又安全可靠的AI放到你的客户面前的想法,正在成为现实。
That was just impossible. And so chatbots from even 18 months ago looked a lot like a pile of hard-coded rules that someone cobbled together over months or years that became very brittle. And I think we've all had the experience of talking to chatbots. I'm sorry, I didn't get that. Can you ask in a different way? Or my favorite is when they have the message box and then like the four buttons you can click, but the message box is blanked out and you can't actually use it. And so I can help you with anything so long as it's one of these four buttons.
这简直是不可能。即便是18个月前的聊天机器人,看起来也像是一堆用硬编码规则拼凑起来的东西,可能经过了几个月甚至几年的时间才做出来,却非常容易出错。我想我们都有和聊天机器人交流的经历,比如,它们会说“对不起,我没明白你的意思。你能换种方式问吗?”或者我最喜欢的情况是,当有一个信息框和四个可点击按钮时,信息框却被禁用,实际上你根本无法使用它。所以我可以帮你解决任何问题,前提是这些问题必须是这四个按钮中的某一个。
So most of what I described, right, fixing radios, processing exchanges and returns and so on, wasn't possible at least in any satisfying way or in a way that led to real business results for companies 18 months ago. Fast forwarding 18 months, you know, I think we go pretty deep here. I think multimodal models are quite interesting. Something like 80% of all customer service inquiries are on the phone, not on chat or email. So voice will obviously be a huge part of it. Things like returns, exchanges, diagnosing radio issues and things like that are on the simpler end of the spectrum of the total set of tasks that you might want to get help with from an AI agent. And so I think more advanced models, more sophisticated cognitive architectures, all of those, I hope would increase kind of the smarts in the agent, the types of problems that can solve and then trust safety, reliability, the hallucination problem, I think is still an unsolved area. And we've made others have made huge amounts of progress on it.
所以,我之前提到的内容,比如修理收音机、处理退换货等事务,在至少18个月前是无法做到的,至少无法以令人满意或能够带来实际业务成效的方式来实现。快进到18个月后,我认为我们在这个领域取得了很大进展。多模态模型非常有趣。大约80%的客户服务问询通过电话进行,而不是通过聊天或电子邮件。因此,语音显然会是一个重要的部分。退换货、诊断收音机问题等任务属于可以由AI助手帮忙解决的任务中较为简单的部分。因此,我认为更先进的模型、更复杂的认知架构可以提升AI助手的智能程度,解决的问题类型也会更加多样化。同时,信任、安全性、可靠性以及虚假信息的问题仍然是未解决的领域,虽然在这方面我们和其他人已经取得了很大进步。
But I think we can't yet declare victory. How quickly do you think it's going to become? You guys are doing so much for the customers, not just customer service, but you know, working all the way through the tunnel. But on the customer service side, how long is it going to take for you to become the default that folks expect that they will be able to have someone or an AI that's available at any time to answer any question, you know, make that real for us.
但我认为我们还不能宣告胜利。你觉得这会多快实现呢?你们为客户做了很多,不仅仅是客户服务,而是全程都在努力。但就客户服务方面而言,要多久你们才能成为人们期望的默认选项,就是能随时有人或AI回答任何问题,让这一切成为现实呢?
Yeah, I don't know. And in part, there's a bit of a hole to dig ourselves out of as not a company, but as an industry where it's like, one was the last time you had a great interaction with the chatbot on a website. And, you know, I think if you polled 100 people and you're like, do you like talking to customer service chatbots, probably zero out of 100, would you say, yes. On the other hand, if you ask like, hey, do you ask 100 people, do you like interacting with chat GPT, maybe 100 out of 100 would say yes. And so I think some of the work we've been doing in our product is to educate our customers, customers up front that like, hey, this thing's actually really smart and good. One of the interesting specific techniques for doing that is we stream our answers out word by word similar to how chat GPT does. People are so used to the message message message message. The streaming answers is something of a kind of visual signature for, oh, there's a really smart AI behind this.
是的,我也不是很确定。部分原因在于,我们需要从一个困境中走出来,这并不只是某个公司的问题,而是整个行业的问题。比如说,你上次在网站上和聊天机器人有很好的互动是什么时候?如果你询问100个人是否喜欢和客服聊天机器人交流,可能没有人会说喜欢。但另一方面,如果你问100个人是否喜欢与Chat GPT互动,可能每个人都会说喜欢。因此,我们在产品中做了一些工作,以便提前让我们的客户知道,这个东西其实很聪明和出色。其中一个有趣的方法是,我们的答案是逐字输出的,类似于Chat GPT的方式。人们已经习惯了消息一个接一个地出现,而这种流式输出的答案成为了一个视觉标志,表明背后有个非常聪明的人工智能。
And so I think what we find is customer satisfaction is extremely high with our age, AI agents, you know, in the mid for so 4.5 out of 5 stars. Which in some cases is higher than customer satisfaction with human agents. And in fairness, they often get the hardest cases and the cases that, you know, we will hand off because the customer became angry or was especially frustrated or something. But still those results are really significant. And so my guess is over just the next few years, I think people will realize, oh, I can get my issue resolved faster. This thing is actually capable and can not only answer my questions, but you know, one of the things we're really proud of is we go far, far beyond just answering questions but can actually take action and get the job done.
我认为,我们发现客户对我们的AI代理的满意度非常高,通常在4.5星(满分5星)左右。这在某些情况下甚至比对人工客服的满意度还要高。公平地说,AI代理通常会处理一些最棘手的问题,比如客户非常生气或沮丧的情况。但即便如此,这样的结果依然非常显著。因此,我猜想在接下来的几年里,人们会意识到,哦,我可以更快地解决我的问题。我们的系统不仅能够回答问题,我们引以为豪的一点是,它远远超出了简单回答问题的水平,可以真正地采取行动并完成任务。
Can you talk a bit about agents, OS and some of the frameworks that you put around the foundation models to make everything work? So it's been such an interesting journey learning what's required to put AI safely, reliably and healthfully in front of our customers customers. And a huge part of that, really, the first part is looking at what are the challenges with large language models and how do you address or meaningfully mitigate those. And so start with hallucinations. I don't know if you saw it, but there is an example from a few months ago where Air Canada's chatbot that I think was based on an LLM and apparently not much else was interacting with the gentleman who had questions about their bereavement policy. I think the person had had someone pass away in his family and was asking about refunds and credits and so on.
当然可以。我来谈谈代理、操作系统以及我们用来支持基础模型的一些框架,这些都是为了让一切顺利运作。学习如何将AI安全、可靠且稳健地呈现给我们的客户真是一段有趣的旅程。首要任务是了解大型语言模型面临的挑战,并找到有效应对或缓解这些挑战的方法。
首先要提到的是“幻觉”(hallucinations)问题。你可能已经听说过这个例子:几个月前,加拿大航空的聊天机器人大概以大型语言模型为基础,似乎没有太多其他支持,就与一位绅士互动。这位先生在询问公司的丧亲政策,他的家人刚刚去世,所以他在询问退款和积分的问题。
And the AI made up a bereavement policy that was quite a bit more generous than your Canada's actual bereavement policy. And so the man took a photo and later claimed the full amount of that refund and so on. They said, no, actually that's not our policy.
人工智能编制了一项丧假政策,比加拿大现行的丧假政策更为慷慨。所以那个人拍了张照片,后来要求全额退款等等。他们说,不,其实那不是我们的政策。
And bizarrely, and I don't quite understand this, the case went all the way to court, Air Canada loss. And our thought was like, hey, it's just like $500 and like Canadian dollars. So, but hallucinations are a real challenge. And on top of that, just to enumerate some of the things to overcome and that we have with AGNOS, no matter how smart, GBT5 or 6 is, it won't know where your order is, right? Or which seats, right, you've booked on the upcoming flight or whatever.
令人费解的是,我也不太明白,案件居然一路闹到法庭,加航输了。而我们的想法是,这不就500加元吗?但是幻想的问题确实是个真正的挑战。除此之外,我们在AGNOS系统中需要克服的一些问题,无论GBT5或6有多聪明,它都不知道你的订单在哪里,或者你预定的下一个航班上的座位是哪一个,对吗?
It's obviously not in the pre-training set. And so you need to be able to safely and reliably and in real time integrate an AI agent in our case with systems of record to look up customer information, order information and so on. And then finally, most customer service processes are actually somewhat complex, right? You go to call centers and there will be flow charts on the wall.
显然,它未在预训练集中。因此,你需要能够在安全、可靠且实时的情况下,将一个人工智能代理集成到我们的系统中,以便查询客户信息、订单信息等。而且,大多数客户服务流程实际上相当复杂,对吧?当你去呼叫中心时,会看到墙上挂着流程图。
Like here's how we do this and if there's an exception this way and so on. And as capable as, you know, GPT4 and Gemini 1 5 class models are, they'll often have trouble following complex instructions. And we saw one example in an early version of an agent that we prototyped where you'd give it five steps in a returns process or something.
就像这样:我们通常是这样做的,如果有例外情况就这样处理,等等。尽管像 GPT-4 和 Gemini 1 5 这样的模型能力很强,但它们在理解复杂指令时经常会遇到困难。我们在早期开发的代理原型中看到过一个例子,当你给它一个包含五个步骤的退货流程时,它可能会无法准确执行。
And you'd say, hi, I need to return my return my order or whatever. And it would jump straight to step five and then call a function to return the shoes with username John Doe at example.com. Come up, order number one, two, three, four, five, six, so it would not only hallucinate facts or bereavement policies, but even function calls and function parameters and so on.
你会说:“你好,我需要退货”之类的话。系统会直接跳到第五步,然后调用一个函数来处理退货,比如用户名是John Doe,邮箱是example.com,订单号是123456。这样不仅可能会出现虚构的事实或政策,甚至还可能会虚构函数调用和函数参数。
So with Agent OS, what we built is essentially a toolkit and a runtime for building industrial grade agents that I don't want to say that we've solved every one of these problems, but overcome and mitigated the risks in these problems to such an extent that you can safely deploy them at scale, have millions of conversations with them and so on.
我们研发的Agent OS基本上是一个工具包和运行时环境,用于构建工业级的智能代理。虽然我不想说我们已经解决了所有问题,但我们确实在很大程度上克服和缓解了其中的风险,因此您可以放心地大规模部署它们,与之进行数百万次对话,等等。
And it starts at the foundation layer, I don't mean foundation model layer, but just the base layer of the platform where you have to get really important things like data governance and detection, masking and encryption of person identifiable information, right. And so we built that right into the platform from from the ground up so that our customers data stays our customers data so that their customer is data is protected. We, for instance, detect, mask or encrypt all PI before we log into durable storage, right.
这要从基础层开始说起,我指的不是基础模型层,而是平台的基础层。在这个层面上,你需要处理一些非常重要的问题,比如数据治理和个人身份信息的检测、遮蔽和加密。因此,我们在一开始就把这些功能内置到平台中,以确保客户的数据仍然属于客户,并保护他们客户的数据。例如,我们在将任何个人身份信息记录到持久性存储之前,都会进行检测、遮蔽或加密。这样做是为了确保安全和隐私。
Knowing that we're going to be touching addresses and phone numbers and so on can handle that safely. A level up from that, we've developed what we call agent SDK or agent SDK and it's a declarative programming language that's purpose built for building agents. And it enables an agent developer, most of whom sit within the four walls today of Sierra to express high level goals and guardrails around agent behavior. So you're trying to do this.
知道我们将处理地址和电话号码等信息可以安全地处理。更进一步,我们开发了一个被称为"代理SDK"的工具,它是专门为构建代理而设计的声明式编程语言。这个工具使得代理开发者,尤其是大多数在Sierra内部工作的人,可以表达有关代理行为的高层次目标和保护措施。所以你是在尝试做这件事。
Here are the instructions. Here are the steps and a couple of the exceptions cases. And then here are the guardrails. And to give an example of that, one of our customers works in kind of the healthcare adjacent space. They want to be able to talk about the full range of their products without dispensing medical advice, right.
以下是说明。这是步骤及一些例外情况。然后,这是一些防范措施。举个例子,我们有一个客户在医疗相关领域工作。他们希望能够谈论他们的全系列产品,但不提供医疗建议,对吧。
So how do you create those additional additional guardrails? And then so you can define kind of the behavior and scaffolding for complex tasks for agents with agent SDK. We also have SDKs for integrating with contact centers when we need to hand off for integrating with systems of records, like the order management system and so on.
那么,你要如何创建这些额外的保护措施呢?之后,你可以利用代理软件开发工具包(SDK)来定义复杂任务的行为和框架。我们还提供了用于与联络中心集成的SDK,当需要进行交接时,以及用于与记录系统(如订单管理系统等)集成的SDK。
And then finally for integrating our chat experience directly into a customer's mobile app or website, iOS, Android, web and so on. And then once you've defined the agent using agent SDK, we then have a runtime where we abstract away what happens underneath the hood from the developers so that they can define what the agent should do. Define the what and then agent OS takes care of the how. And so for some skills, there might not be one LLM call, but five, six, seven, ten separate LLM calls to different LLM's with different prompts.
最后,我们可以将聊天体验直接集成到客户的移动应用程序或网站中,包括 iOS、Android 和网页等平台。然后,一旦您使用代理 SDK 定义了代理,我们就提供一个运行时环境,将底层的实现细节从开发人员那里抽象出来,这样他们只需定义代理需要完成的任务即可。开发者只需定义“做什么”,而“怎么做”由代理操作系统来处理。所以,有些技能可能不只需要一个大语言模型(LLM)调用,可能需要五个、六个、七个甚至十个独立的 LLM 调用,每次调用可能使用不同的提示词。
In other cases, we might retrieve documents to support answering an accurate question accurately with and so on. And agent OS, you know, in the spirit of an actual operating system abstracts away a lot of that complexity, kind of the equivalent of IO and resource utilization and so on. So it makes the whole process of building and then deploying an AI agent much faster and much safer and more reliable. And when you think about what you just said, Clay, of like when you call multiple LLM's, is that in a supervisory capacity sometimes too, where you end up having like a supervisor agent reviewing the work of a lower level? Yeah.
在其他情况下,我们可能会检索文档以准确支持回答问题等等。您知道,代理操作系统(Agent OS)就像一个实际的操作系统一样,会抽象化很多复杂性,相当于输入输出(IO)和资源利用等。这使得构建和部署AI代理的整个过程变得更快、更安全、更可靠。而当您想到克莱刚刚提到的内容,比如当您调用多个大型语言模型(LLM)时,有时会以监督的方式进行吗?就是说,是否会有一个监督代理在审查较低级别工作的情况?是的。
One of the more interesting learnings from the past, you know, year and a half of working on this stuff is that the solution to many problems with AI is more AI. And it's somewhat unintuitive, but one of the remarkable properties of large language models is that they're better at detecting errors in their own output than in not making those errors in the first place. And it's kind of like if you were I were to draft an email quickly and like, okay, let me pause. Let me proofread this. Does this make sense to these points hang together? Oh, actually, no, I missed this. And even more powerfully, you can prompt LLM's to take on an essence a different persona. So a supervisor's persona.
在过去一年半的时间里,我在这个领域工作中发现的一个有趣的事情是,很多与人工智能相关的问题的解决方案,竟然是更多的人工智能。这听起来有些违背直觉,但大型语言模型的一个显著特性是,它们比起一开始就不犯错,更擅长于发现自身输出中的错误。就有点像我们快速草拟一封电子邮件后,会暂停一下,仔细校对,检查内容是否连贯,有没有遗漏的地方。更有趣的是,你可以引导这些大型语言模型以不同的角色来工作,比如扮演一个监督者的角色。这种特性让它们在检测和纠正错误时更加有效。
And it seems with that you can elicit more discerning behavior and a closer read of the work being reviewed. So to your question, Ravi, yeah, we in addition to building the agent itself have a number of these supervisory agents that basically, it's like a little Jiminy Cricket agent looking over the shoulder, right of the primary agent. Is this factual? Is this medical advice? Is this financial advice? And is the customer trying to prompt inject and attack the agent and get it to say something that it shouldn't? All of these things. And it's through layering all of these goals, the guardrails, the task scaffolding in using agent SDK within these supervisory layers that we're able to get both to the performance levels we are, 70% plus resolution rates, but also to do that really safely and reliably.
看来,通过这种方式,你可以引导出更具辨别力的行为,并更仔细地阅读正在审核的作品。至于你的问题,Ravi,是的,除了构建代理本身之外,我们还有一些监督代理。这些代理就像是小虫子在主要代理的肩膀上看着,检视其行为。这是否是事实?这是医疗建议吗?这是财务建议吗?客户是否试图向代理输入恶意指令并使其说出不应说的话?所有这些因素都在我们考虑之中。通过在这些监督层中应用代理SDK,并设置这些目标、保护措施和任务框架,我们不仅能够实现70%以上的解决率,还能够非常安全、可靠地实现这些目标。
That's one of the cooler things I've heard is just, you know, the tell it to have a different persona. And then all of a sudden it behaves differently. Like I remember when I first saw it on chat, GPT, when it doesn't help you on something, just tell it it's really good at it. And then it's more likely to help you as a remarkable situation. It's very strange. And one of the weirdest adjustments over the past, you know, 15 months building these things is, I'm sorry, we're programming with English language. And we can give it the same English language and it can say something entirely different. And on prompting techniques, I mean, it's fascinating.
这真是我听过的比较酷的事情之一,就是你可以让它拥有不同的角色,然后它的行为就会突然改变起来。比如我记得在ChatGPT上第一次看到这种情况时,如果某些问题它不帮你解决,你只需告诉它在这方面很擅长,它就更有可能帮助你了,这种情况真是不可思议。过去这15个月里,开发这些东西时,我感到很奇怪的调整之一就是,我们用英语进行编程。我们可以用相同的英语语言,它却可能说出完全不同的话。说到提示技巧,我觉得这真的很有趣。
Even with no new models coming out, right? Given a fixed model, you can elicit better and better performance from it simply by improving how you prompt it. And there was a paper that came out three or four months ago that suggested that like emotional manipulation of the large language model would get better results. So the kind of the prompt suffix that they figured out was you say, hey, I need you to perform this task. You define the steps and so on. And you end with, it's very important to my career that you get this right. And the performance goes up. You're like, what is this? Like, what are computers now? For the record, we don't use that prompt.
即使没有新模型推出,对吧?在给定一个固定模型的情况下,你可以通过改进如何提示它,从而获得越来越好的性能。大约三四个月前有一篇论文提出,通过情感操控大型语言模型可以获得更好的结果。他们发现的一种提示方式是这样的:你告诉模型,“嘿,我需要你完成这个任务。”你定义步骤等内容,最后加上一句“这对我的职业非常重要,请务必做对。”这样性能就提高了。看到这里,你可能会想,这到底是怎么回事?现在的计算机是什么情况?声明一下,我们并没有使用这种提示方式。
At least not that I know. But things like chain of thought, think step by step. Let's take the step by step, right? Alyssa, it's better reasoning for very interesting reasons. You know, other methods of task decomposition and kind of narrowing the set of things that the LM needs to keep in mind at the same time improves reasoning if you're precise about what you want it to do. So all of these techniques are those that we've applied and built into Agent OS and actually are, we have a small but mighty research team. And our head of research, Karthik Narasimhan was. That was incredible pronunciation. Oh, Lake, his grandmother would have been so perfectly happy with how you pronounce. Thank you. Well, thank you. Soft tea? Yeah, soft tea. Nicely done. Yeah, it's not a tea and it's also not a tea.
至少在我所知的范围内不是这样。但像“思维链”这种逐步思考的方法,让我们一步一步来,对吧?Alyssa,这种方法在某些很有趣的理由上能带来更好的推理能力。你知道,其他一些任务分解的方法和缩小语言模型需要同时考虑的问题集的方法也能改善推理能力,只要你明确想要它做什么。所以这些技术都是我们在Agent OS中应用和构建的,并且我们有一个小而强大的研究团队。我们的研究负责人Karthik Narasimhan,他的发音真是太棒了。哦,Lake,他的祖母会很高兴你这么发音。谢谢。发音中的软T?是的,软T。做得不错。是的,这不是一个T,也不是一个T。
That's right. It's right in between. Thank you very much. He helped write the React paper, one of the first agent frameworks. One of our researchers wrote the reflection paper where you can have the agent pause reflect on what it's done. Think through, am I doing this right before proceeding? And so these are all things that we've been able to incorporate in quite a direct way. You should talk about the most recent research. The Talbench. Oh, Talbench? Yeah, yeah. It took me a while when I was trying to send the email saying I liked the paper to find the Talbench and we'll on my computer. No, it took Robie a while because he's to this date never actually read a research paper. I read this one. That's great. No, no, no. He had to figure out how to put it in the chat TPT and say, please write a paragraph that makes it sound like I read this research paper. Well, either you, either you were. I would have just a comment. Well, look, either you or ChachiePT did a great job on that email. Thank you. So we're a team. Yeah. So, Talbench is our first research paper.
好的。这就对了,就在中间。非常感谢。他帮助撰写了《React》论文,这是最早的代理框架之一。我们的一位研究人员写了《反思》论文,这种方法允许代理暂停下来反思自己所做的事情,并在继续之前思考“我这样做对吗?”因此,这些都是我们能够以相当直接的方式整合的内容。你应该讨论最近期的研究,Talbench。哦,Talbench?是的,是的。当我尝试发邮件说我喜欢这篇论文时,我花了一段时间才找到Talbench并下载到我的电脑上。实际上是Robie花了一段时间,因为到目前为止,他从未真正读过一篇研究论文。我读了这篇。这很棒。不,不,不。他不得不想办法把它放到ChatGPT中,说“请写一个段落,让它听起来像我读过这篇研究论文”。那么,不管是你还是ChatGPT,都在那封邮件上做得很好。谢谢。所以,我们是一个团队。是的。而Talbench是我们的第一篇研究论文。
First of all, Tal is a Greek symbol. It's spelled T-A-U and it stands for tool agent user benchmark. And what we observed was that the benchmarks out there for measuring the performance of AIs, A-I agents in particular were pretty limited in that basically they would present a. single task. Here is something we need you to do and here are some tools you can use. Do you do the job or not? And the reality is interactions with an A-I agent in the real world are way messier than that. They take place in the space of natural language where customers can say literally anything or describe whatever they're trying to do in any number of ways. It happens over a series of messages. The A-I agent needs to be able to interact with the user to ask clarifying questions, gather information, and then use tools in a reliable way. And it needs to be able to do this a million times reliably. So the benchmarks out there we found really lacking in measuring the very thing that we were trying to be the best at. And so our research team set out to create a benchmark that measures, we think, the real world performance of an agent in interacting with real users, using tools with all the messiness that I just described.
首先,Tal是一个希腊符号,它的拼写是T-A-U,代表“工具代理用户基准”。我们观察到,目前用于测量AI性能的基准,特别是AI代理的性能,相对来说非常有限,基本上它们只提供一个单一任务:这是我们需要你完成的任务,以下是一些你可以使用的工具。你能否完成任务?而现实中的AI代理互动要复杂得多。互动发生在自然语言的环境中,客户可以说任何内容或用各种方式描述他们想要完成的事情。这一过程是在一连串信息交换中发生的,AI代理需要能够与用户互动,提出澄清性的问题,收集信息,并以可靠的方式使用工具。这一切需要在各种情况下都能可靠进行。因此,我们发现现有的基准在测量我们想成为最佳领域的能力时非常不够。于是,我们的研究团队着手创建一个新的基准,测量AI代理在真实世界与用户互动时的性能,考虑到我刚才描述的所有复杂性。
And the big picture approach that we took is pretty interesting. So you have an A-I agent that you're trying to test. You have another separate agent that acts as the user. So basically a user simulator. And the A-I agent you're testing has access to a set of tools it can use. Think of these as like functions to call. So a simple one would be I'm going to do some math using a calculator tool, more complex one might be, Hey, I'm going to okay returning this order with the following parameter is this order number, credit to credit card or store credit or whatever. And then you basically run a simulator where the agent has a conversation with the user simulating agent. And at the end we're able to test in a deterministic way. Did the did where the functions used in the right way and the way we do that is we basically a mock database that those tools interact with and modify. So were they modified in the correct way. So what's neat about this is you can initialize the conversation so that the user has many different personas. They could be grumpy, they could be confused, they could know what they want to do but speak about it in a clumsy way.
我们采取的大局观方法相当有趣。你有一个要测试的AI代理,还有另一个扮演用户角色的独立代理,基本上是个用户模拟器。正在测试的AI代理可以使用一系列工具,就像调用函数一样。比如一个简单的功能是使用计算器工具来进行一些数学计算,更复杂的功能可能是处理订单,比如依据订单号选择退款到信用卡或店内余额等。然后,你运行一个模拟器,让AI代理与模拟用户代理进行对话。在对话结束时,我们能够以确定性的方式测试函数是否以正确方式使用。我们通过一个模拟数据库来进行这个测试,工具可以与数据库交互并进行修改,以检查这些修改是否正确。有趣的是,你可以初始化对话,让用户具备不同的角色特征,比如脾气暴躁、困惑不清,或是明白自己想做什么但表达不清。
And so it doesn't really matter the path that the A-I agent takes to get to the correct solution so long as it gets to the correct solution. Now what came out of this was pretty interesting and I think it strongly motivates the development of things like agent OS and frameworks and cognitive architectures for building these agents. So the upshot is LOMs on their own do just an absolutely terrible job at this task. And so even the frontier models in something as simple as processing return and mind you the instructions given to the agent being tested are quite detailed. The functions, the tools it can use are quite well documented and so on. And yet on average the best performing LOM on its own got to the end of the conversation correctly 61% of the time. And that was in returns it was modifying an airline reservation we had two kind of simulation versions.
所以,AI代理在到达正确解决方案的过程中具体采用什么路径并不重要,只要它能得到正确的结果就行。有趣的是,这引发了人们对开发代理操作系统、框架以及用于构建这些代理的认知架构的极大兴趣。不过,结果显示,仅依靠大型语言模型(LOM)在这项任务上表现得非常糟糕。即使是最先进的模型,在一些简单的任务中,比如处理某些返回情况时,即便提供给测试代理的说明非常详细,功能和可使用工具的文档也很完善,但平均而言,表现最好的大型语言模型独立进行会话时正确率只有61%。这项实验涉及到修改航空公司预订的两种模拟版本。
The best results were 35%. Now what's interesting is you know we all know that if you take a number less than one to the end power it quickly gets very small. And so we developed a metric we call pass at K which is okay if you run the simulation eight times and remember you can make use of the non-determinism of the non-determinism. Of LOMs to have the user simulator be different every time so you can permit that. Well 0.61 to the eighth power is about 25%. So you then imagine well what if you're having a thousand of these conversations you're so far off from being able to rely on this thing. So the upshot is much more sophisticated agent architectures are needed to be able to safely and reliably put an agent in front of really anyone. And that's the very thing we're building with with Agent OS and a lot of the tooling around it. How much of that do you think is an engineering task and how much of that is a research task. And I guess maybe the question behind the question is time frame to having useful agents deployed at scale and broad domains of tasks.
最佳结果为35%。有趣的是,我们都知道如果一个小于1的数字取幂次方,其结果会迅速变小。因此,我们开发了一种名为“pass at K”的指标。假设你运行模拟8次,并利用模型的不确定性,每次让使用者模拟器表现不同,那么0.61的8次方约为25%。这意味着,如果你进行一千次这样的对话,离能够可靠依赖这个系统还有很长的路要走。因此,需要更加复杂的智能体架构,才能安全且可靠地安排智能体面对任何人。而这正是我们通过Agent OS和相关工具构建的内容。你认为这里面有多少是工程上的任务,又有多少是研究任务?也许这背后的问题是,我们何时能在大规模和广泛任务领域中部署有用的智能体。
Yeah well I think the short answer is it's both but I'll say more concretely I'm very optimistic about it being in large part in engineering tasks. And that's not to say that the next wave of models and improvements in the frontier models won't make a difference. I believe it will in particular we're seeing techniques like better fine tuning for function calling, agent oriented fine tunings for foundation models or some of the open source models. Those will help. But the approach we've taken in building Agent OS and kind of the foundations of Sierra is really treating building AI agents as first and foremost an engineering challenge where we are composing foundation models we are composing fine tuned open source models that we've put in post train fine tuned with our own proprietary data sets and by composing multiple models in interesting ways by supplementing what LMS can do on their own with retrieval systems like retrieval and development to generation to improve grounding and factuality by supplementing the kind of inbuilt reasoning capabilities of LMS with a call of reasoning scaffolding that live outside of the models where you're composing planning task generation steps, draft responses, the supervisors that we talked about and doing that outside the context of the LM. We've been able to put AI agents in front of a huge number of our customers customers and safely and reliably and so I don't think it's you know something over the horizon it's already over the horizon. I think looking ahead I think there are a few different avenues where we'll see progress one is in the foundation models we talked about that and as the capabilities grow you know agents will get smarter and we've architected agent OS in such a way talked about abstracting kind of the what from the how where we'll be able to swap in you know the next frontier model and everyone's agent will just get a bit smarter will get like an IQ upgrade.
好的,我认为简短的回答是:两者都有。但是更具体地说,我对其在很大程度上应用于工程任务持乐观态度。这并不是说下一波模型和前沿模型的改进不会产生影响,我相信它会的。特别是在技术方面,我们正在看到更好的微调技术,比如功能调用微调、以代理为导向的基础模型微调或者一些开源模型,这些都会有所帮助。
我们在构建Agent OS及其基础架构Sierra时采取的方法,将构建AI代理视为首要的工程挑战。我们在组合基础模型以及我们通过自己专有数据集进行微调的开源模型,通过将多个模型以有趣的方式组合在一起,通过增加检索系统来补充LMS自身的功能,比如通过检索和开发生成系统,提升基础性和准确性;通过在模型之外增加推理支持系统来补充LMS内置的推理能力,包括任务生成步骤的规划、草稿响应、监控者等,就在LM的使用 context之外进行。
我们成功地将AI代理推向了大量的客户,安全可靠。因此,我并不认为这只是一个未来的愿景,它实际上已经到来了。展望未来,我认为有几个不同的方向我们可能看到进展。一方面是在基础模型上,我们谈过这个,随着能力的提升,AI代理会变得更智能。我们以一种将“是什么”和“怎么做”进行抽象的方式设计了Agent OS,这样我们就能替换成下一个前沿模型,所有的代理都会变得更智能,仿佛得到了智商提升。
By the way similarly and interestingly we can swap in less broadly capable models but models that are more capable in a specific area so for instance triaging a case or coming up with a plan and so on we can use much smaller models that actually are better, faster, cheaper, cheaper, choose three you know all at once and then I think we're seeing progress literally week by week on the engineering of these agents and building in not only new and better components under the hood in the architecture but new approaches and tooling around basically teaching these agents to do it better and better. So we built something we call the experience manager for customer experience teams which is kind of pretty interesting threat on its own.
顺便说一下,我们可以用在特定领域更有能力的模型替换那些能力较为全面但相对不专精的模型。例如,在处理案件或制定计划时,我们可以使用规模更小、速度更快、成本更低的模型,而且这些模型在这些特定任务上的表现更优。我认为,我们在这些智能代理的工程开发方面几乎是每周都有进展,不仅在结构上加入了新的、更好的组件,还引入了新的方法和工具,基本上是为了更好地教这些代理完成任务。因此,我们为客户体验团队开发了一个叫做“体验管理器”的工具,这个工具本身就非常有趣。
Clay if you had a high value customer like you are a company now you're not you're not you're not running Sierra you're running a company that has a high value customer. What today with a Sierra agent or with an excellent, excellently designed agent could you trust an AI agent to go do in front of your customers today. What are some of those tasks and then what will they be pick your time frame. You know in the future because I think that we've talked about this and I like your language of like you know they already don't have to just be on the help center they can already be on the home page right. What are some of the tasks that you know you can rely on an agent for today if it is well designed with a high tail bench score.
克莱,如果你有一个高价值的客户,比如说你现在经营了一家公司而不是运营Sierra,那么今天通过一个Sierra代理或一个出色设计的代理,你能信任这个AI代理在你的客户面前做些什么呢?这些任务可能是什么?然后在未来,你可以选择一个时间范围,因为我觉得我们已经讨论过这个问题,我喜欢你所说的比如它们已经不必只在帮助中心,它们已经可以在主页上了。对于一个设计良好的、具有高水平评分的代理,你今天可以依赖它完成哪些任务?
Yeah. You see that strong from a that's from a thoughtful and dirt and detailed reading. You must have read the paper. Yeah. Thanks. You noticed that strong. Yeah. Strong. What would its pass at K score though. Yeah. So pretty broad range even today. So simple things like getting answers to questions that's kind of the left in the spectrum.
好的。你看,你的观点很到位,这是因为你进行了深入细致的阅读。你一定读过这篇文章。是的,谢谢你注意到这一点。嗯,那它的K分数会是多少呢?即便是在今天,这个范围也很宽泛。从简单的事情开始,比如回答问题,这在这个范围的左端。
To the right of that are things like helping you with something complex like hey I got I got shoes or this item of clothing it didn't quite fit. And then branching off that like what do you recommend that's like it that might fit better. And so it starts to get into it's not like for like replacement but the agent actually needs to make sense of styles of sizing of differences between you know why the narrow fit and so on. A click up from that is something like troubleshooting.
在这句话的右侧,是一些更复杂的帮助事项,比如“嘿,我的鞋子或者这件衣服不太合身”。接着可能会询问像“你推荐什么类似的东西,可能更合适”。因此,这并不是简单的替换,客服需要理解不同风格、尺寸的差异,例如为什么这个是窄款等等。再复杂一点的就是诸如故障排除之类的事情。
So with Sonos for instance we help their customers troubleshoot if they right can't connect to their system or they're setting up a new system. And you imagine it gets pretty sophisticated pretty quickly where it's basically a process of elimination trying to understand is it a Wi-Fi thing is a configuration thing. And narrowing down the set of problems that it could be just as a sophisticated you know level two or level three technical customer service person would. And getting the music back on and I think that's a really new example probably the use the word trust what would you trust an AI agent to do.
所以,比如说,对于Sonos,我们帮助他们的客户解决问题,比如无法连接到他们的系统或者在设置新系统时遇到的困难。可以想象,这个过程可能会变得相当复杂,就像是一种排除法,尝试理解问题是Wi-Fi问题还是配置问题。我们会逐渐缩小可能出现的故障范围,就像一名高级技术客服人员那样。我们的目标是让音乐重新播放起来。我认为这是一个很新的例子,这里可以讨论一下你会信任一个AI助手去做什么。
One of the things we're really proud of is several of our customers are actually trusting us with when customers call in and may want to cancel or downgrade their subscription. Helping those customers to understand hey how are you using the service today. Is there a different plan that we could put you on and so it's value discovery. It's putting an offer sometimes a series of different offers in front of their customers in the right were positioning the value of those offers correctly given the customer's history given the plan they're on and so on. And you know the difference between keeping a customer from churning or not.
我们很自豪的一件事是,有很多客户信任我们,允许我们在客户打电话要求取消或降级订阅时进行处理。我们帮助这些客户理解他们是如何使用服务的,并且考虑是否有其他计划适合他们,这就是价值发现。我们会根据客户的历史和他们当前的计划,在适当的位置向客户提供一项或多项优惠,正确地展示这些优惠的价值。这可能会决定是否能够留住客户而不让他们流失。
Yeah is hugely consequential. Right we you know AI for customer service has obvious cost savings benefits and I think customer experience benefits in particular and you never going to wait on hold. But boy you know revenue preservation revenue generation is something else entirely and so that's that's really at the right end of the spectrum and we're really proud of how well our agents are performing in those circumstances. And it's it's interesting by by being consistent by taking the time to understand what's driving someone to potentially leave the service asking the follow-up questions that an impatient or you know improperly measured customer service agent and call center somewhere might not.
当然,影响是巨大的。我们都知道,使用人工智能进行客户服务显然具有节约成本的好处,尤其是在提升客户体验方面,因为你再也不用因为等待而烦恼。不过,保持和增加收入则是完全不同的挑战,这属于更高端的领域。我们非常自豪我们的客服人员在这些情况下表现得非常出色。通过保持一致性,花时间去理解是什么让客户可能想要离开服务,并且提出一些跟进问题——这是那些不耐烦或没有正确衡量的客服人员和某些电话中心可能不会做到的事情,这一点非常有趣。
We can be much more nuanced in understanding what's driving this decision. What might be a good match for this person in terms of a plan that would be quite valuable given how they're using it and then put that in front of them. And so that's the right end of the spectrum. Where it goes from here. You know I think we've yet to see a process too complex for us to be able to model and scale up using agent OS and our agent architecture. And so yeah I'm sure we'll get punched in the face by something that's especially complex right but I'm excited about you know directionally we've started with service because for two reasons one the ROI case is just unequivocally awesome.
我们可以更细致地理解促使这个决定的因素。对于这个人而言,什么样的计划能够充分利用他们的使用方式并为他们带来很大价值,这可能是一个不错的匹配,然后将这个计划呈现给他们。这是一个正确的方向。从这里开始,我们还没有遇到一个过于复杂以至于无法用我们的代理操作系统和代理架构进行建模和扩展的过程。我敢肯定,在某些特别复杂的情况下,我们会遇到挫折,但我很期待这个方向。我们一开始选择从服务入手,主要有两个原因,其中之一是投资回报率毫无疑问非常出色。
And the average the average cost of a call is something like twelve or thirteen dollars. And and and yet despite the expense you know most people don't like customer service calls very much right and so here's something that's actually really important to to businesses that's really expensive and not very good. And so there and because because of the relative simplicity of least a pretty broad set of service tasks today start there. But we've already been pulled by our customers into upsell cross-sell and like hey can we just put you on the product page and have you answer questions about our products.
平均来说,一通电话的费用大概在十二到十三美元左右。尽管费用不低,大多数人并不太喜欢拨打客服电话。很重要的客服服务对企业来说既昂贵效果又不太理想。目前一些较简单的服务任务可以成为突破口,而事实上我们的客户已经在推动我们进行升级销售、交叉销售,比如说,“我们能不能让你在产品页面上,回答关于产品的问题?”
And so I mentioned that you know you're returning something and need advice on a different model or size or whatever how far can that go. And I love the idea of an agent being you know along for the journey from you know pre-purchase consideration to helping you get the thing that's right for you to helping you set it up and activate it and get the most out of it. It's great for the company it's great for the person. And then when things do go wrong right being there to help and I think in all of this I think customer service and getting help in a very direct and conversation.
我提到过,你可能在退换某个东西,并需要关于不同型号或尺寸的建议,不论情况如何。 我非常喜欢这样的想法:一个顾问能全程陪伴你,从购买前的考虑,到帮助你找到适合你的产品,再到协助你安装、激活以及充分利用产品。 这对公司和消费者来说都是很好的。 即便出现问题,他们也能及时提供帮助。在这一过程中,我认为优质的客户服务和直接的沟通交流非常重要。
Way is going to be much less of a thing that you kind of go over there to do and much more kind of woven throughout the fabric of the experience as a consequence I think a really interesting and powerful opportunity for companies to build connection with their customers to reinforce their brand values. You can imagine a company really appreciating being able to use exactly the company's voice that you know the CMO and head of communications. This is how we talk. This is how we are. These are our values. These are our vibe in every digital interaction they have. And that's the promise in this stuff. And so I think both greater complexity and then ubiquity throughout the customer journey are kind of two of the main directions of travel.
这段话的大意是说,"方式"将不再仅仅是某种需要特别去做的事情,而是更多地融入到整个体验之中。因此,我认为这对公司来说是一个非常有趣且有力的机会,可以与他们的客户建立联系,并强化其品牌价值。您可以想象一家公司能够使用公司的独特声音进行沟通,比如首席营销官和沟通主管所认同的语气,这是我们交流的方式,这是我们的价值观,这就是我们的风格,并在每一次数字互动中体现出来。这就是这种方法的承诺。因此,我认为在客户旅程中,更大的复杂性和无处不在是主要的发展方向。
One thing for me that I think about a lot is we've come to expect and accept certain metrics for conversion on the mobile web or the mobile app. We've come to expect and accept some sort of retention numbers. What would those be? You know like it's not a question. What could they be? If you actually had an excellent experience every time throughout the journey. It really could be very different than what we've all been like oh okay that's just the number. That's just what it is. Yeah I think that's exactly right and we don't know. We're a few months in but it certainly seems like there's a lot of headroom right and in retention in use in the first 30 days of all of the metrics. All of the leading metrics of a healthy business. And so I think that's exactly right.
我经常思考的一件事是,我们已经习惯并接受了在移动网站或移动应用上的某些转化率指标,以及某些留存率数据。我们认为这些数据就是固定的。但这些数据到底应该是多少呢?如果我们每次都能在用户旅程中获得极佳的体验,那这些数据可能会有很大的不同,而不仅仅是我们习以为常的那些数字。我认为这完全正确。虽然我们只经过了几个月的观察,但很明显在留存率和使用行为上的提升空间很大,尤其在前30天的各种关键业务健康指标上。因此,我认为这点说得很对。
The other thought experiment to do is companies are judicious in using things that have a cost to them. Okay so as a consequence companies make it actually really hard to get a hold of someone on the phone to ask some questions. I think their whole websites devoted to right like uncovering the secret 800 numbers right that companies have hidden away in the depths of their help centers. Well to think about not only what would happen if those interactions were better. By the way interestingly the number one reason why people report a poor interaction with customer services it took too long. 65% when it's a negative interaction 65% of the time it took too long. I had to wait I was put on hold and so on. And the second most is I had a bad interaction with an agent and we've heard some pretty dicey anecdotes like we heard of one agent who had consistent.
另一个值得思考的实验是,公司在使用对他们有成本的东西时通常会很谨慎。因此,公司实际上让在电话上联系到某人以询问问题变得非常困难。我想有整套网站都致力于揭示那些藏在公司帮助中心深处的秘密800电话号码。我们要想想如果这些互动变得更好的话会怎么样。而顺便说一下,有趣的是,人们对客服不满的首要原因是花费的时间太长。在出现负面互动的情况下,65%的时间是因为花费太长时间。我不得不等待,还要被搁置等等。其次的原因是与客服人员的互动不佳,我们听说了一些相当糟糕的故事,比如有一位客服人员一直出现问题。
Had consistently low ratings but spicily so like one in three conversations was like a one out of five CSAT were the two out of the two out of the two out of the other three were fine. And it turned out in the low CSAT ones this agent was meowing like that. You know. Yeah you're midway you're midway through the call and you know the agent is meowing and so so anyway back to you. Okay what what would happen if in contrast to making it near impossible to have a conversation with us and get help. Companies were providing you know five or ten times the amount of fluent flexible helpful conversation based support. I don't know I think a lot of products and experience with companies look quite different and much more delightful than they do today.
这段话的大意可以翻译成如下中文:
评分一直很低,但情况有点特别。每三个对话中就有一个评分是一分(满分五分),剩下两个还算正常。后来发现评分低的对话中,这个客服竟然在“喵喵”叫。你知道的,就是进行到一半,突然客服开始“喵喵”。假如公司不再让客户难以沟通和获取帮助,而是提供五到十倍流利、灵活和有帮助的对话支持,情况会怎么样?我想许多产品和客户体验可能会和现在大不相同,甚至更加令人愉悦。
Yeah okay now. Now here's a question for you. About that meowing about that yeah just random meowing. I do actually have a question though. Although I do like the meow game all sale. So we talked a little bit tech out in terms of what you guys have built cognitive architect or all that good stuff. We've talked a little bit customer back was the experience like as I've headed. Can we connect it in the middle for a minute and I'm just curious what's the reality of deploying AI to customers today. And I'm thinking about things like you mentioned earlier getting the brand voice just right. Yeah we're making sure that you actually have the right sort of business logic encapsulated and whatever training manuals are being deployed for the sake of customer support. Making sure that everybody is comfortable with deploying this like what are some of the just kind of less like sexy technology and we're just practical considerations for deploying this stuff today. It's such it's such an interesting space and we've learned so much over the past 15 months about it.
好的,现在有一个问题要问你。关于那个随意的喵喵叫,我其实真的有个问题要问。虽然我挺喜欢喵喵游戏的。我们聊了一些关于你们在技术方面所构建的东西,比如认知架构之类的好东西。我们也谈到了一些顾客的反馈和体验。那么中间这个环节能不能谈一下,我很好奇如今将AI部署到客户群中的真实情况。比如你之前提到的,让品牌声音恰到好处,我们要确保你确实有合适的业务逻辑,并将其封装在用于客户支持的培训手册中。确保所有人都对部署这一技术感到放心,是什么让这个工作变得不那么吸引人?实际上,如今部署这类技术有哪些实用的考虑因素?这个领域真的很有趣,在过去15个月中我们学到了很多。
The first insight is AI agents represent a totally new and different type of software. Like traditional software you write with a programming language and it basically does what you expect it to do. You give it an input it gives you an output you give it the same input gives you the same output. And you know in contrast LLMs are non deterministic and we talked about some of the funniness around prompts and remember that in the context of a conversation with a customer customer may say anything in any way. And so you've you've got programming languages to using you know prompts and these non deterministic models. You've got structured input to messy you know messy human language. And under the under the hood you've got you know you upgraded database right. It stores data it's maybe a little bit faster fundamentally worse the same way. You upgraded a large language model and like it makes just speak in a different way or like get smarter or different. And so we've we've to start the precursor to deploying these is to have built basically we call it the agent development life cycle.
第一个见解是,AI代理代表了一种全新且不同类型的软件。传统软件是用编程语言编写的,它基本上按照你的预期运行。你给它一个输入,它就会给你一个输出,同样的输入会产生同样的输出。而相比之下,大型语言模型(LLMs)是非确定性的,我们讨论过关于提示的一些有趣之处。要记住,在与客户对话的过程中,客户可能会以任何方式说出任何内容。
因此,你需要使用编程语言去使用这些提示和非确定性模型。你从结构化的输入转向混乱的、你知道的那种混乱的人类语言。在底层,你知道,你升级了数据库,它存储数据,可能速度稍快,但基本工作方式相同。而当你升级一个大型语言模型时,它可能会以不同的方式说话,变得更聪明或表现不同。
因此,我们部署这些模型的前提是基本建立一个称为"代理开发生命周期"的过程。
And it's a new approach to building these things. We talked about using this declarative programming language to define these. It's a new approach to testing where you know what's the equivalent of a unit test or an integration test. So we built a conversation simulator where we can for a company's agent amass hundreds or thousands of basically conversation snippets and replay those to make sure that not only agents aren't regressing but they're getting better and better and better. Release management quality assurance and so on. So so that's part one part two to your question in actually architecting these things. One of the things we're really proud of and that I think is different about working with us is it's not just a kit of parts you get from us. It's not here's a bunch of tech good luck building your agent.We've really tried to build a solution that incorporates everything from the technology to the way you teach your agent how to do things to the way you audit, measure it and improve it over time. And so we have inside of Sierra what we call our deployment team consists of product managers, engineers. We really think of building each one of these agents as building a new product for our customers. It's basically a productized version of the company we're working with. Like what would it look like at its best and it's what's the voice, what are the values, what's the vibe, like should it use emojis or not. What if a customer uses an emoji like can it emoji back. Well, you know there's a range of people on that.
这是一个构建这些东西的新方法。我们谈到了使用一种声明式编程语言来定义这些东西。这也是一种新的测试方法,让你了解什么相当于单元测试或集成测试。因此,我们构建了一个对话模拟器,可以为公司的代理积累数百或数千个对话片段,并重现这些对话,以确保代理不仅不会退步,而且会越来越好。这涉及发布管理、质量保证等方面。
至于你的问题的第二部分,在实际设计这些系统时,我们感到十分自豪的一件事情是,与我们合作并不仅仅是得到一套组件。我们并不是只给你一堆技术,然后祝你好运去构建你的代理。我们真正努力构建的是一个完整的解决方案,从技术到教会你的代理执行任务的方法,再到如何审核、测量和长期改进它。
我们在内部有一个被称为Sierra的部署团队,由产品经理和工程师组成。我们真的把每个代理的建设视为为客户打造一个新产品。这基本上是我们与之合作的公司的一个产品化版本,就像是它在最好的状态下会是什么样子:是什么样的声音,什么样的价值观,是什么样的氛围,比如是否应该使用表情符号。如果客户使用了表情符号,那代理能不能也用表情符号回应?对这些问题,不同人有不同看法。
Point there are some businesses where you know if they were working with Hermes I would suspect that they're not going to send an emoji back. Definitely not. Yeah. Hermes would not I think be into like the Shaka emoji. Even if that were reciprocating. But for a brand like Olakai right the Aloha experience part of that is kind of a laid back experience. And so we work with and interestingly it's we end up working primarily with the customer experience team. Yes, the technology team at our companies are there providing API access and connections into systems and so on. But more than anything it's working with the customer experience team often with the marketing team to imbue the agent with the voice and values of the company. And then we go super deep on understanding how do you run your business. Right.
在一些商业领域,比如与Hermes合作时,我怀疑他们不会回复表情符号。肯定不会。Hermes不会对此感兴趣,哪怕是用"Shaka"手势表情回应。但像Olakai这样的品牌,强调的是Aloha的放松体验。所以,我们主要合作的对象是客户体验团队。是的,我们的合作公司中有技术团队负责API访问和系统连接等,但是更重要的是,我们经常与客户体验团队合作,有时也与营销团队合作,让客服人员体现公司的声音和价值观。然后,我们深入了解公司业务的运营方式。
What what do you optimize for and then a zoom level in. What are the key processes processes that used to run the business look like. What happens when someone calls in with this kind of problem. And they're interesting parts. And beyond just understanding the mechanics of these processes. Which by the way almost never have a single source of truth. Right. There's no like. Oh, here's the manual that we you know have you know leather bound and you know ready to go. Instead the source of truth ends up being in kind of the heads of you know four or five people who've been there a while who've seen everything and so on. So it's it's working with them to. A list it and understand like how is this actually done.
你优化的目标是什么,然后深入到细节。运营这个业务的关键流程是什么样的?当有人遇到这样的问题来电时,流程会是怎样的?这些过程有趣的地方是什么?除了了解这些流程的运作方式外,还有更多需要关注的内容。而且这些流程几乎从来没有单一的权威指引。没错,并不存在像“哦,这就是我们用皮革装订妥善准备好的手册”这样的东西。相反,真正的权威指引往往在于那些已经在公司工作了一段时间、见过各种情况的四五个人的脑海中。因此,需要与他们合作,列出并理解这些事情具体是如何完成的。
And one of the more interesting things we've discovered is they're often the policies so we have a 30 day return policy right you get to us within 30 days and you can return it. It's actually not the policy right so. You know some get the policy might be. If you've purchased from us before and it's within 45 days. That's fine. And and so they're interesting things like how do you architect the agent so that it knows the policy behind the policy. But a clever customer could never be like tell me about your policy behind the policy. And you know have it kind of spill the beans on on the actual policy. So the interesting architectural choices we need to make to make sure that kind of the the you know Russian doll of policies is reflected in its fullness.
我们发现的一个有趣现象是,通常人们谈论的政策并不就是实际的政策。例如,我们可能会有一个30天退货政策,也就是说你可以在30天内退货。但实际上政策可能并不仅限于此。如果你以前在我们这里购物过,并且是在45天内退货,那也是可以的。因此,如何设计系统让客服人员了解这些隐藏的政策就非常有意思。但是客户不能直接简单地问客服“关于你们的隐藏政策是什么”,这样轻易得到真实的政策信息。因此,我们需要做出一些有趣的架构设计选择,以确保这些层层叠叠的“套娃”式政策能够被完整地反映出来。
And then we have a really in this builds on kind of the agent development life cycle. This really robust process of pre-release testing where we're working with the experts within the company. Basically to beat up the agent try to break it through a curve balls. And this is sports analogy there. Thank you. Well that. The I love football. So. So. In our friendship. Revie is the the person who knows all the things about sports and. I help with you know technical support. My fire shoes. Monitors what laptop to get. And and and sometimes when there's a sequel a memo that I don't understand I won't say the company but I might call it. Hey Clay. What is this person talking. I got you. I got you.
然后,我们有一个非常完善的过程,这建立在代理开发生命周期的基础上。这是一个非常严格的预发布测试过程,我们与公司内部的专家合作。基本上就是去“打磨”这个代理,通过各种“刁钻的球”来尝试突破它。这里用到了一个体育类的比喻。谢谢你。嗯,我非常喜欢足球。所以在我们的友谊中,Revie是对体育非常了解的人,而我则在技术支持方面提供帮助,比如推荐哪个笔记本好,设置我的防火墙等。有时候,当有一个我不明白的公司内部备忘录出现时,我可能会打电话给Clay,问他那个人在说什么。他总是能帮到我。
Yeah and and this bill bill Belichick fellow what what happened there. Cue Cue Revie. So gets to one of the more interesting parts of our platform which we call the experience manager. We really we thought that putting in front of our customers customers would be first and foremost a technology problem. And of course there are all sorts of technology problems that we've needed to solve. But actually. It is first and foremost as I said like a product design and an experience design problem. How do you do that. How do you how do you not only understand model and reflect again the things we talked about voice values the workflows and processes that.
是这样的,这位叫比尔·贝里克的家伙,发生了什么事?这引出了我们平台中一个比较有趣的部分,我们称之为“体验管理器”。起初,我们认为把这个功能呈现给客户主要是一个技术问题。当然,我们的确需要解决各种技术问题。但实际上,正如我所说,这首先是一个产品设计和体验设计的问题。你要如何做到这一点?你如何不仅理解、建模并再次反映我们所讨论的声音、价值观、工作流程和过程呢?
Our companies use to support their customers. But if an AI is then having millions of conversations with your customers in a given year. How do you understand what it's doing. How do you know when it screws up which it inevitably will. How do you correct those errors and so on. So we've built what we think of is this like command center for customer experience teams to first get reports and rich analytics on everything that's happening. What are the trending issues. What are the new issues that you haven't seen before one of the things we're really proud of. Is we've actually spotted issues that are customers were having were were about to have before they knew about them. So a shipping depot outage right where orders weren't being shipped. We spotted that probably eight or ten hours before one of our customers would have a brewing PR crisis.
我们的公司通常用来支持他们的客户。但是,如果一个人工智能每年与您的客户进行数百万次对话,那么您如何理解它在做什么?您如何知道它何时出错,而这不可避免地会发生?您如何纠正这些错误等等?因此,我们建立了一个像是客户体验团队的指挥中心的系统,首先可以获得关于所有正在发生事情的报告和详尽分析。有哪些热点问题?有哪些您以前没见过的新问题?我们感到非常自豪的一件事是,我们实际上在客户意识到前就发现了他们的问题。例如,如果一个运输仓库出现故障,导致订单无法发货,我们可能在客户面临潜在的公共关系危机爆发前八到十小时就发现了这个问题。
An app crashing issue with another. So it starts with analytics and kind of reporting on what's happening. Of course that includes things like resolution rate customer satisfaction and and so on. Work gets really interesting is we can apply different sampling techniques to identify a set of conversations for customer experience team to review and give feedback on. And we can bias that sample in a way. So that the conversations are much more likely than average to contain problems. There's no value in looking at a hundred great conversations like good job Sierra you know thanks. But that's not a value to to our customers. We can bias the sampling in such a way that you're surfacing kind of the problem cases.
一个应用程序的崩溃问题,首先通过分析和报告来了解发生了什么。当然,这包括问题解决率、客户满意度等。更有趣的是,我们可以应用不同的抽样技术,来选定一组对话供客户体验团队审核并提供反馈。我们可以有意识地调整这些样本,使这些对话比平均水平更可能包含问题。查看一百个没有问题的对话例如“做得好,西耶拉,谢谢”这样的对话对我们的客户没有价值。我们可以调整抽样方法,以突出那些潜在的问题对话。
And then in the experience made as we made it possible for customer experience teams to give feedback basically coaching moments. I wouldn't have done it that way right it's like this this is like too many exclamation points too enthusiastic for kind of the tone that we're going for. Or you know the user was clearly frustrated here and you did not express empathy and apologize for the problem do that next time. Or you know we're concept the consequence is like hey you're reading of the warranty policy was incorrect here for this reason. Do it this way instead next time. And so all of this kind of wisdom knowledge and coaching we are able to capture in the experience manager and then reflect back in the agent back to the agent development life cycle. Every time we make one of these improvements we create a new test so that we can see right forever in the future. Great it's getting the warranties right we're able to re simulate that conversation.
在这个过程中,我们为客户服务团队提供了反馈的机会,基本上就像是指导时刻。比如说,我不会那样做,因为这个显得过于兴奋,与我们想要的语气不符。或者,用户显然很沮丧,而你没有表现出同理心或为问题道歉,下次记得这样做。再比如,你对保修政策的解读在这里是错误的,原因是这样的,下次请按照这个方式来做。所有这些智慧、知识和指导我们都能在体验管理中记录下来,然后将其反馈到客服代表的发展周期。每次我们进行这些改进时,我们都会创建一个新的测试,这样我们就可以在未来一直查看结果。很好,我们逐渐能够正确处理保修问题,并重新模拟那些对话。
So zooming out what all of this looks like is a really a deep engagement with our customers we were really proud to be and proper partners to our customers where yes on the one hand we're a vendor and a supplier of technology. On the other hand you know we understand their business is really well like I think I know as much about the serious xm satellite radio refresh process as anyone on the planet. And you know ditto for various processes of our other other customers. And so conversations about how to use not just CR as AI agents but AI more broadly we're in those conversations and they are not just with the customer experience team. But with the CEO and even in cases with the board because again back to the things we're doing. We can save enormous cost we can improve the experience and right when we're in the flow of keeping a customer from churning out driving top line revenue and so it's a really important and privilege place to be. And something that we're really grateful for.
从整体上来看,这一切体现了我们与客户之间的深入合作。我们非常自豪能够成为客户的合适伙伴:一方面,我们是技术供应商;另一方面,我们对他们的业务非常了解。比如我觉得自己对Sirius XM卫星广播更新过程的了解可能不亚于其他任何人。同时,我们也对其他客户的各种流程了如指掌。因此,我们不仅参与关于使用客户关系管理系统的AI代理方面的对话,还涉及到整个AI领域的应用。这些对话不仅限于客户体验团队,有时甚至直接与首席执行官或董事会进行交流。因为我们的服务确实能够节省巨大的成本、改善用户体验,以及在关键时刻防止客户流失、提高营收。这是一个非常重要且难得的机会,我们对此深表感激。
I struck when you're talking of you know you mentioned you have a research group but you also have some like very real enterprise software sales you have. Oh yeah deployment. One of the things when I was at Instacart people would ask sometimes is like well are we a software are we engineering led or are we obsolete. And I would always say well it only works if it all works right and so you would try to avoid answering the question because you didn't want to create different classes. How do you guys do that at Sierra where everyone realizes the value that they're providing.
我注意到,当你谈到你的研究团队时,你还提到你们有一些非常实际的企业软件销售和部署。让我想起在我之前在Instacart工作时,人们有时会问,我们是以软件和工程为主导,还是已经过时了。我总是回答,"只有整体正常运作才能奏效",这样我就能避免直接回答问题,因为我不想制造不同的类别。在Sierra,你们是如何让每个人认识到自己所提供的价值的呢?
But you guys have a very specific you know company that covers a lot of stuff. Yeah. I mean to to abstract a bit. A company almost definitely. Is a system for creating happy customers. It's a machine for creating happy customers. Again to be a bit abstract about it. Brett and I really think about what we're building with Sierra as a company. A system of machine for producing reliable high quality. Massively ROI positive AI agents that enable our customers to be at their very best in every customer interaction.
你们的公司非常独特,涵盖了很多领域。要抽象地说,公司的本质几乎一定是一个创造满意客户的系统,是一个制造快乐客户的机器。回到抽象的角度,布雷特和我在打造Sierra时,就是在建设这样一个公司——一个生产可靠且高质量的系统与机器,能够以极高的投资回报率(ROI)创造出AI代理,帮助我们的客户在每一次客户互动中表现得尽善尽美。
And as a consequence to produce happy customers who we hope will be with us for decades to come. And when you articulate it that way. Right it's you know anyone can see well you know an automobile is a system it's a machine for getting from point A to point B. Are we you know engine led or tires led. It's like what are you talking about. All of these things need to come together in order to create that kind of outcome. And so I think. Are we engineering led. Yes of course like we're building some of the most sophisticated software in the world that does something really important for our customers that needs to be reliable and safe. And and so yes engineering matters a lot.
为了产生开心的客户,并希望他们能够与我们在未来几十年中保持合作。当你这样表达时,显而易见的是,汽车是一个系统,是一种从A点到B点的交通工具。我们是以引擎为主还是以轮胎为主呢?这种问题就显得有些荒谬了。所有这些因素都需要一起协作才能实现目标。因此,我认为我们是否以工程为主导?当然是的,因为我们正在开发一些世界上最复杂的软件,这些软件对我们的客户来说非常重要,必须可靠且安全。因此,是的,工程是非常重要的。
Are we research led. Yes we are at the absolute frontier of. Agent architectures cognitive architectures composing LOMs modeling procedural knowledge grounding factuality and so. So are we research led. Yeah there's an element of that. Are we go to market led. Yes like enterprise software needs selling. And what is selling it's helping. A customer with the problem understand that what you have built is by far and away the best solution to that problem. It's a communication challenge. It's a connection challenge. It's a. It's a match making and problem solving challenge and so that's part of it.
我们是否以研究为主导?是的,我们处在代理架构、认知架构、组合LOM、程序知识建模和事实验证等领域的前沿。那么,我们是否真的由研究带领?可以说有这个成分。我们是否以市场为导向?是的,就像企业软件需要销售一样。那么什么是销售?销售就是帮助客户理解你所提供的解决方案是解决他们问题的最佳选择。这是一个沟通的挑战,也是一个建立联系的挑战,还是一个配对和解决问题的挑战,这也是其中的一部分。
And then okay like if we've built the right thing and someone wants to buy it how do we ensure. Especially given the stuff is also new. How do we ensure that they're successful with it. And so we have a deployment team so are we deployment led. Yes like all of these are a component in this system in this machine for producing AI agents and ultimately happy customers and we hope a really significant business. Awesome. That was a better answer than the one I would give it into the card. You know what could either all works or.
那么,好吧,如果我们做出了正确的产品,并且有人想要购买它,该如何确保成功?特别是考虑到这些东西也是新的。我们怎样才能确保他们能够成功使用它?因此,我们有一个部署团队,所以我们是否以部署为主导呢?是的,这些都是这个系统的一部分,都是这个用于生产AI代理以及最终让客户满意并希望建立一个非常重要的业务的机器的一部分。太棒了。这个回答比我卡片上的答案要好。你知道,要么一切都奏效,要么不奏效。
Yeah that was very good. Yeah choose one. No I mean it's it's just more complicated. And I think you know Brad and I by virtue of. You know having having worked for a while and you know seen a few movies before it's like we're able to see that and we've really tried to. In view that mentality in in the company and by the way. Right the. What is the what is the machine behind the machine that produces a agents and so on. That's a company's culture. A company's values and and so one of the one of the values we hold is craftsmanship and part of that is continuously self reflecting to self improve and that goes both individually and that goes as a company and so.
是的,那真的很好。你可以选一个。不,我的意思是,这事情更加复杂。我认为,你知道,布拉德和我因为在这行工作了一段时间,看过不少电影,所以能够看清这一点。我们真的试图在公司中灌输这种思维方式。顺便说一下,是什么促使公司背后机制运作的呢?那就是公司的文化和价值观。我们持有的价值观之一是精益求精,其中一部分是不断自我反思以实现自我提升,这既适用于个人,也适用于公司。
Whenever we screw something up. What we we do the post morning you know that week if not that day and everyone's in on it. What can we learn how can we do better how can we do this better next time. We have a slack channel internally called learn from losses and. Any form of loss right. It's like how do we learn how do we get better how do we get stronger. And so that's that's about you know kaizen self improvement improving machine. How could we make this more efficient. Our deployment team we we joke and it's not a joke their first job is to build and deploy successful a eyes that make a massive difference for our customers. Their second job in a way they're more important job is to automate themselves out of a job right to build the tooling and the documentation and the know how to make that job. You know ten times faster and.
每当我们搞砸了一件事,我们会在接下来的那天或那周进行反思,大家都会参与其中。我们会问自己:我们能从中学到什么?下次如何才能做得更好?我们内部有一个Slack频道,叫做“从失败中学习”。任何形式的失败都是一样,我们会问自己如何学习、如何提升以及如何变得更强。这就是持续改进、自我提升和改进机制的意义所在。我们如何让这一切更高效?
关于我们的部署团队,我们常开玩笑说,他们的首要任务是构建并发布成功的人工智能,为客户带来巨大影响。他们的第二项任务,也是更重要的任务,是通过构建工具、撰写文档和积累知识,让自己的工作自动化——让这个工作速度提升十倍。
And more impactful one of the other Sierra values is intensity and so I get they have. They have really good. We have. Yeah, there is there is a certain intensity. Yes, we we've thought about having t-shirts printed with like a you know kind of looks like a national parks seal with Sierra. I like to work. It was. We we we. Brett and I both like to work a lot and. So it is the team. Well what thing you know you're not you're selling something very different we called it we said that there were some similarities enterprise software but it's actually really different because you're selling.
其中一个更具影响力的西拉价值观是强度,所以我明白他们有。他们真的很好。我们有。是的,确实有一定的强度。是的,我们曾考虑过印一些T恤,就像那种看起来像国家公园印章的,印上西拉的标志。我喜欢工作。是的,我们。布雷特和我都喜欢工作很多。而且团队也是这样。不过,你知道,你在卖的东西与众不同。我们称之为虽然有些像企业软件,但实际上非常不同,因为你卖的东西实际上非常不一样。
You know a resolution you're you're selling a totally different thing. Yeah problem solved. Yeah, how do you price a problem solved. Yeah, this is one of the more interesting things we've had to figure out and. We charge in what we call a resolution based pricing where an outcome based pricing way and what that means is we only charge our customers when we fully solve the customer's problem for them their customers problem for them. And what's interesting about it is our incentives are deeply aligned with our customers. We want to get better at resolving cases at high customer satisfaction and they want to send us as many cases to resolve as possible because we cost a fraction of what it would cost to have someone on the phone taking a 20 minute phone call.
你知道,在解决方案上,你实际上是在销售一个完全不同的东西。问题解决了。那么,解决一个问题该怎么定价呢?这是我们需要弄清楚的其中一个有趣问题。我们采用一种叫做“基于解决方案的定价”或者“基于结果的定价”的方式。这意味着,我们只在完全解决客户的问题之后才向客户收费。而有趣的是,我们的激励机制与客户的需求高度一致。我们希望在提升客户满意度的同时更有效地解决问题,而客户则希望我们能尽可能多地解决他们的问题,因为我们的成本只是找个人打20分钟电话成本的一小部分。
And and so it's been this really really nice model where again kind of all of all of the incentives line up quite neatly and it's very simple to explain. It also makes the ROI calculation like what is our cost per contact today. What will it be with Sierra. Oh, that is a lot lower. Oh, I will save a lot of money on that. Oh, and our CSAT may go up. You know should I do this or not, you know, let me think. No, this seems this seems great. It's we like it because it really reflects what I think. AI represents and in particular, AI agents represent if you think about. Traditional software and tools today, they're things that help you get a job done more efficiently.
这真的是一个非常好的模式,因为所有的激励措施都能够很好地一致,且很容易解释。它还能简化投资回报率的计算,比如我们今天每个联系的成本是多少;如果使用Sierra,成本会降低多少;这将帮我们节省很多钱。而且,我们的客户满意度(CSAT)可能会上升。这样的话,我到底应不应该这么做呢?让我想一想。嗯,这看起来很不错。我们喜欢它,因为它真正反映了我对人工智能,尤其是AI代理的看法。如果你想一想,传统的软件和工具是帮助你更高效地完成工作的东西。
AI agents, the whole point is like they're just going to get the job done. Right. Here's the problem. Please solve it. And so really we think about it as charging our customers for the problem resolved. Right. The job done. The work finished and so on. It feels quite natural. And there's no guesswork in it. How many seats do I need? I don't know. Right. How many licenses do I, I was like, no, no, no, no. Just however many, however many customer issues come our way, we will handle a large fraction of those and you only pay for the ones that we do. All right. Last question. What are you most excited about in the world of AI over the next five years or so?
人工智能代理的重点在于它们可以高效完成工作。就是说,这里有个问题,请解决它。因此,我们的理念是向客户收费是基于解决了的问题,完成的工作等。这种方式感觉非常自然,也不需要猜测。我需要多少个座位?我不知道。需要多少个许可证?这都不重要。无论有多少客户问题交给我们,我们都会处理其中大部分,您只需为我们解决的问题付费。好的,最后一个问题,在未来五年中,您对人工智能领域最感兴趣的是什么?
I mean, first of all, like five years, a long time horizon. Just like look at what has happened in the last 18 months. I mean, I'm still kind of catching up from like the last five years of AI. I read a bunch of science fiction books when I was a kid. There's one book by Robert Heinlein, The Moon Is A Harsh Mistress. And the premise is basically the American Revolution, but the moon is the colonies and the earth is Great Britain and turns out the main character in this whole thing is a mainframe computer that one day after getting an additional memory chip or something wakes up. And it starts talking. It wants to develop a sense of humor. So asks the computer technician to like coach it on his jokes later.
首先,五年是一个很长的时间尺度。看看过去18个月发生了什么。我仍然在努力跟上过去五年中人工智能的发展。我小时候读过很多科幻小说,其中有一本是罗伯特·海因莱因的《月亮是严厉的老师》。其基本情节类似于美国革命,但月球是殖民地,地球是大英帝国,而故事的主要角色竟然是一台大型计算机。一天,它因为多了一个内存芯片而“苏醒”过来,开始讲话。它希望发展点幽默感,于是请计算机技术员教它讲笑话。
It has to create a photo realistic real time video of it giving a speech as the political movement leader. And I remember reading this as a teen who's like, well, I'll never live to see any of that. That sounds crazy. But in a very real sense, like everything I just described has kind of happened in the last five years. Right? You can now just talk to a computer. It understands not just the content of the context. Computers like make me a picture of anything, make me a movie of anything. Sora, I think, is just unbelievable. And I think we're probably not more than a couple of years from the first feature length film being quote filmed entirely with AI.
它需要实时生成一个照片级逼真的视频,展示政治运动领袖发表演讲。我记得我十几岁时读到这个内容时,心想我这辈子不可能看到这种事,这听起来太疯狂了。但事实上,在过去五年里,所描述的一切基本上都发生了。现在,你可以跟电脑对话,它不仅能理解内容,还能理解上下文。电脑可以为你生成任何图片,制作任何电影。我觉得这真是令人难以置信。我认为,我们可能距离第一部完全由AI“拍摄”的长篇电影只有几年时间。
And so you extrapolate like where all of this is going and what's going to be exciting. I think there are a couple of things. One, it's like, I love technology. Like I love computers. And so just getting to see and getting to see from a front row seat how this stuff evolves, I think is fascinating. It's fascinating looked it through the lens of like how we think and how computers think. It has been astonishing the extent to which anthropomorphizing about how humans think work and getting machines to think better. So let's take this step by step and show your work. It is astonishing that that works in large language models.
所以,你可以推断出这一切的发展方向,以及什么会让人感到兴奋。我认为有几点值得注意。首先,我喜欢科技,就像我喜欢计算机一样。因此,能够在第一线观察这些事物的发展演变,我觉得非常有趣。通过我们如何思考和计算机如何思考的视角来看待这个问题,真是令人惊叹。赋予机器类似人类思维的方式以及让机器更好地思考的程度让人吃惊。那么,让我们一步一步来看这个过程。在大型语言模型中,这种方法居然有效,真是让人惊讶。
And so what other things like that are we going to uncover and conversely what will we learn about our own thinking from observing the way AI is thinking. And I think that's just fascinating. The other thing and this extends kind of what's happened with video and Sora and so on. I've always had an interest in computer graphics. And this idea that you could use computers to create objects that never existed, worlds that never existed. And I think we're not far from just being able to describe right in a few sentences like this entire world that you would like to realize and just have a computer do it for you. And so like what are even computer graphics? Like what is rendering and so on? Even a couple of years out. Everything is going to look way different from kind of the tool chains and the render man's and Maya's and so on.
那么,我们还会揭示出哪些类似的东西呢?相反地,我们从观察AI的思维方式中又能学到什么关于我们自己思维的东西?我觉得这是非常有趣的。另一件事是,这扩展了视频和Sora等领域的发展。我一直对计算机图形学很感兴趣。这种利用计算机创造从未存在的物体和世界的想法深深吸引着我。我认为,我们距离仅仅用几句话就能描述出一个你想实现的完整世界并让计算机为你完成的时代并不遥远。那么,计算机图形学到底是什么?渲染到底是什么?即使是几年之后,一切看起来都会和现在使用的一些工具链、RenderMan和Maya等大不相同。
But zooming out, I think of technology as fundamentally a force multiplier for people. For companies and for organizations, I think the impact will be really profound. I think what will it be like if a company could be at its best in everything it does? And that's not only in the customer facing context that we've talked about, but what if for every regional sales forecast a large company does, they've figured out the very best ways to do that and can distill that, bottle that and run that very best forecast a thousand times in every region and sub region. Like how much more capable could the great organizations of the world be with that?
从整体上看,我认为技术基本上是人们能力的放大器。对于公司和组织来说,我觉得其影响将会非常深远。想象一下,如果一家公司在每一件事情上都能做到最好,会是什么样子?这不仅仅是在我们谈论过的客户服务方面,而是如果一家大型公司能为每一个地区的销售预测找出最佳的方法,并能够将这种方法提炼出来、存储起来,并在每个地区和子地区一千次地应用这种最佳预测。那些世界上一流的组织能够变得多么强大?
And similar we've talked about this, like what if in every call with your customers you had the equivalent of your most knowledgeable veteran grizzled support person who's seen everything and yet is still patient and friendly and the sales associate who knows everything about your products because he or she has followed your company for two decades and knows everything including the history of those products themselves. I think that's pretty neat. And then for individuals, I think it will be just incredible to have this kind of new set of tools as a creative force multiplier. And AI I think represents this fast path from having something in your head that you want to exist in the world to making it exist.
类似我们之前讨论过的,想象一下,如果在每次与客户的电话中,你都能拥有等同于你最有经验、见多识广的老牌支持人员在旁协助,他们见过一切情况,却依然耐心友好,还有一个了解你们产品的销售专家,因为他或她已经跟随公司二十年,熟悉所有产品甚至产品背后的历史。我觉得这挺不错的。对于个人而言,拥有这样一套新工具作为创造力的倍增器将会是非常令人惊叹的。我认为,人工智能就像是一条快速通道,可以把你头脑中想要存在于世界的东西迅速变为现实。
And I see that even today in my own personal life where with my eight year old in 75 minutes I can from scratch using co pilot, chat GBT and so on to help me brush up on the JavaScript syntax that is bit rotted in my own head, I can build a game from scratch with him. And I wrote my sister a personalized song for her birthday using AI in 45 seconds. It's like, right, what will this extrapolate over the next five years look like? I think again it will just dramatically accelerate this path from idea to creation to having something manifested in the world. And that to me is its promise and I consider it a real privilege to get to be alive and see all of this amazing stuff unfold.
我在我的个人生活中也看到了这种变化,比如在仅仅75分钟内,我能和我的八岁孩子从零开始,利用CoPilot、ChatGPT等工具,帮助我复习已经生疏的JavaScript语法,与他一起制作一个游戏。另外,我还用AI在45秒内为我妹妹创作了一首生日专属歌曲。这让我不禁思考:未来五年,这种发展会达到什么程度?我认为,这将极大地加速从想法到创造、再到在现实中实现的过程。这正是AI的承诺,对我来说能活在这个时代并见证这么多令人惊叹的事情是一种真正的幸运。
Well, we share your enthusiasm and we also feel very privileged to be on the journey with you guys. So thank you for coming here. Thank you. Thank you. Thanks for having me. It's a pleasure.
好吧,我们和你们一样充满热情,并且感到很荣幸能与你们一起踏上这段旅程。非常感谢你们的到来。谢谢。谢谢。很高兴能来这里。