An unfiltered conversation with Alex Atallah, CEO of OpenRouter

发布时间 2025-02-17 18:44:57    来源

中英文字稿  

This was the first local model I ever used that I was like, wow, clearly we can actually get within shooting distance of the closed models. And there is a way for indie developers to look like a company spending $100 million trading a model, but with way less money and with just one computer. You know, you're not really showing what the best model is, but more so like what model has the most traction and how representative do you think some of those numbers are? We've experimented with a couple different complicated ways of ranking models. So we figured to start with, we would just do raw tokens in and tokens out and that has a big con.
这是我第一次使用的本地模型,让我感受到惊讶的是,我们实际上可以接近那些封闭模型的水平。独立开发者也有办法,能让他们看起来像一家花费上亿美元训练模型的公司,但实际上只用很少的钱和一台电脑来实现。你知道,这并不是在展示最好的模型,而是展示哪个模型更受欢迎,以及你认为那些数据的代表性有多大?我们尝试过几种复杂的方法来对模型进行排名。所以我们决定,从最基础的开始,只计算输入和输出的总量,这也带来一些明显的缺点。

Alex, do you wanna take us back? And I'm curious what the inspiration was to start open routers, sort of what the conviction was, what was happening at the moment when you decided to create the project? Yeah, so it started back in February or March when I was thinking about, llama had just come out and this was the first time we were kind of getting exposure to local models. This was llama one. And it was surprising how like useful it was. So it wasn't really like a great chatbot and it wasn't reliable for knowledge retrieval, but there was something there that was worth exploring and I didn't quite know what it was.
Alex,你能否带我们回顾一下当初的情况?我很好奇是什么启发了你开始开放路由器项目,你当时的信念是什么,以及决定创建这个项目的时候发生了什么?好的,这要追溯到今年二三月份,当时我在思考,因为Llama刚刚问世,这是我们第一次接触本地模型。Llama一代让人惊讶的是,虽然它不是个优秀的聊天机器人,也不可靠于信息检索,但它还是有些值得探索的地方,只是我当时还不太清楚那是什么。

So first project was that I started was a Chrome extension, very much following the pattern of like my current paradigm and seeing like what about AI could benefit from it. My current paradigm was crypto because I was wrapping up my time at OpenSea. I was the co-founder and CTO. Still on the board, but I like, I figured, I was, I wanted to look for something zero to one again and try to explore the AI space and figure out what sort of interesting intersections would appear. So I worked on a project similar to a Web3 wallet, but for bringing language models to the browser.
我启动的第一个项目是一个Chrome浏览器扩展,这个项目是基于我当前的思维模式,探索人工智能如何从中受益。我的当前思维模式是加密货币,因为当时我刚刚结束在OpenSea的工作。我曾是OpenSea的联合创始人兼CTO,现在仍在董事会中。但我想寻找一个新的机会,从零开始,探索AI领域,并找出其中有趣的结合点。所以,我开始了一个类似于Web3钱包的项目,但这个项目是为了将语言模型引入浏览器。

And it was a standard for front-end developers to just build API and model agnostic apps. And that was the experiment. And what became clear is that there weren't that many close models at the time. It was really just like GBT3, GBT4 had just come out and cohere. And there were a few open source models that were really tough to use. And then LAMA came out and after LAMA, a research group took LAMA and generated a bunch of synthetic data and then fine-tuned LAMA on that data and spent about $600 in the whole process. This was a group at Stanford and they created a packa.
这段话的中文翻译大致如下: 对于前端开发人员来说,构建与API和模型无关的应用程序曾经是一个标准。这就是试验的内容。很明显,当时并没有那么多紧密的模型。基本上只有GBT3,GBT4刚刚推出,还有cohere。而且有一些开源模型,但它们都很难使用。然后LAMA出现了,之后一个研究小组采用了LAMA,生成了一批合成数据,并在这些数据上微调了LAMA,整个过程中花费了大约600美元。这个小组来自斯坦福大学,他们创造了一个名为package的东西。

And this was the first local model I ever used that I was like, wow, this is huge. This is something I like, clearly we can actually get within shooting distance of the close models. And there is a way for indie developers to look like a company spending $100 million trading a model, but with way less money and with just one computer. And I thought that was huge and meant that maybe we'd have a world with thousands or tens of thousands or hundreds of thousands of models. And another takeaway from this is that the data becomes more of the moat. And data becomes like a more critical part of what makes a new model unique and a new model useful where a closed or centralized model wouldn't be as good.
这是我第一次使用本地模型时感到震惊的时刻。我意识到,这个模型非常强大,明显地我们可以在成本较低的情况下接近那些大型模型。独立开发者有可能用远少于1亿美元的成本和仅仅一台计算机的资源,达到大公司训练模型的效果。我觉得这非常了不起,这可能意味着未来我们会看到成千上万甚至数以十万计的模型。此外,我还认识到数据的重要性在提升。数据成为使新模型独特且有用的关键部分,这在封闭或集中的模型中无法实现。

Even if a new model is just not as smart, even if we have like, take a person who's not as smart as somebody else, but that one person just knows a whole bunch of stuff. They just have all of these secrets locked in them that they've been exposed to, like they know how something works. The smart person's not gonna be able to just figure it out, they're just knowledge they're missing. And that is something that like, empowers the like, or allows people to take data that there's no way of selling this data and find a way to monetize it. So it creates a new economy. And I thought, well, we might need a marketplace then for models.
即使一个新模型没有那么聪明,就像一个人不如另一个人聪明,但这个人却知道很多事情。他们心里藏着很多秘密和信息,比如他们知道某些事情如何运作。那个聪明的人也无法凭空想明白,因为他们缺少那些知识。这种情况让人们能够利用那些无法直接出售的数据,找到一种方式将其变现。这就创造了一个新的经济。而我觉得,我们可能需要一个针对这些模型的市场。

The closest thing at the time was hugging face, but you couldn't really use the models. And inference was tricky and often left up to the developer. And it was very hard to find the models to like figure out which ones would be good at different things. So that's why I started OpenRouter. Actually, one of the guys who was building like one of the top browser extension frameworks called Plasma, which I used to build window AI, the Chrome extension for local models, he ended up joining as co-founder and we started OpenRouter together. I love that story. And how for folks who haven't used OpenRouter, like what's the basic pitch of like, why as a developer do I want to use OpenRouter? And also like, who is the, does it tend to be skew more indie developer persona today? Or is it like, you know, you also have large fortune 500 companies or whatever, like larger companies who are building on an OpenRouter?
当时最接近的东西是 Hugging Face,但你无法真正使用那些模型。推理过程也很棘手,通常需要开发者自己处理。而且很难找到合适的模型来理解哪些模型适合不同的任务。所以这就是我启动 OpenRouter 的原因。实际上,有一个人正在构建一个顶级浏览器扩展框架 Plasma,我曾用它来开发用于本地模型的 Chrome 扩展“窗口 AI”,他最后加入我成为联合创始人,并一起启动了 OpenRouter。我很喜欢这个故事。 对于那些还没有使用过 OpenRouter 的人,基本的推介是:作为开发者,为什么要使用 OpenRouter?另外,现在的用户是偏向于独立开发者吗?还是说也有大型的财富500强公司或其他较大的公司在使用 OpenRouter 进行开发呢?

So today it skews indie developer for sure. And there are a couple of companies that use OpenRouter to do like benchmarking, where they run a whole bunch of tasks against new models that just emerge. And OpenRouter is the easiest API for doing that. The primary like indie, like developer use cases that we see are people building V2C apps. The top ones are either games or role playing apps or like novel writing assistants. They involve like, probably the very top one involves mostly generating code. And the code like it's, you know, rendered into top experiences. And then we have some that just help users generate code that more programming assistance. So it's really a big mix of lots of stuff. And we try to like, developers can opt in to sharing their data with us. And we give them a discount if they do. And that data we just use to classify prompts coming in and figure out which models are good at what. And we'll do, you know, we're working on a router and some ideas for creating a router to help people find the best model. And we're sort of shipping new experiments all the time.
今天,这个系统主要偏向于独立开发者使用。有一些公司使用OpenRouter进行基准测试,他们用它来对刚出现的新模型执行大量任务。OpenRouter是进行这类操作最简单的API。我们看到的主要独立开发者使用场景包括人们开发V2C应用程序。最常见的是游戏、角色扮演应用或者小说写作助手。其中,最受欢迎的可能是主要用于生成代码的应用。这些代码会被呈现为顶级体验,还有一些应用专为用户生成代码,提供更多编程协助。所以,这实际上是个混合了大量不同功能的平台。我们努力让开发者可以选择与我们分享他们的数据,如果他们这么做,我们会为其提供折扣。我们将这些数据用于对传入的提示进行分类,并判断哪个模型在哪方面表现良好。我们正在研究开发一个路由器及其它计划,以帮助用户找到最佳模型,并且我们一直在推出新的实验。

But the primary use that people have for OpenRouter they know what model they want to use. They don't know where to get it from. Or they, you know, they want to explore models and they want to figure out which model is really good at finance or really good at roleplay or really good at programming or really good at machine translation. And we just help them find the models that people are really using a lot for those cases. It's not really an eval. It's more of a, you know, app any style, like engagement metric that we try to show for every model. I have so many random follow-up questions to this, but I'll just ask one of them right now, which is how based on that metric of, you know, you're not really showing what the best model is but more so like what model has the most traction across a specific domain.
但人们使用 OpenRouter 的主要目的在于他们已经知道想使用哪个模型,但不知道从哪里获取。或者他们想探索各种模型,想找出哪个模型在金融、角色扮演、编程或机器翻译方面表现很好。我们帮助他们找到在这些领域中被广泛使用的模型。这并不是真正的评估,而更像是一种应用参与度的度量方式,我们试图为每个模型展示这一点。我有很多相关的问题,但我现在只问一个,那就是基于这种度量标准,你并不是在展示哪个模型是最好的,而是展示哪个模型在特定领域里获得了最多关注。

Similar to that, like how representative, and I was looking through some of the leaderboards before we jumped on the scholarship to see where some of the models were, how representative do you think some of those numbers are with actual production usage or maybe more broadly like developer mind share. Like and the part of for folks who haven't seen the leaderboards, they should go look because it's interesting to see what's showing up at the top and right now showing up at the top are some of the cloud models. And I'm curious if that's, you know, obviously we've seen more people using cloud models because of the recent releases, but do you think it's like actually representative of mind share or some other metrics or proxies?
类似于此,就像如何代表性一样,在我们开始讨论奖学金之前,我查看了一些排行榜,看看一些模型在哪些位置。你觉得这些数字在实际生产使用中有多具代表性,或者更广泛地说,在开发者的关注度方面有多具代表性。对于那些还没有看排行榜的人来说,他们应该去看看,因为看看哪些排在前面是很有趣的。目前排在前面的是一些云模型。我很好奇,因为最近的发布让我们看到更多人使用云模型,但你认为这实际上反映了开发者关注度,还是说只是代表了一些其他的指标或替代数据?

Yeah, it's a good question. We've experimented with a couple different complicated ways of ranking models and we're zeroing in on one that'll come in the future but we figured to start with, we would just do raw tokens in and tokens out and that has a big con. The big con, the disadvantage is that one developer who's just going bananas can skew the data for some model. And they do have to pay a lot to skew that data. So it's not like I'm not aware of anyone trying to game a model or trying to like game the rankings, but it's not ideal. I think like a closer ideal would be something like retention where of all the people who like test a model, how often are they coming back with that same kind of prompt? I mean, that's kind of what we look for in analyzing websites. You don't look at like raw traffic. You do sometimes, it's the simple metric, but what really matters is like our people sticking and coming back. So we're gonna experiment with some of those other ways of ranking models soon.
是的,这是个好问题。我们尝试过几种复杂的方法来对模型进行排名,现在正逐渐聚焦到未来将采用的一种方法上。但我们决定一开始只通过输入和输出的原始词汇数来进行排名,不过这有一个很大的缺点。这个缺点就是某个开发者可能会因为非常活跃而影响某个模型的数据。虽然他们需要支付高昂的费用来这样做,所以我并不知道有人想操纵模型或操纵排名,但这并不是理想的做法。我认为更理想的做法可能是类似于留存率的衡量标准,也就是在测试一个模型的人中,有多少人会经常带着相同类型的提示词返回使用。这就像我们分析网站时关注的重点,你不会只看原始流量,虽然简单的指标有时也有用,但真正重要的是人们是否愿意停留和回来。因此,我们打算很快尝试一些其他方法来对模型进行排名。

Alex, I guess kind of hitting on the creation of OpenRaughter. Curious to hear kind of from a technical perspective, what issues or roadblocks you faced when creating the platform itself and then how you're able to kind of combat some of those. There've been a couple. One was routing speed. And we wanted to build something that ideally would have almost zero latency impact on the process of routing through OpenRouter to the language model that you wanna use. And to do that, we ended up moving more and more of our logic and infrastructure to the edge. And CloudFlare is just huge here. I don't think people realize how many things CloudFlare offers. And I'm not really familiar with CloudFlare's competitors, but we ended up leveraging a lot of CloudFlare's newer features to reduce the latency of our routing process down as small as we could. And really sort of like advanced types of caching that we put in place. And we, I'll give an example. CloudFlare is a cool feature called HyperDrive that lets you execute SQL that is cached in the edge region of the user executing the SQL. And you can kind of connect HyperDrive to your database. And ideally you locate your database as close to the edge center as possible. But we found HyperDrive to be really, really effective in reducing routing latency. And just any sort of database work that we have to do is like almost zero, zero latency because of things like hard to drive in combination with a couple other components. So that was one. And we've been spent, I mean, it's kind of a continuous project. It's never gonna be fully over. There's all kinds of like caching that we're moving closer and closer to users. But I haven't heard anybody complain about routing latency in like many, many months.
Alex,我想谈谈创建 OpenRouter 的过程。从技术角度来说,很好奇你们在创建平台时遇到了哪些问题或障碍,以及你们是如何克服这些问题的。有几个问题,其中之一是路由速度。我们希望构建一个在通过 OpenRouter 路由到你想使用的语言模型时,理想情况下对延迟几乎没有影响的平台。为此,我们逐渐将更多的逻辑和基础设施移到边缘节点上。CloudFlare 在这里发挥了巨大作用。很多人可能不知道 CloudFlare 提供了多少功能,我对 CloudFlare 的竞争对手不太熟悉,但我们利用了 CloudFlare 的许多新功能,将我们的路由过程的延迟尽可能降低。我们采取了先进的缓存策略,我举个例子。CloudFlare 有一个很酷的功能叫 HyperDrive,它允许你在用户执行 SQL 的边缘区域缓存和执行 SQL。你可以将 HyperDrive 连接到你的数据库,理想情况下,你需要将数据库尽可能靠近边缘中心的位置。但我们发现 HyperDrive 在减少路由延迟方面效果非常明显。我们任何的数据库操作几乎都做到零延迟,这得益于 HyperDrive 以及其他一些组件的结合。这是一个方面。我们花了很多时间,这实际上是一个持续进行的项目,永远不会完全结束。我们正在不断地将各种缓存移得越来越靠近用户。但已经有很多个月没有听到有人抱怨路由延迟的问题了。

Another technical challenge has been making analytics that scale. And we've, like the first thing we did wasn't terrible, but it definitely is not gonna scale. And that was building our own sort of in Postgres analytics tooling to do all of our analytics in Postgres and just move as much into Postgres as possible. The Postgres ecosystem, similar to CloudFlare, seems to be a bit underrated. There's just, there are a lot of amazing Postgres extensions and having a lot of our like very stateful logic in the Postgres ecosystem has been pretty interesting and like allows us to feel like there's just a lot of integrity in our data. Also allows us to leverage triggers which have scaled very well for us actually...
另一个技术挑战是实现能够扩展的分析功能。我们最初做的并不是太糟糕,但肯定无法扩展。当时我们是在Postgres中构建了一套自己的分析工具,尽可能多地把我们的分析工作转移到Postgres上。Postgres生态系统有点像CloudFlare,似乎被低估了。其实,Postgres有很多非常出色的扩展,将我们许多状态逻辑放在Postgres生态系统中非常有趣,这让我们对数据的完整性感到有信心。我们还利用了触发器,而触发器在我们的使用中确实扩展得很好。

So when things happen in open router, Postgres triggers like update a lot of the analytics tables that we have or update a lot of our analytics logic. And we still have, we currently have cron jobs doing analytics that are gonna go away eventually. We're looking at timescale DB as one, like as like the next gen of all of our analytics which will unlock tons of other cool ways of ranking models. And is also just a lot more scalable for managing a really massive data set like we have now. So those have been exciting and in the more like AI sector of technical challenges, we integrate a lot of different APIs and we host a few models ourselves...
当涉及到开放路由器时,Postgres会触发一些更新,以更新我们许多分析表或者我们分析逻辑的一部分。目前,我们仍然依赖使用定时任务(cron jobs)进行分析,但这些任务终将被淘汰。我们正在考虑使用TimescaleDB作为我们所有分析的下一代工具,这将开启许多很酷的模型排名方式。它也更加适合处理像我们现在这样的庞大数据集,因此这些改进令人振奋。在技术挑战的人工智能领域中,我们整合了许多不同的API,并且自行托管了一些模型。

Like every time a new like a LOM provider comes up, it's APIs never the same as all the others. There's always some kind of like weird edge case that we see at pop up at some point. We have like, I think one thing that's worked to our advantage a bit is that we have this community of power users, they're not developers, they're not companies, they're just normal users who love LLMs. And they like have an open router account that they connect to apps that let you bring your own key. There's also a way of using open router to sign in with open router like to do an OAuth into some apps. I'm not familiar with like another way of doing OAuth with LLMs. So this community of power users has been really good for just teaching us things. When new models pop up or when we start in it, when we like host a new model or when we start integrating a new API, we will discover these very niche finish reasons that come out or like, you know, strange errors that will emerge that we get immediate alerts for and then we can fix. And then I think like a last answer that I would have that just reminded me of. Since we're aggregating so many models and so many APIs, we decided early on to get really crazy about type safety. So we have extremely strict type checking across the whole code base. And we sort of believe that as like a foundational and engineering principle, which is new...
每次有新的 LOM(大语言模型)提供者出现时,它的 API 总是和其他的不同。我们总是会遇到一些奇怪的边缘情况。有一件对我们有利的事情是,我们有一个强大的用户社区,他们不是开发人员,不是公司,他们只是热爱 LLM 的普通用户。他们有一个开放路由的账户,可以连接到那些允许用户使用自己的密钥的应用程序。通过开放路由登录那些支持该功能的应用程序的 OAuth 方式。我不知道还有其他方法能够与 LLMs 配合使用 OAuth。这些热心用户社区对我们帮助很大。当新的模型出现时,或者我们开始托管新模型或集成新的 API 时,我们会发现这些非常小众的实际问题或奇怪的错误,并能立即收到警报然后加以修正。另外,最后一点让我想到了。由于我们汇总了这么多模型和 API,我们很早就决定对类型安全性采取极其严谨的态度。因此,我们在整个代码库中进行极其严格的类型检查,并认为这是一个基础的工程原则。

I haven't done that at another company before and I haven't seen it really thought through as like an engineering principle. But by making it really serious for us, we just catch a lot of errors before they ever happen. And it's like for us, it's almost an imperative because there are just so many different schemas that we're working with all the time. And there are so many APIs and so many different like formats for models. And getting like really, really good alerts and really good error reporting kind of necessitates us knowing exactly the shape of the data going in and out of every pipe in the machine. Oh, I love that. A couple of quick reactions. One, the finish reason PTSD thing. I was just having flashbacks of all the horrible finish reasons that are possible that are oftentimes not fully documented. So a lot of empathy for that. I also think your comment about solving, I think that it's a very underappreciated problem space, this like analytics problem for LMs because of the volume of queries that you're often working with and like the size of the data. It's like another order of magnitude we're like trying to deal with these problems at OpenAI and it's just really hard. And like most of the off the shelf analytics tools like don't scale up to the order of magnitude that you might actually see with some of these analytics things.
我以前没有在其他公司这样做过,也没有见过将其作为一种工程原则来深入思考。但是,当我们认真对待这件事情时,我们能够在错误发生之前就捕捉到很多错误。对我们来说,这几乎是一个必须的步骤,因为我们一直在处理各种不同的架构。有很多API,还有很多不同格式的模型。为了实现非常优秀的警报和错误报告,我们需要确切了解每一条数据管道中进出的数据形态。哦,我真喜欢这一点。有几个快速的反应。首先是完成原因的创伤后应激障碍,我刚刚闪回到所有可能可怕且常常没有完全记录的完成原因,因此对此有很多共鸣。我还认为你提到的解决问题的评论非常正确,这个关于大语言模型的分析问题往往被低估了,因为你经常处理大量的查询和数据规模。我们在OpenAI试图处理这些问题时,遇到的是另一个数量级的问题,非常困难。而且,大多数现成的分析工具都无法应对这些可能在某些分析场景中见到的数量级。

I'm curious though, like your comment about this standardization or lack of standardization. It does seem that there's some amount of momentum around people standardizing on the way that OpenAI has built their API and you actually see this like even more so with the SDKs where like together AI and a bunch of other providers don't even make their own SDKs, they just wrap them directly as part of OpenAI's SDK. I'm curious to get your reaction to this, where you think that's going, shout out to the folks at Stainless who make the OpenAI SDK and obviously in conjunction with the team at OpenAI and have done an incredible job, they also do at-profits SDK and actually the CloudFlare SDK as well. So they're sort of helping some of this but it's been interesting to see how. And maybe just as one other comment to get your perspective on how potentially conversely things happened in the NFT blockchain ecosystem where it seems like all the different chains have completely independent tool chains and everything is hard to deploy projects to multiple chains, none of that really works super well. I'm curious why people have been so willing to standardize around OpenAI which seems like an interesting decision.
我很好奇你提到的标准化或缺乏标准化的问题。似乎有很多人开始在OpenAI构建API的方式上进行标准化。尤其是在SDK方面,更是如此。像一起AI和许多其他供应商甚至不制作自己的SDK,他们只是直接把它们作为OpenAI的SDK的一部分来使用。我很想知道你对此的看法,你觉得这个趋势将如何发展。同时也想对Stainless团队表示感谢,他们与OpenAI团队合作(包括开发有利润的SDK和CloudFlare SDK),做了出色的工作,他们在推动这一发展中也起了很大的作用。不过,看到这种情况的发展确实很有意思。也希望听到你对另一种情况的看法,即NFT区块链生态系统的发展。在这个领域,各个不同的区块链有着完全独立的工具链,很难将项目部署到多个链上,这方面的效果并不理想。我很好奇为什么大家如此愿意围绕OpenAI进行标准化,这似乎是一个有趣的决策。

At a high level, I think there's a healthy, in both spaces, there's a healthy duality between standards and standard breakers or you might call it like innovation, generously and standard breakers less generously. The standards in either space, they can be good for consumers because they make it easier for new entrants to get adoption and they lower the barrier of entry for them. So it makes it easier for developers to switch their code over to a new language model or new API and that increases competition, which increases quality and reduces costs for consumers. Standards can also be bad for the consumer too. If nobody tries to improve upon the standard and everyone just locks in to the way things are, it becomes really hard for a company to get any kind of traction from deviating or making a breaking change. OpenAI had the first attempt that I'm aware of at standardizing something about the language model communication process with chatML which for anyone listening is a way of structuring a multi-term conversation with an AI where a user says something or AI responds back, the user responds with something else and maybe there's a overall context for the whole conversation called a system, the system prompts and it's really a simple standard and it's very extensible and it was the start of something like easy to set up and I mean, correct me if I'm wrong, I'm not aware of like an earlier attempt at standardizing something about the prompt.
从总体上看,我认为在这两个领域(即标准与标准打破者之间)存在一种健康的二元性。标准对于消费者而言有益,因为它能让新进入者更容易被接受,并降低其进入的门槛。因此,这使得开发者更容易将其代码转换到新的语言模型或API上,这会增加竞争,从而提高质量并降低消费者的成本。然而,标准也可能对消费者不利。如果没有人尝试改进标准,而所有人都只固守现状,那么公司就很难通过偏离或做出突破性改变来获得进展。OpenAI 首次尝试对某些与语言模型通信过程相关的事情进行标准化,也就是引入了 chatML。对于正在聆听的读者来说,chatML 是一种与AI进行多轮对话的结构化方式,其中用户说了些什么,AI做出回应,用户再做出其他回应,并且可能存在一个被称为“系统”的整体对话背景、系统提示。它是一个非常简单且具有可扩展性的标准,是某种易于设置的事物的开始。如果我说错,请纠正我;据我所知,在此之前没有尝试过对提示内容进行标准化。

So chatML was really interesting to us and it was part of the reason we decided that our API was going to look very similar to opening eyes and just be a superset. And I think some of that mentality probably meant a lot of other companies thought the same thing and started implementing things that looks like openAI standard and this is like, I think a really healthy standard because if you want to deviate, you can still just send a raw prompt. There's like basically an easy opt out for people who are trying to innovate on their prompt format and there's tons of people who do. There's basically like a really clear pathway for a developer to make a model with like a really unique interesting prompt format. An example is a open chat, which I believe at some point, I think they were training on, this was a research model and they were training on synthetic data and they found that the model performed better if the assistance name was like GBT for response. If the assistant thought it was GBT for, this researcher found the performance improved. So basically like doing these little tweaks to the prompt format was one way the model developer community has chosen to like innovate and deviate from the standard and they're not host. They can still plug in to a lot of the existing tools and hugging face now has like a really flexible standard for tokenizer in the tokenizer config for the chat format that lets you use JINJA which is a templating language for really, really complex chat of malformas.
所以,chatML 对我们来说非常有趣,这也是我们决定让我们的 API 看起来与 OpenAI 非常相似并且只是一个超集的部分原因。我认为这种思维方式可能让其他很多公司也产生了类似的想法,并开始实现类似于 OpenAI 标准的东西。我觉得这是一个非常健康的标准,因为如果你想要有所不同,你依然可以发送一个原始的提示。这就为那些想在提示格式上创新的人提供了一个简单的选择退出途径,事实上很多人正是这么做的。开发者基本上有一个非常清晰的路径,可以用一个独特且有趣的提示格式来创建一个模型。 一个例子是 Open Chat,我相信他们曾经在某个时候进行训练,这是一个研究模型,他们使用合成数据进行训练,并发现如果助手的名字是类似于 “GPT for response”,即如果助手认为自己是 GPT,模型的性能会更好。这位研究人员发现性能有所提升。因此,通过对提示格式进行这些小调整,是模型开发者社区选择创新和偏离标准的一种方式,他们没有被限制,因为他们仍然可以接入许多现有工具。现在,Hugging Face 在其聊天格式的分词器配置中拥有一个非常灵活的标准,它允许你使用 Jinja(一种模板语言)来处理非常复杂的聊天格式。

So back to the overall picture, I think like crypto is just very different. The, and part because it's financial fundamentally within like the Ethereum ecosystem and all the layer two's on top of it conforming to the same RPC standard as all the other miners just allows you to participate and earn mining rewards. So there's like direct financial incentive to adhere to the standard and there's direct, there's a direct financial disincentive to deviate which is really tough. And if you want to like deviate from the standard in crypto at that level, you kind of have to create new blockchain. And that's why every new blockchain, not everyone, but a lot of them have very different APIs for interacting with them as clients or as miners. And I think that's just why things work differently in that space. Yeah, just a quick reaction, Nolan, and then I'll kick it to you for a question which is when chat ML was coming out, Greg Brockman was incredibly gung ho about chat ML and was like, you know, we're gonna build this standard. It's, I think the original blog post and we've released the chat completions API was accompanied by us saying like, you know, I actually think the original iteration of that blog post was more of like, we're building this open standard yada yada yada. And I think it got sort of tampered down to like, you know, we're putting out this product and we're also releasing the spec of chat ML so that people can understand. And then sort of almost, almost immediately the sort of, even we diverged from what the standard was and like we sort of had this reference implementation that was available to the world with chat ML. And you know, what the path that we were on was like not actually keeping that, you know, quote unquote standard up to date. So I think this is a good reminder. I'm sure Greg's not listening to the blog aspect. Someone needs to ping Greg, he Greg's coding somewhere and putting out tweets about fixing machine learning optimization problem code issues. Somebody's got to ping Greg and remind him that they need to open it. I should do some active work to like actually keep that standard like somewhat up to date because it hasn't been maintained and it was actually like removed from a bunch of the stuff that they had put out.
回到整体情况,我觉得加密货币领域非常不一样。在以太坊生态系统及其所有符合相同RPC标准的第二层上,作为矿工,你可以参与并获得挖矿奖励。这种机制为遵循标准提供了直接的经济激励,同时偏离标准则会受到经济上的直接惩罚,这使得改变变得非常困难。如果你想在这个层面偏离标准,你就需要创造一个新的区块链。因此,许多新区块链(不是所有,但很多)在与用户或矿工交互的API方面都有不同之处。我认为这就是为什么在这个领域事情的运作方式不同。 快速反馈一下,诺兰,然后我把问题抛给你。当Chat ML推出时,Greg Brockman对其非常热衷,并表示要建立一个标准。最初的博客文章及发布的聊天完成API都伴随着这样的宣言。起初,这些宣言对于建立开放标准显得非常坚定,后来慢慢转向成“我们推出了这个产品,并发布了Chat ML的规范以便人们理解”的声明。几乎是在立刻,我们自己也偏离了所谓的标准,向世界提供了Chat ML的参考实现。我们原本的计划是不断更新这个所谓的“标准”,但实际上并没有做到。因此,这提醒我们,即使可能Greg没有关注这些博客,应该有人提醒他,Greg正忙于编写代码和推特发布关于解决机器学习优化问题的内容。有必要提醒Greg,他们需要积极地保持这个标准的某种程度上的更新,因为它一直没有得到维护,并且一些他们曾经发布的东西已经被移除了。

So it's a good reminder. I'm curious, are there other parts of the API standard that drive more adoption of it in your eyes? I think it's part of my mental model is it was less that like open AI sort of figured out the right level of abstraction, which they might have. And like, I don't know if I actually have the intuition, no, if that's true or not, because I think people just became so clear that open AI was so far ahead that like, essentially if you wanted any of those users to be on your platform, you were required basically to, especially in the world where like everyone is experimenting always like you were essentially required to use that spec and the spec specifically of the SDKs so that developers didn't have to switch over to something else. And it just, from my perspective, like I think it's obviously good for open AI, but it's interesting how it really was like the first, like the first iteration. And it seems like to a certain extent, like the first iteration without any level of incremental innovation is like the thing that people are sticking with. It's like, I don't know, you know, don't make a different thing. Like let's stick with the one that open AI came up with like literally on the first shot. And that just seems, yeah, it'll be interesting to see like how much that plays out over time. Yeah, I mean, our view at OpenRatter has been like make every model work with the open AI API formats but provide extensibility and advanced features that deviate from the standard, but aren't breaking changes. For example, like our model, normally the model field in your request is just one single string with the model that you want to request. For OpenRatter, it can be an array of models. And we have like different kinds of like prompt transforms that you can configure. One of them we call middle out, which allows you to a little bit of a play on the Silicon Valley TV show joke. Take your prompt and there's research that like LLM's mostly pay attention to the beginning and end of very, very large texts and pay less attention to the text in the middle. So you can have if your prompt is too large for a model, OpenRatter can like squeeze it and remove parts from the middle in like a strategic way. So we add different things like that to our API, but try to like our overall goals to reduce switching costs between models. So it has to, you know, we try to make it really, really easy to let developers like experiment with new ones.
好的,这是一个很好的提醒。我很好奇,在你看来,API标准的其他部分是否也促进了它的推广?我想在我的思维模式中,这部分是因为OpenAI找到了合适的抽象层次,但我不是很确定,因为我认为人们很清楚OpenAI处在一个明显的领先位置,基本上如果你想让那些用户在你的平台上,你就必须使用他们的规范,尤其是在大家都在不断尝试的环境中。特别是SDK的规范,这样开发者就不必转换到其他东西。从我的角度来看,这对OpenAI显然是好的,但有趣的是它几乎从一开始就这么做,没有逐步的创新。看起来大家都坚持使用OpenAI最初的设计,没有尝试做出不同的东西,这一点令我感到惊讶。观察这在未来会如何发展是很有趣的。 我们的看法是在OpenRatter中让每个模型都兼容OpenAI的API格式,但也提供可扩展性和不同于标准的高级功能,但不会破坏现有的体系。例如,在我们的OpenRatter中,通常你的请求中用于指定模型的字段只是一个字符串,表示你想要请求的模型。但在OpenRatter中,它可以是一个模型数组。我们有不同的提示转换功能,其中一个我们叫做"middle out",这名字取自《硅谷》电视剧中的一个笑话。它能在提示过大时,通过策略性地移除中间部分来帮助你“压缩”提示文本。我们在API中添加了很多这样的功能,但我们的总体目标是降低模型之间切换的成本,使开发者更容易尝试新的模型。

Alex, my question kind of ties back to your statement around the power users that you have currently. I guess kind of curious as far as use cases, as far as what those power users are using this tool for. And then kind of, you know, higher level question too, when you're originally rolling out with OpenRatter, you know, your thesis as far as who would be the end users.
好的,Alex,我的问题有些回到您之前提到的关于您目前的高级用户的内容。我有些好奇这些高级用户是如何使用这款工具的,以及他们具体的使用场景是什么。还有一个比较宏观的问题,当您最初推出OpenRatter时,您对于它的最终用户群体的设想是什么呢?

Did that stay true or was there kind of a shift in personas as you kind of gradually rolled it out and got more and more feedback? I kind of curious to see where that stands. And again, kind of how it changed throughout the evolution of OpenRatter itself. We have seen more of our usage in terms of tokens and dollar volume move towards developers over time. And it's a little bit hard to know this question for sure, but we still have a good chunk, probably more than half of our users are just prosumers or power users or people using OpenRatter directly.
这一直如此吗?还是在逐步推广并获得越来越多反馈的过程中,使用者的类型发生了一些变化?我对这一点很感兴趣,也想了解在OpenRatter自身的发展过程中,这种变化是如何发生的。我们注意到,随着时间的推移,开发者在我们平台上的使用量和交易额都有所增加。虽然很难准确回答这个问题,但我们的用户中仍有很大一部分,可能超过一半,是普通用户、专业用户或者直接使用OpenRatter的人。

And, yeah, we also have like a playground where people can experiment with models and test new ones out. And the playground is often used, or used by some people is just like a consumer experience for directly chatting with models. And saving your history with all of them in one place, but locally and privately on your device. So we've been seeing the ratio, we think roughly not changed, you know, in terms of users versus developers over time, but developers have been getting more serious and more active and spending more over time.
当然,我们还设有一个类似操场的平台,人们可以在上面试验和测试新的模型。这个平台通常被用作用户直接与模型互动的体验工具,可以在本地和私密的环境中保存所有的对话记录。目前,我们观察到用户与开发者的比例大致没有变化,但开发者对平台的投入越来越认真、活跃,并且花费的时间和精力也在增加。

Yeah, I would have a follow up on this. And I'm even more grateful that we're having this conversation because I think like this is, to a large extent, some of the almost the exact situation that the product that I work on, you know, many hours a day is sort of in the position of a Google AI studio. Like there's a large consumer footprint for Google AI studio because it's an easy way for folks to sort of get access to the models for free and try out what's latest, but like the core market that we're going after is developers.
好的,我想就这个问题再跟进一下。我非常感激我们能进行这次对话,因为这在很大程度上与我每天花很多时间工作的产品的处境非常相似。就像Google AI实验室一样,它有很大的用户群,因为它为人们提供了一种免费获取最新模型并进行尝试的简单方式。但我们真正瞄准的核心市场是开发者。

I'm curious like how you've sort of navigated the world of like you can't do everything and you ultimately have to make those trade-offs and like how you've made, how you've thought about the trade-offs between like, we want this thing to be for developers potentially or but we also still like don't want to actively make a bad product for the sort of consumer oriented user persona. We struggle with this question all the time, honestly.
我很好奇你是如何应对“你不可能做所有事情,最终必须做出权衡”这个问题的。你是如何思考这些权衡的,比如我们希望产品适合开发者,但同时也不希望为普通消费者用户群体打造一个糟糕的产品。说实话,我们一直在努力解决这个问题。

It is like, or I at least like think about it all the time. I think that what the biggest source of the tension between building for users and building for developers, at least to me is differentiation, just being different than everything else out there or different in a way that really like works for us and feels like the open router thing to do. The answer to this question for us right now is building a platform that works for developers and building a marketplace that allows people to discover and explore new models with ease.
这句话的意思是,我几乎一直在思考这个问题。我认为,对于我来说,用户体验和开发者需求之间最大矛盾的根源在于“差异化”,也就是要与市场上的其他东西不同,或者以一种对我们有效并符合开放路由器精神的方式与众不同。我们目前的解决方案就是创建一个开发者友好的平台,并构建一个市场,让人们能够轻松地发现和探索新模型。

And the people doing that are not just developers. But we've been, we just launched like a new marketplace UI last week that lets you see all the language models where we have, but figure out which ones are free, which ones are really popular on finance or legal topics, which ones support JSON output, which ones support tool calling and then sorting all of them by price or looking at which ones are new, sorting them all by token usage in the last week, giving people like really powerful tools for looking up models that they want to use and for exploring like how models rank and compare with each other is sort of like the big, you know, not purely developer product that we spend a lot of time on.
翻译成中文: 从事这项工作的不仅仅是开发者。我们最近推出了一个新的市场界面,它允许你查看我们拥有的所有语言模型,并找出哪些是免费的,哪些在金融或法律主题上非常受欢迎,哪些支持JSON输出,哪些支持工具调用。你可以根据价格排序,查看哪些是新的,根据上周的代币使用量进行排序。这为人们提供了非常强大的工具,方便他们查找想要使用的模型,探索模型的排名和比较情况。这是一个我们花了很多时间投入开发的产品,不仅仅是面向开发者。

The playground, which we're gonna be, which we've been upgrading a lot recently is like a really easy to use developer tool for models and it looks a little consumer-y. But I think like the developers for language models are like bringing a lot of value that just the, you know, the chat room we played around is not gonna bring and we want them to feel like they're welcome. So we've been sort of building a lot of developer platform and performance improvements recently.
我们最近对游乐场进行了很多升级,这个平台其实是一个非常易于使用的模型开发工具,看起来有点像面向消费者的产品。但我认为,语言模型的开发人员能够带来很多价值,这不仅仅是我们随便玩过的聊天室可以带来的。所以,我们希望他们感到受欢迎。为此,我们最近一直在构建许多开发者平台和性能上的改进。

This is kind of a high-level question too, but I'm curious and again, kind of over the past, I mean, close to a year and a half, not two years now. Again, we've got kind of go back to all of this AI hype and Alex, I'd love to get your feedback or perspective, you know, being an OpenSea when NFT and crypto was kind of on that bull run as well. You know, what similarities you see between the AI hype and kind of those NFT crypto hype and you know, kind of where do you see that space going as we continue to move further and further of that kind of chart?
这是一个比较宏观的问题,但我很好奇,又要回顾过去,差不多一年半的时间,还不到两年。我们又回到了所有这些人工智能的热潮上,Alex,我很想听听你的反馈或看法,你在OpenSea工作时,NFT和加密货币也正处于牛市。你觉得人工智能的热潮和NFT及加密货币的热潮有什么相似之处?随着时间推移,这个领域会如何发展?

And maybe as an additional add-on to this, like anything that you have thought about doing differently with OpenSea to like, with open router this time to sort of make it resilient to like more resilient perhaps to like, you know, if we are in some crazy hype bull run with AI right now and it sort of flatlines in the future, like is there still a feasible way to make, you know, the business or the project work however you wanna frame it? With Open Router, we haven't really been building that many hedges against like the AI market into the product like, you know, if the AI market flatlines or declines, I expect we will too.
也许可以对此做一个补充,比如,您是否考虑过在使用OpenSea时,有什么不同的做法;这次使用Open Router时,是不是有办法让它更有韧性,以应对可能正在经历的疯狂AI牛市,并在未来趋于平稳的情况下依然保持运作呢?针对Open Router项目,我们并没有在产品中建立很多对冲措施来应对AI市场的变化。如果AI市场停滞或下滑,我预计我们的项目也会受到类似的影响。

And I think that's just part of the bet. It is a bet that like when some part of the language model space is growing, a rise in tide lifts all boats and I expect other parts of the language model space or the model space to grow too. We just focus on language models today, but others will come in the future. So an examples of this, and this is similar to this part, I'll draw a similarity to crypto in a second. If you look at the rankings page on Open Router today, there are a couple moments where it just jumps up significantly. You know, a recent moment like that was the launch of CLOD, or actually a better example is the launch of CLOD 3, which happened in early March of this year.
我认为这只是赌注的一部分。当语言模型领域的某个部分正在增长时,就像潮水上涨会抬高所有的船一样,我也期待语言模型空间或模型空间的其他部分也会有所增长。虽然我们今天专注于语言模型,但未来会有其他模型出现。举个例子,这种情况类似于加密货币。如果你今天查看Open Router的排名页面,就会发现有几个时刻数据显著上升。比如最近一个这样的时刻是CLOD的发布,更好的例子是今年三月初CLOD 3的发布。

And you see a lot of people trying out CLOD 3, just tons of tokens processed. And then you start to see open source models grow a lot as well, like they follow it up. Yeah, it's kind of one of the mysteries of life. Why that happens, I think like one theory is that the interest that CLOD brought to the ecosystem from developers who just weren't getting something out of it before or for developers who wanted to achieve something and now could achieve it in a low-cost way, it just made people look at the other things available.
你会看到很多人在尝试使用CLOD 3,处理了大量的文本数据。接着,你还会发现开源模型也在大幅增长,就像在跟风一样。为什么会发生这种情况,这有点像生活中的一个谜团。我觉得其中一个理论是,CLOD吸引了那些以前从中得不到什么或者想要以低成本方式实现某些目标的开发者,增加了他们对这个生态系统的兴趣,这也让人们开始关注其他可用的东西。

It just, it made people look around and say, wow, I thought I knew AI, but then this new thing came out. What if there's something else new that I don't know about that is also underrated or undiscovered? And there's just some kind of, there's something about like developers everywhere, not just in AI, where I think this just happens all the time.
它让人们环顾四周,感叹道,哇,我以为我了解人工智能,但现在出现了这个新东西。那么是否还有其他我不知道且被低估或尚未被发现的新事物呢?我觉得,这种情况在各地的开发人员中都时常发生,不仅仅是在人工智能领域。

Like if a new thing emerges, you start to wonder if you were overlooking the whole space. And if there are other new things that are just hiding in corners and just need a light shown on them. And this happens in crypto too. Like there's very and we made a lot of like inside jokes at OpenSea about rising tide lifts all boats because everything is seaffined. And we spent, we would look at like the analytics of the space and every time a new project emerged, maybe it was like a new blockchain like Polygon. You could just see a ton of other projects like kind of similar to that project or other creators or apps start to just get more interest and get more traction and sort of follow the lead of the breakout project.
当有新事物出现时,你可能会开始怀疑自己是否忽略了整个领域。并且是否还有其他隐藏在角落里的新事物,只需要有人来发现它们。这种情况在加密货币领域也同样存在。就像我们在OpenSea开玩笑说"涨潮会托起所有船只",因为一切都是互相关联的。我们常常会查看该领域的分析数据,每当有新项目出现,比如一个新的区块链像Polygon,你就能看到许多类似的项目,或者其他创作者和应用程序开始获得更多的关注和发展,跟随那个脱颖而出的项目的步伐。

Additionally, when crypto, like when prices are going up, just fungible tokens like Bitcoin and ETH, when the prices are going up, NFTs values go up too. And it like will be lagged a bit and they're not super correlated. But I think it's partly because people like have more crypto liquidity and are looking for things to do with them. And NFTs are one of the ways that you can use your crypto that's like easiest and most like visually and digitally exciting.
此外,当加密货币市场上涨时,比如比特币和以太坊等可替代代币价格上升时,NFT的价值也会上涨。虽然这种上涨会有一些滞后,而且它们之间的关联度并不是特别高,但我认为这部分是因为人们在市场上升时会有更多的加密货币流动性,并且他们寻找使用这些加密货币的方法。NFT 是一种使用加密货币的简便方式,同时它在视觉和数字呈现上非常吸引人。

So just like the whole rising tide lifts all boats phenomena, I love like seeing it happen. And I just started to follow it around everywhere and think about what it means. And I guess that's the like one of the biggest commonalities between the two spaces too. Yeah, I love that. I know we're almost coming up on the end of time. So one of your specs of your time, Alex, the two questions that we always close with, and I'll ask the first one and then I'll kick it to Nolan, just curious like what your personal AI tech stack looks like. And this could be from a tool perspective, like what are you using, what's getting you excited from like an actual tool, consumer tool perspective?
就像那种水涨船高的现象,我非常喜欢看到这种情况的发生。我开始到处跟随它,并思考它意味着什么。我觉得这也是这两个领域之间最大的一些共同点之一。我非常喜欢这一点。我知道我们差不多快到时间的尽头了。所以,Alex,有几个关于你时间的问题,我们总是用这两个问题来结束。我会先问第一个,然后交给Nolan。首先,我很好奇你的个人AI技术栈是什么样的。这可以是从工具的角度来看,你在使用哪些工具?从实际工具或消费者工具的角度来看,哪些让你感到兴奋?

Yeah, I've recently started to get more into supermaven as my tool for code generation. I've just found it to be so much faster than competitors. I've been kind of like blown away. It's really hard to know, but it feels like it has more of my code and context when it generates code. And like creators seems really smart and it's also a New York based team. I'm in New York. And then they just launched the ability to chat with models, with different kinds of models with your code.
是的,我最近开始更加依赖于使用supermaven作为我的代码生成工具。我发现它比其他竞争产品要快得多,令我非常震惊。虽然很难下确切的结论,但它在生成代码时似乎能更好地理解我的代码和其上下文。创造者看起来非常聪明,而且他们的团队就在纽约,我也在纽约。他们刚刚推出了可以与模型进行对话的功能,使用不同类型的模型与你的代码互动。

And I've started sort of like a new workflow where I'll write the code that I want. And then I just ask OpenAI or Cloud or Gemini to generate tests that just like show me which tests it thinks I should write, which unit tests would be good for the code I just wrote. And I would, you know, I accept most of them. Like they've been getting better over time. And if enough code is in context, they improve exponentially. So I like write more tests now than I ever have in my life. Cause I'm not actually writing them. And the tests I read it and check over. And I mean, the end result is so much better. So that's been exciting. Another one that I've been using is, I mean, I'm an angel investor in Devon by cognition. And we've hooked Devon up to our code base and Slack and it's improving. And particularly good at like at sort of research driven tasks that really, really need to like, you know, explore things on the internet. Devon is like the most powerful scraper I've ever seen. And consumer wise, I mean, I switch, I switch back and forth between like for just knowledge retrieval, I sort of bounce around to everything partly cause I enjoy bouncing around.
我已经开始了一种新的工作流程,我会先写好我想要的代码,然后请像OpenAI或Cloud或Gemini这样的工具来生成测试。这些工具会告诉我,它们认为我应该编写哪些测试,哪些单元测试对我刚写的代码有帮助。我大多数都会接受它们的建议,因为这些工具随着时间的推移越来越出色。如果给它们提供足够的代码背景,它们的输出会呈指数级地提升。所以我现在比以前写了更多的测试,因为实际上这些测试不是我在写。我会阅读和检查它们,最终结果也非常不错,令人兴奋。 另外,我还在使用的一个工具是Devon,我是它的早期投资者。我们把Devon连接到了我们的代码库和Slack上,它在不断完善。Devon特别擅长于需要进行互联网探索的研究驱动任务。Devon是我见过的最强大的网页数据抓取工具。 在知识获取方面,我经常切换使用不同的工具,因为我喜欢尝试不同的东西。

So I bounce between like open routers playground and and cloud and perplexity and chat GPT directly as my primary like knowledge retrieval places. I think there's definitely in the future, like it feels like there's gonna be some more UI differentiation that every platform does. The UIs all started the same and have gradually diverged and tried to like keep users locked in with some unique thing. I'm guessing that's gonna continue. And Alex, our final question we like to end it on is already six and a half months into 2024. We like to get the perspective of our folks we speak with as far as, you know, something that you're looking forward to or hope that happens within the next six to seven months, maybe early into 2025.
所以我在开放路由器乐园、和云、Perplexity以及直接使用ChatGPT之间来回切换,把它们作为我主要的信息检索平台。我觉得未来肯定会有更多的用户界面差异化。所有平台的用户界面最初都相同,但逐渐开始有差异,并尝试通过某些独特性将用户留住。我猜测这种趋势会继续下去。Alex,我们最后一个问题通常是问:2024年已经过去六个半月了,我们希望了解你的看法,即在接下来的六到七个月中,你期待或希望发生什么事情,也许是在2025年初。

And then on the flip side, you know, kind of given everything that's going on, something that you hope does not happen, whether it has an active effect on personally, society, politics as a whole, et cetera. Kind of the two sided questions of as far as again, what you're looking forward to in something that maybe you're not. Starting with something that I'm not, or it's something that I'm worried about, I'm really worried about Taiwan. It doesn't feel like the AI industry or even the computer industry has a great plan for removing the dependency on all kinds of industry going on there, but primarily TSMC.
然后从另一方面来看,根据当前的局势,有些事情是你希望不会发生的,无论它对个人、社会、政治等方面有没有积极的影响。可以说是双面的问题,一方面是你期待的事情,另一方面是你担心的事情。先说我担心的事情,我真的很担心台湾。感觉人工智能产业甚至整个计算机行业都没有一个很好的计划来减少对台湾各类产业的依赖,尤其是台积电。

Like that kind of like critical component, basically not having an alternative or not having like a great plan for what happens if Taiwan, you know, if we can no longer depend on our supply chain in Taiwan, it would be so disastrous. And it's literally being verbally threatened all the time. I just haven't seen like something that's calmed me down about it. So that's one thing I think about and worry about. And in terms of like things I'm looking forward to, there's a lot of stuff. I'm, you know, I first thing that pops into my mind is a new architecture for the AI industry. Or language models. And particularly one that can do search. Maybe that's, you know, reasoning would be maybe a subset or maybe I'm misunderstanding how some language models are trying to approach reason.
像这样的重要组成部分,如果没有备选方案或者没有一个好的计划来应对台湾这边的供应链中断,那将是非常灾难性的。而且这已经常常被口头威胁。我只是还没有看到能让我对此感到安心的东西。所以这是我一直在考虑和担心的事情。 至于我期待的东西,有很多。首先想到的是人工智能行业或语言模型的新架构。尤其是能够进行搜索的那种。也许推理是其中的一部分,或者可能我对一些语言模型是如何尝试进行推理的理解有误。

But it seems kind of odd that the search process, just like doing inference and then revising your answer over and over again is being handled by apps and agents today. You get like a huge performance boost from generating code with any of the language models and then saying, do it better. Or like, you know, prompting the model to like revise the code they wrote and improve it in some way. And you don't even have to explain how you want it to improve, just think twice, not just once. Give me your best answer, not just the first answer that you thought of. And moving this sort of search process through multiple inference passes inside the model itself, I expect would be a wild, wild improvement. So I'm excited for that to happen somehow or some kind of new architecture that allows for the model to like actually think a bit.
但这似乎有点奇怪,因为现在的应用程序和智能助手已经在处理这样的搜索过程,就像不断地进行推断然后一次又一次地修正答案一样。你可以通过使用任何语言模型生成代码然后要求“做得更好”而获得巨大的性能提升。或者,你可以提示模型对其写的代码进行修改并以某种方式改进它。你甚至不需要解释你希望它如何改进,只需要多思考,不止一次。给我你最好的答案,而不仅仅是你最初想到的答案。我期待这种通过多次推断来进行搜索的过程在模型内部实现时,会带来巨大提升。所以我很期待能够以某种方式实现这一点,或是出现允许模型真正思考的新架构。

I love that. I completely agree on both of those fronts. Alex, this was a ton of fun. I wish that we had another hour to chat about stuff. Super excited about Open Router. It was awesome to hear your perspective. And yeah, looking forward to hopefully all of the cool new improvements that you all.
我很喜欢这个。我在这两个方面完全同意。Alex,这次聊天非常有趣。真希望我们还有一个小时可以继续聊。对 Open Router 非常期待。很高兴听到你的看法。期待你们所有很酷的新改进。