This was the first local model I ever used that I was like, wow, clearly we can actually get within shooting distance of the closed models. And there is a way for indie developers to look like a company spending $100 million trading a model, but with way less money and with just one computer. You know, you're not really showing what the best model is, but more so like what model has the most traction and how representative do you think some of those numbers are? We've experimented with a couple different complicated ways of ranking models. So we figured to start with, we would just do raw tokens in and tokens out and that has a big con.
Alex, do you wanna take us back? And I'm curious what the inspiration was to start open routers, sort of what the conviction was, what was happening at the moment when you decided to create the project? Yeah, so it started back in February or March when I was thinking about, llama had just come out and this was the first time we were kind of getting exposure to local models. This was llama one. And it was surprising how like useful it was. So it wasn't really like a great chatbot and it wasn't reliable for knowledge retrieval, but there was something there that was worth exploring and I didn't quite know what it was.
So first project was that I started was a Chrome extension, very much following the pattern of like my current paradigm and seeing like what about AI could benefit from it. My current paradigm was crypto because I was wrapping up my time at OpenSea. I was the co-founder and CTO. Still on the board, but I like, I figured, I was, I wanted to look for something zero to one again and try to explore the AI space and figure out what sort of interesting intersections would appear. So I worked on a project similar to a Web3 wallet, but for bringing language models to the browser.
And it was a standard for front-end developers to just build API and model agnostic apps. And that was the experiment. And what became clear is that there weren't that many close models at the time. It was really just like GBT3, GBT4 had just come out and cohere. And there were a few open source models that were really tough to use. And then LAMA came out and after LAMA, a research group took LAMA and generated a bunch of synthetic data and then fine-tuned LAMA on that data and spent about $600 in the whole process. This was a group at Stanford and they created a packa.
And this was the first local model I ever used that I was like, wow, this is huge. This is something I like, clearly we can actually get within shooting distance of the close models. And there is a way for indie developers to look like a company spending $100 million trading a model, but with way less money and with just one computer. And I thought that was huge and meant that maybe we'd have a world with thousands or tens of thousands or hundreds of thousands of models. And another takeaway from this is that the data becomes more of the moat. And data becomes like a more critical part of what makes a new model unique and a new model useful where a closed or centralized model wouldn't be as good.
Even if a new model is just not as smart, even if we have like, take a person who's not as smart as somebody else, but that one person just knows a whole bunch of stuff. They just have all of these secrets locked in them that they've been exposed to, like they know how something works. The smart person's not gonna be able to just figure it out, they're just knowledge they're missing. And that is something that like, empowers the like, or allows people to take data that there's no way of selling this data and find a way to monetize it. So it creates a new economy. And I thought, well, we might need a marketplace then for models.
The closest thing at the time was hugging face, but you couldn't really use the models. And inference was tricky and often left up to the developer. And it was very hard to find the models to like figure out which ones would be good at different things. So that's why I started OpenRouter. Actually, one of the guys who was building like one of the top browser extension frameworks called Plasma, which I used to build window AI, the Chrome extension for local models, he ended up joining as co-founder and we started OpenRouter together. I love that story. And how for folks who haven't used OpenRouter, like what's the basic pitch of like, why as a developer do I want to use OpenRouter? And also like, who is the, does it tend to be skew more indie developer persona today? Or is it like, you know, you also have large fortune 500 companies or whatever, like larger companies who are building on an OpenRouter?
So today it skews indie developer for sure. And there are a couple of companies that use OpenRouter to do like benchmarking, where they run a whole bunch of tasks against new models that just emerge. And OpenRouter is the easiest API for doing that. The primary like indie, like developer use cases that we see are people building V2C apps. The top ones are either games or role playing apps or like novel writing assistants. They involve like, probably the very top one involves mostly generating code. And the code like it's, you know, rendered into top experiences. And then we have some that just help users generate code that more programming assistance. So it's really a big mix of lots of stuff. And we try to like, developers can opt in to sharing their data with us. And we give them a discount if they do. And that data we just use to classify prompts coming in and figure out which models are good at what. And we'll do, you know, we're working on a router and some ideas for creating a router to help people find the best model. And we're sort of shipping new experiments all the time.
But the primary use that people have for OpenRouter they know what model they want to use. They don't know where to get it from. Or they, you know, they want to explore models and they want to figure out which model is really good at finance or really good at roleplay or really good at programming or really good at machine translation. And we just help them find the models that people are really using a lot for those cases. It's not really an eval. It's more of a, you know, app any style, like engagement metric that we try to show for every model. I have so many random follow-up questions to this, but I'll just ask one of them right now, which is how based on that metric of, you know, you're not really showing what the best model is but more so like what model has the most traction across a specific domain.
Similar to that, like how representative, and I was looking through some of the leaderboards before we jumped on the scholarship to see where some of the models were, how representative do you think some of those numbers are with actual production usage or maybe more broadly like developer mind share. Like and the part of for folks who haven't seen the leaderboards, they should go look because it's interesting to see what's showing up at the top and right now showing up at the top are some of the cloud models. And I'm curious if that's, you know, obviously we've seen more people using cloud models because of the recent releases, but do you think it's like actually representative of mind share or some other metrics or proxies?
Yeah, it's a good question. We've experimented with a couple different complicated ways of ranking models and we're zeroing in on one that'll come in the future but we figured to start with, we would just do raw tokens in and tokens out and that has a big con. The big con, the disadvantage is that one developer who's just going bananas can skew the data for some model. And they do have to pay a lot to skew that data. So it's not like I'm not aware of anyone trying to game a model or trying to like game the rankings, but it's not ideal. I think like a closer ideal would be something like retention where of all the people who like test a model, how often are they coming back with that same kind of prompt? I mean, that's kind of what we look for in analyzing websites. You don't look at like raw traffic. You do sometimes, it's the simple metric, but what really matters is like our people sticking and coming back. So we're gonna experiment with some of those other ways of ranking models soon.
Alex, I guess kind of hitting on the creation of OpenRaughter. Curious to hear kind of from a technical perspective, what issues or roadblocks you faced when creating the platform itself and then how you're able to kind of combat some of those. There've been a couple. One was routing speed. And we wanted to build something that ideally would have almost zero latency impact on the process of routing through OpenRouter to the language model that you wanna use. And to do that, we ended up moving more and more of our logic and infrastructure to the edge. And CloudFlare is just huge here. I don't think people realize how many things CloudFlare offers. And I'm not really familiar with CloudFlare's competitors, but we ended up leveraging a lot of CloudFlare's newer features to reduce the latency of our routing process down as small as we could. And really sort of like advanced types of caching that we put in place. And we, I'll give an example. CloudFlare is a cool feature called HyperDrive that lets you execute SQL that is cached in the edge region of the user executing the SQL. And you can kind of connect HyperDrive to your database. And ideally you locate your database as close to the edge center as possible. But we found HyperDrive to be really, really effective in reducing routing latency. And just any sort of database work that we have to do is like almost zero, zero latency because of things like hard to drive in combination with a couple other components. So that was one. And we've been spent, I mean, it's kind of a continuous project. It's never gonna be fully over. There's all kinds of like caching that we're moving closer and closer to users. But I haven't heard anybody complain about routing latency in like many, many months.
Another technical challenge has been making analytics that scale. And we've, like the first thing we did wasn't terrible, but it definitely is not gonna scale. And that was building our own sort of in Postgres analytics tooling to do all of our analytics in Postgres and just move as much into Postgres as possible. The Postgres ecosystem, similar to CloudFlare, seems to be a bit underrated. There's just, there are a lot of amazing Postgres extensions and having a lot of our like very stateful logic in the Postgres ecosystem has been pretty interesting and like allows us to feel like there's just a lot of integrity in our data. Also allows us to leverage triggers which have scaled very well for us actually...
So when things happen in open router, Postgres triggers like update a lot of the analytics tables that we have or update a lot of our analytics logic. And we still have, we currently have cron jobs doing analytics that are gonna go away eventually. We're looking at timescale DB as one, like as like the next gen of all of our analytics which will unlock tons of other cool ways of ranking models. And is also just a lot more scalable for managing a really massive data set like we have now. So those have been exciting and in the more like AI sector of technical challenges, we integrate a lot of different APIs and we host a few models ourselves...
Like every time a new like a LOM provider comes up, it's APIs never the same as all the others. There's always some kind of like weird edge case that we see at pop up at some point. We have like, I think one thing that's worked to our advantage a bit is that we have this community of power users, they're not developers, they're not companies, they're just normal users who love LLMs. And they like have an open router account that they connect to apps that let you bring your own key. There's also a way of using open router to sign in with open router like to do an OAuth into some apps. I'm not familiar with like another way of doing OAuth with LLMs. So this community of power users has been really good for just teaching us things. When new models pop up or when we start in it, when we like host a new model or when we start integrating a new API, we will discover these very niche finish reasons that come out or like, you know, strange errors that will emerge that we get immediate alerts for and then we can fix. And then I think like a last answer that I would have that just reminded me of. Since we're aggregating so many models and so many APIs, we decided early on to get really crazy about type safety. So we have extremely strict type checking across the whole code base. And we sort of believe that as like a foundational and engineering principle, which is new...
每次有新的 LOM(大语言模型)提供者出现时,它的 API 总是和其他的不同。我们总是会遇到一些奇怪的边缘情况。有一件对我们有利的事情是,我们有一个强大的用户社区,他们不是开发人员,不是公司,他们只是热爱 LLM 的普通用户。他们有一个开放路由的账户,可以连接到那些允许用户使用自己的密钥的应用程序。通过开放路由登录那些支持该功能的应用程序的 OAuth 方式。我不知道还有其他方法能够与 LLMs 配合使用 OAuth。这些热心用户社区对我们帮助很大。当新的模型出现时,或者我们开始托管新模型或集成新的 API 时,我们会发现这些非常小众的实际问题或奇怪的错误,并能立即收到警报然后加以修正。另外,最后一点让我想到了。由于我们汇总了这么多模型和 API,我们很早就决定对类型安全性采取极其严谨的态度。因此,我们在整个代码库中进行极其严格的类型检查,并认为这是一个基础的工程原则。
I haven't done that at another company before and I haven't seen it really thought through as like an engineering principle. But by making it really serious for us, we just catch a lot of errors before they ever happen. And it's like for us, it's almost an imperative because there are just so many different schemas that we're working with all the time. And there are so many APIs and so many different like formats for models. And getting like really, really good alerts and really good error reporting kind of necessitates us knowing exactly the shape of the data going in and out of every pipe in the machine. Oh, I love that. A couple of quick reactions. One, the finish reason PTSD thing. I was just having flashbacks of all the horrible finish reasons that are possible that are oftentimes not fully documented. So a lot of empathy for that. I also think your comment about solving, I think that it's a very underappreciated problem space, this like analytics problem for LMs because of the volume of queries that you're often working with and like the size of the data. It's like another order of magnitude we're like trying to deal with these problems at OpenAI and it's just really hard. And like most of the off the shelf analytics tools like don't scale up to the order of magnitude that you might actually see with some of these analytics things.
I'm curious though, like your comment about this standardization or lack of standardization. It does seem that there's some amount of momentum around people standardizing on the way that OpenAI has built their API and you actually see this like even more so with the SDKs where like together AI and a bunch of other providers don't even make their own SDKs, they just wrap them directly as part of OpenAI's SDK. I'm curious to get your reaction to this, where you think that's going, shout out to the folks at Stainless who make the OpenAI SDK and obviously in conjunction with the team at OpenAI and have done an incredible job, they also do at-profits SDK and actually the CloudFlare SDK as well. So they're sort of helping some of this but it's been interesting to see how. And maybe just as one other comment to get your perspective on how potentially conversely things happened in the NFT blockchain ecosystem where it seems like all the different chains have completely independent tool chains and everything is hard to deploy projects to multiple chains, none of that really works super well. I'm curious why people have been so willing to standardize around OpenAI which seems like an interesting decision.
At a high level, I think there's a healthy, in both spaces, there's a healthy duality between standards and standard breakers or you might call it like innovation, generously and standard breakers less generously. The standards in either space, they can be good for consumers because they make it easier for new entrants to get adoption and they lower the barrier of entry for them. So it makes it easier for developers to switch their code over to a new language model or new API and that increases competition, which increases quality and reduces costs for consumers. Standards can also be bad for the consumer too. If nobody tries to improve upon the standard and everyone just locks in to the way things are, it becomes really hard for a company to get any kind of traction from deviating or making a breaking change. OpenAI had the first attempt that I'm aware of at standardizing something about the language model communication process with chatML which for anyone listening is a way of structuring a multi-term conversation with an AI where a user says something or AI responds back, the user responds with something else and maybe there's a overall context for the whole conversation called a system, the system prompts and it's really a simple standard and it's very extensible and it was the start of something like easy to set up and I mean, correct me if I'm wrong, I'm not aware of like an earlier attempt at standardizing something about the prompt.
So chatML was really interesting to us and it was part of the reason we decided that our API was going to look very similar to opening eyes and just be a superset. And I think some of that mentality probably meant a lot of other companies thought the same thing and started implementing things that looks like openAI standard and this is like, I think a really healthy standard because if you want to deviate, you can still just send a raw prompt. There's like basically an easy opt out for people who are trying to innovate on their prompt format and there's tons of people who do. There's basically like a really clear pathway for a developer to make a model with like a really unique interesting prompt format. An example is a open chat, which I believe at some point, I think they were training on, this was a research model and they were training on synthetic data and they found that the model performed better if the assistance name was like GBT for response. If the assistant thought it was GBT for, this researcher found the performance improved. So basically like doing these little tweaks to the prompt format was one way the model developer community has chosen to like innovate and deviate from the standard and they're not host. They can still plug in to a lot of the existing tools and hugging face now has like a really flexible standard for tokenizer in the tokenizer config for the chat format that lets you use JINJA which is a templating language for really, really complex chat of malformas.
所以,chatML 对我们来说非常有趣,这也是我们决定让我们的 API 看起来与 OpenAI 非常相似并且只是一个超集的部分原因。我认为这种思维方式可能让其他很多公司也产生了类似的想法,并开始实现类似于 OpenAI 标准的东西。我觉得这是一个非常健康的标准,因为如果你想要有所不同,你依然可以发送一个原始的提示。这就为那些想在提示格式上创新的人提供了一个简单的选择退出途径,事实上很多人正是这么做的。开发者基本上有一个非常清晰的路径,可以用一个独特且有趣的提示格式来创建一个模型。
一个例子是 Open Chat,我相信他们曾经在某个时候进行训练,这是一个研究模型,他们使用合成数据进行训练,并发现如果助手的名字是类似于 “GPT for response”,即如果助手认为自己是 GPT,模型的性能会更好。这位研究人员发现性能有所提升。因此,通过对提示格式进行这些小调整,是模型开发者社区选择创新和偏离标准的一种方式,他们没有被限制,因为他们仍然可以接入许多现有工具。现在,Hugging Face 在其聊天格式的分词器配置中拥有一个非常灵活的标准,它允许你使用 Jinja(一种模板语言)来处理非常复杂的聊天格式。
So back to the overall picture, I think like crypto is just very different. The, and part because it's financial fundamentally within like the Ethereum ecosystem and all the layer two's on top of it conforming to the same RPC standard as all the other miners just allows you to participate and earn mining rewards. So there's like direct financial incentive to adhere to the standard and there's direct, there's a direct financial disincentive to deviate which is really tough. And if you want to like deviate from the standard in crypto at that level, you kind of have to create new blockchain. And that's why every new blockchain, not everyone, but a lot of them have very different APIs for interacting with them as clients or as miners. And I think that's just why things work differently in that space. Yeah, just a quick reaction, Nolan, and then I'll kick it to you for a question which is when chat ML was coming out, Greg Brockman was incredibly gung ho about chat ML and was like, you know, we're gonna build this standard. It's, I think the original blog post and we've released the chat completions API was accompanied by us saying like, you know, I actually think the original iteration of that blog post was more of like, we're building this open standard yada yada yada. And I think it got sort of tampered down to like, you know, we're putting out this product and we're also releasing the spec of chat ML so that people can understand. And then sort of almost, almost immediately the sort of, even we diverged from what the standard was and like we sort of had this reference implementation that was available to the world with chat ML. And you know, what the path that we were on was like not actually keeping that, you know, quote unquote standard up to date. So I think this is a good reminder. I'm sure Greg's not listening to the blog aspect. Someone needs to ping Greg, he Greg's coding somewhere and putting out tweets about fixing machine learning optimization problem code issues. Somebody's got to ping Greg and remind him that they need to open it. I should do some active work to like actually keep that standard like somewhat up to date because it hasn't been maintained and it was actually like removed from a bunch of the stuff that they had put out.
So it's a good reminder. I'm curious, are there other parts of the API standard that drive more adoption of it in your eyes? I think it's part of my mental model is it was less that like open AI sort of figured out the right level of abstraction, which they might have. And like, I don't know if I actually have the intuition, no, if that's true or not, because I think people just became so clear that open AI was so far ahead that like, essentially if you wanted any of those users to be on your platform, you were required basically to, especially in the world where like everyone is experimenting always like you were essentially required to use that spec and the spec specifically of the SDKs so that developers didn't have to switch over to something else. And it just, from my perspective, like I think it's obviously good for open AI, but it's interesting how it really was like the first, like the first iteration. And it seems like to a certain extent, like the first iteration without any level of incremental innovation is like the thing that people are sticking with. It's like, I don't know, you know, don't make a different thing. Like let's stick with the one that open AI came up with like literally on the first shot. And that just seems, yeah, it'll be interesting to see like how much that plays out over time. Yeah, I mean, our view at OpenRatter has been like make every model work with the open AI API formats but provide extensibility and advanced features that deviate from the standard, but aren't breaking changes. For example, like our model, normally the model field in your request is just one single string with the model that you want to request. For OpenRatter, it can be an array of models. And we have like different kinds of like prompt transforms that you can configure. One of them we call middle out, which allows you to a little bit of a play on the Silicon Valley TV show joke. Take your prompt and there's research that like LLM's mostly pay attention to the beginning and end of very, very large texts and pay less attention to the text in the middle. So you can have if your prompt is too large for a model, OpenRatter can like squeeze it and remove parts from the middle in like a strategic way. So we add different things like that to our API, but try to like our overall goals to reduce switching costs between models. So it has to, you know, we try to make it really, really easy to let developers like experiment with new ones.
Alex, my question kind of ties back to your statement around the power users that you have currently. I guess kind of curious as far as use cases, as far as what those power users are using this tool for. And then kind of, you know, higher level question too, when you're originally rolling out with OpenRatter, you know, your thesis as far as who would be the end users.
Did that stay true or was there kind of a shift in personas as you kind of gradually rolled it out and got more and more feedback? I kind of curious to see where that stands. And again, kind of how it changed throughout the evolution of OpenRatter itself. We have seen more of our usage in terms of tokens and dollar volume move towards developers over time. And it's a little bit hard to know this question for sure, but we still have a good chunk, probably more than half of our users are just prosumers or power users or people using OpenRatter directly.
And, yeah, we also have like a playground where people can experiment with models and test new ones out. And the playground is often used, or used by some people is just like a consumer experience for directly chatting with models. And saving your history with all of them in one place, but locally and privately on your device. So we've been seeing the ratio, we think roughly not changed, you know, in terms of users versus developers over time, but developers have been getting more serious and more active and spending more over time.
Yeah, I would have a follow up on this. And I'm even more grateful that we're having this conversation because I think like this is, to a large extent, some of the almost the exact situation that the product that I work on, you know, many hours a day is sort of in the position of a Google AI studio. Like there's a large consumer footprint for Google AI studio because it's an easy way for folks to sort of get access to the models for free and try out what's latest, but like the core market that we're going after is developers.
I'm curious like how you've sort of navigated the world of like you can't do everything and you ultimately have to make those trade-offs and like how you've made, how you've thought about the trade-offs between like, we want this thing to be for developers potentially or but we also still like don't want to actively make a bad product for the sort of consumer oriented user persona. We struggle with this question all the time, honestly.
It is like, or I at least like think about it all the time. I think that what the biggest source of the tension between building for users and building for developers, at least to me is differentiation, just being different than everything else out there or different in a way that really like works for us and feels like the open router thing to do. The answer to this question for us right now is building a platform that works for developers and building a marketplace that allows people to discover and explore new models with ease.
And the people doing that are not just developers. But we've been, we just launched like a new marketplace UI last week that lets you see all the language models where we have, but figure out which ones are free, which ones are really popular on finance or legal topics, which ones support JSON output, which ones support tool calling and then sorting all of them by price or looking at which ones are new, sorting them all by token usage in the last week, giving people like really powerful tools for looking up models that they want to use and for exploring like how models rank and compare with each other is sort of like the big, you know, not purely developer product that we spend a lot of time on.
The playground, which we're gonna be, which we've been upgrading a lot recently is like a really easy to use developer tool for models and it looks a little consumer-y. But I think like the developers for language models are like bringing a lot of value that just the, you know, the chat room we played around is not gonna bring and we want them to feel like they're welcome. So we've been sort of building a lot of developer platform and performance improvements recently.
This is kind of a high-level question too, but I'm curious and again, kind of over the past, I mean, close to a year and a half, not two years now. Again, we've got kind of go back to all of this AI hype and Alex, I'd love to get your feedback or perspective, you know, being an OpenSea when NFT and crypto was kind of on that bull run as well. You know, what similarities you see between the AI hype and kind of those NFT crypto hype and you know, kind of where do you see that space going as we continue to move further and further of that kind of chart?
And maybe as an additional add-on to this, like anything that you have thought about doing differently with OpenSea to like, with open router this time to sort of make it resilient to like more resilient perhaps to like, you know, if we are in some crazy hype bull run with AI right now and it sort of flatlines in the future, like is there still a feasible way to make, you know, the business or the project work however you wanna frame it? With Open Router, we haven't really been building that many hedges against like the AI market into the product like, you know, if the AI market flatlines or declines, I expect we will too.
And I think that's just part of the bet. It is a bet that like when some part of the language model space is growing, a rise in tide lifts all boats and I expect other parts of the language model space or the model space to grow too. We just focus on language models today, but others will come in the future. So an examples of this, and this is similar to this part, I'll draw a similarity to crypto in a second. If you look at the rankings page on Open Router today, there are a couple moments where it just jumps up significantly. You know, a recent moment like that was the launch of CLOD, or actually a better example is the launch of CLOD 3, which happened in early March of this year.
And you see a lot of people trying out CLOD 3, just tons of tokens processed. And then you start to see open source models grow a lot as well, like they follow it up. Yeah, it's kind of one of the mysteries of life. Why that happens, I think like one theory is that the interest that CLOD brought to the ecosystem from developers who just weren't getting something out of it before or for developers who wanted to achieve something and now could achieve it in a low-cost way, it just made people look at the other things available.
It just, it made people look around and say, wow, I thought I knew AI, but then this new thing came out. What if there's something else new that I don't know about that is also underrated or undiscovered? And there's just some kind of, there's something about like developers everywhere, not just in AI, where I think this just happens all the time.
Like if a new thing emerges, you start to wonder if you were overlooking the whole space. And if there are other new things that are just hiding in corners and just need a light shown on them. And this happens in crypto too. Like there's very and we made a lot of like inside jokes at OpenSea about rising tide lifts all boats because everything is seaffined. And we spent, we would look at like the analytics of the space and every time a new project emerged, maybe it was like a new blockchain like Polygon. You could just see a ton of other projects like kind of similar to that project or other creators or apps start to just get more interest and get more traction and sort of follow the lead of the breakout project.
Additionally, when crypto, like when prices are going up, just fungible tokens like Bitcoin and ETH, when the prices are going up, NFTs values go up too. And it like will be lagged a bit and they're not super correlated. But I think it's partly because people like have more crypto liquidity and are looking for things to do with them. And NFTs are one of the ways that you can use your crypto that's like easiest and most like visually and digitally exciting.
So just like the whole rising tide lifts all boats phenomena, I love like seeing it happen. And I just started to follow it around everywhere and think about what it means. And I guess that's the like one of the biggest commonalities between the two spaces too. Yeah, I love that. I know we're almost coming up on the end of time. So one of your specs of your time, Alex, the two questions that we always close with, and I'll ask the first one and then I'll kick it to Nolan, just curious like what your personal AI tech stack looks like. And this could be from a tool perspective, like what are you using, what's getting you excited from like an actual tool, consumer tool perspective?
Yeah, I've recently started to get more into supermaven as my tool for code generation. I've just found it to be so much faster than competitors. I've been kind of like blown away. It's really hard to know, but it feels like it has more of my code and context when it generates code. And like creators seems really smart and it's also a New York based team. I'm in New York. And then they just launched the ability to chat with models, with different kinds of models with your code.
And I've started sort of like a new workflow where I'll write the code that I want. And then I just ask OpenAI or Cloud or Gemini to generate tests that just like show me which tests it thinks I should write, which unit tests would be good for the code I just wrote. And I would, you know, I accept most of them. Like they've been getting better over time. And if enough code is in context, they improve exponentially. So I like write more tests now than I ever have in my life. Cause I'm not actually writing them. And the tests I read it and check over. And I mean, the end result is so much better. So that's been exciting. Another one that I've been using is, I mean, I'm an angel investor in Devon by cognition. And we've hooked Devon up to our code base and Slack and it's improving. And particularly good at like at sort of research driven tasks that really, really need to like, you know, explore things on the internet. Devon is like the most powerful scraper I've ever seen. And consumer wise, I mean, I switch, I switch back and forth between like for just knowledge retrieval, I sort of bounce around to everything partly cause I enjoy bouncing around.
So I bounce between like open routers playground and and cloud and perplexity and chat GPT directly as my primary like knowledge retrieval places. I think there's definitely in the future, like it feels like there's gonna be some more UI differentiation that every platform does. The UIs all started the same and have gradually diverged and tried to like keep users locked in with some unique thing. I'm guessing that's gonna continue. And Alex, our final question we like to end it on is already six and a half months into 2024. We like to get the perspective of our folks we speak with as far as, you know, something that you're looking forward to or hope that happens within the next six to seven months, maybe early into 2025.
And then on the flip side, you know, kind of given everything that's going on, something that you hope does not happen, whether it has an active effect on personally, society, politics as a whole, et cetera. Kind of the two sided questions of as far as again, what you're looking forward to in something that maybe you're not. Starting with something that I'm not, or it's something that I'm worried about, I'm really worried about Taiwan. It doesn't feel like the AI industry or even the computer industry has a great plan for removing the dependency on all kinds of industry going on there, but primarily TSMC.
Like that kind of like critical component, basically not having an alternative or not having like a great plan for what happens if Taiwan, you know, if we can no longer depend on our supply chain in Taiwan, it would be so disastrous. And it's literally being verbally threatened all the time. I just haven't seen like something that's calmed me down about it. So that's one thing I think about and worry about. And in terms of like things I'm looking forward to, there's a lot of stuff. I'm, you know, I first thing that pops into my mind is a new architecture for the AI industry. Or language models. And particularly one that can do search. Maybe that's, you know, reasoning would be maybe a subset or maybe I'm misunderstanding how some language models are trying to approach reason.
But it seems kind of odd that the search process, just like doing inference and then revising your answer over and over again is being handled by apps and agents today. You get like a huge performance boost from generating code with any of the language models and then saying, do it better. Or like, you know, prompting the model to like revise the code they wrote and improve it in some way. And you don't even have to explain how you want it to improve, just think twice, not just once. Give me your best answer, not just the first answer that you thought of. And moving this sort of search process through multiple inference passes inside the model itself, I expect would be a wild, wild improvement. So I'm excited for that to happen somehow or some kind of new architecture that allows for the model to like actually think a bit.
I love that. I completely agree on both of those fronts. Alex, this was a ton of fun. I wish that we had another hour to chat about stuff. Super excited about Open Router. It was awesome to hear your perspective. And yeah, looking forward to hopefully all of the cool new improvements that you all.
我很喜欢这个。我在这两个方面完全同意。Alex,这次聊天非常有趣。真希望我们还有一个小时可以继续聊。对 Open Router 非常期待。很高兴听到你的看法。期待你们所有很酷的新改进。