I'm excited to introduce our first speaker, Arthur from Mistral. Arthur is the founder and CEO of Mistral AI despite just being nine months old as a company and having many fewer resources than some of the large foundation model companies so far. I think they've really shocked everybody by putting out incredibly high quality models approaching GPT4 and caliber out into the open. So we're thrilled to have Arthur with us today all the way from France to share more about the opportunity behind building an open source. And please, in interviewing Arthur will be my partner Matt Miller who is dressed in his best French wear to honor Arthur today and helps lead our efforts in Europe. So please welcome Matt and Arthur. With all the efficiency of a French train, right? Right on time. Right on time. We were sweating a little bit back there. Just walked on the door. But good to see you. Thanks for coming all this way. Thanks for being with us here at AI Sense today. Thank you for hosting us. Absolutely. Would love to maybe start with the background story of why you chose to start Mistral and maybe just take us to the beginning. We all know about your career, your successful career at DeepMind, your work on the Chinchilla paper. Tell us maybe share with us. We always love to hear at Sequoia and I know that our founder can also love to hear that spark that gave you the idea to launch and to start to break out and start your own company.
Yeah, sure. So we started the company in April, but I guess the idea was out there for a couple of months before. Simu, Tien and I were in Master Together, Guillem and I were in school together. So we knew each other from before and we had been in the field for like 10 years doing research. And so we loved the way AI progressed because of the open exchanges that occurred between academic labs, industrial labs and how everybody was able to build on top of one another.
And it was still the case, I guess, even in the beginning of the LLM era, we were opening the AI and DeepMind were actually contributing to one another roadmap. And this kind of stopped in 2022. So basically one of the last paper doing important changes to the way we trained models was Chinchilla. And that was the last model that Google ever published, last important model in the field that Google published. And so for us, it was a bit of a shame that the field stopped doing open contributions that early in the AI journey because we're very far away from finishing it.
And so when we saw Chinchilla at the end of the year, and I think we reflected on the fact that there was some opportunity for doing things differently, for doing things from friends because in France you have, as it turned out, there was a lot of talented people that were a bit bored in big tech companies. And so that's how we figured out that there was an opportunity for building very strong open source models going very fast with a lean team of experienced people and try to correct the direction that the field was taking.
So we wanted to push the open source model much more. I think we did a good job at that because we've been followed by various companies in our trajectory. Wonderful. And so it was really a lot of the open source movement was a lot of the drive behind starting the company. Yeah, that's one of the driver, our intention and the mission that we gave ourselves is really to bring AI to the hands of every developer.
And the way it was done and the way it is still done by our competitors is very close. And so we want to push a much more open platform and we want to spread the adoption and accelerate the adoption through that strategy. So that's very much at the core. The reason why we started the company did wonderful.
And just recently, I mean, fast forward to today, you released Mistral Large. You've been on this tear of like amazing partnerships with Microsoft, Snowflake, Databricks, announcements. So how do you balance the what you're going to do open source with what you're going to do commercially and how you're going to think about the trade off?
Because that's something that many open source companies contend with. How do they keep their community thriving? But then how do they also build a successful business to contribute to their community? Yeah, it's a hard question. And the way we've addressed it is currently through two families of model, but this might evolve with time. We intend to stay the leader in open source. So that kind of puts a pressure on the open source family because there's obviously some contenders out there.
I think compared to how various software providers playing this strategy developed, we need to go faster because AI develops faster than software. It develops faster than Databricks. Like MongoDB played a very good game at that. And this is a good example of what we could do, but we need to adapt faster. So yeah, there's obviously this tension and we're constantly thinking on how we should contribute to the community, but also how we should show and start getting some commercial adoption, enterprise deals, etc.
And this is obviously a tension. And for now, I think we've done a good job at doing it, but it's a very dynamic thing to think through. So it's basically every week we think of what we should release next on both families. And you have been the fastest in developing models, fastest reaching different benchmarking levels, one of the most leanest in amount of expenditure to reach these benchmarks out of any of the foundational model companies.
What do you think has given you that advantage to move quicker than your predecessors and more efficiently? I think we like to get our hands dirty. It's machine learning has always been about crunching numbers, looking at your data, doing a lot of extract, transform and load and things that are oftentimes not fascinating. And so we hired people that were willing to do that stuff. And I think that has been critical to our speed and that's something that we want to keep.
Awesome. And in addition to the large model, you also have several small models that are extremely popular. When would you tell people that they should spend their time working with you on the small models? When would you tell them working on the large models? Where do you think the economic opportunity for Mistral lies? Is it doing more at the big or doing more of the small?
I think this is an observation that every LLM provider has made that one size does not fit all. Depending on what you want to. When you make an application, you typically have different large language model calls. And some should be low latency and because they don't require a lot of intelligence, but some should be higher latency and require more intelligence. An efficient application should leverage both of them, potentially using the large models as an orchestrator for the small ones.
And I think the challenge here is how do you make sure that everything works? So you end up with a system that is not only a model, but it's really like two models plus an author loop of calling your model, calling systems, calling functions. And I think some of the developer challenge that we also want to address is how do you make sure that this works, that you can evaluate it properly, how do you make sure that you can do continuous integration? Or do you change like. Or do you move from one version to another of a model and make sure that your application has actually improved and not deteriorated?
So all of these things are addressed by various companies, but these are also things that we think should be core to our value proposition. And what are some of the most exciting things you see being built on Mistral? Like what are the things that you get really excited about that you see the community doing or customers doing? I think pretty much every young startup in the Bay Area has been using it for like fine-tuning purposes for fast application making. So really, I think one part of the value of Mistral, for instance, is that it's very fast.
And so you can make applications that are more involved. And so we've seen web search companies using us. We've seen all of the standard enterprise stuff as well, like knowledge management, marketing. The fact that you have access to the weights means that you can pour in your editorial tone much more. So that's, yeah, we see the typical use cases. I think the value is that for the open source part is that developers have control so they can deploy it everywhere.
They can have very high quality of service because they can use their dedicated instances, for instance. And they can modify the weights to suit their needs and to bump the performance to a level which is close to the largest models while being much cheaper. And what's the next big thing that we're going to get to see from you guys? Can you give us a sneak peek of what might be coming soon or how we should be expecting from Mistral? Yeah, for sure.
So we have, so Mr. Laj was good but not good enough so we are working on improving it quite heavily. We have interesting open source models on various vertical domains that we'll be announcing very soon. The platform is currently just APIs, like your serverless APIs. And so we are working on making customization part of it, so we like the fine tuning part. And obviously, and I think as many other companies, we are heavily betting on multi-lingual data and multi-lingual model because as a European company, we're also well positioned.
And this is the demand of our customers that I think is higher than here. And then, yeah, eventually in the months to come, we will also release some multi-lingual models. Okay, exciting. We'll look forward to that. As you mentioned, many of the people in this room are using Mistral models. Many of the companies we work with here every day here in the Silicon Valley ecosystem are already working with Mistral. How should they work with you and how should they work with the company and what type of, what's the best way for them to work with you? Well, they can reach out.
So we have some developer relations that are really pushing the community forward, making guides, also gathering use cases to showcase where they can build with Mistral models. So this is, we're very like investing a lot on the community. Something that basically makes the model better and that we are trying to set up is our ways for us to get evaluations, benchmarks, actual use cases on which we can evaluate our models on.
And so having like a mapping of what people are building with our model is also a way for us to make a better generation of new open source models. And so please engage with us to discuss how we can help, how, discuss your use cases. We can advertise it. We can also gather some insight of the new evaluations that we should add to our evaluation to verify that our models are getting better all the time.
And on the commercial side, our models are available on our platform. So the commercial models are actually working better than the open source ones. They're also available on various cloud providers so that it facilitates adoption for enterprises. And customization capabilities like fine tuning, which really made the value of the open source models are actually coming very soon. Wonderful. And you talked a little bit about the benefits of being in Europe. You touched on it briefly. There are already this example, global example of the great innovations that can come from Europe and are coming from Europe. But talk a little bit more about the advantages of building a business from France and like building this company from Europe. The advantage in drawbacks, I guess. Yeah, both. I guess one advantage is that you have a very strong junior pool of talent. So there's a lot of people coming from masters in France, in Poland, in the UK that we can train in three months and get them up to speed, get them basically producing as much as a million dollar engineer in the Bay Area for 10 times. Sometimes there's the cost, so that's kind of efficient. Don't tell them all that. They're going to hire people in France. The workforces are very good. Engineers and machine learning engineers. Generally speaking, we have a lot of support from the state, which is actually more important in Europe than in the US. They tend to over-regulate a bit too fast. We've been telling them not to, but they don't always listen. And then generally, European companies like to work with us because we're European and we are better in European languages, as it turns out. The French mistrialage is actually probably the strongest French model out there. So that's not an advantage, but at least there's a lot of opportunities that are going to be a little bit more difficult and it's really emerging.
Wonderful. And paint the picture for us five years from now. I know that this world is moving so fast and you just think about all the things you've gone through in the two years. It's not even two years old as a company. Almost two years old as a company. But five years from now, where does mistrialage said, what do you think you have achieved? What does this landscape look like? So our bet is that basically the platform and the infrastructure of artificial intelligence will be open. And based on that, we'll be able to create assistance and then potentially autonomous agent. And we believe that we can become this platform by being the most open platform out there, by being independent from cloud providers, etc. So in five years from now, I have literally no idea of what this is going to look like. If you looked at the field in like 2019, I don't think you could bet on where we would be today. But we are evolving toward more and more autonomous agents. We can do more and more tasks. I think the way we work is going to be changed profoundly. And making such agents and assistants is going to be easier and easier. So right now we're focusing on the developer world. But I expect that AI technology is in itself so easily controllable through human languages, human language that potentially at some point the developer becomes a user. And so we are evolving toward any user being able to create its own assistant or its own autonomous agent. I'm pretty sure that in five years from now, this will be something that you learn to do at school.
Awesome. Well, we have about five minutes left. Just want to open up in case there's any questions from the audience. Don't be shy. Sony's got a question. How do you see the future of open source versus commercial models playing out for your company? I think you made a huge splash with open source at first. As you mentioned, some of the commercial models are even better now. How do you imagine that plays out over the next handful of years? Well, I guess the one thing we optimize for is to be able to continuously produce open models with a sustainable business model to actually fuel the development of the next generation. And so that's, I think that's, as I've said, this is going to evolve with time. But in order to stay relevant, we need to stay the best at producing open source models, at least on some part of the spectrum. So that can be the small models. That can be the very big models.
And so that's very much something that basically that sets a constraint on whatever we can do. Staying relevant in the open source world, staying the best solution for developers is really our mission and we'll keep doing it. David. There's got to be questions from more than just the Sequoia partners, guys. You talked a little bit about Loma3 and Facebook and how you think about competition with them? Well, Loma3 is working on, I guess, making models. I'm not sure that we will be open source. I have no idea of what's going on there. So far, I think we've been delivering faster and smaller models. So we expect to be continuing doing it.
But generally, the good thing about open source is that it's never too much for competition because once you have several actors, normally that should actually benefit to everybody. And so there should be some, if they turn out to be very strong, there will be some cosperination and we will welcome it. One thing that's made you guys different from other proprietary model providers is the partnerships with Snowflakes and Databricks, for example, and running natively in their clouds as opposed to just having API connectivity.
Curious if you can talk about why you did those deals and then also what you see is the future of say Databricks or Snowflake in the Brave New Loma world. I guess you should ask them. But I think, generally speaking, AI models become very strong. They are connected to data and grounding information. As it turns out, the enterprise that I oftentimes either on Snowflake or on Databricks or sometimes on AWS. And so being able for customers to be able to deploy the technology exactly where their data is is, I think, quite important.
I expect that this will continue doing the case, being the case, especially as I believe we'll move on to more stateful AI deployment. So today we deploy several SAPIs with not much state. It's really like lambda functions. But as we go forward and as we make models more and more specialized, as we make them more tuned to use cases and as we make them self-improving, you will have to manage state and those could actually be part of the data cloud. So there's an open question of where do you put the AI state? And I think that's my understanding is that Snowflake and Databricks would like it to be on the other data cloud.
I think there's a question right behind him, the gray switch. I'm curious where you draw the line between openness and proprietary. So you're releasing the weights. Would you also be comfortable sharing more about how you train the models, the recipe for how you collect the data, how you do mixture of experts training, or do you draw the line at like we release the weights and the rest is proprietary? So that's where we draw the line. And I think the reason for that is that it's a very competitive landscape. And so it's similar to the tensions there is in between having some form of revenue to stay in the next generation.
And there's also a tension between what you actually disclose and everything that, yeah, in order to stay ahead of the curve and not to give your recipe to your competitors. And so again, this is the moving line. If there's also some game theory at stake, like if everybody starts doing it, then we could do it. But for now, for now we are not taking this reskinvity. I'm curious when another company releases weights for a model like Groc, for example, and you only see the weights, what kinds of practices do you guys do internally to see what you can learn from it? You can't tell on a lot of things from weights. We don't even look at it.
It's actually too big for us to deploy a Groc is quite big. Or was there any architecture learning? I guess they are using a mixture of expert pretty standard sitting with a couple of treats that I knew about. Yeah, there's not a lot of things to learn from the recipe themselves by looking at the weights. You can try to infer things, but reverse engineering is not that easy. It's basically compression information and it compresses information sufficiently highly so that you can't really find out what's going on. The cube is coming. It's okay. Yeah, I'm just curious about what are you guys going to focus on the model sizes? The oping is in that. Is you guys going to still go on a small or going to go to the larger ones? So model size are kind of set by scaling loads. It depends on the compute you have based on the learning infrastructure you want to go to, you make some choices. And so you optimize for training cost and foreign friends cost. And then there's obviously there's the weight in between, like for depends on the weight that you put on the training cost and the more you amortize it, the more you can compress models. But basically our goal is to be low latency and to be relevant from the reasoning front. So that means having a family of model that goes from the small ones to the very large ones.
Hi, are there any plans for Mistral to expand into the application stack? So for example, when I released the custom GPTs and the assistance API, is that the direction that you think that Mistral will take in the future? Yeah, so I think as I've said, we are really focusing on the developer first. But there's many, like the frontier is pretty thin in between developers and users for this technology. And so that's the reason why we released like an assistant, the multi-talk called LOSHA, which is the cat in English. And it's the point here is to expose it to enterprises as well and make them able to connect their data, connect their context. I think that answers some need from our customers that many of the people we've been talking to are willing to adopt the technology, but they need an entry point. And if you just give them APIs, they're going to say, okay, but I need an integrator. And then if you don't have an integrator at hand, and oftentimes this is the case, it's good if you have like an official solution, at least you get them into the technology and show them what they could build for their core business. So that's the reason why we now have like two products offering. And the first one, which is the platform, and then we have the Shah, which should evolve into an enterprise of the shelf solution. Or over there.
Just wondering like where would you be drawing the line between like stop doing prompt engineering and start doing like point tuning? Because like a lot of my friends and our customers are suffering from like where they stop doing more prompt engineering. I think that's the number one pain point that is hard to solve from a product standpoint. The question is, normally your workflow should be what should you evaluate on? And based on that, have your model kind of find out a way of solving your task. And so right now this is still a bit manual. You go and you have like several versions of prompting. But this is something that actually AI can help solving. And I expect that this is going to grow more and more automatically across time. And this is something that yeah, we'd love to try and enable.
I wanted to ask a bit more of a personal question. So like as a founder in the cutting edge of AI, how do you balance your time between explore and exploit? Like how do you yourself stay on top of like a feel that's rapidly evolving and becoming larger and deeper every day? How do you still talk? So I think this question is, I mean, we explore on the science part and the product part and on the business part. And the way you balance it is effectively hard for a startup. You do have to exploit a lot because you need to shift fast. But on the science part, for instance, we have like two or three people that are like working on the next generation of models. And sometimes they lose time. But if you don't do that, you are at risk of becoming irrelevant. And this is very true for the product side as well. Right now we have a very simple product. But being able to try out new features and see how they pick up is something that we need to do. And on the business part, you never know who is actually quite mature enough to use your technology. So yeah, the balance between exploitation and exploration is something that we master well at the science level because we've been doing it for years. And somehow it transcribes into the product and the business. But I guess we are currently still learning to do it properly.
So one more question for me and then I think we will be done without a time. But in the scope of two years, models big, models small that have taken the world by storm, killer go to market partnerships, just tremendous momentum at the center of the AI ecosystem. What advice would you give to founders here? What you have achieved and the pace of which you have achieved is truly extraordinary. What advice would you give to people here who are at different levels of starting and running and building their own businesses and around the AI opportunity?
I would say it's always day one. So I guess we got some mind share but there's still many proof points that we need to establish. And so being a founder is basically working up every day and figuring out that you need to build everything from scratch every time, all the time. So it's a bit exhausting but it's also exhilarating. And so I would recommend to be quite ambitious, usually being more ambitious. Ambition can get you very far. And so you should dream big. That would be my advice.