Tesla's Artificial Intelligence Division has created a very special and very powerful new type of supercomputer, and they are calling it Dojo. This is a project that Tesla has been talking about for the past couple of years, but as of right now, Dojo is online and it is growing in power at an incredible rate expected to reach the top 5 most powerful computers in the world by early next year.
Project Dojo is a much bigger deal than most people realize, and it is about to change everything for Tesla. In June 2023, Elon Musk announced that Dojo has been online and running useful tasks at Tesla data centers for a few months. This was the first confirmation we've had that Dojo is operational and ready for action. It puts to rest speculation that has been going around about just how mature this new system really is. And Elon's words were backed up by information from the new official Tesla AI Twitter account showing that Dojo's hardware is production ready in July 2023.
This means that Tesla has already begun accumulating fully functional Dojo chips and building them out into larger systems of racks and cabinets as the supply continues to grow exponentially. Tesla AI is forecasting that Dojo will expand from its current testing phase to become one of the top 5 most powerful supercomputers in the world by February 2024 reaching over 30 exeflops of computing power. And from there, the production rate for Dojo hardware will continue to ramp up in volume pushing Tesla's total compute power to 100 exeflops in October 2024.
Now you might be wondering what the hell is an exeflop? Well, it's a way to measure the amount of work a computer can handle in one second. So one exeflop represents one quintillion computer operations per second. That is a one with 18 zeros behind it. It is an unfathomable number on the human scale. So let's put this into a metric that we all understand money. 100 exeflops of compute is equivalent to 300,000 and Vidya A100 GPUs. These have been industry standard for a few years as the top tier processing unit at the most powerful data centers in the world. These are not the kind of chips that play video games. These A100s are used to create the most advanced software on Earth. Large AI models like chat GPT and mid-journey image generator would not exist without the A100 chip. And that in turn makes these GPUs incredibly valuable. Average cost of around $10,000 per unit. So now we can take 100 exeflops, which is 300,000 A100s multiplied by $10,000 and you get $3 billion. That's what Tesla is creating with Dojo, a $3 billion supercomputer. And that's just where Dojo will be next October. According to the initial curve on Tesla's graph, it looks like they are projecting at least a 100% increase in capacity over a 1 year period.
So that all sounds very exciting and cool, but just what is this Dojo thing actually going to do? Why does a car company need such a ridiculously powerful computer? There's a lot to unpack here. So just like with a real world Dojo, this computer system is designed to function as a training ground, except instead of karate, Tesla's Dojo will train artificial intelligence. More specifically, this is the new home of Tesla's full self driving neural network.
So this is the point where we can start separating Dojo from the standard definition of a supercomputer. You see, Dojo is more correctly described as an artificial intelligence training cluster, but supercomputer is a much more familiar association for the average person. So we lead off with that to establish the fundamentals.
These training clusters are traditionally made up of giant cabinets packed with GPUs, all buzzing away. The graphics processor that we've all been using for decades just happened to be particularly well suited for the type of calculation that is demanded by neural net training. So companies like NVIDIA and AMD just started making these much bulkier and more powerful versions of their existing designs. But Dojo is coming to the game with an entirely new approach.
Dojo is a bespoke hardware platform that was designed from the ground up by Tesla's AI division exclusively for use in training their latest computer vision, video based full self driving networks. The goal being to create a digital duplicate of the human visual cortex and brain function, then use that to drive a car autonomously. This involves processing vast amounts of visual data, which in this case is video captured by the vehicle's cameras.
All of that information from billions of frames of digital videos needs to be translated into a language that the AI model can understand. This is called labeling and it's exactly what it sounds like. They're just assigning a designation to a cluster of pixels so that the AI knows what it's looking at. The more labels the network has to draw from, the better it's going to get at recognizing patterns and making associations.
In the past, Tesla has had human beings doing this labeling work, but that's obviously not sustainable for growing the capabilities of FSD by orders of magnitude. Eventually, you'd have to have every human being on earth working at Tesla. So in order to succeed, they need to automate and that automation comes in the form of computer power, which has now taken the form of Dojo.
Tesla's work with Dojo is strikingly similar to what Apple has been doing with their own computer systems, both in philosophy and technology. Apple figured out a long time ago that it was a really effective strategy to build software and hardware that are specifically designed to work together. It results in a more efficient and higher performing device. This is something that Apple has fully realized with their new M1 and M2 powered computers. They've replaced Intel processors with their own bespoke chip that is designed specifically to run Apple software. And they're doing that in an entirely different way than any other computer on the market.
What really sets Dojo apart from the rest of the AI training industry is the move away from GPU hardware. Dojo exists on its base level as something called a system on a chip, which is an entire computer assembled on one single piece of silicon. And this is the exact same architecture that Apple used to create the M1. This method allows for a spectacular level of efficiency, because instead of having all these PCI ports and wires and motherboards and stuff all connected together, now every necessary component lives on the same little square of semiconductor material. And the more power you need, the larger you make that piece of silicon and the more processing cores you attach onto it. You can see that with the Apple's M1, M1 Pro and M1 Max chips. The M1 can't fit inside an iPad or MacBook Air, while the M1 Max is sized up for a MacBook Pro.
The Dojo chip is about the size of the palm of your hand, which is a lot smaller than an A100 GPU, but Dojo isn't supposed to exist as just a single unit. Dojo really becomes functional on the tile level, which is the point where multiple chips are fused together to function as one system. And again, this is something that Apple is also doing with their new max studio units. Their top tier M1 Ultra chip is just two M1 Max chips fused together to create one massively powerful computer.
With the Dojo tile, Tesla has fused 25 Dojo chips to create one unified computer system, and each tile contains all of the necessary hardware for power, cooling, and data transfer. It's a self-sufficient computer in itself, made up from 25 smaller computers. Then going one level up, they integrate six tiles together into one single rack unit, and then to make one cabinet, they integrate two of the racks into one case.
The amazing thing about this system on a chip architecture is the level of efficiency that can be reached with such minimal need for power or cooling. So looking again to Apple, their max studio with the M1 Ultra or now M2 Ultra, these are the most powerful computers that you can buy. Yet they are housed in these minuscule square boxes that can easily sit on top of your desk. They take up very little space and they only need a basic fan system for cooling. It's incredible, really. Just the power supply alone on a traditional PC desktop would be half the size of one Ultra powered max studio.
Hopefully that gives you a more tangible idea of how Dojo is operating. The same thing that is going on inside your new MacBook is going on inside Dojo, just on a massive scale.
So now what does this all mean? What does having this Dojo thing change for Tesla?
那么,这到底意味着什么呢?对于特斯拉来说,拥有这个道场(Dojo)会带来什么变化?
Well, thing number one is obviously that they will be able to quickly add a large amount of computing power to their AI training program at a relatively low cost to the company. Obviously any new product is going to have a very high overhead at the beginning of production, but the more Dojo chips and tiles that are produced, the more affordable they become.
This also means that Tesla isn't going to the same marketplace and competing to buy the same NVIDIA chips as everyone else in their industry. As the power of AI continues to grow, so does the demand for A100 and H100 level processors. They are going to be difficult to acquire in large quantities and the price is going to reflect the demand.
The new H100 GPU from NVIDIA is going at $40,000 per unit right now. And this inflated value placed on AI training power is something that Tesla could leverage in the future to create a whole new business model with their existing AI division.
目前,NVIDIA 公司的全新 H100 GPU 单价为 40,000 美元。AI 训练能力的这种虚高价值,将来可能成为特斯拉利用其现有 AI 部门创造全新商业模式的一个优势。
Elon Musk has said that this first version of Dojo is specifically tailored for Tesla's computer vision video labeling, which is exactly what they need for FSD and later the humanoid Tesla bot. So Dojo is not really going to be particularly useful for anything beyond that.
But Elon says that future versions of the Dojo system will be more tailored to general purpose AI training, so it could be adapted for language models or social media algorithms or whatever else people can come up with.
Basically, once Tesla gets their own Dojo system up to a level where it is delivering all of the compute power that they need, then every additional Dojo system that they build becomes an asset that can be monetized. Elon sees this working exactly the same as something like Amazon Web Service or Microsoft Azure. Tesla will simply rent out their excess computing power for anyone who needs it, and there will be a staggering amount of need for this service in the years to come.
基本上,一旦特斯拉将自己的Dojo系统提升到满足其所有计算需求的水平,那么他们建造的每个额外的Dojo系统都成为可以货币化的资产。埃隆认为这与Amazon Web Service或Microsoft Azure等服务完全相同。特斯拉将简单地为任何需要的人出租他们多余的计算能力,而未来几年对这项服务的需求量将会惊人。
This kind of business model is about as lucrative as it gets. Amazon Web Service is a spectacularly profitable division. This is the reason Jeff Bezos became the richest man alive. It's the reason that Amazon can sell all of this stuff so cheap and deliver so fast. It's funded by just renting out their spare server capacity.
Web Service started out because Amazon only ever really needed their maximum server capacity for peak periods like Black Friday. The whole rest of the year, it was just sitting around doing nothing until they got the idea to rent it out. Dojo can do the exact same thing for Tesla, and this is what we call a game changer.
Don't forget to give this video a thumbs up today if you liked it. That is so important for getting our content out to more people. If you enjoy the content, then you'd probably also enjoy our weekly newsletter. So sign up the link down below at the Tesla space.com.
A huge thank you to all of our Patreon supporters who are listed on the screen now. You help us make the best content we can and we really appreciate it. Thanks for watching and we'll see you in the next one.