首页  >>  来自播客: The Tesla Space 更新   反馈

Tesla Reveals The New DOJO Supercomputer!

发布时间 2023-07-30 20:00:11    来源
Tesla's Artificial Intelligence Division has created a very special and very powerful new type of supercomputer, and they are calling it Dojo. This is a project that Tesla has been talking about for the past couple of years, but as of right now, Dojo is online and it is growing in power at an incredible rate expected to reach the top 5 most powerful computers in the world by early next year.
特斯拉的人工智能部门创建了一种非常特殊和非常强大的新型超级计算机,他们将其称为Dojo。这是特斯拉过去几年一直在谈论的一个项目,但现在,Dojo已经上线,并且其性能以惊人的速度增长,预计到明年年初将达到全球五大最强大的计算机之列。

Project Dojo is a much bigger deal than most people realize, and it is about to change everything for Tesla. In June 2023, Elon Musk announced that Dojo has been online and running useful tasks at Tesla data centers for a few months. This was the first confirmation we've had that Dojo is operational and ready for action. It puts to rest speculation that has been going around about just how mature this new system really is. And Elon's words were backed up by information from the new official Tesla AI Twitter account showing that Dojo's hardware is production ready in July 2023.
Dojo项目对于特斯拉来说远比大多数人意识到的重要,它即将改变特斯拉的一切。2023年6月,埃隆·马斯克宣布Dojo已经在特斯拉的数据中心在线并运行了几个月。这是我们首次得到的Dojo已经可操作并准备开始工作的确认消息。这消除了关于这个新系统有多成熟的种种猜测。埃隆的话得到了特斯拉官方AI推特账号的信息支持,显示Dojo的硬件已经于2023年7月投入生产使用。

This means that Tesla has already begun accumulating fully functional Dojo chips and building them out into larger systems of racks and cabinets as the supply continues to grow exponentially. Tesla AI is forecasting that Dojo will expand from its current testing phase to become one of the top 5 most powerful supercomputers in the world by February 2024 reaching over 30 exeflops of computing power. And from there, the production rate for Dojo hardware will continue to ramp up in volume pushing Tesla's total compute power to 100 exeflops in October 2024.
这意味着特斯拉已经开始积累完全功能的Dojo芯片,并将其组装成更大的机架和机柜系统,随着供应持续呈指数增长。特斯拉人工智能预测,到2024年2月,Dojo将从目前的测试阶段扩展为全球前5最强大的超级计算机之一,计算能力将达到30艾克斯菲洛普。之后,Dojo硬件的生产速度将继续增加,到2024年10月,特斯拉的总计算能力将达到100艾克斯菲洛普。

Now you might be wondering what the hell is an exeflop? Well, it's a way to measure the amount of work a computer can handle in one second. So one exeflop represents one quintillion computer operations per second. That is a one with 18 zeros behind it. It is an unfathomable number on the human scale. So let's put this into a metric that we all understand money. 100 exeflops of compute is equivalent to 300,000 and Vidya A100 GPUs. These have been industry standard for a few years as the top tier processing unit at the most powerful data centers in the world. These are not the kind of chips that play video games. These A100s are used to create the most advanced software on Earth. Large AI models like chat GPT and mid-journey image generator would not exist without the A100 chip. And that in turn makes these GPUs incredibly valuable. Average cost of around $10,000 per unit. So now we can take 100 exeflops, which is 300,000 A100s multiplied by $10,000 and you get $3 billion. That's what Tesla is creating with Dojo, a $3 billion supercomputer. And that's just where Dojo will be next October. According to the initial curve on Tesla's graph, it looks like they are projecting at least a 100% increase in capacity over a 1 year period.
现在你可能想知道什么是exeflop。好吧,它是衡量计算机一秒钟内可处理工作量的方式。因此,一个exeflop代表每秒一百万亿次的计算操作。这是一个后面带有18个零的数字,在人类尺度上是难以想象的。所以让我们用我们都能理解的货币来衡量。100个exeflop的计算相当于30万个Vidya A100 GPU。几年来,这些GPU一直是世界上最强大数据中心的顶级处理单元的行业标准。它们不是用来玩视频游戏的芯片。这些A100芯片用于创建地球上最先进的软件。像Chat GPT和Mid-journey Image Generator这样的大型人工智能模型将不会没有A100芯片存在。这也使得这些GPU极其有价值,平均每个单元成本约为1万美元。因此,现在我们可以将100个exeflop,即30万个A100芯片乘以1万美元,得到30亿美元。这就是特斯拉用Dojo打造的30亿美元超级计算机。而且这只是Dojo在明年十月的起点。根据特斯拉图表上的初步曲线,他们似乎预计在一年内至少增加100%的容量。

So that all sounds very exciting and cool, but just what is this Dojo thing actually going to do? Why does a car company need such a ridiculously powerful computer? There's a lot to unpack here. So just like with a real world Dojo, this computer system is designed to function as a training ground, except instead of karate, Tesla's Dojo will train artificial intelligence. More specifically, this is the new home of Tesla's full self driving neural network.
所以听起来非常令人兴奋和酷,但是这个Dojo到底是做什么的呢?为什么汽车公司需要这么一台强大得离谱的电脑?这里有很多内容需要解释。就像现实世界的道场一样,这个计算机系统的设计目的是作为一个训练场所,但与空手道不同,特斯拉的Dojo将用于训练人工智能。更具体地说,这将是特斯拉全自动驾驶神经网络的新家。

So this is the point where we can start separating Dojo from the standard definition of a supercomputer. You see, Dojo is more correctly described as an artificial intelligence training cluster, but supercomputer is a much more familiar association for the average person. So we lead off with that to establish the fundamentals.
所以这就是我们可以开始区分道场和超级计算机标准定义的地方。你知道,道场更准确地被描述为一个人工智能训练集群,但对普通人来说,超级计算机是一个更为熟悉的概念。所以我们首先使用超级计算机来建立基本概念。

These training clusters are traditionally made up of giant cabinets packed with GPUs, all buzzing away. The graphics processor that we've all been using for decades just happened to be particularly well suited for the type of calculation that is demanded by neural net training. So companies like NVIDIA and AMD just started making these much bulkier and more powerful versions of their existing designs. But Dojo is coming to the game with an entirely new approach.
传统上,这些培训集群由装满显卡的巨型机柜组成,它们一直在忙忙碌碌地运行。几十年来,我们一直在使用的图形处理器恰好非常适合神经网络训练所需要的计算类型。因此,像NVIDIA和AMD这样的公司就开始制造这些更加庞大且功能更强大的版本,但是Dojo正以全新的方法参与这场竞争。

Dojo is a bespoke hardware platform that was designed from the ground up by Tesla's AI division exclusively for use in training their latest computer vision, video based full self driving networks. The goal being to create a digital duplicate of the human visual cortex and brain function, then use that to drive a car autonomously. This involves processing vast amounts of visual data, which in this case is video captured by the vehicle's cameras.
Dojo是特斯拉的人工智能部门从零开始专门为训练其最新的计算机视觉、基于视频的全自动驾驶网络而设计的定制硬件平台。其目标是创建人类视觉皮层和大脑功能的数字副本,并利用它来实现车辆的自主驾驶。这涉及处理大量的视觉数据,即该车辆摄像头捕捉的视频。

All of that information from billions of frames of digital videos needs to be translated into a language that the AI model can understand. This is called labeling and it's exactly what it sounds like. They're just assigning a designation to a cluster of pixels so that the AI knows what it's looking at. The more labels the network has to draw from, the better it's going to get at recognizing patterns and making associations.
数十亿帧数字视频中的所有信息都需要被转化为AI模型能够理解的语言。这被称为标注,其实就是给一系列像素分配一个指示,以便AI知道它正在观察什么。网络需要有更多的标签进行参考,这样它在识别模式和建立关联方面就会变得更好。

In the past, Tesla has had human beings doing this labeling work, but that's obviously not sustainable for growing the capabilities of FSD by orders of magnitude. Eventually, you'd have to have every human being on earth working at Tesla. So in order to succeed, they need to automate and that automation comes in the form of computer power, which has now taken the form of Dojo.
在过去,特斯拉曾让人类完成这项标注工作,但显然,这对于将 FSD 的能力增长数倍来说是不可持续的。终究,你将不得不让全世界的人类都在特斯拉工作。因此,为了取得成功,他们需要实现自动化,而这种自动化的形式是计算机的力量,现在体现为 Dojo。

Tesla's work with Dojo is strikingly similar to what Apple has been doing with their own computer systems, both in philosophy and technology. Apple figured out a long time ago that it was a really effective strategy to build software and hardware that are specifically designed to work together. It results in a more efficient and higher performing device. This is something that Apple has fully realized with their new M1 and M2 powered computers. They've replaced Intel processors with their own bespoke chip that is designed specifically to run Apple software. And they're doing that in an entirely different way than any other computer on the market.
特斯拉与Dojo一起的工作与苹果在自己的计算机系统方面所做的工作非常相似,无论是在哲学上还是技术上。苹果早就发现,构建专门设计用于配合工作的软件和硬件是一种非常有效的策略。这将使设备更加高效和性能更强。苹果利用他们的新款M1和M2处理器的电脑完全实现了这一点。他们用自己定制的芯片取代了英特尔处理器,该芯片专门设计用于运行苹果软件。而他们以一种完全不同于市场上其他电脑的方式做到了这一点。

What really sets Dojo apart from the rest of the AI training industry is the move away from GPU hardware. Dojo exists on its base level as something called a system on a chip, which is an entire computer assembled on one single piece of silicon. And this is the exact same architecture that Apple used to create the M1. This method allows for a spectacular level of efficiency, because instead of having all these PCI ports and wires and motherboards and stuff all connected together, now every necessary component lives on the same little square of semiconductor material. And the more power you need, the larger you make that piece of silicon and the more processing cores you attach onto it. You can see that with the Apple's M1, M1 Pro and M1 Max chips. The M1 can't fit inside an iPad or MacBook Air, while the M1 Max is sized up for a MacBook Pro.
道场与其他AI培训行业真正不同的地方在于摆脱了GPU硬件。道场从基本层面上存在,被称为一种系统芯片,它是整个计算机组件集成在一块硅片上的。而这恰恰是苹果用来创建M1芯片的完全相同架构。这种方法具有极高的效能,因为现在不需要将所有这些PCI端口、线缆和主板等全部连接在一起,而是将每个必要的部件都放在同一小块半导体材料上。而且你需要的功率越大,你就可以增大那块硅片的大小,并附加更多的处理核心。你可以从苹果的M1、M1 Pro和M1 Max芯片中看到这一点。M1无法适应iPad或MacBook Air的尺寸,而M1 Max则适合MacBook Pro。

The Dojo chip is about the size of the palm of your hand, which is a lot smaller than an A100 GPU, but Dojo isn't supposed to exist as just a single unit. Dojo really becomes functional on the tile level, which is the point where multiple chips are fused together to function as one system. And again, this is something that Apple is also doing with their new max studio units. Their top tier M1 Ultra chip is just two M1 Max chips fused together to create one massively powerful computer.
Dojo芯片的尺寸大约和你手掌一样大,比A100 GPU小得多,但Dojo并不仅仅存在于单个单元中。Dojo在瓦片级别上实现了真正的功能,多个芯片融合在一起以作为一个系统运行。而这也是苹果使用其新的max studio单元进行的操作。他们的顶级M1 Ultra芯片实际上是两个M1 Max芯片融合在一起,创建了一台极其强大的计算机。

With the Dojo tile, Tesla has fused 25 Dojo chips to create one unified computer system, and each tile contains all of the necessary hardware for power, cooling, and data transfer. It's a self-sufficient computer in itself, made up from 25 smaller computers. Then going one level up, they integrate six tiles together into one single rack unit, and then to make one cabinet, they integrate two of the racks into one case.
通过Dojo卡片,特斯拉将25个Dojo芯片融合在一起,创建了一个统一的计算机系统,每个卡片都包含了所需的电源、冷却和数据传输硬件。它是一个自给自足的计算机系统,由25台较小的计算机组成。然后,将六个卡片整合到一个单一的机架单元中,以构建一个机柜,再将两个机柜整合到一个机箱中。

The amazing thing about this system on a chip architecture is the level of efficiency that can be reached with such minimal need for power or cooling. So looking again to Apple, their max studio with the M1 Ultra or now M2 Ultra, these are the most powerful computers that you can buy. Yet they are housed in these minuscule square boxes that can easily sit on top of your desk. They take up very little space and they only need a basic fan system for cooling. It's incredible, really. Just the power supply alone on a traditional PC desktop would be half the size of one Ultra powered max studio.
这种片上系统架构的令人惊奇之处在于,即使只需极少的功率或散热需求,也能达到如此高效率的水平。再看看苹果公司的M1 Ultra或现在的M2 Ultra最大工作室,它们是市场上你能买到的最强大的计算机。然而,它们都装在这些微小的方形盒子里,可以轻松放置在桌子上。它们占用的空间非常小,只需要一个基本的风扇系统进行冷却。真是令人难以置信。仅仅一个传统PC台式机的电源就能占到M1 Ultra的一半大小。

Hopefully that gives you a more tangible idea of how Dojo is operating. The same thing that is going on inside your new MacBook is going on inside Dojo, just on a massive scale.
希望这能让你更加具体地了解Dojo的运作方式。你新的MacBook内部发生的事情与Dojo内部发生的事情是一样的,只是规模更大而已。

So now what does this all mean? What does having this Dojo thing change for Tesla?
那么,这到底意味着什么呢?对于特斯拉来说,拥有这个道场(Dojo)会带来什么变化?

Well, thing number one is obviously that they will be able to quickly add a large amount of computing power to their AI training program at a relatively low cost to the company. Obviously any new product is going to have a very high overhead at the beginning of production, but the more Dojo chips and tiles that are produced, the more affordable they become.
首先,显然的是,他们将能够以相对较低的成本迅速为自己的AI训练项目增加大量计算能力。显然,任何新产品在开始生产时都会有很高的开销,但随着生产越来越多的Dojo芯片和瓷砖,价格也会逐渐变得更为实惠。

This also means that Tesla isn't going to the same marketplace and competing to buy the same NVIDIA chips as everyone else in their industry. As the power of AI continues to grow, so does the demand for A100 and H100 level processors. They are going to be difficult to acquire in large quantities and the price is going to reflect the demand.
这也意味着特斯拉不会和同行业的其他公司一样进入同一个市场,竞争购买NVIDIA芯片。随着人工智能的力量不断增长,对A100和H100级处理器的需求也不断增加。大量购买这些处理器将会变得困难,并且价格会反映出市场需求。

The new H100 GPU from NVIDIA is going at $40,000 per unit right now. And this inflated value placed on AI training power is something that Tesla could leverage in the future to create a whole new business model with their existing AI division.
目前,NVIDIA 公司的全新 H100 GPU 单价为 40,000 美元。AI 训练能力的这种虚高价值,将来可能成为特斯拉利用其现有 AI 部门创造全新商业模式的一个优势。

Elon Musk has said that this first version of Dojo is specifically tailored for Tesla's computer vision video labeling, which is exactly what they need for FSD and later the humanoid Tesla bot. So Dojo is not really going to be particularly useful for anything beyond that.
埃隆·马斯克表示,Dojo的首个版本专门为特斯拉的计算机视觉视频标注而设计,正是他们在全自动驾驶(FSD)和未来的人形特斯拉机器人中所需的功能。所以,Dojo实际上对其他用途并不特别有用。

But Elon says that future versions of the Dojo system will be more tailored to general purpose AI training, so it could be adapted for language models or social media algorithms or whatever else people can come up with.
但埃隆表示,未来版本的Dojo系统将更加针对通用人工智能训练,因此可以用于语言模型、社交媒体算法或任何其他人能想出的应用。

Basically, once Tesla gets their own Dojo system up to a level where it is delivering all of the compute power that they need, then every additional Dojo system that they build becomes an asset that can be monetized. Elon sees this working exactly the same as something like Amazon Web Service or Microsoft Azure. Tesla will simply rent out their excess computing power for anyone who needs it, and there will be a staggering amount of need for this service in the years to come.
基本上,一旦特斯拉将自己的Dojo系统提升到满足其所有计算需求的水平,那么他们建造的每个额外的Dojo系统都成为可以货币化的资产。埃隆认为这与Amazon Web Service或Microsoft Azure等服务完全相同。特斯拉将简单地为任何需要的人出租他们多余的计算能力,而未来几年对这项服务的需求量将会惊人。

This kind of business model is about as lucrative as it gets. Amazon Web Service is a spectacularly profitable division. This is the reason Jeff Bezos became the richest man alive. It's the reason that Amazon can sell all of this stuff so cheap and deliver so fast. It's funded by just renting out their spare server capacity.
这种商业模式非常赚钱。亚马逊网络服务是一个利润丰厚的部门。这也是为什么杰夫·贝索斯成为了世界上最富有的人的原因。这也是为什么亚马逊可以以如此低廉的价格销售所有商品并实现如此快速的交付的原因。这种模式是通过出租他们多余的服务器容量来获得资金支持的。

Web Service started out because Amazon only ever really needed their maximum server capacity for peak periods like Black Friday. The whole rest of the year, it was just sitting around doing nothing until they got the idea to rent it out. Dojo can do the exact same thing for Tesla, and this is what we call a game changer.
Web服务起源于亚马逊,因为他们只在像黑色星期五这样的高峰期需要最大的服务器容量。在一整年的其他时间里,服务器只是闲置不用,直到他们想到了将其出租的主意。道场(Dojo)对特斯拉来说可以做同样的事情,这就是我们所说的改变游戏规则。

Don't forget to give this video a thumbs up today if you liked it. That is so important for getting our content out to more people. If you enjoy the content, then you'd probably also enjoy our weekly newsletter. So sign up the link down below at the Tesla space.com.
如果你喜欢这个视频,不要忘记给它点个赞今天。这对于让更多人看到我们的内容非常重要。如果你喜欢这个内容,那么你可能也会喜欢我们的每周新闻简报。所以请在下方的链接处注册:Tesla space.com。

A huge thank you to all of our Patreon supporters who are listed on the screen now. You help us make the best content we can and we really appreciate it. Thanks for watching and we'll see you in the next one.
非常感谢我们所有在我们屏幕上列出的 Patreon 支持者。你们帮助我们制作出最好的内容,我们非常感激。感谢你们的观看,我们下次再见。



function setTranscriptHeight() { const transcriptDiv = document.querySelector('.transcript'); const rect = transcriptDiv.getBoundingClientRect(); const tranHeight = window.innerHeight - rect.top - 10; transcriptDiv.style.height = tranHeight + 'px'; if (false) { console.log('window.innerHeight', window.innerHeight); console.log('rect.top', rect.top); console.log('tranHeight', tranHeight); console.log('.transcript', document.querySelector('.transcript').getBoundingClientRect()) //console.log('.video', document.querySelector('.video').getBoundingClientRect()) console.log('.container', document.querySelector('.container').getBoundingClientRect()) } if (isMobileDevice()) { const videoDiv = document.querySelector('.video'); const videoRect = videoDiv.getBoundingClientRect(); videoDiv.style.position = 'fixed'; transcriptDiv.style.paddingTop = videoRect.bottom+'px'; } const videoDiv = document.querySelector('.video'); videoDiv.style.height = parseInt(videoDiv.getBoundingClientRect().width*390/640)+'px'; console.log('videoDiv', videoDiv.getBoundingClientRect()); console.log('videoDiv.style.height', videoDiv.style.height); } window.onload = function() { setTranscriptHeight(); }; if (!isMobileDevice()){ window.addEventListener('resize', setTranscriptHeight); }