#94 – Ilya Sutskever: Deep Learning

发布时间 2020-05-08 20:25:12 来源

摘要

Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic. Support this podcast by signing up with these sponsors: – Cash App – use code “LexPodcast” and download: – Cash App (App Store): https://apple.co/2sPrUHe – Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Ilya's Twitter: https://twitter.com/ilyasut Ilya's Website: https://www.cs.toronto.edu/~ilya/ This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon. Here's the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time. OUTLINE: 00:00 - Introduction 02:23 - AlexNet paper and the ImageNet moment 08:33 - Cost functions 13:39 - Recurrent neural networks 16:19 - Key ideas that led to success of deep learning 19:57 - What's harder to solve: language or vision? 29:35 - We're massively underestimating deep learning 36:04 - Deep double descent 41:20 - Backpropagation 42:42 - Can neural networks be made to reason? 50:35 - Long-term memory 56:37 - Language models 1:00:35 - GPT-2 1:07:14 - Active learning 1:08:52 - Staged release of AI systems 1:13:41 - How to build AGI? 1:25:00 - Question to AGI 1:32:07 - Meaning of life

GPT-4正在为你翻译摘要中......

中英文字稿

The following is a conversation with Ilya Setskever. Co-founder and chief scientist of OpenAI, one of the most cited computer scientists in history with over 165,000 citations. And to me, one of the most brilliant and insightful minds ever in the field of deep learning.

下面是与Ilya Setskever的对话。他是OpenAI的联合创始人和首席科学家，也是历史上被引用最多的计算机科学家之一，引用量超过165,000。对我来说，他是深度学习领域中最杰出、最有洞察力的思想家之一。

There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life in general than Ilya, on and off the mic. This was an honor and a pleasure.

在这个世界上，我愿意和Ilya一起深入交流和头脑风暴关于深度学习、智能和生活的话题，在麦克风上和麦克风下都是如此。这是一种荣幸和愉悦。

This conversation was recorded before the outbreak of the pandemic. For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way. Stay strong, we're in this together. We'll beat this thing.

这段对话是在疫情爆发之前录制的。对于所有感受到这场危机的医疗、心理和财务负担的人，我向你们传递爱和支持。保持坚强，我们一起面对这个挑战。我们一定能战胜它。

This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with 5 stars and have a podcast, support it on Patreon or simply connect with me on Twitter, at Lex Friedman's, spelled F-R-I-D-M-A-N.

这是《人工智能播客》。如果你喜欢，可以在YouTube上订阅，给它五星评价并拥有一个播客，支持它的Patreon，或者只是在Twitter上联系我（Lex Friedman's，拼写为F-R-I-D-M-A-N）。

As usual, I'll do a few minutes of ads now and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by CashApp. The number one finance app in the App Store.

通常我会在这里播放几分钟的广告，从不在中间插播打破谈话流程的广告。希望这样对你有帮助，不会损害听觉体验。本节目由CashApp呈现，它是应用商店中排名第一的金融应用程序。

When you get it, use code LexPodcast. CashApp, as you said, money to friends, buy Bitcoin, invest in the stock market with as little as $1. Since CashApp allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating.

当你拿到它时，使用 LexPodcast 代码。CashApp 可以像你所说的那样向朋友转钱，购买比特币，最低只需要 $1 投资股票市场。由于 CashApp 允许你购买比特币，让我提到，以货币的历史来看，加密货币是非常迷人的话题。

I recommend a cent of money as a great book on this history, both the book and audio book are great. Depends on credits on ledgers started around 30,000 years ago. The US dollar created over 200 years ago and Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago.

我推荐一本名为《一分钱的故事》的历史书，这本书和有声读物都非常出色。这本书讲述的是从3万年前开始的账目信用历史，美元诞生于200多年前，而比特币则是距今刚好10年的第一种去中心化加密货币。

So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to, and just might redefine the nature of money. So again, if you get CashApp from the App Store or Google Play and use the code LexPodcast, you get $10 and CashApp will also donate $10 to first, an organization that is helping advance robotics and STEM education for young people around the world.

基于历史，加密货币仍在其发展的早期阶段，但它仍旨在重定义货币的本质。所以，如果你从应用商店或Google Play获得CashApp并使用代号LexPodcast，你将获得$10的奖励，CashApp还将向第一个组织捐赠$10，这个组织正在帮助全球青年推进机器人技术和STEM教育。

And now here's my conversation with Ilya Satskever.

现在我要和Ilya Satskever进行交谈了。

Bluff. Bluff. Bluff. You were one of the three authors, with Alex Kyszewski, Jeff Hanton, of the famed AlexNet paper that is arguably the paper that marked the big catalytic moment that launched the deep learning revolution.

你是三位作者之一，与Alex Kyszewski和Jeff Hanton一同创作了著名的AlexNet论文。该论文被认为是深度学习革命推动的重大催化时刻。不要虚张声势。

At that time, take us back to that time. What was your intuition about neural networks, about the representation of power of neural networks? And maybe you could mention, how did that evolve over the next few years up to today, over the 10 years?

那时候，请带我们回到那时。你对神经网络，对神经网络能力的表示有什么直觉？也许你可以提及一下，接下来的几年内，甚至是过去的十年中这个领域是如何发展的，你的认识也是如何不断进化的？

Yeah, I can answer that question. At some point in about 2010 or 2011, I connected two facts in my mind. Basically, the realization was this. At some point, we realized that we can train very large, I shouldn't say very tiny, by today's standards, but large and deep neural networks end to end with back propagation.

是的，我可以回答这个问题。大约在2010年或2011年的某个时候，我在思考中联系到了两个事实。基本上，我领悟到这一点。在某个时候，我们意识到我们可以使用反向传播来逐步训练非常大的、不应该说是非常微小的，但是大而深的神经网络。

At some point, different people obtained this result. I obtained this result. The first moment in which I realized that deep neural networks are powerful was when James Martens invented the Hessian free optimizer in 2010 and he trained a 10 layer neural network and to end without pre-training from scratch.

在某个时候，不同的人得出了这个结果。我也得到了这个结果。当詹姆斯·马滕斯在2010年发明了无Hessian优化器并训练了一个有10个层的神经网络，最终不需要预先训练，我第一次意识到深度神经网络的强大。

And when that happened, I thought, this is it. Because if you can train a big neural network, a big neural network can represent very complicated function. Because if you have a neural network with 10 layers, it's as though you allow the human brain to run for some number of milliseconds, neuron firings are slow. And so in maybe 100 milliseconds, your neurons only fire 10 times. So it's also kind of like 10 layers.

当发生这种情况时，我就想到了，这就是了。因为如果你能训练一个大型神经网络，那么这个神经网络就能够表达非常复杂的功能。因为如果你有一个具有10层的神经网络，就好像允许人类大脑运行一段时间一样，神经元的触发是比较缓慢的。因此，在可能的100毫秒内，你的神经元只会触发10次。所以这也有点像10层。

And in 100 milliseconds, you can perfectly recognize any object. So I thought, so I already had the idea then that we need to train a very big neural network on lots of supervised data. And then it must succeed because we can find the best neural network. And then there's also theory that if you have more data than parameters, you want to perfect. Today, we know that actually this theory is very incomplete and you want to perfect even you have less data than parameters. But definitely, if you have more data than parameters, you want to perfect.

在100毫秒内，你可以完美地认出任何物体。所以我想，我当时已经有了这个想法，我们需要在大量的有监督数据上训练一个非常大的神经网络。然后它必须成功，因为我们可以找到最好的神经网络。而且还有一种理论，如果你的数据比参数多，你想做到完美。今天，我们知道这个理论实际上非常不完整，即使你的数据比参数少，你也想做到完美。但是，无疑，如果你的数据比参数多，你想做到完美。

So the fact that neural networks were heavily overparameterized wasn't discouraging to you. So you were thinking about the theory that the number of parameters that factors a huge number parameters is okay. It's gonna be okay.

所以，神经网络被严重过度参数化的事实并没有使你感到气馁。你在考虑一个理论，即用一定数量的参数来因素化大量参数是可以的。这将是没有问题的。

I mean, there was some evidence before that it was okay. But the theory was most the theory was that if you had a big data set and the big neural network was going to work, the overparameterization just didn't really figure much as a problem. I thought, well, with images, you're just going to add some data augmentation. It's going to be okay.

我的意思是，之前有一些证据表明这是可行的。但是大多数理论是：如果你有一个大数据集和大的神经网络，过度参数化并不会成为问题。我想，对于图像，你只需要添加一些数据增强就可以了。这样就没问题了。

So where was any doubt coming from? The main doubt was can we train a bigger, if you really have enough computer, train a big enough neural network with vector propagation? The back propagation I thought was work. The image wasn't clear whether it would be enough compute to get a very convincing result. Then at some point, Alex Kyrgyzkiy wrote these insanely fast CUDA kernels for training convolutional neural nets. Net was bam. Let's do this. Let's get image in it. And it's going to be the greatest thing.

那么，任何疑虑是从哪里来的呢？主要的疑虑是，如果你有足够的电脑，我们能否用向量传播来训练更大的神经网络？我认为反向传播会行得通。但是不清楚是否有足够的计算力来得到非常令人信服的结果。然后，在某个时候，Alex Kyrgyzkiy为训练卷积神经网络编写了极其快速的CUDA内核。网络一下子就起飞了。让我们把图像放进去。这将是最伟大的事情。

Was your intuition, most of your intuition from empirical results by you and by others? So like just actually demonstrating that a piece of program can train a 10 layer neural network? Or was there some pen and paper or marker and whiteboard thinking intuition? Because you just connected a 10 layer large neural network to the brain. So you just mentioned the brain. So in your intuition about neural networks, does the human brain come into play as a intuition builder?

你的直觉，大多是来自自己和他人的经验结果吗？比如，实际演示一段程序可以训练一个10层神经网络？还是说有些是通过纸笔或白板思考而来的直觉？因为你刚刚把一个10层的大型神经网络与大脑联系在了一起。所以，在你对神经网络的直觉中，人脑是否作为一个直觉建设者起到了作用？

Definitely. I mean, you know, you've got to be precise with these analogies between your artificial neural networks and the brain. But there is no question that the brain is a huge source of intuition and inspiration for deep learning researchers since all the way from Rosenblatt in the 60s. Like if you look at the whole idea of a neural network is directly inspired by the brain. You had people like McCollum and Pits who were saying, Hey, you got these neurons in the brain. And hey, we recently learned about the computer and automata. Can we use some ideas from the computer and automata to design some kind of computational object that's going to be simple computational and kind of like the brain and they invented the neuron.

当然。你知道，你必须在你的人工神经网络和大脑之间的这些类比上非常确切。但是毫无疑问，自60年代的罗森布拉特以来，大脑是深度学习研究人员启发灵感和感觉的巨大来源。就像如果你看整个神经网络的概念是直接受到大脑启发的。你有像麦考勒姆和皮茨这样的人说，嘿，你在大脑中有这些神经元。嘿，我们最近学到了计算机和自动机。我们能否使用一些计算机和自动机的思想来设计一种计算对象，它将是简单的计算和类似于大脑，他们发明了神经元。

So they were inspired by it back then. Then you had the convolutional neural network from Fukushima and then later Jan Lecun who said, Hey, if you limit the receptive fields of a neural network, it's going to be especially suitable for images as it turned out to be true. So there was a very small number of examples of where analogies to the brain were successful. And I thought, well, probably an artificial neuron is not that different from the brain if it's going to harden off. So let's just assume it is and roll with it.

所以当时他们受到启发。接着有来自福岛的卷积神经网络，后来又有 Jan Lecun 提出，嘿，如果限制神经网络的接收域，它将特别适用于图像，事实证明确是如此。因此，有非常少的例子表明与大脑类比是成功的。我想，嗯，如果人工神经元要硬化，那么它可能与大脑并没有太大区别。那就假设它是这样的，然后继续前进吧。

So we're not at a time where deep learning is very successful. So let us squint less and say, let's open our eyes and say, what to use an interesting difference between the human brain. Now, I know you're probably not an expert, neither in your scientists and your biologists, but loosely speaking, what's the difference between the human brain and artificial neural networks? That's interesting to you for the next decade or two. That's a good question to ask.

所以我们现在还没有开展深度学习非常成功的时候。所以，我们应该少抿眼睛，睁大眼睛，看看人类大脑的一个有趣不同点。现在，我知道你可能不是专家，也不是科学家或生物学家，但大体上说，人类大脑和人工神经网络有什么区别？这对你来说是很有意思的，未来十年或二十年都是一个很好的问题。

What is an interesting difference between the neural network between the brain and our artificial neural networks? So I feel like today artificial neural networks, so we all agree that there are certain dimensions in which the human brain vastly outperforms our models. What I also think that there are some ways in which artificial neural networks have a number of very important advantages over the brain. Look, looking at the advantages versus disadvantages is a good way to figure out what is the important difference.

人脑和人造神经网络之间有什么有趣的不同之处？我感觉今天的人造神经网络，我们都同意在某些方面人脑表现优异，而我也认为人造神经网络在某些方面有一些非常重要的优势。看看优势与劣势有助于找出重要的不同之处。

So the brain uses spikes, which may or may not be important. Yes, that's a really interesting question. Do you think it's important or not? That's one big architectural difference between artificial neural networks. It's hard to tell, but my prior is not very high and I can say why. You know, there are people who are interested in spike in neural networks. And basically, what they figured out is that they need to simulate the non-spike in neural networks in spikes. And that's how they're going to make them work.

所以大脑使用尖峰，它可能很重要，也可能不重要。是的，这是一个非常有趣的问题。你认为它重要吗？这是人工神经网络之间的一个重大架构差异。很难说，但我观点不是很高，我可以解释一下。你知道，有些人对神经网络中的尖峰很感兴趣。基本上，他们发现他们需要在尖峰中模拟神经网络中的非尖峰，这就是他们的工作原理。

If you don't simulate the non-spike in neural networks in spikes, it's not going to work because the question is why should it work? And that connects to questions around back propagation and questions around deep learning. You got this giant neural network. Why should it work at all? Why should the learning rule work at all? It's not a self-evident question, especially if you, let's say, if you were just starting in the field and you read the very early papers, you can say, hey, people are saying, let's build neural networks. That's a great idea because the brain is a neural network, so it would be useful to build neural networks. Now, let's figure out how to train them. It should be possible to train them probably, but how?

如果你在神经元网络中没有模拟非尖峰性信号，它就不会工作，因为问题在于为什么它能工作？这与反向传播和深度学习的问题相关。你有一个巨大的神经网络。为什么它能工作？学习规则为什么能工作？这不是显而易见的问题，特别是如果你是该领域的新手，阅读早期论文时，你会发现人们说：让我们建立神经网络，这是个好主意，因为大脑就是一个神经网络，所以建立神经网络会很有用。现在，让我们来想想如何训练它们。应该有可能训练它们，但如何呢？

And so the big idea is the cost function. That's the big idea. The cost function is a way of measuring the performance of the system according to some measure. By the way, that is a big. Actually, let me think. Is that one a difficult idea to arrive at and how big of an idea is that? That there's a single cost function.

所以一个重要的想法就是成本函数。这是个大想法。成本函数是根据某种度量来衡量系统性能的一种方式。顺便说一句，这是个重大的概念。实际上，让我想想。这是不是很难想到的想法，有多么重要？那就是存在一个单一的成本函数。

Sorry, let me take a pause.

对不起，让我暂停一下。

Is supervised learning a difficult concept to come to? I don't know. All concepts are very easy and retrospective. Yeah, that's what it seems trivial now, but I. Because the reason I asked that and we'll talk about it, is there other things? Is there things that don't necessarily have a cost function, maybe have many cost functions, or maybe have dynamic cost functions, or maybe a totally different kind of architectures? Because we have to think like that in order to arrive at something new.

监督学习是否是一个难以理解的概念呢？我不知道。所有的概念都非常容易且回顾性地理解。是的，现在看起来似乎很琐碎，但我问这个问题并且我们将讨论它，是因为还有其他的东西吗？那些可能没有成本函数，可能有许多成本函数，或者可能有动态的成本函数，或者可能有完全不同的架构？因为我们必须这样思考才能得到新的东西。

So the good examples of things that you don't have clear cost functions again. Again, you have a game. So instead of thinking of a cost function, where you want to optimize. Where you know that you have an algorithm gradient descent, which will optimize the cost function. And then you can reason about the behavior of your system in terms of what it optimizes. With again, you say, I have a game, and I'll reason about the behavior of the system in terms of the equilibrium of the game. But it's all about coming up with these mathematical objects that help us reason about the behavior of a system.

好的例子是那些没有明确的成本函数的事情。你有一个游戏。所以，不要考虑成本函数，你可以考虑优化。你知道你有一个梯度下降算法，可以优化成本函数。然后你可以根据它所优化的内容，推理出系统的行为方式。再次强调，我有一个游戏，我会根据游戏的平衡来推理系统的行为。这一切都是关于创造这些数学对象来帮助我们推理一个系统的行为。

Right, that's really interesting. Yeah, so again, it's the only way. It's kind of a. The cost function is emergent from the comparison. I don't know if it has a cost function. I don't know if it's meaningful to talk about the cost function of again. It's kind of like the cost function of biological evolution, of the cost function of the economy. It's. You can talk about regions to which it can. We'll go towards, but I don't think. I don't think the cost function analogies are the most useful. So evolution doesn't.

嗯，那真的很有趣。是的，所以这是唯一的方法。有点像...成本函数是通过比较出现的。我不知道它是否有成本函数。我不知道再次讨论成本函数是否有意义。它有点像生物进化的成本函数,经济的成本函数。你可以谈论它可以走向的区域，但我不认为成本函数类比是最有用的。因此，进化并不...

That's really interesting. So if evolution doesn't really have a cost function, like a cost function based on it's something akin to our mathematical conception of a cost function. Then do you think cost functions in deep learning are holding us back? So you just kind of mentioned that cost function is a nice first profound idea. Do you think that's a good idea? Do you think it's an idea we'll go past?

这真的很有趣。所以如果进化真的没有成本函数，就像基于我们数学概念的成本函数那样。那么你认为深度学习中的成本函数是否在阻碍我们？你刚刚提到成本函数是一个不错的深刻的第一次想法。你认为这是一个好主意吗？你认为这是我们会超越的一个主意吗？

So self-play starts to touch on that a little bit in reinforcement learning systems. That's right. Self-play and also ideas around exploration where you're trying to take action that's a surprise a predictor. I'm a big fan of cost functions. I think cost functions are great and they service really well and I think that whenever we can do things because these cost functions we should. And you know, maybe there is a chance that we will come up with some yet another profound way of looking at things that will involve cost functions in a less central way. But I don't know. I think cost functions are. I mean. I would not bet against cost functions.

所以，自我对战开始在强化学习系统中产生了一些影响。没错。自我对战以及探索的理念，其中你试图采取一些出乎预测之外的行动。我非常喜欢成本函数。我认为成本函数非常好，能够很好地服务，我认为每当我们可以通过这些成本函数做事情时，我们就应该这样做。你知道，也许我们有机会提出另一种深刻的看待事物的方法，它将以不那么核心的方式涉及成本函数。但我不知道。我认为成本函数。我的意思是。我不会押注反对成本函数。

Is there other things about the brain that pop into your mind that might be different and interesting for us to consider in designing artificial neural networks? So we talk about spiking a little bit. I mean one thing which may potentially be useful, I think people neuroscientists have figured out something about the learning rule of the brain or I'm talking about spike time independent plasticity and it would be nice if some people would just study that in simulation.

你脑海中是否还有关于大脑的其他事情，可能对我们设计人工神经网络有所不同和有趣的考虑呢？我们来谈谈“突触前反应”这个东西。我的意思是，可能有一些人工神经网络中有用的东西，我认为神经科学家已经发现了一些关于大脑学习规则的事情，或者我正在谈论“突触前反应无关的可塑性”，如果有人可以在模拟中研究一下这个问题，那就太好了。

Wait, sorry, spike time independent plasticity? Yeah, that's a. SDD. It's a particular learning rule that uses spike time to figure out how to determine how to update the synapse. So it's kind of like if a synapse fires into the neuron before the neuron fires, then it strengthens the synapse and if the synapse fires into the neurons shortly after the neuron fires, then it weakens the synapse. Something along this line, I'm 90% sure it's right. So if I said something wrong here, don't get too angry. But you sounded really well saying it, but the timing, that's one thing that's missing the temporal dynamics is not captured. I think that's like a fundamental property of the brain is the timing of the signals. What do you have a current neural networks?

抱歉，"spike time independent plasticity"是什么？对，这是一种基于电位发放时间的特定学习规则，用于确定如何更新突触。如果一个突触在神经元发放前发放，那么它会增强突触；如果一个突触在神经元发放后不久发放，那么它会削弱突触。我大概90%确定这是正确的。如果我在这里说错话了，请不要太生气。你说得很好，但时间问题还没有解决，时间动态没有被捕捉到。我认为这是大脑的一个基本性质，即信号的时间性。你有什么现有的神经网络吗？

But you think of that as this. I mean, that's a very crude simplified, what's that called? There's a clock, I guess, to recurrent neural networks. It seems like the brain is the general, the continuous version of that, the generalization where all possible timings are possible and then within those timings, this contained some information. You think recurrent neural networks, the recurrence in recurrent neural networks can capture the same kind of phenomena as the timing that seems to be important for the brain, in the firing of neurons in the brain.

但你将那视为这样。我的意思是，这是一个非常简化的粗略描述，那是叫什么来着？我猜是类似于循环神经网络中的时钟。似乎大脑是这个的一般形式，这是一个连续的版本，在其中所有可能的时间都是可能的，然后在这些时间内，包含了一些信息。你认为循环神经网络中，循环可以捕捉到与大脑中神经元的发射时间相关的同一种现象。

I mean, I think recurrent neural networks are amazing and they can do. I think they can do anything we want them to. If we want a system to do, right now recurrent neural networks have been superseded by transformers, but maybe one day they'll make a comeback, maybe they'll be back. We'll see. Let me. in a small tangent say, do you think they'll be back?

我觉得循环神经网络很惊奇，它们能够做到很多事情。我觉得它们能够做到我们想要它们做的任何事情。如果我们需要一个系统做某些事情，现在循环神经网络已经被transformers取代，但也许有一天它们会回来，也许它们会再次出现。我们拭目以待。让我说一句小题外话，你认为它们会回来吗？

So, so much of the breakthroughs recently that we'll talk about on natural language processing and language modeling has been with transformers that don't emphasize recurrence. Do you think recurrence will make a comeback? Well, some kind of recurrence, I think, very likely.

最近在自然语言处理和语言建模方面的突破主要是通过不强调循环的变压器完成的。您认为循环会卷土重来吗？我认为某种形式的循环很可能会卷土重来。

The recurrence neural networks for pros. as they're typically thought of for processing sequences, I think it's also possible. What is to you a recurrence neural network? In generally speaking, I guess, what is a recurrence neural network? You have a neural network which maintains a high-dimensional hidden state and then when an observation arrives, it updates its high-dimensional hidden state through its connections in some way.

回归神经网络通常被认为用于处理序列，但我认为它也可能有其他用途。你认为什么是回归神经网络？一般来说，我猜，什么是回归神经网络？你有一个神经网络，维护一个高维度的隐藏状态，当接收到观测数据时，它通过某种方式更新其高维度的隐藏状态。

So do you think, you know, that's what like expert systems did, right? Symbolic AI, the knowledge-based growing a knowledge base is maintaining a hidden state, which is its knowledge base and is growing it by some question processing. Do you think of it more generally in that way or is it simply, is it the more constrained form of a hidden state with certain kind of gating units that we think of as today with LSTM's and that?

你觉得，就像专家系统所做的那样，是这样吧？符号智能、基于知识的知识库不断增长维护着隐藏状态，也是以某种问答方式增加它。你是否更普遍地这样想，或者只是今天我们认为有 LSTM 某些特定门控单元的这种隐藏状态更受限制的形式？

I mean, the hidden state is technically what you described there. The hidden state that goes inside the LSTM or the RNN or something like this. But then what should be contained, you know, if you want to make the expert system analogy, I mean, you could say that the knowledge story in the connections and then the short-term processing is done in the hidden state. Yes. Could you say that? Yes. So sort of, do you think there's a future of building large-scale knowledge bases within the neural networks? Definitely.

我的意思是，从技术上讲，隐藏状态就像你所描述的那样。它是进入LSTM或RNN等模型中的隐藏状态。但是如果想要使用专家系统的比喻，那么隐藏状态应该包含什么呢？我是说，在连接中应该包含知识信息，而短期处理则在隐藏状态中完成。是的，你可以这样说。那么，你认为在神经网络中构建大规模的知识库有前途吗？当然有。

So we're going to pause on that confidence because I want to explore that. Well, let me zoom back out and ask back to the history of image net. Neural networks have been around for many decades, as you mentioned. What do you think were the key ideas that led to their success, that image net moment and beyond the success in the past 10 years?

所以我们会暂停对于这种信心的探讨，因为我想要深入探究一下。好的，让我回到 ImageNet 的历史问题上。正如您之前提到的，神经网络已经存在了很多年。您认为是什么关键的想法导致了它们的成功，包括 ImageNet 时刻以及过去10年的成功？

Okay. So the question is to make sure I didn't miss anything. The key ideas that led to the success of deep learning over the past 10 years. Exactly. Even though the fundamental thing behind deep learning has been around for much longer.

好的。那么问题是要确保我没有漏掉什么。过去10年中导致深度学习成功的关键思想。确切的说，尽管深度学习背后的基本原理已经存在了很久。

So the key idea about deep learning or rather the key fact about deep learning before deep learning started to be successful is that it was underestimated. People who worked in machine learning simply didn't think that new neural networks could do much. People didn't believe that large neural networks could be trained. People thought that, well, there was lots of debate going on in machine learning about what are the right methods and so on. And people were arguing because there were no, there was no way to get hard facts.

所以，深度学习的关键思想或者说深度学习在成功之前的关键事实是它被低估了。机器学习领域的人们根本不觉得新的神经网络能做出多大贡献。人们不相信可以训练大型神经网络。人们认为，在机器学习领域存在许多关于哪种方法才是正确的的争论。而且人们一直在争论，因为没有硬性的事实。

And by that, I mean, there were no benchmarks which were truly hard. That if you do really well on them, then you can say, look, here is my system. That's when you switch from that's when this field becomes a little bit more of an engineering field.

我的意思是，没有真正难的基准。如果你在这些基准上表现得非常好，那么你就可以说，这是我的系统。这时候，你就从这个领域转变成了一个更加工程化的领域。

In terms of deep learning to answer the question directly, the ideas were all there. The thing that was missing was a lot of supervised data and a lot of compute. Once you have a lot of supervised data and a lot of compute, then there is a third image as needed as well. And that is conviction. Conviction that if you take the right stuff, it already exists and apply and mixed with a lot of data and a lot of compute, that it will in fact work. And so that was the missing piece. It was you had the units, the data, you needed the compute, which showed up in terms of GPUs, and you needed the conviction to realize that you need to mix them together.

就深度学习而言，回答问题时所需的想法已经全部有了。缺少的是大量的监督数据和大量的计算能力。一旦你拥有了大量的监督数据和大量的计算能力，那么还需要第三个和必要的东西。那就是信念。信念就是，如果你采取正确的东西，它已经存在了，应用并与大量的数据和计算混合在一起，它确实会起作用。所以这就是缺失的一部分。你已经有了单元、数据，你需要计算，在GPU的形式下展现，你需要信念来意识到你需要将它们混合在一起。

So that's really interesting. So I guess the presence of compute and the presence supervised data allowed the empirical evidence to do the convincing of the majority of the computer science community. So I guess there's a key moment with a a Jitendra, Mali, and Alex, Alisha, Eiffros, who were very skeptical. And then there's a Jeffrey Hinton that was the opposite of skeptical. And there was a convincing moment. And I think Emission had served as that moment.

那真的很有趣。我想，计算机和监督数据的存在使得大多数计算机科学界的经验证据能够令人信服。我猜，在Jitendra、Mali和Alex、Alisha、Eiffros这个小团队中，有一个关键的时刻，他们非常怀疑。但是Jeffrey Hinton的态度相反，很有说服力。我认为Emission就是那个关键时刻。

That's right. And they represented this kind of, or the big pillars of computer vision community, kind of the wizards got together. And then all of a sudden there was a shift. And it's not enough for the ideas to all be there and the compute to be there. For it to convince the cynicism that existed.

对的。他们代表了计算机视觉社区的这种或类似的主要柱子。这些巫师聚集在一起。突然之间，发生了一种变化。想法和计算的存在不足以说服存在的怀疑。

That's interesting. The people just didn't believe for a couple of decades.

那很有趣啊。这些人仅仅在几十年的时间里就不相信了。

Yeah, well, but it was more than that. It's kind of, when put this way, it sounds like, well, you know, those silly people who didn't believe what were they missing. But in reality, things were confusing because neural networks really did not work on anything. And they were not the best method on pretty much anything as well. And it was pretty rational to say, yeah, this stuff doesn't have any traction. And that's why you need to have these very hard tasks, which are which produce undeniable evidence. And that's how we make progress.

嗯，但它不仅仅是那样。这种说法有点像你知道的那些傻人不信任他们错过了什么。但事实上，情况很混乱，因为神经网络真的无法在任何事情上起作用。它们在许多东西上也不是最好的方法。说这种东西没有任何实际作用是相当合理的。这就是为什么你需要这些非常困难的任务，它们会产生无可置疑的证据。这就是我们取得进步的方式。

And that's why the field is making progress today because we have these hard benchmarks, which represent true progress. And so, and this is why we are able to avoid endless debate.

这就是为什么领域正在取得进展的原因，因为我们有这些艰难的基准，它们代表真正的进步。这也是我们能够避免无休止辩论的原因。

So incredibly, you've contributed some of the biggest recent ideas in AI in computer vision, language, natural language processing, reinforcement learning, sort of everything in between. Maybe not GANS.

你做出了一些近期最重大的AI想法，覆盖了计算机视觉、语言、自然语言处理、强化学习，以及中间的所有领域。或许GANs除外。

Is there, there may not be a topic you haven't touched. And of course, the fundamental science of deep learning.

有可能你没有碰过的话题是存在的。当然，还有深度学习的基础科学。

What is the difference to you between vision, language, and as in reinforcement learning action, as learning problems, and what are the commonalities? Do you see them as all interconnected? Are they fundamentally different domains that require different approaches?

对你来说，视觉、语言和行动（例如强化学习中的行为）在学习问题中有何区别，它们之间有哪些共同点？你认为它们是全部相互关联的吗？或者说它们是需要不同方法的根本不同的领域？

Okay, that's a good question. Machine learning is a field with a lot of unity, a huge amount of unity. In fact, what do you mean by unity? Like overlap of ideas? Overlap of ideas, overlap of principles. In fact, there is only one or two or three principles, which are very, very simple. And then they apply in almost the same way, in almost the same way, to the different modalities, to the different problems.

好的，这是个很好的问题。机器学习是一个非常团结的领域，有很多团结。实际上，你说的团结是什么意思？是思想的重叠吗？思想和原则的重叠。事实上，这里只有一两个或三个非常简单的原则。然后它们会以几乎相同的方式应用于不同的模式和不同的问题。

And that's why today, when someone writes a paper on improving optimization of deep learning and vision, it improves the different NLP applications, and it improves the different reinforcement learning applications.

所以今天，当有人写一篇关于提高深度学习和视觉的优化的论文时，它能够提高不同的自然语言处理应用和不同的强化学习应用。

Reinforcement learning, so I would say that computer vision and NLP are very similar to each other. Today, they differ in that they have slightly different architectures. We use transformers in NLP, and we use convolutional neural networks in vision. But it's also possible that one day, this will change, and everything will be unify with a single architecture.

强化学习，所以我会说计算机视觉和自然语言处理非常相似。今天，它们的差异在于它们有些不同的架构。我们在自然语言处理中使用转换器，而在视觉中使用卷积神经网络。但是也有可能，有一天这种情况会改变，一切都会统一到一个架构中。

Because if you go back a few years ago in natural language processing, there were a huge number of architectures for every different tiny problem had its own architecture. Today, there's just one transformer for all those different tasks. And if you go back in time, even more, you had even more and more fragmentation, and every little problem in AI had its own little sub-specialization and sub- you know, little set of collection of skills, people who would know how to engineer the features. Now it's all been subsumed by deep learning. We have this unification.

因为如果我们回到自然语言处理的几年前，每个小问题都有自己的大量架构。今天，有一个转换器来解决这些不同的任务。而如果你回到更早的时候，你会看到更多的细分和人工智能中每个小问题都有自己的小领域和技能集合。现在，所有的这些都被深度学习所包容。我们有了这种统一。

And so I expect vision to become unified with natural language as well. Or rather, I'll turn say expect. I think it's possible. I don't want to be too sure because I think on the commercial neural networks, it's very computationally efficient. Arrayal is different. Arrayal does require slightly different techniques because you really do need to take action. You really need to do something about exploration. Your variance is much higher. But I think there is a lot of unity even there. And I would expect, for example, at some point, there will be some broader unification between Arrayal and supervised learning, where somehow the Arrayal will be making decisions to make the supervised learning go better. And it will be, I imagine one big black box, and you just throw you know, you shovel, shovel things into it and it just figures out what to do with whatever you shovel at it.

所以我期望视觉能够与自然语言统一起来。或者说，我会说"期望"。我认为这是有可能的。但我不想太肯定，因为在商业神经网络领域，计算效率非常高。但Arrayal不同。Arrayal需要稍微不同的技术，因为你确实需要采取行动。你真的需要对探索采取一些措施。你的方差会更高。但我认为即使在那里也有很多统一性。例如，我期望在某些时候，Arrayal与监督学习之间会有更广泛的统一，其中Arrayal将决策来使监督学习更好。我想这将成为一个大黑匣子，你只需要把东西扔进去，它就会判断如何处理你扔进去的东西。

I mean, reinforcement learning has some aspects of language and vision combined almost. There's elements of a long term memory that you should be utilizing and there's elements of a really rich sensory space. So it seems like the, it's like the union of the two or something like that.

我的意思是，强化学习几乎结合了语言和视觉的某些方面。你应该利用长期记忆的元素和极其丰富的感官空间元素。所以它好像是两者的结合或者什么的。

I'd say something slightly differently. I'd say that reinforcement learning is neither, but it naturally interfaces and integrates with the two of them. You think action is fundamentally different?

我觉得应该稍微换个说法。我觉得强化学习不是两者中的任何一个，而是自然而然地与它们进行接口和整合。你觉得行动是基本不同的吗？

So yeah, what is interesting about what is unique about policy of learning to act? Well, so one example, for instance, is that when you learn to act, you are fundamentally in a non-stationary world. Because as your actions change, the things you see start changing. You experience the world in a different way. And this is not the case for the more traditional static problem, where you have a distribution and you just apply a model to that distribution. You think it's a fundamentally different problem or is it just a more difficult generalization of the problem of understanding? I mean, it's a question of definitions almost.

嗯，关于学习行动政策的独特之处有什么有趣的吗？嗯，例如有一种情况，当你学习行为时，你基本上处于一个非静态的世界中。因为当你的行为改变时，你看到的事物也会随之改变。你会以不同的方式体验这个世界。而且，这不是传统静态问题的情况，其中你有一个分布，只需将模型应用于该分布。你认为这是一个根本不同的问题，还是理解问题的更困难的一般化问题？我是说，这几乎是一个定义问题。

There is a huge amount of commonality for sure. You take gradients, you take gradients, we try to approximate gradients in both cases. In some case, in the case of reinforcement learning, you have some tools to reduce the variance of the gradients. You do that. There's lots of commonality. You use the same neural net in both cases. You compute the gradient, you apply item in both cases. So I mean, there's lots in common for sure, but there are some small differences which are not completely insignificant.

当然有很多共同点。无论是在求梯度还是近似梯度上，我们都在努力。在强化学习的情况下，有一些工具可以减少梯度方差，我们会使用这些工具。在这两种情况下，都使用相同的神经网络，计算梯度并应用项目。所以，确实有很多共同之处，但也有一些微小的差异，这些差异并不完全无关紧要。

It's really just the matter of your point of view, what frame of reference, how much do you want to zoom in or out as you look at these problems? Which problem do you think is harder? So people like Noam Chomsky believe their language is fundamental to everything. So it underlies everything. Do you think language understanding is harder than visual scene understanding or vice versa?

其实只是看问题的角度、参考框架和你想要如何缩放来看待这些问题。你认为哪一个问题会更难一些呢？像诺姆·乔姆斯基这样的人相信他们的语言是一切的基础，它潜藏在一切之下。你认为理解语言比理解视觉场景更难，还是反之？

I think that asking if a problem is hard is likely wrong. I think the question is a little bit wrong and I want to explain why. So what does it mean for a problem to be hard? Okay, the non-interesting dumb answer to that is there's a benchmark and there's a human level performance on that benchmark and how is the effort required to reach the human level so from the perspective of how much until we get to human level on a very good benchmark.

我认为问“问题难不难”可能是错误的。我觉得这个问题有点不太对，我想解释一下。那么，一个问题难是什么意思呢？好吧，那个无趣愚蠢的回答是有一个标准，有一个人类水平的表现，以及到达人类水平所需的努力程度，从多少努力到达一个非常好的标准的人类水平的角度来看。

Yeah, like some I understand what you mean by that. So what I was going to say that a lot of it depends on, once you solve a problem, it stops being hard. That's always true. But something is hard and not depends on what a tool can do today. So you know, you say today, true human level, language understanding and visual perception are hard in the sense that there is no way of solving the problem completely in the next three months. So I agree with that statement. Beyond that, my guess would be as good as yours, I don't know.

嗯，像一些人一样，我明白你的意思。所以我想说的是，很多事情取决于你解决了什么问题，问题解决了，它就不再难了。这是一直都是真的。但有些问题是否困难并不取决于工具今天能做到什么。你知道，你说今天，真正的人类水平，语言理解和视觉感知都很困难，因为在接下来的三个月内完全解决这个问题是不可能的。所以我同意这种说法。除此之外，我的猜测与你一样好，我不知道。

Okay, so you don't have fundamental intuition about how hard language understanding is. I think I know I changed my mind. I'd say language is probably going to be hard. I mean, it depends on how you define it. If you mean absolute top notch 100% language understanding, I'll go with language. But then if I show you a piece of paper with letters on it, is that you see what I mean? You have a vision system. You say it's the best human level vision system. I show you, I open a book and I show you letters. If you let understand how these letters form into word incidences and meaning is this part of the vision problem, where does vision end and language begin?

好的，所以你并没有对语言理解的难度有基本直觉。我认为我改变了主意。我会说语言可能会很难。我的意思是，这取决于你的定义。如果你的意思是绝对顶尖的100%语言理解能力，我会选择语言。但是如果我给你一张有字母的纸，你看到了什么？你有一个视觉系统。你说它是最好的人类级视觉系统。我给你看一本书，我向你展示字母。如果你能理解这些字母如何形成单词和意义，这是否是视觉问题的一部分，那么视觉在哪里结束，语言开始？

Yeah, so Chomsky would say it starts at language. So vision is just a little example of the kind of a structure and you know, fundamental hierarchy of ideas that's already represented in our brains somehow that's represented through language. But where does vision stop and language begin? And that's a really interesting question. So one possibility is that it's impossible to achieve really deep understanding in either images or language without basically using the same kind of system. So you're going to get the other for free. I think it's pretty likely that yes, if we can get one, we probably, our machine learning is probably that good that we can get the other. But it's not one honey. I'm not one honey with the insurer.

嗯，乔姆斯基会说它始于语言。因此，视力只是一个结构和基本层次的小例子，它们已经以某种方式通过语言在我们的大脑中表示出来。但视力在哪里停止，语言始于何处？这真是一个非常有趣的问题。因此，一个可能性是，在图像或语言中，如果没有基本相同的系统，就不可能实现真正深入的理解。因此，您将免费获取另一个。我认为很有可能，如果我们能够得到其中一个，我们的机器学习也很好，我们也能够得到另一个。但我不是一个甜蜜蜜的人，也不是一个保险商。

And also, I think a lot, a lot of it really does depend on your definitions. Definitions of like perfect vision because reading, you know, reading is vision, but should it count? Yeah, to me, so my definition is of a system looked at an image and then a system looked at a piece of text and then told me something about that. And I was really impressed. That's relative. You'll be impressed for half an hour and then you're going to say, well, I mean, all the systems do that. But here's the thing they don't do. Yeah, but I don't have that with humans. Humans continue to impress me.

我认为许多问题其实取决于你如何定义它。比如，完美视力对于阅读来说就是视觉，但这应该算吗？对我来说，我的定义是系统查看一张图片，然后查看一段文本，然后告诉我一些有关信息。我很受触动。不过这是比较相对的，你可能会被吸引半个小时，然后你就会说，我觉得所有的系统都这样吧。但是这里有一个不同的方面，那就是我对人类也没有这种观感。人类一直都让我感到敬佩。

Is that true? Well, the ones, okay, so I'm a fan of monogamy. So I like the idea of marrying somebody being with them for several decades. So I believe in the fact that yes, it's possible to have somebody continuously giving you pleasurable, interesting, witty, new ideas, friends. Yeah, I think so. They continue to surprise you. The surprise, it's, you know, that injection of randomness seems to be a nice source of continued inspiration, like the width, the humor. I think, yeah, that that would be, it's a very subjective test, but I think if you have enough humans in the room, yeah, I understand what you mean.

这是真的吗？好的，那么我是一位单一伴侣制的粉丝。我喜欢与某人结婚并与其共度数十年的想法。因此，我相信事实是可以不断地有某人给你带来愉悦、有趣、机智、新颖的想法和朋友的。是的，我认为他们会不断给你带来惊喜。这些惊喜是随机性注入的，似乎是持续灵感的不错来源，比如幽默感。我认为，这是一个非常主观的测试，但如果有足够的人在场，是可以理解你的意思的。

Yeah, I feel like I misunderstood what you meant by impressing you. I thought you meant to impress you with its intelligence, with how well it understands an image. I thought you meant something like, I'm going to show it a really complicated image and it's going to get it right and you're going to say, wow, that's really cool, a system of, you know, a January to any, to any have not been doing that.

是的，我感觉我误解了你对于令你印象深刻的意思。我以为你的意思是通过智能程度来让你刮目相看，比如说，它有多么能够理解一张图片。我以为你的意思是要展示一个非常复杂的图片给它，它会得出正确的答案，而你会说，“哇，这太酷了，一个系统从1月到现在都没有做到这个程度。”

Yeah, now I think it all boils down to like the reason people click like on stuff on the internet, which is like it makes them laugh. So it's like humor or wit or insight. I'm sure we'll get it as get that as well.

嗯，我现在觉得最终问题集中在人们在互联网上点赞的原因，这是因为那些东西让他们笑了。所以就是幽默或机智或洞见。我相信我们也可以理解并掌握这些。

So forgive the romanticized question, but looking back to you, what is the most beautiful or surprising idea and deep learning or AI in general you've come across?

所以请原谅我浪漫化的问题，但回顾你，你遇到的最美丽或令人惊讶的深度学习或人工智能理念是什么？

So I think the most beautiful thing about deep learning is that it actually works. And I mean it because you got these ideas, you got a little neural network, you got the back propagation algorithm, and then you got some theories as to, you know, this is kind of like the brain. So maybe if you make it large, if you make the neural network large in a train, there's a lot of data, then it will do the same function in the brain. And it turns out to be true. That's crazy. And now we just train these neural networks and you make them larger and they keep getting better. And I find it unbelievable. I find it unbelievable that this holy eye stuff with neural networks works.

我认为深度学习最美妙的事情在于它真的奏效了。我的意思是，你拥有这些想法，你有一个小小的神经网络，你有反向传播算法，你还有一些有关这就像是大脑的理论。所以，也许如果你把神经网络变得足够大，再加上很多数据，它就会在大脑中执行同样的功能。而这却被证明是真的。这太疯狂了。现在，我们只需要训练这些神经网络，让它们变得更大，它们就会变得越来越好。我发现这一切都令人难以置信。令我难以置信的是，这种神经网络的「神圣眼」真的奏效了。

Have you built up an intuition of why are there a lot of bits and pieces of intuitions of insights of why this whole thing works? I mean, some's definitely, while we know that optimization, we now have good, you know, we've had lots of empirical, you know, huge amounts of empirical reasons to believe that optimization should work on all most problems we care about.

你是否建立过对这整个系统为什么会有效果的洞察力的种种碎片的直觉？我的意思是，我们现在有了很多经验性的原因来相信优化在我们关心的大多数问题上应该有效，其中一些绝对是确定的。

Do you have insights of what?

你有什么见解？

So you just said empirical evidence is most of your sort of empirical evidence kind of convinces you. It's like evolution is empirical. It shows you that, look, this evolutionary process seems to be a good way to design organisms that survive in their environment. But it doesn't really get you to the insights of how the whole thing works.

你刚刚说了经验证据是你大部分的经验证据，这种经验让你相信它。就像进化一样是经验性的。它向你展示了这种进化过程似乎是一种设计生物在其环境中生存的好方法。但它实际上并不能让你深入了解整个过程的内部机制。

Well, I think it's a good analogy is physics. You know how you say, hey, let's do some physics calculation and come on for some new physics theory and make some prediction. But then you got around the experiment. You know, you got around the experiment. It's important. So it's a bit the same here, except that maybe sometimes the experiment came before the theory. But it still is the case. You know, you have some data and you come up with some prediction. So yeah, let's make a big neural network. Let's train it and it's going to work much better than anything before it and it will in fact continue to get better as a make it larger. And it turns out to be true. That's amazing when a theory is valid like this. You know, it's not a mathematical theory. It's more of a biological theory almost. So I think there are not terrible analogies between deep learning and biology. I would say it's like the geometric mean of biology and physics. That's deep learning.

我认为这个比喻跟物理学有些类似。你知道当你说：“嘿，我们来做些物理计算吧，然后为一些新的物理理论做一些预测。”但是然后你需要进行一些实验。你知道的，实验非常重要。这里也有一些相似之处，只是有时候实验可能在理论之前执行。但是这实际上仍旧是真实的。你知道，有了一些数据以后，你可以得到一些预测。因此，我们可以建立一个大型的神经网络。让它进行训练，并且它会比以前任何的东西表现得更好，并且随着它变得更大，它实际上也会变得更好。这实际上是真实的，当一个理论是有用的时候，那是非常令人惊叹的。这不是一个数学理论，几乎是一种生物学理论。所以我认为深度学习和生物学之间有一些相似之处。我会说它像是生物学和物理学的几何平均。这就是深度学习。

The geometric mean of biology and physics. I think I'm going to need a few hours to wrap my head around that. Just to find the geometric just to find the set of what biology represents.

生物学和物理学的几何平均值。我觉得我需要几个小时来理解它。只是为了找出几何平均值，找到生物学所代表的集合。

Biology, in biology things are really complicated in the years. They're really, really it's really hard to have good predictive theory. And in physics, the theory is too good. In physics, people make the super precise theory. It's making these amazing predictions. And in machine learning, we're kind of in between. Kind of in between. But it'd be nice if machine learning somehow helped us discover the unification of the two as opposed to serve the in between. But you're right. That's you're kind of trying to juggle both.

在生物学中，有些事情在这些年里真的很复杂。它们真的很，很难有一个好的预测理论。而对于物理学来说，理论则太好了。在物理学中，人们制定了超精确的理论。这些理论能够做出惊人的预测。而在机器学习中，我们有点处于中间。有点处于中间。但如果机器学习能够帮助我们发现两者的统一，而不是仅仅处于中间状态，那会很好。但你是对的。你正在试图权衡这两方面。

So do you think there are still beautiful and mysterious properties in your networks there yet to be discovered?

那么，您认为您的网络中是否仍存在尚未被发现的美丽和神秘属性呢？

Definitely. I think that we are still massively underestimating deep learning. What do you think it will look like? Like what? I knew I would have died. But if you look at all the progress from the past 10 years, I would say most of it, I would say there've been a few cases where some were things that felt like really new ideas showed up. But by and large, it was every year we thought, okay, deep learning goes this far. Nope, it actually goes further. And then the next year, okay, now this is this is big deep learning. We are really done. Nope, goes further. It just keeps going further each year. So that means that we keep underestimating. We keep not understanding it. The surprising properties all the time.

我觉得我们仍然非常低估深度学习。你认为它会是什么样子？什么样子？我知道我会死的。但是如果你看过过去10年的所有进展，我会说，大部分进展，有一些新颖的想法确实出现了，但总的来说，每年我们都认为深度学习可以到达这个程度。不，它实际上可以更进一步。然后第二年，好的，现在这是真正的大深度学习了，我们完全完成了。不，它更加深入了。它每年都在不断深入。所以这意味着我们一直低估它，我们一直没有理解它，总是发现它有惊人的属性。

Do you think it's getting harder and harder to make progress?

你觉得取得进步越来越难了吗？

Need to make progress. It depends on what we mean.

需要取得进展。这取决于我们所指的是什么。

I think the field will continue to make very robust progress for quite a while.

我认为这个领域在相当长一段时间内将继续取得非常强劲的进展。

I think for individual researchers, especially people who are doing research, it can be harder because there is a very large number of researchers right now.

我认为对于个别研究人员，特别是正在进行研究的人，可能更难，因为目前有非常多的研究人员。

I think that if you have a lot of compute, then you can make a lot of very interesting discoveries, but then you have to deal with the challenge of managing a huge compute cluster through your experiments.

我认为，如果你有很多计算机，你就能做出很多非常有趣的发现，但接着你就需要面对管理一个巨大计算机集群的挑战，来开展你的实验。

It's a little bit harder.

这有点难一点。

So I'm asking all these questions that nobody knows the answer to, but you're one of the smartest people I know. So I think keep asking.

我在问一些没有答案的问题，但你是我认识的最聪明的人之一，所以我觉得还是继续问吧。

So let's imagine all the breakthroughs that happen in the next 30 years in deep learning. Do you think most of those breakthroughs can be done by one person with one computer? Sort of in the space of breakthroughs, do you think compute will be compute and large efforts will be necessary?

那么假设在未来30年中深度学习会有很多突破。你认为这些突破大多数都可以由一个人和一台电脑完成吗？在这样的突破领域中，你觉得计算能力是否足够？是否需要大力投入？

I mean, it can't be sure.

我的意思是，不能确定。

When you say one computer, you mean how large?

当你说一个电脑时，你指的是多大的电脑？

You're clever. I mean, one GPU.

你很聪明，我的意思是，只需一张GPU。

I see. I think it's pretty unlikely.

我知道了，我认为这是不太可能的。

I think that there are many, the stack of deep learning is starting to be quite deep. If you look at it, you've got all the way from the ideas, the systems to build the data sets that distributed programming, the building, the actual cluster, the GPU programming, putting it all together.

我认为现在有很多深度学习方面的技术，它们构成了一个相当深的技术堆栈。如果你看一下，这个堆栈包含了从构思、系统开发、数据集构建到分布式编程、构建反馈、实际集群部署、GPU编程的全过程，需要将所有这些部分组合起来。

So now the stack is getting really deep and I think it can be quite hard for a single person to become, to be world class in every single layer of the stack.

现在栈越来越深了，我认为一个单独的人想要在栈的每个层面都达到世界级水平可能会很困难。

What about what like Vladimir Vapnik really insists on is taking MNIST and trying to learn from very few examples. So being able to learn more efficiently. Do you think that's there'll be breakthroughs in that space that would may not need the huge compute?

弗拉基米尔·瓦普尼克真正强调的是拿MNIST数据集去尝试从很少的例子中学习，以达到更高效的学习效果。您认为在这个领域会有突破，而不需要大量计算吗？

I think there will be a large number of breakthroughs in general that will not need a huge amount of compute.

我认为普遍情况下会有很多突破，而这些突破并不需要大量的计算。

So I maybe I should clarify that. I think that some breakthroughs will require a lot of compute and I think building systems which actually do things will require a huge amount of compute. That one is pretty obvious. If you want to do X and extra-crime the huge neural net, you've got to get a huge neural net.

所以我或许应该澄清一下。我认为某些突破性进展会需要大量计算，而且构建真正有用的系统也需要大量计算。这一点相当明显。如果你想要做X并训练一架巨型神经网络，那你就必须拥有一架巨型神经网络。

But I think there will be lots of, I think there is lots of room for very important work being done by small groups and individuals.

但我认为小组和个人有很多机会去做非常重要的工作，我认为这个领域还有很大的空间。

Can you maybe sort of on the topic of the science that's deep learning? Talk about one of the recent papers that you've released, the deep double descent, where bigger models and more data hurt. I think it's a really interesting paper. Can you describe the main idea?

你能不能在谈论深度学习这个话题上，简要地介绍一下你们最近发布的一篇文章——深度双重下降，即更大的模型和更多的数据对模型的性能产生负面影响。我觉得这篇文章非常有趣。你能描述一下主要思想吗？

Yeah, definitely. So what happened is that some over the years, some small number of researchers noticed that it is kind of weird that when you make the neural network logic, it works better and it seems to go in contradiction with statistical ideas.

是的，绝对是这样的。发生的事情是，多年来一些研究者注意到了一个有点奇怪的现象，就是当你制作神经网络逻辑时，它的表现更好，但似乎与统计学的想法相悖。

And then some people made an analysis showing that actually you got this double descent bump. And what we've done was to show that double descent occurs for pretty much all practical deep learning systems.

然后一些人进行了分析，显示实际上你得到了这种双下降的提升。我们所做的是表明几乎所有实用的深度学习系统都存在双下降。

And that it will be also, so can you step back? What's the X axis and the Y axis of a double descent plot?

那就是说，你能退后一步吗？双世系图中的X轴和Y轴是什么？

Okay, great.

好的，太棒了。

So you can look, you can do things like you can take a neural network and you can start increasing its size slowly while keeping your data set fixed.

所以你可以看一下，你可以做一些事情，比如可以拿一个神经网络，然后慢慢扩大它的规模，同时保持你的数据集不变。

So if you increase the size of the neural network slowly and if you don't do early stopping, that's a pretty important detail.

如果您慢慢增加神经网络的大小，而且不采取早期停止的方法，那就是一个非常重要的细节。

Then when the neural network is really small, you make it larger. You get a very rapid increase in performance.

当神经网络很小的时候，你把它变大。这样可以快速提高性能。

Then you continue to make it larger. At some point, performance will get worse.

然后你继续把它扩大。在某一点上，性能会变差。

And it gets the worst exactly at the point at which it achieves zero training error, precisely zero training loss.

当它达到零训练误差，即完全零的训练损失时，情况就会变得最糟。

And then as you make it larger, it starts to get better again.

当你把它变得更大，它又开始变得更好了。

And it's kind of counterintuitive because you'd expect deep learning phenomena to be monotonic.

这有点违反直觉，因为你会期望深度学习现象是单调的。

And it's hard to be sure what it means, but it also occurs in the case of linear classifiers.

虽然不确定它的含义，但它也在线性分类器的情况下发生，这是很难确定的。

And the intuition basically boils down to the following. When you have a lot, when you have a large data set and a small model, then small tiny random, so basically what is overfitting?

直觉基本上可以归结为以下。当你拥有大量的数据集和一个小模型时，小小的随机性就会过拟合了，所以什么是过拟合呢？

Overfitting is when your model is somehow very sensitive to the small random unimportant stuff in your data set, in the training data set, precisely.

当你的模型对于数据集中的小随机无关紧要的内容非常敏感时，就会出现过拟合现象，尤其是在训练数据集中。

So if you have a small model and you have a big data set, and there may be some randoms that some training cases are randomly in the data set.

如果你拥有一个小模型和一个大数据集，可能会有一些随机因素导致某些训练案例随机地被包含在数据集中。

And others may not be there. But the small model is kind of insensitive to this randomness because it's the same.

还有一些人可能不会在那里。但是小模型对这种随机性有点麻木，因为它是相同的。

There is pretty much no uncertainty about the model when the data set is large.

当数据集很大时，模型基本上是没有任何不确定性的。

So, okay, so at the very basic level to me, it is the most surprising thing that neural networks don't overfit every time very quickly before ever being able to learn anything, the huge number of parameters.

所以，基本上对我来说，神经网络不会在学习任何东西之前非常迅速地过度拟合，这是最令人惊讶的事情，因为其参数数量非常巨大。

So here is, so there is one way.

这么说吧，有一种方法。

Okay, so maybe, let me try to give the explanation, maybe that will work. So you got a huge neural network. Let's suppose you got a you have a huge neural network, you have a huge number of parameters.

好的，那么，让我试着解释一下，也许那会起作用。你有一个巨大的神经网络。假设你有一个巨大的神经网络，有很多参数。

And now let's pretend everything is linear, which is not, let's just pretend. Then there is this big subspace, we bring a new network achieve zero error. And as GT is going to find approximately the point that's right. Approximately the point with the smallest norm in that subspace.

现在假设一切都是线性的，虽然实际情况并非如此，我们就这样假设吧。那么存在一个大的子空间，我们引入了一个新的网络，实现了零误差。由于 GT（注：可能是某个技术术语）将近似找到正确的点。这个点大致位于该子空间中具有最小规范的位置。

Okay, and that can also be proven to be insensitive to the small randomness in the data when the dimensionality is high. But when the dimensionality of the data is equal to the dimensionality of the model, then there is a one-to-one correspondence between all the data sets and the models.

嗯，当数据的维度很高时，这还可以证明对小的随机性变得不敏感。但是当数据的维度等于模型的维度时，所有数据集和模型之间存在一一对应的关系。

So small changes in the data set actually lead to large changes in the model, and that's why performance gets worse. So this is the best explanation, more or less.

所以，数据集中的微小变化实际上会导致模型的大幅变化，这就是性能变差的原因。这大概是最好的解释了。

So then it would be good for the model to have more parameters, so to be bigger than the data. That's right. But only if you don't really stop. If you introduce early stop in your regularization, you can make the double descent bump almost completely disappear.

那么，如果模型有更多的参数，比数据还要更大，那会更好。没错。但前提是你不能真正停下来。如果你在正则化过程中引入早期停止，你几乎可以完全消除双下降峰现象。

What is early stop? Early stop is when you train your model and you monitor your test evaluation performance. And then if at some point validation performance starts to get worse, you say, okay, let's stop training. You're good, you're good, you're good enough. So the magic happens after after that moment, so you don't want to do the early stopping. Well, if you don't do the early stop in you get this very, you get the very pronounced double descent.

早期停止是什么？早期停止是当你训练你的模型并监控你的测试评估表现时，如果某个时刻验证表现开始变差，你就会说，好了，让我们停止训练。你已经足够好了。所以，魔术发生在那之后，所以你不想做早期停止。如果你不进行早期停止，你会得到非常明显的双下降。

Do you have any intuition why this happens? Double descent? Or sorry, a stopping? No, the double descent. So the, oh yeah, so I try, let's see. The intuition is basically is this that when the data set has as many degrees of freedom as the model then there is a one-to-one correspondence between them.

你有没有直觉为什么会这样发生？是双重下降吗？或者，抱歉，是停止吗？不，是双重下降。所以，嗯，我尝试一下。直觉基本上就是当数据集具有与模型一样多的自由度时，它们之间存在一对一的对应关系。

And so small changes to the data set lead to noticeable changes in the model. So your model is very sensitive to all the randomness. It is unable to discard it. Whereas it turns out that when you have a lot more data than parameters or a lot more parameters than data, the resulting solution will be insensitive to small changes in the data set.

因此，数据集的微小变化会导致模型明显变化。因此，您的模型对所有随机因素都非常敏感，无法忽略它们。而事实证明，当您拥有比参数更多的数据或比数据更多的参数时，得到的解决方案将不会对数据集的微小变化敏感。

So it's able to, let's nicely put this card, the small changes, the, the randomness. Exactly. The, the, the, the spurious correlation if you don't want.

所以它能够很好地处理这张牌——小的变化、随机性。确实如此。如果你不想要，它也可以避免误导性相关性。

Jeff Hinton suggested we need to throw back propagation. We already kind of talked about this a little bit, but he suggested we need to throw away back propagation and start over. I mean, of course some of that is a little bit, um, wit and humor. But what do you think? What could be an alternative method of training neural networks?

杰夫·辛顿建议我们需要放弃反向传播。我们已经稍微谈论过这个问题，但他建议我们需要放弃反向传播并重新开始。当然，其中一部分是幽默和机智。但你认为呢？训练神经网络的备选方法是什么？

Well, the thing that he said precisely is that to the extent that you can't find back propagation in the brain, it's worth seeing if we can learn something from how the brain learns. But back propagation is very useful and we should keep using it. Oh, you're saying that once we discover the mechanism of learning in the brain or any aspects of that mechanism, we should also try to implement that in your network.

嗯，他准确的说，只要在大脑中找不到反向传播，我们就值得探究一下如何从大脑学习一些东西。但是反向传播非常有用，我们应该继续使用它。哦，你的意思是，一旦我们发现大脑学习机制或该机制的任何方面，我们也应该尝试在你的网络中实现它。

If it turns out that you can't find back propagation in the brain, if we can't find back propagation in the brain. Well, so I guess your answer to that is back propagation is pretty damn useful. So why are we complaining? I mean, I personally am a big fan of back propagation.

如果我们最终发现大脑中没有反向传播，如果我们无法在大脑中找到反向传播。嗯，那么我猜你对此的答案就是反向传播非常有用。那么我们为什么要抱怨呢？我的意思是，我个人非常喜欢反向传播。

I think it's a great algorithm because it solves an extremely fundamental problem which is finding a neural circuit subject to some constraints. And I don't see that problem going away. So that's why I, I really, I think it's pretty unlikely that we'll have anything which is going to be dramatically different. It could happen, but I wouldn't bet on it right now.

我认为这是一个很棒的算法，因为它解决了一个非常基本的问题，即寻找符合一些限制的神经回路。我认为这个问题不会轻易消失。因此，我真的觉得很不可能有什么东西会戏剧性地不同。虽然有可能，但现在我不会打赌。

So let me ask a sort of big picture question. Do you think can, do you think neural networks can be made to reason? Why not? Well, if you look, for example, at alpha go or alpha zero, the neural network of alpha zero plays go, which we all agree is a game that requires reasoning better than 99.9% of all humans.

那么让我问一个更宏观的问题。你认为神经网络能否被制造成能够推理的吗？为什么不行呢？好吧，如果你看看 alpha go 或 alpha zero，它们的神经网络在下围棋方面表现得比99.9%的人类更好，而我们都认为下围棋需要推理能力。

Just the neural network without this search, just the neural network itself. Doesn't that give us an existence proof that neural networks can reason? To push back and disagree a little bit, we all agree that go is reasoning. I think I agree, I don't think it's a trivial. So obviously reasoning like intelligence is a loose gray area term a little bit.

仅有神经网络本身，不需要这个搜索，这不是给我们证明神经网络可以进行推理吗？然而有些人可能会提出异议，我们都认为围棋是推理的。我认为我同意，但我不认为这是一件琐碎的事情。所以，推理和智能一样，是一个比较模糊的概念。

Maybe you disagree with that. But yes, I think it has some of the same elements of reasoning. Reasoning is almost like akin to search, right? There's a sequential element of step wise consideration of possibilities and sort of building on top of those possibilities in a sequential manner until you arrive at some insight.

也许你不同意这个观点。但是，我认为它具有一些相同的推理元素。推理几乎就像搜索一样，对吧？有一个逐步考虑可能性的序列元素，并在这些可能性的基础上按顺序构建，直到你得出一些见解。

Sort of, yeah, I guess playing goes kind of like that. And when you have a single neural network doing that without search, that's kind of like that. So there's an existence proof in a particular constrained environment that a process akin to what many people call reasoning exists. But more general kind of reasoning.

有点这个意思，是的，我猜玩起来差不多就是这样。当你用单个神经网络来做而不用搜索时，就像这样。所以在特定的受限环境中，存在一种类似于许多人所称之为推理的过程的存在证明。但更普遍的推理方式也是存在的。

So off the board. There is one other existence, probably, which one? Us humans? Yes. Okay. All right. So do you think the architecture that will allow neural networks to reason will look similar to the neural network architectures we have today? I think it will. I think, well, I don't want to make two overly definitive statements.

那么离开这个议题吧。还有一种存在，是什么呢？我们人类？是的。好的。那么，您认为允许神经网络推理的架构是否类似于我们今天拥有的神经网络架构？我认为是的。我想，我不想做出过度明确的陈述。

I think it's definitely possible that the neural networks that will produce the reasoning breakthroughs of the future will be very similar to the architectures that exist today. Maybe a little bit more recurrent. Maybe a little bit deeper. But these neural networks are so insanely powerful. Why wouldn't they be able to learn to reason? Humans can reason. So why can't neural networks?

我想说，未来产生推理突破的神经网络，很可能与现有架构非常相似。也许稍微更复杂一点，也许稍微更深一点。但是这些神经网络非常强大。为什么它们不能学会推理呢？人类可以推理，为什么神经网络不能呢？

Do you think the kind of stuff we've seen neural networks do is a kind of just weak reasoning? So it's not a fundamentally different process. Again, this is stuff nobody knows the answer to. So when it comes to our neural networks, I would think which I would say is that neural networks are capable of reasoning. But if you train a neural network on a task which doesn't require reasoning, it's not going to reason.

你认为我们曾经看到的神经网络所做的是一种弱推理吗？所以它不是一种根本不同的过程。再一次，这是没有人知道答案的事情。所以当涉及到我们的神经网络时，我会认为神经网络是有能力推理的。但是，如果你训练一个神经网络去完成不需要推理的任务，它就不会推理。

This is a well-known effect where the neural network will solve exactly the, it will solve the problem that you pose in front of it in the easiest way possible. Right. That takes us to one of the brilliant ways you describe neural networks, which is you've referred to neural networks as the search for small circuits and maybe general intelligence as the search for small programs, which I found is a metaphor very compelling.

这是一个众所周知的效应，即神经网络将以最简单的方式解决您面前提出的问题。没错。这引出了你描述神经网络的一种亮眼方式，你将神经网络称为寻找小电路的过程，而可能的一般智能是寻找小程序的过程，我觉得这个比喻非常有说服力。

Can you elaborate on that difference? Yeah. So the thing which I said precisely was that if you can find the shortest program that outputs the data in your disposal, then you will be able to use it to make the best prediction possible. And that's a theoretical statement which can be proven mathematically.

你能详细说明一下那个差异吗？嗯，我所说的确切的是，如果你能找到能输出你手头的数据的最短程序，那么你将能够使用它来进行最佳预测。这是一个理论语句，可以在数学上证明。

Now, you can also prove mathematically that it is, that finding the shortest program which generates some data is not a, is not a computable operation. No finite amount of compute can do this. So then with, with neural networks, neural networks are the next best thing that actually works in practice. We are not able to find the best, the shortest program which generates our data, but we are able to find, you know, a small, but now, now that statement should be amended, even a large circuit, which fits our data in some way, well, I think what you meant by the small circuit is the smallest needed circuit.

现在，你也可以通过数学证明，即找到生成某些数据的最短程序不是一个可计算的操作。没有有限的计算量可以完成这项工作。因此，神经网络是实际上可行的下一个最佳选择。我们无法找到生成我们的数据的最佳最短程序，但我们能够找到一个与我们的数据相匹配的小型电路，即使现在这种说法应该被修正为一个大电路。我想你所说的小电路是需要的最小电路。

Well, the thing, the thing which I would change now back, then I really have, I haven't fully internalized the over parameter, the over parameterized results, the, the things we know about over parameterized neural nets. Now, I would phrase it as a large circuit that, whose weights contain a small amount of information, which I think is what's going on. If you imagine the training process of a neural network, as you slowly transmit entropy from the data set to the parameters, then somehow the amount of information in the weights ends up being not very large, which would explain by the generalize so well.

嗯，这个问题，我现在想改变的是，我以前没有完全理解超参数、过度参数化结果和我们对过度参数化神经网络的认识。现在，我会把它表述为一个包含少量信息的大电路，我认为这就是所发生的。如果你想象一下神经网络的训练过程，当你从数据集向参数慢慢传递熵时，权重中的信息量不会特别大，这也可以解释为什么它能够如此好地推广。

So that's, that the large circuit might be one that's helpful for the regular, for the generalization. Yeah, something like this. But do you see there, do you see it important to be able to try to learn something like programs? I mean, if we can definitely, I think it's kind of the answer is kind of yes, if we can do it, we should do things that we can do it.

所以这样说，那么大电路可能对普通人有帮助，可以推广。是的，就是这样。但你能看出来吗？你认为学习编程这样的东西很重要吗？我的意思是，如果我们肯定可以做到，那么答案是肯定的，我们应该做我们能做的事情。

It's, it's the reason we are pushing on deep learning, the fundamental reason, the, the, the, the root cause is that we are able to train them. So in other words, training comes first. We've got our pillar, which is the training pillar. And now we are trying to contour our neural networks around the training pillar. We got a state trainable. This is an, in what, this is an invariant. We cannot violate. And so being trainable means starting from scratch, knowing nothing, you can actually pretty quickly converge towards knowing a lot or even slowly, but it means that given the resources at your disposal, you can train the neural net and get it to achieve useful performance. Yeah, that's a pillar we can't move away from. That's right.

说实话，我们推动深度学习的原因就在于它。从根本上讲，之所以这么做，因为我们能够训练它们。也就是说，训练是首要的。我们有一个支柱，就是训练支柱。现在我们正在尝试围绕训练支柱构建我们的神经网络。我们有一个不可变的状态，那就是可训练。这是我们不能违反的规定。因此，可训练意味着从零开始，一无所知，你实际上很快就能够收敛到大量的知识，甚至是缓慢地，但这意味着在你的掌握资源的情况下，你可以训练神经网络并使其实现有用的性能。是的，这是我们无法改变的支柱。没错。

Because if you can, and various, if you say, hey, let's find the shortest program, well, we can't do that. So it doesn't matter how useful that would be, we can't do it. So you want. So do you think you kind of mentioned that neural networks are good at finding small circuits or large circuits? Do you think then the matter of finding small programs is just the data? No.

因为，如果你可以的话，而且各种可能性，如果你说：“嘿，让我们找到最短的程序”，我们是做不到的。所以无论那有多有用，我们也做不到。所以你想要什么？那么，你认为你有点提到神经网络擅长找到小电路或大电路吗？那么，找到小程序的问题仅仅是数据吗？不是。

So the cut, sorry, not not the size or the quality, the type of data, sort of ask giving it programs. Well, I think the thing is that right now, finding, there are no good precedents of people successfully finding programs really well. And so the way you'd find programs is you'd train a deep neural network to do it basically, right? Which is the right way to go about it. But there's not good illustrations that it hasn't been done yet, but in principle, it should be possible.

嗯，这个削减，对不起，不是尺寸或质量，而是数据类型，有点像需要为其编程。嗯，我认为现在的问题是，从目前的情况来看，还没有人成功地找到过真正好的程序。所以你要找程序的方法就是基本上训练一个深度神经网络，对吧？这是正确的方法。但目前还没有好的例子证明它已经做到了，但原则上应该是可行的。

Can you elaborate another bit? What's your answer in principle? Put another way you don't see why it's not possible. Well, it's kind of like more, it's more a statement of, I think that it's, I think that it's unwise to bet against deep learning. And if it's a, if it's a cognitive function, it humans seem to be able to do, then it doesn't take too long for some deep neural net to pop up that can do it too. Yeah, I'm there with you. I can, I've stopped betting against neural networks at this point, because I continue to surprises.

你能再详细说一点吗？你的原则性回答是什么？换句话说，你不认为这不可能。嗯，有点像，更像是我认为反对深度学习是不明智的说法。如果这是人类能完成的认知功能，那么深度神经网络很快就会出现，在这一点上，我和你的看法一致。我已经停止反对神经网络了，因为它们一直让我惊喜。

What about long-term memory? Can neural networks have long-term memory? Is something like knowledge basis? So being able to aggregate important information over long periods of time that will then serve as useful sort of representations of state that you can make decision by. So you have a long-term context based on which you make into the decision. So in some sense, the parameters already do that. The parameters are an aggregation of the neural of the entirety of the neural net experience. And so they count as the long-term knowledge. And people have trained various neural nets to act as knowledge basis and you know, investigated with, people have investigated language and all those knowledge bases. So there is work, there is work there.

长期记忆怎么样呢？神经网络能拥有长期记忆吗？就像一种知识基础。它能够聚合重要的信息，使其在很长一段时间内作为有用的状态表示，从而帮助你做出决策。这样，您就可以基于长期的背景来做出决定。某种程度上，参数已经做到了这一点。这些参数是整个神经网络经验的聚合。因此，它们可以算作长期知识。人们已经训练了各种神经网络作为知识基础，并研究了语言和所有这些知识基础。因此，这方面已经有了一些研究。

Yeah, but in some sense, do you think in every sense? Do you think there's a, it's all just a matter of coming up with a better mechanism of forgetting the useless stuff and remembering the useful stuff? Because right now, I mean, there's not been mechanisms that do remember really long-term information. What do you mean by that precisely? Precisely, I like the word precisely. So I'm thinking of the kind of compression of information the knowledge bases represent. Sort of creating a, now, I apologize for my sort of human centric thinking about what knowledge is because neural networks aren't interpretable necessarily with the kind of knowledge they have discovered. But a good example for me is knowledge bases being able to build up over time something like the knowledge that Wikipedia represents. It's a really compressed, structured knowledge base. Obviously not the actual Wikipedia or the language, but like a semantic web, the dream that semantic web represented. So it's a really nice compressed knowledge base or something akin to that in a non-interpretable sense as neural networks would have.

嗯，但从某些意义上来说，您是否认为在每个方面都是这样的呢？您是否认为只需要想出一个更好的机制来忘记无用的东西并记住有用的东西就可以了？因为现在，我是说，还没有机制可以记住非常长期的信息。您具体指的是什么？具体地说，我喜欢“具体”这个词。所以我在思考知识库所代表的信息压缩。这种知识库可以随着时间的推移建立起类似于维基百科所代表的知识。它是一个非常压缩、有结构的知识库。显然不是实际的维基百科或语言，但就像语义网络所代表的梦想一样。因此，它是一个非常好的压缩知识库，或者类似于神经网络的非可解释概念。

Well, the neural networks would be not interpretable if you look at their weights, but their outputs should be very interpretable. Okay, so how do you make very smart neural networks like language models interpretable? Well, you asked them to generate some text and the text would generally be interpretable. Do you find that the epitome of interpretability, like can you do better? Like, can you add? Because you can't, okay, I'd like to know what does it know and what doesn't know. I would like the neural network to come up with examples where it's completely dumb and examples where it's completely brilliant. And the only way I know how to do that now is to generate a lot of examples and use my human judgment, but it would be nice if a neural network had some, a word self-awareness about, yeah, 100%.

嗯，如果你看它们的权重，神经网络是不可解释的，但是它们的输出应该是非常可解释的。那么，你如何使像语言模型这样非常聪明的神经网络可解释呢？嗯，你可以要求它们生成一些文本，这些文本一般来说应该是可解释的。你认为这是可解释性的典范吗？你能做得更好吗？你能添加什么吗？因为你做不到，所以我想知道它知道什么，不知道什么。我希望神经网络能够提出一些例子，其中一些完全愚笨，而另一些则非常杰出。现在我唯一知道的方法是生成大量的例子并使用我的人类判断力，但是如果神经网络有一些词汇自我意识，那就太好了，100%。

100%.

请翻译下面的英文，并以像中文母语者的方式交流。如有必要，请进行修改。原文：I really enjoyed the conference last week. The speakers were great and I learned a lot. 翻译：上周的会议我真的很喜欢。演讲者非常出色，我学到了很多。

I'm a big believer in self-awareness and I think that neural net self-awareness will allow for things like the capabilities like the ones you describe like for them to know what they know and what they don't know and for them to know where to invest to increase their skills most optimally.

我强烈相信自我意识，我认为神经网络自我意识将会允许像你所描述的那样的能力，让它们知道自己知道什么，不知道什么，并且让它们知道在哪里进行投资以最优化地增强技能。

And to your question of interpretability, there are actually two answers to that question.

对于你的解释可解释性问题，实际上有两个答案。

One answer is, you know, we have the neural net so we can analyze the neurons and we can try to understand what the different neurons and different layers mean. And you can actually do that and openly I have done some work on that.

一个答案是，你知道，我们有神经网络，所以我们可以分析神经元，尝试理解不同的神经元和不同的层所代表的含义。你实际上可以这样做，而且我也曾经做过一些工作。

But there is a different answer, which is that, I would say that's the human centric answer where you say, you know, you look at a human being, you can't read, you know, how do you know what a human being is saying? Can you ask them? You say, hey, what do you think about this? What do you think about that? And you get some answers. The answers you get are sticky.

有一个不同的答案，我认为那是以人为中心的答案，你看一个人，你不能读懂，你怎么知道人在说什么？你能问他们吗？你说，嘿，你对这个怎么看？你对那个怎么看？然后你得到一些答案。你得到的答案是粘性的。

In the sense, you already have a mental model. You already have a mental model of that human being. You already have an understanding of like a big conception of what of that human being, how they think, what they know, how they see the world, and then everything you ask, you're adding onto that. And that's stickiness seems to be, that's one of the really interesting qualities of the human being is that information is sticky.

你已经有一个心理模型了。你已经对那位人类有一个心理模型了。你已经对他们拥有一个大致的概念，包括他们的思考方式，知识水平和对世界的看法，每一次的提问，都是围绕这个模型展开的。这种“粘性”似乎是人类真正有趣的特质之一，意味着信息可以很容易地留存下来。

You don't, you seem to remember the useful stuff, aggregate it well and forget most of the information that's not useful. That process, but that's also pretty similar to the process that you own networks do is just that you own networks so much crap here at this time. It doesn't seem to be fundamentally that different, but just stick on reasoning for a little longer.

你好像记得有用的东西，把它聚合起来，忘记了大部分不有用的信息。这个过程和你自己的网络的过程很相似，只不过你的网络上堆满了太多的垃圾。这似乎本质上并没有太大的差别，只是需要再深入思考一下。

He said, why not? Why can't that reason? What's a good impressive feat benchmark to you of reasoning? That you'll be impressed by if you own networks were able to do. Is that something you already have in mind?

他说：“为什么不呢？为什么不能用那个理由呢？在推理方面，你认为有什么出色的标准吗？如果你的网络能够做到这一点，你会印象深刻的。你已经有这方面的想法了吗？”

Well, I think writing, writing really good code, I think proving really hard theorems, solving open-ended problems without other box solutions. And sort of theorem-type mathematical problems. Yeah, I think those ones are a very natural example as well.

嗯，我认为写作，写出真正优秀的代码，证明相当困难的定理，解决没有其他框架解决方案的开放性问题是很重要的。还有一些定理类型的数学问题，我认为这些是非常自然的例子。

If you can prove an unproven theorem, then it's hard to argue it on reason. And so by the way, and this comes back to the point about the hard results, if you've got the hard, if you have machine learning, deep learning as a field is very fortunate because we have the ability to sometimes produce these unambiguous results. And when they happen, the debate changes, the conversation changes.

如果你能证明一个未被证明的定理，那么很难用理性进行争论。顺便说一句，这也回到了关于难度结果的那个观点，如果你拥有机器学习，深度学习作为一个领域是非常幸运的，因为我们有时候有能力产生这些明确的结果。当它们发生时，辩论会改变，谈话也会改变。

It's a conversation, we have the ability to produce conversation change in results. Conversation, and then of course, just like you said, people kind of take that for granted, that wasn't actually a hard problem.

这是一个对话，我们有能力在结果中产生对话的变化。对话，当然，就像你说的那样，人们有点认为这并不是一个难题。

Well, I mean, at some point, you probably run out of hard problems. Yeah, that whole mortality thing is kind of a sticky problem that we haven't quite figured out. Maybe we'll solve that one.

我是说，到某个时候，你可能会遇到没有太难的难题了。是啊，那个不朽的问题有点棘手，我们还没有完全解决。也许我们会解决这个问题。

I think one of the fascinating things in your entire body of work, but also the work at OpenAI recently, one of the conversation changes has been in the world of language models.

我认为你的整个工作和最近在OpenAI工作的一件迷人的事情是，在语言模型的世界中发生的一种对话变化。

Can you briefly kind of try to describe the recent history of using your networks in the domain of language and text?

你能简单概述一下最近使用你们的网络在语言和文本领域的历史吗？

Well, there's been lots of history. I think the Elman network was a small tiny recurrent neural network applied to language back in the 80s.

好的，有很多历史。我认为Elman网络是80年代应用于语言的小巧循环神经网络。

So the history is really fairly long at least. And the thing that started, the thing that changed the trajectory of neural networks and language is the thing that changed the trajectory of all deep learning and that data and compute.

所以历史至少相当悠久。而使神经网络和语言的轨迹发生改变的事情，也就是改变了所有深度学习以及数据和计算的轨迹的事情。

So suddenly you move from small language models, which learn a little bit. And with language models in particular, you can, there's a very clear explanation for why they need to be large, to be good, because they're trying to predict the next word.

你突然转向小型语言模型，它们只能学习一点点。对于语言模型来说，特别是要表现良好，需要做到很大的原因有明确的解释，因为它们试图预测下一个单词。

So we don't, we don't know anything. You'll notice very, very broad strokes surface level patterns like sometimes there are characters and there is a space between those characters. You'll notice this pattern. And you'll notice that sometimes there is a comma and then the next character is a capital letter. You'll notice that pattern.

所以我们不知道，我们什么都不知道。你会注意到非常、非常粗略的表面模式，比如有时候会有字符，字符之间会有一个空格。你会注意到这种模式。你会注意到有时候有逗号，然后下一个字符是大写字母。你会注意到这种模式。

Eventually you may start to notice that there are certain words that occur often. You may notice that spellings are a thing. You may notice syntax. And when you get really good at all these, you start notice the semantics. You start notice the facts. But for that to happen, the language model needs to be larger.

最终，你可能会开始注意到某些单词经常出现。你可能会注意到拼写是一件事。你可能会注意到句法。当你在所有这些方面变得非常擅长时，你开始注意语义。你开始注意事实。但是，为了发生这种情况，语言模型需要更大。

So that's, let's linger on that. That's where you and no jumps get disagree. So you think we're actually taking incremental steps. A sort of larger network, larger compute will be able to get to the semantics, be able to understand language without what norm likes to sort of think of as a fundamental understandings of the structure of language, like imposing your theory of language onto the learning mechanism.

“那我们就继续谈谈这个问题吧。这是你和别人对于这个问题的分歧点。你认为我们正在逐步地改进。有一种更大的网络、更强大的计算能够进入语义领域，能够不需要像规范所认为的那样去了解语言的结构基础知识，像强制将你的语言理论灌输到学习机制中那样。”

So you're saying the learning you can learn from raw data, the mechanism that underlies language. Well, I think it's pretty likely, but I also want to say that I don't really know precisely what is what Chomsky means when he talks about him. You said something about imposing your structure on language. I'm not 100% sure what he means, but empirically it seems that when you inspect those larger language models, they exhibit signs of understanding the semantics where is the small language models do not.

所以你的意思是从原始数据中学习，这是语言基础的机制。嗯，我认为这很可能，但我也要说我并不确切知道Chomsky在谈论他时指的是什么。你说了些关于将你的结构强加于语言的话。我并不百分之百确定他的意思，但根据实证，似乎当你检查那些更大的语言模型时，它们会表现出对语义的理解迹象，而小型的语言模型则不会。

We've seen that a few years ago when we did work on the sentiment neuron. We trained the small, you know, smaller shell STM to predict the next character in Amazon reviews. And we noticed that when you increase the size of the LSTM from 500 to LSTM cells to 4000 LSTM cells, then one of the neurons starts to represent the sentiment of the article of story of their view. Now, why is that sentiment is a pretty semantic attribute, it's not a syntactic attribute.

我们曾几年前研究情感神经元时看到过这个现象。我们训练了一个较小的LSTM来预测亚马逊评论中下一个字符。我们发现，当将LSTM单元数从500个增加到4000个时，其中一个神经元开始表现出文章或故事观点的情感倾向。现在，为什么情感是一个语义属性，而不是一个句法属性呢？

And for people who might not know, I don't know if that's a standard term, but sentiment is whether it's a positive or negative review. That's right. Like, is the person happy with something, is the person unhappy with something. And so here we had very clear evidence that a small neural net does not capture sentiment while a large neural net does. And why is that? Well, our theory is that at some point you run out of syntax to models, you start to got a focus on something else.

对于可能不熟悉的人，我不知道这是否是标准术语，但情感是指是积极还是消极的评论。没错。就像一个人对某件事是否感到高兴或是不高兴。所以，我们有非常清楚的证据表明，一个小的神经网络不能捕捉情感，而一个大型的神经网络可以。为什么呢？我们的理论是，在某个时候，你会用完语法来建模，你开始专注于其他事情。

And besides you quickly run out of syntax to model and then you really start to focus on the semantics, it would be the idea. That's right. And so I don't want to imply that our models have complete semantic understanding because that's not true. But they definitely are showing signs of semantic understanding, partial semantic understanding, but the smaller models do not show that those signs.

而且，你很快就用尽了语法来进行建模，然后你就开始着重关注语义，这就是这个想法。没错。因此，我不想暗示我们的模型具有完全的语义理解，因为这是不正确的。但是，它们肯定表现出了语义理解的迹象，是部分的语义理解，而较小的模型则没有显示出这些迹象。

Can you take a step back and say, what is GPT-2, which is one of the big language models that was the conversation changer in the past couple of years?

你能够稍微退后一步，然后说一下GPT-2是什么吗？它是目前为止最受瞩目的语言模型之一，过去几年中引起了很大的谈论。

Yes, so GPT-2 is a transformer with one and a half billion parameters that was trained on about 40 billion tokens of text, which were obtained from webpages that were linked to from Reddit articles with more than three upvotes.

嗯，GPT-2是一个参数有十五亿的Transformer模型，它被训练在约四十亿个文本词汇上，这些词汇是从Reddit论坛帖子中链接到的网页中提取出来的，这些帖子获得了三个以上的赞。

And what's the transformer?

那么，什么是变压器呢？

The transformer, it's the most important advance in neural net work architectures in recent history. What is attention maybe too? Because I think that's an interesting idea, not necessarily sort of technically speaking, but the idea of attention versus maybe what recurrent neural net works representing.

变形金刚是近期神经网络结构中最重要的进步。也许关注机制也是一个有意思的概念，不一定要从技术角度出发，但是关注机制和递归神经网络所代表的思维方式之间的对比确实很有启发性。

Yeah, so the thing is the transformer is a combination of multiple ideas simultaneously, which attention is one. Do you think attention is the key? No, it's A key, but it's not the key. The transformer is successful because it is the simultaneous combination of multiple ideas. And if you were to remove either idea, it would be much less successful.

嗯，事实是，变压器是同时结合多个想法而成的，而关注是其中之一。你认为关注是关键吗？不，它只是其中一个关键因素，但却不是唯一的关键。变压器成功是因为它同时结合了多个想法。如果你去掉其中任何一个想法，它都将变得不那么成功。

So, the transformer uses a lot of attention, but attention existed for a few years, so that can't be the main innovation. The transformer is designed in such a way that it runs really fast on the GPU. And that makes a huge amount of difference. This is one thing. The second thing is that transformer is not recurrent. And that is really important too because it is more shallow and therefore much easier to optimize.

所以，变形金刚使用了很多注意力，但是注意力已经存在了几年，所以这并不是主要的创新。变形金刚的设计方式使其在GPU上运行非常快，这是一件非常重要的事情。第二个重要的事情是，变形金刚不是递归的，这也非常重要，因为它更浅，因此更容易优化。

So in other words, it uses attention. It is a really great fit to the GPU. And it is not recurrent, so therefore less deep and easier to optimize. And the combination of those factors make it successful. So now it makes great use of your GPU. It allows you to achieve better results for the same amount of compute. And that's why it's successful.

换句话说，它使用了注意力机制。这非常适合GPU。它不具有循环性，因此较浅，易于优化。这些优势的结合使其成功。因此，它现在可以很好地利用您的GPU。这使您可以在相同的计算量下获得更好的结果。这就是它成功的原因。

Were you surprised how well transformer has worked and GPT-2 worked?

你对Transformer和GPT-2的表现感到吃惊吗？觉得它们做得很棒吗？

So if you worked on language, you've had a lot of great ideas before. Transformers came about in language. So you got to see the whole set of revolutions before and after. Were you surprised?

如果你从事语言方面的工作，你之前一定有过很多伟大的想法。Transformer 出现在语言领域。所以你先后看到了整个革命的过程。你感到很惊讶吗？

Yeah, a little. A little? Yeah. I mean, it's hard to remember because you adapt really quickly, but it definitely was surprising. It definitely was, in fact, I'll retract my statement. It was pretty amazing. It was just amazing to see, generate this text of this.

是的，有一点点。一点点？是的。我的意思是，很难记住，因为你适应得非常快，但是它确实很令人惊讶。它确实很令人惊讶，事实上，我要收回之前的话。它真的很神奇。看到这个生成文本，真的太神奇了。

And you know, you got to keep in mind that at that time, we've seen all this progress in GANs and improving the samples produced by GANs were just amazing. You have these realistic faces, but text hasn't really moved that much. And suddenly, we moved from, you know, whatever GANs were in 2015 to the best, most amazing GANs in one step. And I was really stunning.

你知道，你得牢记那个时候，我们看到了GAN在进步，GAN生成的样本变得更棒了。你可以有这些逼真的脸，但是文本并没有真正改变很多。突然间，我们从2015年的任何GAN步入了最好、最棒的GAN，这真的让我震撼不已。

Even though theory predicted, yeah, you train a big language model, of course, you should get this, but then to see it with your own eyes. It's something else. And yet, we adapt really quickly.

"即使理论预测，当然，你训练了一个大规模语言模型，你应该得到这样的结果，但是亲眼看到它还是另一回事。然而，我们很快就能适应。"

And now there's sort of some cognitive scientists, right, articles saying that GPT-2 models don't truly understand language. So we adapt quickly to how amazing the fact that they're able to model the language so well is.

现在有一些认知科学家的文章表示，GPT-2模型并不真正理解语言。但是我们很快适应了它们能够如此完美地模拟语言的惊人事实。

So what do you think is the bar for what, for impressing us that it, I don't know. Do you think that bar will continuously be moved? Definitely.

你认为打动我们的标准是什么？你认为这个标准会一直改变吗？肯定会的。

I think when you start to see really dramatic economic impact, that's when, I think that's in some sense, in the next barrier. Because right now, if you think about the work in AI, it's really confusing. It's really hard to know what to make of all these advances.

我觉得，当你开始看到真正戏剧性的经济影响时，那就是下一道门槛。因为现在，如果你思考AI领域中的工作，那真的非常混乱。很难确定这些进步意味着什么。

It's kind of like, okay, you got an advance. And now you can do more things and you got another improvement, and you got another cool demo. At some point, I think people who are outside of AI, they can no longer distinguish this progress anymore.

有点像，你得到了一份预付款。现在你能做更多的事情，得到了另一个改善，还有一个很酷的演示。我认为，在某个时候，那些不是AI从业者的人无法再区分这种进步了。

So we were talking offline about Translate Russian to English and how there's a lot of brilliant work in Russian that the rest of the world doesn't know about. That's true for Chinese. That's true for a lot of scientists and just artistic work in general.

我们之前在私下讨论翻译俄语到英语的问题，以及全球很多优秀的俄语作品在外界并不为人所知。这同样适用于中文。对于许多科学家和各类艺术作品，也是如此。

Do you think translation is the place where we're going to see sort of economic big impact? I don't know. I think there is a huge number of applications.

你认为翻译会对经济产生重大影响吗？我不确定。我认为它有很多应用领域。

I mean, first of all, I want to point out that translation already today is huge. I think billions of people interact with big chunks of the internet, primarily through translation. Translation is already huge, and it's hugely positive too.

首先，我想指出翻译已经是非常重要的事情了。我认为数十亿人主要通过翻译使用互联网的大部分内容。翻译已经变得非常重要，并且它也非常有益。

I think self-driving is going to be hugely impactful. It's unknown exactly when it happens, but again, I would not bet against deep learning. So there's deep learning in general, but deep learning for self-driving. Yes, deep learning for self-driving.

我觉得自动驾驶会有巨大的影响。虽然不知道什么时候会实现，但我不会反对深度学习。所以有深度学习的普遍应用，也有深度学习应用于自动驾驶。是的，深度学习应用于自动驾驶。

But I was talking about sort of language models. I see. Just to check. I've earduff a little bit. Just to check. You're not seeing a connection with you driving in language. No, no.

但我刚刚在谈论语言模型。我明白了。只是确认一下。我听力有点问题。只是确认一下。你没有看到你在语言上驾驶的联系，对吧？没有，没有。

Okay. All right, both using your own nets. That would be a poetic connection. I think there might be some, like you said, there might be some kind of unification towards a kind of multitask transformers that can take on both language and vision tasks. I'd be an interesting unification.

好的，好的，使用你们自己的网络。那将是一种诗意的连接。我认为可能会有一些像你说的，向一种能够同时处理语言和视觉任务的多任务转换器的统一趋势。这将是一个有趣的统一。

Let's see. What can I ask about GPD2 more? It's simple. Not much to ask. You take a transformer, make it bigger, give it more data, and suddenly it does all those amazing things.

让我们看看。我还能问些什么有关GPD2的问题呢？很简单。没有太多的问题可问。你只需要把变压器做大，给它更多的数据，突然它就能做出所有那些惊人的事情了。

One of the beautiful things is that GPD, the transformers are fundamentally simple to explain, to train, do you think bigger will continue to show better results in language? Probably.

其中一个美妙之处是，GPD和变压器的基本原理很容易解释和训练。你认为在语言方面更大的模型会继续展现出更好的结果吗？很可能会。

Sort of like what are the next steps with GPD2, do you think? I mean, I think for sure seeing what a large version can do is one direction.

你认为接下来GPD2有什么步骤呢？我是说，我认为看看大版本能做什么肯定是一个方向。

Also, I mean, there are many questions. There's one question which I'm curious about, and that's the following. So right now, GPD2, so we feel it all is data from the internet, which means that it needs to memorize all those random facts about everything in the internet.

另外，我的意思是，有很多问题。有一个问题让我很好奇，那就是下面的问题。所以现在，GPD2，我们感觉它全部都是从互联网获取的信息，这意味着它需要记住所有这些关于互联网上每个事物的随机事实。

And it would be nice if the model could somehow use its own intelligence to decide what data it wants to start, accept and what data it wants to reject. Just like people, people don't learn all data in this criminality. We are super selective about what we learn.

如果这个模型能够像人一样运用自己的智能，决定它想要开始接受哪些数据、接受哪些数据和拒绝哪些数据，那就太好了。就像人一样，我们在学习犯罪方面时也不会学习所有的数据。我们会非常挑剔地选择要学习的内容。

And I think this kind of active learning I think would be very nice to have. Yeah, listen, I love active learning. Let me ask, does the selection of data, can you just elaborate that a little bit more?

我觉得这种主动学习非常棒。嗯，听着，我喜欢主动学习。让我问一下，数据的选择，你能再详细解释一下吗？

Do you think the selection of data is like I have this kind of sense that the optimization of how you select data so that the active learning process is going to be a place for a lot of breakthroughs even in the near future because there's hasn't been many breakthroughs there that are public.

你觉得数据的选择是不是有种感觉，就是优化数据选择的方式，让主动学习的过程成为未来许多突破的地方，因为公开的突破还不太多。

I feel like there might be private breakthroughs that companies keep to themselves because the fundamental problem has to be solved if you want to solve self-driving, if you want to solve a particular task.

我觉得可能有一些公司私下里取得了突破，因为如果想要解决自动驾驶问题，或者解决特定任务的问题，必须先解决根本问题。

What do you think about the space in general?

你对整个太空有什么看法？

Yeah, so I think that for something like active learning or in fact for any kind of capability, like active learning, the thing that it really needs is a problem. It needs a problem that requires it.

嗯，我认为像积极学习这样的能力，事实上任何能力都需要一个问题，需要一个需要它的问题。

It's very hard to do research about the capability if you don't have a task because then what's going to happen is you will come up with an artificial task, get good results, but not really convince anyone.

如果你没有任务，研究能力将会非常困难，因为你会想出一个人为的任务，取得好的结果，但实际上并不能说服任何人。

We're now past the stage where getting resultant M-nist, some clever formulation of M-nist will convince people. That's right.

我们现在已经超过了仅仅得到M-nist结果的阶段，需要一些巧妙的M-nist配方来说服人们。没错。

In fact, you could quite easily come up with a simple active learning scheme on M-nist and get a 10x speed up, but then so what?

实际上，你可以很容易地在M-nist上想出一个简单的主动学习方案，并获得10倍加速，但这又怎样呢？

I think that with active learning, the active learning will naturally arise as problems that require it pop-up. That's how I would, that's my take on it.

我认为通过积极学习，需要积极学习才能解决的问题自然而然地出现。这就是我的看法。

There's another interesting thing that OpenA has brought up with GPT2, which is when you create a powerful artificial intelligence system and it was unclear what kind of detrimental, once you release GPT2, what kind of detrimental effect it'll have.

有一个有趣的事情是OpenA在GPT2方面提出的，那就是当你创建了一个强大的人工智能系统，而不清楚它会产生什么样的负面影响，一旦你释放了GPT2。

Because if you have a model that can generate pretty realistic text, you can start to imagine that on the, it would be used by bots in some way that we can't even imagine.

因为如果你拥有一个能够生成相当逼真文本的模型，你就能开始想象到它会被机器人以某种我们甚至无法想象的方式来使用。

There's this nervousness about what it's possible to do. You did a really brave and I think profound thing, which just started a conversation about this.

有一个关于可以做什么的紧张感。你做了一件非常勇敢而且我认为很深刻的事情，这仅仅开始了对此事的对话。

How do we release powerful artificial intelligence models to the public? If we do it all, how do we privately discuss with other even competitors about how we manage the use of the systems and so on?

我们如何向公众发布强大的人工智能模型？如果我们这样做，我们如何与其他甚至是竞争对手私下讨论如何管理系统的使用等问题？

So from that, this whole experience, you release the report on it, but in general, are there any insights that you've gathered from just thinking about this, about how you release models like this?

那么，从这个经历中，你发布了报告，但是总的来说，你从思考这件事中得到了什么启示？比如你如何发布这样的模型？请用像中文母语者那样的说话方式翻译。如有需要，请改写。

I mean, I think that my take on this is that the field of AI has been in a state of childhood and now it's exiting that state and it's entering a state of maturity.

我觉得，我想说的是，人工智能领域一直处于婴儿状态，现在正要走出这个阶段，进入成熟状态。

What that means is that AI is very successful and also very impactful and its impact is not only large, but it's also growing.

这句话的意思是，人工智能非常成功，影响力也非常大，其影响不仅仅很大，而且还在增长。

And so for that reason, it seems wise to start thinking about the impact of our systems before releasing them, maybe a little bit too soon rather than a little bit too late.

因此，出于这个原因，似乎明智的是在发布之前开始考虑我们的系统对环境的影响，可能会提前一些，而不是延迟一些。

And with the case of GPT-2, like I mentioned earlier, the results really were stunning and it seemed plausible.

并且，就像我之前提到的那样，GPT-2的情况确实非常惊人，看起来很有可信性。

It didn't seem certain, it seemed plausible that something like GPT-2 could easily use to reduce the cost of this information.

这并不是肯定的事情，但很有可能像 GPT-2 这样的工具可以轻松地用来降低这些信息的成本。

And so there was a question of what's the best way to release it and staged release simulogical.

因此，有一个问题需要解决，就是什么是最好的发行方式，以及如何分阶段发布。

A small model was released and there was time to see the. Many people use these models in lots of cool ways. There have been lots of really cool applications. There haven't been any negative application, do we know of? And so eventually it was released, but also other people replicated similar models.

有一个小型模型被发布出来，我们有时间去看它。许多人用这些模型做了很多很酷的事情。已经有许多真正酷炫的应用程序。我们不知道是否有过任何负面的应用程序？最终，它被发布了，但其他人也复制了类似的模型。

That's an interesting question though that we know of.

这是一个有趣的问题，尽管我们已经知道。

So in your view, stage release is at least part of the answer to the question of how do we. how. What do we do once we create a system like this?

那么据你所见，分阶段发布至少是回答“我们如何做到这一点”的问题的一部分。我们创造出这样一个系统后，我们要做什么呢？

It's part of the answer, yes. Is there any other insights?

这是答案的一部分，对吗？还有什么其他看法吗？

Like say you don't want to release the model at all because it's useful to you for whatever the business is.

比如说你不想发布这个模型，因为它对你的业务很有用。

While there are plenty of people don't release models already. Right, of course, but is there some moral ethical responsibility when you have a very powerful model to sort of communicate?

当然，还有许多人没有发布模型。但当你拥有一个非常强大的模型时，是否存在一定的道德和伦理责任需要传达呢？

Like just as you said, when you had GPT-2, it was unclear how much it could be used for misinformation. It's an open question.

就像你所说的那样，当你拥有GPT-2时，它的误导程度还不清楚。这是一个悬而未决的问题。

And getting an answer to that might require that you talk to other really smart people that are outside of.

得到这个答案可能需要和其他非常聪明的人交谈，他们不在你身边。

That's actually your particular group. Have you.

那实际上是你特定的群体。你知道吗？

Please tell me there's some optimistic pathway for people across the world to collaborate on these kinds of cases?

请问有没有一些积极的途径，让全世界的人们在这类情况下进行合作？

Or is it still really difficult from one company to talk to another company?

从一家公司到另一家公司交流仍然非常困难吗？

So it's definitely possible. It's definitely possible to discuss these kind of models with colleagues elsewhere and to get their take on what's on what to do.

所以这绝对是可能的。可以与其他同事讨论这些模型，并听取他们的建议。

A heart is it though. I mean, do you see that happening?

虽然它是一颗心，但你认为会发生那种事吗？

I think that's a place where it's important to gradually build trust between companies because ultimately all the AI developers are building technology which is becoming increasingly more powerful.

我觉得在这个领域，逐渐建立公司间的信任非常重要，因为最终所有的人工智能开发者都在建设越来越强大的技术。

And so it's. The way to think about it is that ultimately we only need together. Yeah, it's.

所以是这样的。我们需要一起来考虑它。是的，没错。

I tend to believe in the the better angels of our nature, but I do hope that when you build a really powerful AI system in a particular domain, that you also think about the potential negative consequences of.

我倾向于相信人性中更善良的一面，但我希望当你在特定领域建造一个非常强大的人工智能系统时，你也要考虑其可能产生的负面影响。

Yeah. It's an interesting and scary possibility that there will be a race for AI development that would push people to close that development and not share ideas with others. I don't love this.

是的，这是一个有趣而可怕的可能性，会让人们争相开发人工智能，推动人们关闭开发，并不与他人分享想法。我不喜欢这个。

I've been in like a pure academic for 10 years. I really like sharing ideas and it's fun and it's exciting.

我已经像一个纯粹的学术人员那样存在了10年。我非常喜欢分享想法，这很有趣并且令人兴奋。

What do you think it takes to. Let's talk about AGI a little bit. What do you think it takes to build a system of human level intelligence? We talked about reasoning. We talked about long-term memory, but in general, what does it take, do you think?

让我们来谈谈 AGI。你认为构建一个人类水平智能的系统需要什么？我们谈到了推理，谈到了长期记忆，但总体上，你认为需要什么呢？

Well, I can't be sure, but I think the deep learning plus maybe another small idea.

嗯，我不能确定，但我认为深度学习再加上可能是其他一些小的想法。

Do you think self-play will be involved? So, like you've spoken about the powerful mechanism of self-play where systems learn by sort of exploring the world and a comparative setting against other entities that are similarly skilled as them, and so incrementally improving this way. Do you think self-play will be a component of building an AGI system?

你觉得自我对弈会被纳入其中吗？就像你所说的，自我对弈是一种强大的机制，系统通过探索世界并与同样熟练的其他实体进行比较，逐步提高水平。你认为自我对弈会成为构建AGI系统的一个组成部分吗？

Yeah. So, what I would say to build AGI, I think it's going to be deep learning plus some ideas. And I think self-play will be one of those ideas. I think that that is a very. Self-play has this amazing property that it can surprise us. In truly novel ways, for example, I mean, pretty much every self-play system, both are daughter-bought. I don't know if you openly, I had a release about multi-agent where you had the two little agents who were playing hide and seek. And of course, also Alpha Zero, they were all produced surprising behaviors. They all produce behaviors that we didn't expect. They are creative solutions to problems. And that seems like an important part of AGI that our systems don't exhibit routinely right now.

嗯，我的观点是，要建立人工智能的通用智能，需要深度学习加上一些新想法。自我对弈就是其中之一。自我对弈有一个了不起的特性，就是能以全新的方式给我们带来惊喜。例如，几乎所有自我对弈系统都会产生令人意想不到的行为，比如我不知道你是否看到过多智能体研究的公开发布，其中两个小代理人在玩藏猫猫游戏；还有Alpha Zero，他们都表现出让我们意想不到的行为，并提供了创造性的解决问题的方式。而这似乎是人工通用智能的重要组成部分。这是我们现有系统无法经常展示出来的。

And so, that's why I like this area, I like this direction because of its ability to surprise us. To surprise us. And an AGI system would surprise us fundamentally. But not just a random surprise, but to find the surprising solution to a problem, it's also useful.

所以，这就是为什么我喜欢这个地区，喜欢这个方向，因为它有能力给我们带来惊喜。给我们带来惊喜。而一个AGI系统会从根本上给我们带来惊喜。但并不是随机的惊喜，而是找到惊人的解决问题的方法，这也是有用的。

Now, a lot of the self-play mechanisms have been used in the game context, or at least in simulation context. How far along the path to AGI do you think will be done in simulation? How much faith promise do you have in simulation versus having to have a system that operates in the real world, whether it's the real world of digital real world data or real world, like actual physical world of robotics?

现在，很多自我对弈的机制都被用于游戏的背景中，或者至少在仿真的背景下。您认为在仿真中到达人工智能的路程有多远？相比于必须拥有一个在真实世界中运行的系统，无论是数字真实世界数据还是实际机器人的实际物理世界，您有多少信心？

I don't think it's in either war. I think simulation is a tool, and it helps. It has certain strengths and certain weaknesses, and we should use it. Yeah, but okay, I understand that that's true. But one of the criticisms of self-play, one of the criticisms of reinforcement learning is one of the, the, its current power, its current results, while amazing, have been demonstrated in a simulated environment, or very constrained physical environments.

我不认为这个问题与任何一场战争有关。我认为模拟是一种工具，它有其优点和缺点，我们应该好好利用它。是啊，这是对的。但是，自我对弈和强化学习所面临的批评之一就是，尽管它们的表现惊人，但它们的功效和结果目前只在模拟环境或非常受限制的物理环境下得到验证。

Do you think it's possible to escape them? Escape the simulated environments and be able to learn in non-similar environments? Or do you think it's possible to also just simulate in a photo-realistic and physics-realistic way, the real world, in a way that we can solve real problems with self-play in simulation?

你觉得逃离它们是可能的吗？逃离模拟环境并且能够在非相似环境中学习？或者你觉得是否也可能只是以一种逼真的照片和物理模拟的方式，模拟真实世界，通过在模拟中自我玩耍来解决真实问题？

So, I think that transfer from simulation to the real world is definitely possible, and has been exhibited many times in, by many different groups. It's been a specialist, successful in vision. Also, OpenAI in the summer has demonstrated a robot hand which was trained entirely in simulation in a certain way that allowed for seem to real transfer to occur.

所以，我认为从模拟到现实世界的转化是完全可能的，而且已经由众多团体多次展示过了。在视觉方面已取得了专业的成功。此外，OpenAI在夏季展示了一个机器人手，完全在模拟中进行训练，以一种特殊的方式，能够实现看起来像真实世界的转换。

Is this for the Rubik's cube?

这个是为魔方吗？

Yes, that's right. I wasn't aware that was trained in simulation. It was trained in simulation entirely. Really? So, what it wasn't in the physical, the hand wasn't trained?

是的，没错。我不知道它是在模拟中训练的。它完全是在模拟中训练的。真的吗？那么，如果不是在真实场景下，那只是手部没有受到训练？

No. 100% of the training was done in simulation, and the policy that was learned in simulation was trained to be very adaptive. So, adaptive that when you transfer it, it could very quickly adapt to the physical world. So, the kind of perturbations with the giraffe or whatever the heck it was, were those part of the simulation?

全部的训练都是在仿真环境中完成的，所学习的政策非常适应。它非常适应，以至于当您将其转移到现实世界时，它可以非常快速地适应。那么，与长颈鹿或者其他东西的那些干扰是仿真环境的一部分吗？

Well, the simulation was generally, so the simulation was trained to be robust to many different things, but not the kind of perturbations we've had in the video. So, it's never been trained with the glove, it's never been trained with the stuff giraffe. So, in theory, these are novel perturbations.

嗯，整个模拟通常来说是稳健的，它被训练得可以应对很多不同的情况，但不是像在视频中出现的那种扰动。因此，它从未被训练过戴手套或抱长颈鹿毛绒玩具。理论上来说，这些都是新颖的扰动。

Correct. It's not in theory in practice, that those are novel perturbations. Well, that's okay. That's a small scale, but clean example of a transfer from the simulated world to the physical world.

正确的说法是，仅仅是理论上是新的干扰，而在实践中，这些干扰并不新奇。不过没关系，这是一个小而干净的例子，展示了从模拟世界到物理世界的转换。

Yeah, and now we'll also say that I expect the transfer capabilities of deep learning to increase in general, and the better the transfer capabilities are, the more useful simulation will become. Because then you could take, you could experience something in simulation, and then learn a moral of the story, which you could then carry with you to the real world. As humans do all the time, and they play computer games.

是的，现在我认为深度学习的转移能力会普遍增加，而且转移能力越好，模拟就越有用。这样，你可以在模拟中体验某些事情，然后领悟到一个故事的道理，并将其带到现实世界中。这就像人类经常做的一样，比如玩电脑游戏。

So, let me ask sort of an embodied question, staying on AGI first, like, do you think AGI system would need to have a body? We need to have some of those human elements of self-awareness, consciousness, sort of fear of mortality, sort of self-preservation in the physical space, which comes with having a body.

那么，我想问一个关于AGI的问题，就像是一个具有身体的问题。你认为AGI系统是否需要有一个身体？我们需要一些人类的元素，比如自我意识、意识、对死亡的恐惧、在物理空间中的自我保护，这些都需要身体的存在。

I think having a body will be useful. I don't think it's necessary, but I think it's very useful to have a body for sure, because you can learn a whole new, you can learn things which cannot be learned without a body. But at the same time, I think that you can call, if you don't have a body, you could compensate for it and still succeed.

我认为拥有一个身体会很有用。我认为不是必要的，但是毫无疑问，拥有一个身体非常有用，因为你可以学到全新的东西，你可以学到没有身体就无法学到的东西。但同时，我认为如果你没有身体，你可以弥补它并且仍然能够成功。

You think so? Yes. Well, there is evidence for this. For example, there are many people who were born deaf and blind, and they were able to compensate for the lack of modalities. I'm thinking about Helen Kailer specifically. So, even if you're not able to physically interact with the world, and if you're not able to, I mean, I actually was getting it.

你这么想？是的。嗯，这有证据支持。比如，有很多人天生又聋又盲，但他们能够补偿感官的缺失。我特别想提到海伦·凯勒。所以，即使你不能身体上与世界互动，如果你不能——我想我理解了。

Maybe let me ask on the more particular, I'm not sure if it's connected to having a body or not, but the idea of consciousness. And a more constrained version of that is self-awareness. Do you think any G.I. system should have consciousness? We can't define God. Whatever the heck you think consciousness is.

或许让我更详细地问一下，我不确定这是否与拥有身体有关，但涉及到意识的概念。而更严谨的版本是自我意识。您认为任何智能系统都应该具有意识吗？我们无法定义上帝。无论您认为意识是什么。

Yeah. A hard question to answer, given how hard it is to define it. Do things useful to think about? I mean, it's definitely interesting. It's fascinating. I think it's definitely possible that our systems will be conscious. You think that's an emergent thing that just comes from. You think consciousness could emerge from the representation that's stored within your networks. So, it naturally just emerges when you become more and more. You're able to represent more and more over the world.

是的。这是一个非常难回答的问题，因为它很难定义。有些东西值得去思考吗？我觉得它无疑很有趣。它很迷人。我认为我们的系统可能会有意识。你认为这是一种由累积而成的新出现的东西。你认为意识可能会从网络中存储的表达中出现。因此，当你能够代表更多世界时，自然地就会出现。

Well, I'd make the following argument, which is humans are conscious. And if you believe that artificial neural nets are sufficiently similar to the brain, then there should at least exist artificial neural nets. It should be conscious, too. You're leaning on that existence proof pretty heavily. Okay. But that's the best sense that I can give. No, I know.

好的，我认为我们人类是有意识的。如果你认为人工神经网络足够类似于人脑，那么至少应该存在一个人工神经网络同样具有意识。你很依赖这个存在的证明，没错。但这是我能给出的最好的解释了。我知道。

I know. There's still an open question if there's not some magic in the brain that we're not. I mean, I don't mean a non-materialistic magic, but that the brain might be a lot more complicated and interesting than we give a credit for. If that's the case, then it should show up. And at some point, if you find out that we can't continue to make progress, I think it's unlikely.

我知道。仍有一个未解之谜，即我们的大脑中是否存在一些我们尚未知晓的魔法。我的意思并不是非唯物主义的魔法，而是说大脑可能比我们想象的更为复杂和有趣。如果是这样的话，就应该有所呈现。而且，如果到某个时候你发现我们不能继续取得进展，我认为这是不可能的。

So, we talk about consciousness, but let me talk about another poorly defined concept of intelligence. Again, we've talked about reasoning, we've talked about memory. What do you think is a good test of intelligence for you?

所以，我们谈论意识，但让我谈一下另一个定义不清的智力概念。再次强调，我们已经谈论过推理和记忆。你认为什么是适合测试智力的好方法呢？

Are you impressed by the test that Alan Turing formulated with the imitation game of natural language? Is there something in your mind that you will be deeply impressed by if a system was able to do?

你对艾伦·图灵用自然语言的模拟游戏测验提出的测试印象深刻吗？如果一个系统能够做某些事情，这会让你深受震撼吗？

I mean, lots of things. There is a certain frontier of capabilities today. And there exists things outside of that frontier. And I would be impressed by any such thing. For example, I would be impressed by a deep learning system which solves a very pedestrian task like machine translation or computer vision task or something which never makes mistakes a human wouldn't make under any circumstances. I think that is something which have not yet been demonstrated and I would find it very impressive.

我是说，有很多事情。今天的能力已经有了一定的边界。在这个边界之外一定有一些东西。如果有这样的东西，我会感到非常惊讶。比如说，如果有一个深度学习系统能够解决一个非常常见的任务，比如机器翻译或计算机视觉任务，或者在任何情况下都不会犯人类不会犯的错误的任务，我会感到非常惊讶。我认为这是尚未被证明的东西，我会觉得它非常令人印象深刻。

Yes, so right now they make mistakes and they might be more accurate than you being, but they still they make a difference out of mistakes. So my I would guess that a lot of the skepticism that some people have about deep learning is when they look at their mistakes and they say, well, those mistakes they make no sense. Like if you understood the concept, you wouldn't make that mistake. And I think that changing that would be that would inspire me. That would be yes. This is this is this is progress.

是的，现在他们犯错误，可能比你更准确，但他们仍然从错误中发现了差异。我猜想，很多人对深度学习持怀疑态度是因为他们看到了那些错误，并且认为那些错误是没有意义的。如果你理解了概念，你就不会犯那样的错误。我认为改变这种状况会给我带来启发。这是进步的体现。

Yeah, that's really that's a really nice way to put it. But I also just don't like that human instinct to criticize the models not intelligent. That's the same instinct as we do when we criticize any group of creatures as the other because it's very possible that GPT2 is much smarter than human beings at many things.

是啊，这确实是一种很好的表达方式。但我也不喜欢那种批评模型不智能的人类本能。这跟我们批评其他任何群体一样，因为很有可能GPT2在许多方面比人类更聪明。

That's definitely true. It has a lot more breadth of knowledge. Yes, breadth of knowledge and even and even perhaps depth on certain topics. It's kind of hard to judge what depth means, but there's definitely a sense in which humans don't make mistakes that these models do.

那绝对是真的。它有更广泛的知识。是的，广泛的知识甚至有时可能还包括某些领域的深度。深度是很难衡量的，但肯定有一种人类不会犯错而这些模型却会的感觉。

The same is applied to autonomous vehicles. The same is probably going to continue being applied to a lot of artificial social systems.

同样的情况也适用于自主驾驶车辆。这种情况可能会继续适用于许多人工社会系统。

We find this is the annoying this is the process of in the 21st century, the process of analyzing the progress of AI is the search for one case where the system fails in a big way where humans would not and then many people writing articles about it and then broadly as the public generally gets convinced that the system is not intelligent and we like pacify ourselves by think it's not intelligent because of this one anecdotal case and this seems to continue happening.

我们发现这是令人讨厌的事情，即在21世纪的过程中分析人工智能进展的过程是寻找一个系统在大规模上失败的案例，人类不会失败，然后很多人会写文章，广泛地让公众相信该系统不智能，我们则喜欢让自己相信这是由于这一个轶事发生的缘故，这种情况似乎还在继续发生。

Yes, I mean there is truth to that. Although I'm sure that plenty of people are also extremely impressed by the system that exists today, but I think this connects to the earlier point we discussed that it's just confusing to judge progress in AI. You have a new robot demonstrating something. How impressed should you be?

是的，我的意思是那是有一定真实性的。虽然我确信很多人也对现今的系统印象极佳，但我认为这与我们之前讨论的观点联系在一起，很难对AI的进步进行评估。当你看到一个新的机器人展示某些东西时，你应该有多么印象深刻呢？

I think that people will start to be impressed once AI starts to really move the needle on the GDP. You're one of the people that might be able to create an AGS system here, not you, but you and OpenAI.

我认为一旦人工智能真正开始拉动GDP，人们就会开始感到印象深刻。在这里，你是能够创建AGS系统的人之一，不是你一个人，而是你和OpenAI。

If you do create an AGS system and you get the spend sort of the evening with it, him, her, what would you talk about, do you think? The very first time, the first time.

如果你创建了一个AGS系统，并与它度过了一个晚上，你认为你会谈些什么？第一次，第一次。

Well, the first time I would just ask all kinds of questions and try to make it to get it to make a mistake and that would be amazed that it doesn't make mistakes and just keep keep asking broad questions.

嗯，第一次我会问各种问题，试着让它犯错，但很惊讶地发现它不会犯错，所以就继续问广泛的问题了。

What kind of questions do you think, would they be factual or would they be personal, emotional, psychological? What do you think?

你觉得他们会问什么样的问题呢？是客观事实还是个人、情感、心理类的吗？你认为呢？

All of their Bob. Would you ask for advice? Definitely. I mean, why would I limit myself talking to a system like this?

他们所有的Bob你会寻求建议吗？当然会。我的意思是，为什么要限制自己只跟这样的系统交流呢？

Now, again, let me emphasize the fact that you truly are one of the people that might be in the room where this happens. So let me ask a sort of a profound question about, I've just talked of Stalin's story. I've been talking to a lot of people who are studying power.

现在，让我再次强调这个事实，你真的是可能在发生这件事的房间里的人之一。所以让我问一个有深度的问题，关于我刚才谈到了斯大林的故事。我一直在和许多研究权力的人谈论。

Abraham Lincoln said, nearly all men can stand adversity, but if you want to test a man's character, give him power. I would say the power of the 21st century, maybe a 22nd, but hopefully the 21st would be the creation of an AGI system and the people who have control, direct possession and control of the AGI system.

亚伯拉罕·林肯曾说，几乎所有人都能经受逆境，但如果你想测试一个人的性格，就把权力交给他。我认为21世纪的力量可能是创造一个超级智能系统，而那些拥有对超级智能系统的直接掌控和控制权的人将会是该时代的强者，希望这能在21世纪实现。

So what do you think after spending that evening, having a discussion with the AGI system, what do you think you would do?

在与AGI系统交流了一个晚上之后，你认为你会做什么？

Well, the ideal world would like to imagine is one where humanity, I like the board members of a company where the AGI is the CEO. So it would be, I would like the picture which I would imagine is you have some kind of different entities, different countries of cities and the people that leave their vote for what the AGI that represents them should do and the AGI that represents them goes and does it.

我们喜欢想象的理想世界是人类如同公司董事会一样，AGI是CEO。这个世界里，你可以看到各种不同的实体、国家、城市和人们为代表他们的AGI投票，然后AGI会去执行它们的意愿。

I think a picture like that, I find very appealing. You could have multiple aid, you would have an AGI for a city, for a country and it would be, it would be trying to in effect take the democratic process to the next level and the board can always fire the CEO, essentially press the reset button, say. Press the reset button.

我认为那样的画面非常吸引人。你可以有多个帮助，你可以为一个城市、一个国家拥有一个AGI，并且它将试图将民主进程提升到一个新的水平。董事会始终可以解雇CEO，本质上是按下重置按钮。按下重置按钮。

Re-randomize the parameters. Well, let me sort of, that's actually, okay, that's a beautiful vision, I think, as long as it's possible to press the reset button. Do you think it will always be possible to press the reset button?

重新随机参数。嗯，让我来说一下，实际上，好的，我认为这是一个非常美好的愿景，只要能按下重置按钮。你认为一直能按下重置按钮吗？

So I think that it's definitely really possible to build. So the question that I really understand from you is, will humans or, humans people have control over the AIS systems at the build?

我觉得建造它的可能性非常高。所以你真正想问的问题是，人类或人类将在建造过程中控制人工智能系统吗？

Yes. And my answer is it's definitely possible to build AIS systems which will want to be controlled by their humans. Wow, that's part of their, so it's not that they can't help but be controlled but that's the one of the objectives of their existence is to be controlled.

是的。我的答案是，绝对有可能建立 AIS 系统，那些系统会需要由它们的人类控制。哇，那是他们的一部分，所以并不是他们不能不被控制，而是他们存在的一个目标就是被控制。

In the same way that human parents generally want to help their children, they want their children to succeed. It's not a burden for them. They are excited to help the children to feed them and to dress them and to take care of them. And I believe with high conviction that the same will be possible for an AGI, it will be possible to program an AGI to design it in such a way that it will have a similar deep drive that it will be delighted to fulfill and the drive will be to help humans flourish.

就像人类父母通常希望帮助他们的孩子一样，他们希望孩子成功。这对他们来说不是负担。他们很高兴帮助孩子喂养、穿衣和照顾他们。我坚信，同样的事情也可能发生在一个AGI身上，可以将一个AGI程序设计得具有类似的深层动力，它会很高兴去实现这个深层动力，这个动力就是帮助人类茁壮成长。

But let me take a step back to that moment where you create the AGI system. I think this is a really crucial moment. And between that moment and the democratic board members with the AGI at the head, there has to be a relinquishing of power.

让我倒退一步，谈到你们创建AGI系统的那一刻。我认为这是非常关键的时刻。在那一刻和民主董事会成员掌握AGI之间，必须放弃权力。

So as George Washington, despite all the bad things he did, one of the big things he did is he relinquished power. He, first of all, didn't want to be president and even when he became president, he gave, he didn't keep just serving as most dictators do for indefinitely.

乔治·华盛顿虽然做了许多坏事，但他做的一个重要的好事就是放弃权力。首先，他并不想当总统，即使他成为总统后，他也没有像大多数独裁者那样一直掌权不放。

Paragraph 1:

大家好，我是一个 AI 语音助手。我可以帮助您完成各种任务，例如查找信息、设置提醒和播放音乐。只需说出您需要的指令，我就可以帮您实现。如果您需要我的帮助，请随时问我。

Do you see yourself being able to relinquish control over an AGI system given how much power you can have over the world at first financial, just make a lot of money, right? And then control by having possession of the AGI system. I find it trivial to do that. I find it trivial to relinquish this kind of, I mean, you know, the kind of scenario you are describing sounds terrifying to me. That's all. I would absolutely not want to be in that position.

你觉得你有能力放弃对AGI系统的控制吗？考虑到一开始金融方面的强大影响力和赚很多钱，然后通过拥有AGI系统来掌控。我觉得这很难。我觉得放弃这种控制让人感到不重要。你说的这种情况听起来很可怕。我绝对不想处于那样的位置。

Paragraph 2:

第二段：近年来，人工智能不断发展，其在各个领域的应用也日益广泛。人工智能技术已经被广泛应用于医疗保健、金融、交通、教育等各个领域，并取得了令人瞩目的成果。其中，自然语言处理技术使得机器能够识别、理解和处理自然语言，为智能客服、智能翻译等领域提供了便利。机器学习技术也为电商、互联网广告等行业带来了更高的效率和更好的用户体验。未来，随着技术的进一步成熟，人工智能将在更多领域发挥重要作用。

Do you think you represent the majority or the minority of people in the AGI community? Well, I mean, it's an open question and an important one. Our most people good is another way to ask it. So I don't know if most people are good, but I think that when it really counts, people can be better than we think. That's beautifully put. Yeah.

您认为自己代表AGI社区中的大多数人还是少数人？这是一个开放问题，也是一个重要问题。就好像我们是否是大多数好人一样。我不确定大多数人是否都是好人，但我认为，当真正需要时，人们可以比我们想象的更好。这说得很美。是的。

Paragraph 3:

第三段：该公司的经理们认为这项新政策将改善产品的质量，并提高公司的生产能力。他们表示，公司正在寻找新的方法来提高员工的工作效率，以确保生产过程的顺畅运行。该公司还将提供培训计划，帮助员工掌握新技能和技术，以适应不断变化的市场需求。他们相信，这项新政策将对公司和员工都产生积极影响。

Are there specific mechanisms you can think of of aligning AGI values to human values? Is that do you think about these problems of continued alignment as we develop the AGI systems? Yeah, definitely. In some sense, the kind of question which you are asking is, so if you have a to translate that question to today's terms, yes, it would be a question about how to get an RL agent that's optimizing a value function which itself is learned.

你是否能想到特定的机制，将AGI的价值观与人类价值观对齐？随着我们开发AGI系统，你是否认为这些持续对齐问题很重要？绝对是的。在某种意义上，你所问的问题是，如果要将这个问题转化为今天的术语，那么这将是一个关于如何得到一个优化学习价值函数的RL代理的问题。

Paragraph 4:

在第四段，作者提到说，中国正在成为一个越来越受欢迎的旅游目的地。中国丰富的文化遗产、美食和美景吸引了越来越多的外国游客。此外，中国也取得了一些重要的旅游成果，如世界遗产地、自然景区和度假村等。这些都为中国旅游业的发展提供了机遇，并在全球范围内推动着中国形象的提升。

And if you look at humans, humans are like that because the reward function, the value function of humans is not external. It is internal. It's right. And there are definite ideas of how to train a value function, basically an objective, you know, and as objective as possible perception system that will be trained separately to recognize, to internalize human judgments on different situations. And then that component would then be integrated as the base value function for some more more capable RL system. You could imagine a process like this. I'm not saying this is the process, I'm saying this is an example of the kind of thing you could do.

如果你看看人类，他们是这样的，因为人类的奖励功能、价值功能是内在的，正确的。而且，有明确的培训价值功能的想法，基本上是一个客观的，尽可能认知的系统，将被单独训练，以识别、内化人类对不同情况的判断。然后，这个组成部分将被集成为一些更有能力的强化学习系统的基础价值函数。你可以想象一个这样的过程。我不是说这是这样一个过程，我是说这是一个你可以做的事情的例子。

Paragraph 5:

第五段：在协调的过程中，这些翻译家需要对原文和目标语言非常敏感，以确保翻译的准确性和适当性。他们需要灵活地适应不同的口音、语速和文化背景，并以清晰、流畅的语言表达复杂的概念和思想。除了在口译方面的技能，他们还需要拥有出色的跨文化沟通能力和人际交往技能，以确保顺畅的沟通和协作。同时，耐心和专业精神也是翻译家必不可少的品质，能够面对挑战和压力，保持优质的翻译服务。

So on that topic of the objective functions of human existence, what do you think is the objective function that implicit in human existence? What's the meaning of life? Oh. I think the question is wrong in some way. I think that the question implies that the reason objective answer which is an external answer, you know, your manual life is X. I think what's going on is that we exist and that's amazing. And we should try to make the most of it and try to maximize our own value and enjoyment of our very short time while we do exist. It's funny because action does require an objective function.

关于人类存在的目标函数，你认为人类存在中所隐含的目标函数是什么？生命的意义是什么？哦，我觉得这个问题有些不对。我认为这个问题意味着一个外在的客观答案，你知道，你的手册生命是X。但我认为情况是我们存在，这本身就很了不起。我们应该尽可能利用这个时间，并最大化我们的价值和享受。有趣的是，行动确实需要一个目标函数。

Paragraph 6:

这就是为什么需要有一个强大的翻译软件，使得任何人都能够轻松地翻译各种语言。这些软件使用了不同的技术，包括机器翻译、神经网络和自然语言处理，以尽可能地准确地传达信息。当然，由于不同语言之间存在文化和语法差异，它们可能无法完全捕捉语言的细微差别。但是，随着技术的不断发展，翻译软件的精度也在不断提高。

It's definitely theirs in some form but it's difficult to make it explicit and maybe impossible to make it explicit. I guess is what you're getting at. And that's an interesting fact of an RL environment. Well, what I was making is slightly different point is that humans want things and their wants create the drives that cause them to, you know, our wants are our objective functions, our individual objective functions.

这肯定是属于他们的，但很难具体表述出来，甚至可能无法明确。我猜这就是你想表达的意思。而这是强化学习环境中的一个有趣的事实。我想说的是，人类有自己的欲望，这些欲望产生驱动力，使他们实现目标，你知道的，我们的欲望就是我们的目标函数，我们个人的目标函数。

Paragraph 7:

第7段：这些技术进步无疑是使我们的生活更加便捷和舒适，但也带来了许多与该技术有关的问题。例如，我们有可能会变得过度依赖这些技术，并忘记现实生活中的实践技能。此外，随着我们的个人数据和隐私变得越来越容易被收集和分享，数字技术也引发了隐私和安全的担忧。我们需要在享受技术带来的好处的同时，也要考虑如何最大程度地减少它们带来的负面影响。

We can later decide that we want to change, that what we wanted before is no longer good and we want something else. Yet, but they're so dynamic. There's got to be some underlying sort of Freud. There's things there's like sexual stuff. There's people think it's the fear of death and there's also the desire for knowledge and you know, all these kinds of things are procreation. They are sort of all the evolutionary arguments. It seems to be there might be some kind of fundamental objective function from from which everything else emerges. But it seems like it's very difficult to make it. I think that probably is an evolutionary objective function, which is to survive and procreate and make sure you make your children succeed. That would be my guess. But it doesn't give an answer to the question of what's the meaning of life.

我们之后可以决定是否要改变，之前想要的可能已经不再好了，我们想要别的东西。但是，它们是如此动态。一定有某种潜在的弗洛伊德理论。有一些事情涉及到性方面。有些人认为是对死亡的恐惧，还有对知识的渴望，还有繁殖等等。它们都是进化的论据。似乎可能存在一种基本的客观函数，从中可以推导出一切。但是，似乎很难确定它。我想可能是一个进化的客观函数，即生存、繁殖并确保孩子的成功。这只是我的猜测。但它并没有回答生命的意义是什么这个问题。

Paragraph 8:

第八段：在过去几周中，新冠肺炎疫情在全球蔓延。尽管中国看到了一些积极的进展，但其他国家的疫情仍然不断恶化。许多国家已经采取了紧急措施来应对疫情，包括封锁城市、关闭学校和取消大型活动。许多公司也已经开始实行远程办公，以确保员工的安全和健康。虽然当前的形势很严峻，但我们必须继续保持警惕，并采取必要的措施来遏制疫情。

I think you can see how humans are part of this big process. This ancient process. We are we are we exist on a small planet. And that's it. So given that we exist, try to make the most of it and try to enjoy more and suffer less as much as we can. Let me ask two silly questions about life. One, do you have regrets?

我觉得你能够看出人类是这个古老进程中的一部分。我们，我们存在于一个小行星上。就是这样。所以，既然我们存在了，就要尽可能地享受更多，遭受更少的痛苦。让我问两个傻傻的问题关于生命。第一，你有什么后悔吗？

Paragraph 9:

第9段：此外，AI还可以识别和学习人类行为模式，根据这些模式自动执行任务或提供咨询服务，这种自动化不仅会提高效率，还能减少人力成本。但是，需要注意的是，AI并不能完全取代人类，尤其是在需要人类判断和决策的领域，还需要人类的思维和判断力。因此，AI和人类应该共同合作，相互补充。

Moments that if you went back, you would do differently. And two, are there moments that you're especially proud of? I made you truly happy. So I can answer that. I can answer both questions. Of course, there are there's a huge number of choices and decisions that have made that with the benefit of hindsight. I wouldn't have made them. And I do experience some regret. But you know, I try to take solace in the knowledge that at the time I did the best they could.

有没有那种时刻让你回想起来，你会有不同的处理方式呢？另外，你是否有那些特别引以为傲的时刻呢？我让你真正地开心过。所以我可以回答这两个问题。当然，有很多选择和决定，如果有远见的话，我不会这样做的，我确实有些后悔。但你知道，我试着安慰自己，当时我尽力做到了最好。

Paragraph 10:

第10段：现在，人们比以往更加意识到了环境问题的重要性。很多人都在努力采取一些可持续的措施来减少他们的碳足迹和其他负面影响。政府和企业也在采取行动来实现可持续发展，这将使我们的未来更加可持续和环保。

And in terms of things that I'm proud of, there are very fortunate to have things I'm proud to have done things I'm proud of. And they made me happy from some for some time. But I don't think that that is the source of happiness.

关于我引以为豪的事情，我很幸运有一些我为之自豪的事情，它们使我感到快乐持续了一段时间。但我认为这不是幸福的源泉。

So your academic accomplishments are the papers. You're one of the most cited people in the world. All the breakthroughs I mentioned in computer vision and language and so on is what is the source of happiness and pride for you?

那么，你的学术成就就是那些论文吗？你是世界上被引用最多的人之一。我提到的计算机视觉和语言等所有突破都是你的幸福和骄傲来源吗？

I mean, all those things are a source of pride for sure. I'm very and grateful for having done all those things. And it was very fun to do them. But happiness comes, but you know, you can happiness.

我的意思是，所有这些事情肯定是一种自豪的来源。我对于做了所有这些事情感到非常的满足和感激。而且做这些事情也非常有趣。但幸福是来自于你内心的感觉，你可以寻找幸福。

Well, my current view is that happiness comes from our to a lot to a very large degree from the way we look at things. You know, can have a simple meal and be quite happy as a result or you can talk to someone and be happy as a result as well. Or conversely, you can have a meal and be disappointed that the meal wasn't a better meal. So I think a lot of happiness comes from that, but I'm not sure. I don't want to be too confident, IE. Being humble in the face of the answer, it seems to be also part of this whole happiness thing.

我目前的看法是，幸福在很大程度上来自于我们对事物的看法。你知道，我们可以吃一顿简单的饭菜，就感到很开心；或者和别人聊天，也会因此而快乐。相反地，我们也可能会去吃一顿饭，但会因为食物不够好而感到失望。因此，我认为很多幸福来自于这种心态，但我并不确定。我不想太自信，也就是说，在面对答案时保持谦卑，似乎也是这个幸福的核心。

Well, I don't think there's a better way to end it than meaning of life and discussions of happiness. So Ilya, thank you so much. You've given me a few incredible ideas. You've given the world many incredible ideas. I really appreciate it. And thanks for talking today. Yeah. Thanks for stopping by. I really enjoyed it.

嗯，我认为以探讨生命意义和幸福为结束的方式再好不过了。所以，Ilya，非常感谢你。你给了我一些不可思议的想法。你也给世界带来了很多不可思议的想法。我真的很感激。今天的交谈，谢谢你。是的，谢谢你造访。我非常享受。

Thanks for listening to this conversation with Ilya Satskavir. And thank you to our presenting sponsor, CashApp. Please consider supporting the podcast by downloading CashApp and using the code Lex Podcast. If you enjoyed this podcast, subscribe on YouTube, review it with five stars and Apple podcasts, support it on Patreon or simply connect with me on Twitter at Lex Friedman.

谢谢你收听本次与Ilya Satskavir的交谈。感谢我们的赞助商CashApp。请考虑下载CashApp并使用Lex Podcast代码支持我们的播客。如果你喜欢这个播客，就订阅我们的YouTube频道，给我们在Apple Podcast上的五星评价，或在Patreon上支持我们。或者只需在Twitter上关注我Lex Friedman即可。

And now let me leave you with some words from Alan Turing on Machine Learning. Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child? If this were then subjected to an appropriate course of education, one would obtain the adult brain.

现在，我要引用艾伦·图灵关于机器学习的一句话，留给你们：不要试图生产一个程序来模拟成年人的思维，为什么不试图生产一个能够模拟孩子思维的程序呢？如果对它进行适当的教育，我们就能得到成年人的大脑。

Thank you for listening and hope to see you next time.

谢谢您的倾听，希望下次再见。