The Mastermind Behind GPT-4 and the Future of AI | Ilya Sutskever | Eye on AI #118 - YouTube

发布时间 2023-03-14 16:00:00 来源

中英文字稿

I'm Craig Smith and this is I on AI. This week I talk to Ilya Suitskiver, a co-founder and chief scientist of OpenAI and one of the primary minds behind the large lightwage model GPT3 and its public progeny JAPGPT, which I don't think it's an exaggeration to say is changing the world.

我是克雷格·史密斯，这是我就人工智能的节目。本周我与OpenAI的联合创始人和首席科学家伊利亚·苏伊茨基弗进行了交谈，他是大型轻量级模型GPT3及其公共后代JAPGPT的主要创意人之一。我认为可以毫不夸张地说，这些模型正在改变世界。

This isn't the first time Ilya has changed the world. Jeff Hinton has said he was the main impetus for Alex Net, the convolutional neural network whose dramatic performance stunned the scientific community in 2012 and set off the deep learning revolution.

这并不是Ilya第一次改变世界了。Jeff Hinton曾经说过，Ilya是Alex Net的主要推动力，这是一种卷积神经网络，在2012年以惊人表现震惊了科学界，并开启了深度学习革命的先河。

As is often the case in these conversations they assume a lot of knowledge on the part of listeners primarily because I don't want to waste the limited time I have to speak to people like Ilya, explaining concepts for people or events that can easily be Googled or being I should say where the chat GPT can explain for you.

在这些对话中经常会出现这种情况，他们假定听众具有很多知识，主要是因为我不想浪费与像伊利亚这样的人交谈的有限时间，为人们解释可以轻松通过谷歌搜索到的概念或事件，或者说是聊天机器人可以为您解释的问题。

The conversation with Ilya follows a conversation with Jan Lecun in a previous episode so if you haven't listened to that episode I encourage you to do so. Meanwhile I hope you enjoy the conversation with Ilya as much as I do.

这次与Ilya的对话是在之前一集与Jan Lecun的对话之后进行的，因此如果你还没有听过那一集，我建议你先去听听。与此同时，我希望你能像我一样喜欢这次与Ilya的对话。

Ilya, it's terrific to meet you to talk to you I watch many of your talks online and when we're in red many of your papers. Can you start just by introducing yourself a little bit of your background? I know you were born in Russia where you were educated what got you interested in computer science if that was the initial impulse or brain science neuroscience or whatever it was and then I'll start asking questions.

Ilya，很高兴能与您见面并与您交谈。我在网上观看了您很多的演讲，也阅读了您的很多论文。能给我们介绍下您背景及教育背景吗？我知道您出生在俄罗斯，受到了那里的教育，是什么激发了您对计算机科学的兴趣，或者是大脑科学、神经科学等领域的初始冲动？后面我会开始提问的。

Yeah I can talk about that a little bit. So you are indeed I was born in Russia I grew up in Israel and then as a teenage I'm a family immigrated to Canada.

没问题，我可以稍微讲一下。所以你没错，我出生在俄罗斯，长大在以色列，然后在青少年时期，我的家人移民到了加拿大。

My parents say I was interested in AI from a pretty early age I also was very motivated by consciousness I was very disturbed by it and I was curious about things that could help me understand it better and AI seemed like a very like a good angle there. So I think these were some of the ways that got me started and I actually started working with Jeff Hinton very early when I was 17 we moved to Canada and I immediately was able to join the University of Toronto and I really wanted to do machine learning because that seemed like the most important aspect of artificial intelligence that at the time was completely inaccessible.

我的父母说我对人工智能很感兴趣，这种兴趣在我很小的时候就已经存在。我对人类的意识也很感兴趣，而我对此感到很不安，想寻找可以帮助我更好地理解的事物，人工智能似乎是一个很好的切入点。因此，我认为这些是一些开启我的方式，我在17岁时就开始与Jeff Hinton一起工作，我们搬到了加拿大，我立即加入了多伦多大学，我非常想做机器学习，因为那时候看来它是人工智能最重要的方面，但又是无法进入的领域。

Like to give some context the year was 2003 day we take it for granted that computers can learn but in 2003 we took it for granted that computers can't learn. The biggest achievement away I think then was deep blue the chess plane engine. But there it was like you have this game and you have this research and you have this simple way of determining if one position is better than another and it really did not feel like that could possibly be applicable to the real world because there is no learning and learning was this big mystery and so I was really really interested in learning and to my great luck Jeff Hinton was a professor in the university I was in and so I was able to find him and we began working together almost right away.

在2003年，我们认为计算机无法学习，尽管如今我们已经习以为常。当时最大的成就是象棋游戏引擎“深蓝”。但它的应用范围似乎仅限于游戏和研究，因为在现实世界中不存在学习机制，而学习一直是一个难以解决的谜题。因此，我对学习非常感兴趣，而我的幸运是我所在的大学有杰夫·辛顿教授，在找到他后我们几乎立刻开始了合作。

And was your impulse as it was for Jeff to understand how the brain worked or was it more that you were simply interested in the idea of machines learning. AI is so big and so the motivations were just as many like it is interesting but how does intelligence work at all. But right now we have quite a bit of an idea but it's a big neural net and we know how it works to some degree but back then all of the neural nets were around no one knew that neural nets are good for anything.

你的动机像Jeff一样是为了理解大脑的运作方式，还是只是对机器学习的想法很感兴趣？AI非常大，所以动机也有很多，例如觉得它很有趣，但是智能是如何工作的呢？但现在我们已经有了一些想法，这是一个大型的神经网络，我们在某种程度上了解它的工作原理，但是在那个时候，神经网络还不存在，没有人知道神经网络有什么用。

So how does intelligence work at all? How can we make computers be even slightly intelligent and I had a very explicit intention to make a very small but the real contribution to AI because there were lots of contributions to AI which weren't real which were but I could tell for various reasons that they weren't real that nothing would come out of it and I just thought nothing works at all. AI is a hopeless field so the motivation was could I understand how intelligence to work and also make a contribution towards it. So that was my initial early motivation.

那么智能是如何工作的？我们如何使计算机变得稍微聪明一点呢？我有一个明确的意图，即为AI做出一个非常小却实实在在的贡献，因为有许多对AI的贡献并不真实，而我可以因为各种原因看出它们并不真实，所以不会有任何实质性成果，我认为什么都不行。AI是一个没有希望的领域，因此我的动机是能否理解智能工作的方式并为之做出贡献。这是我的初期动机。

That's 2003 almost exactly 20 years ago. And then Alex and I've spoken to Jeff and he said that it was really your excitement about the breakthroughs and convolutional neural networks that led you to apply for the ImageNet competition and that the Alex had the coding skills to train the network. Can you talk just a little bit about that? I don't want to get bogged down in history but it's fascinating.

大概是在2003年，几乎也就是20年前的时候。之后亚历克斯和我跟杰夫聊过，他说是你对卷积神经网络的突破感到兴奋，才决定参加ImageNet竞赛，而亚历克斯则具备训练网络的编码技能。你可以简单谈一下这个过程吗？我不想陷入历史中，但这很有意思。

So in a nutshell I had the realization that if you train a large neural network on a large, sorry large and deep because back then the deep part was still new. If you train a large and a deep neural network on a big enough data set that specifies some complicated task that people do such as vision but also others and you just train that neural network then you will succeed necessarily and the logic for it was very irreducible where we know that the human brain can solve these tasks and can solve them quickly and the human brain is just a neural network with slow neurons.

简短来说，我认识到如果你用大型神经网络和深度学习（因为那时深度学习才刚兴起）在一个足够庞大和复杂的数据集上进行训练，可以实现复杂任务，如视觉等等。而且，只需一次训练，你就能成功，这样的逻辑本身不可简化，因为我们知道人的大脑可以迅速解决这些任务，而人的大脑是一个神经元较慢的神经网络。

So we know that some neural network can do it really well. So then you just need to take a smaller but related neural network and just train it on data and the best neural network inside the computer will be related to the neural network that we have that performs this task. So it was an argument that the neural network, the large and deep neural network can solve the task and furthermore we have the tools to train it that was the result of the technical work that was done in Jeff's lab.

我们知道有些神经网络能够完成这项任务，所以你只需要使用一种更小但相关的神经网络，将其训练在数据上，那么计算机中最优的神经网络就会与我们所拥有的可以完成此任务的神经网络有关。因此，大型深度神经网络能够解决任务的论点得以证明，并且我们拥有的技术工具可以训练它。这是在Jeff实验室进行的技术性工作的结果。

So you combine the two, we can train those neural networks. It needs to be big enough so that if you trained it, it would work well and you need the data which can specify the solution and with ImageNet all the ingredients were there. Alex had these very fast convolutional kernels, ImageNet had the large enough data and there was a real opportunity to do something totally unprecedented and it totally worked out. Yeah. That was supervised learning and the convolutional neural nets.

因此，我们将两者结合起来，就可以训练那些神经网络。它需要足够大，以便如果你训练它，它将工作良好，并且你需要能够指定解决方案的数据，而且ImageNet中所有的要素都在那里。Alex拥有这些非常快的卷积核，ImageNet拥有足够大的数据，而且有一个真正的机会做一些完全史无前例的东西，它完全成功了。是的，那就是监督学习和卷积神经网络。

In 2017 the attention as all you need paper came out introducing the self-attention and transformers. At what point did the GPT project start? Was there some intuition about transformers and self-supervised learning? Can you talk about that? So for context at OpenAI from the earliest days we were exploring the idea that predicting the next thing is all you need.

在2017年，一篇名为“自我关注和转换器：打造更好的语言模型”的论文引出了自我关注和转换器。那么GPT项目是在何时开始的呢？是否有对于转换器和自我监督学习的直觉？您能具体说明一下吗？所以，在OpenAI，我们从最早的时候开始就探索预测下一步是所有需要的想法。

We were exploring it with the much more limited neural networks of the time but the hope was that if you have a neural network that can predict the next word, the next pixel, really it's about compression, prediction is compression and predicting the next word is not, let's see let me think about the best bit to explain it because there are there were many things going on and they were all related. Maybe I'll take a different direction. We were indeed interested in trying to understand how far predicting the next word is going to go and whether it will solve unsupervised learning.

我们当时用的是非常有限的神经网络进行探索，但我们希望如果有一个可以预测下一个词或下一个像素的神经网络，那么这实际上是关于压缩的，预测就是压缩，预测下一个词并不是真正的压缩。让我思考一下如何最好地解释它，因为当时有很多事情正在发生，而且它们都有关联。也许我会走另一个方向。我们确实有兴趣了解预测下一个词会走多远，是否能解决无监督学习的问题。

So back before the GPT's unsupervised learning was considered to be the Holy Grail of Machine Learning. Now it's just been fully solved and not even talked about it but it was a Holy Grail. It was very mysterious and so we were exploring the idea. I was really excited about it that predicting the next word well enough is going to give you unsupervised learning if it will learn everything about the data set that's going to be great but our neural networks were not up for the task. We were using recurrent neural networks.

很早以前，在GPT之前，无人监督学习被认为是机器学习的圣杯。现在，这个问题已经得到了解决，但现在却很少有人提及它，尽管当时它是一个神秘莫测的东西。因此，我们正在探索这个想法。我非常兴奋，因为只要能够很好地预测下一个单词，就可以获得无人监督学习，从而学习关于数据集的一切。但我们的神经网络还没有达到这个任务。我们当时使用的是循环神经网络。

When the transformer came out it was literally as soon as the paper came out literally the next day it was clear to me to us that transformers are dressed to limitations of recurrent neural networks of learning long term dependencies. It's a technical thing but it was like vis-veech to transformers right away and so the very nascent GPT effort continued then and then like we did a transformer it started to work better and you make it bigger and then you realized you need to keep making it bigger and we did and that's what led to eventually GPT3 and essentially where we are today.

当变压器问世时，就在纸张问世的第二天，我们就发现了变压器几乎打破了循环神经网络学习长期依赖性的局限性。这是一个技术性的问题，但对于变压器来说，这就像显而易见一样，因此最初的GPT尝试得以继续，而后我们做了一个变压器，它开始变得更好，你让它变得更大，然后你意识到你需要不断地把它变得更大，我们也是这样做的，这最终导致了GPT3，并且实际上也是我们现在的水平。

Yeah and I just wanted to ask actually I'm getting caught up in this history but I'm so interested in it. I want to get to the problems or the shortcomings of large language models or large models generally but Rich Sutton had been writing about scaling and how that's all we need to do we don't need new algorithms we just need to scale. Did he have an influence on you or was that a parallel track of thinking?

我其实对这个历史很感兴趣，在其中陷入了。我想了解大语言模型或大型模型的问题或缺陷，但Rich Sutton一直在写关于规模化的文章，他认为我们只需要扩展规模，而不需要新算法。他对你有影响吗？还是他和你的思考方向类似？

No I would say that when he posted his article then we were very pleased to see some external people thinking in similar lines and if you thought it was very eloquently articulated but I actually think that's the bitter lesson as articulated over a state's case or at least I think the takeaway that people have taken from it over state's case.

我认为当他发布了他的文章时，我们非常高兴看到其他人也有类似的想法。如果你认为文章表达很 eloquent，但是我的看法是，这是一个很痛苦的教训，就像在一起的情况下所表达的一样，或者至少我认为人们从中得出的结论是在一起的情况下。

The takeaway that people have is doesn't matter what you do just scale but that's not exactly true. You got to scale something specific you got to have something that you'll be able to benefit from the scale.

人们的误解是无论你做什么只要扩大规模就可以，但这并不完全正确。你必须扩大某个具体的东西，你必须有能够从规模中获益的东西。

The great breakthrough of deep learning is that it provides us with the first ever way of productively using scale and getting something out of it in return. Before that what would people use large computer clusters for?

深度学习的伟大突破在于为我们提供了第一种可生产使用规模且可以获得回报的方式。在此之前，人们会用大型计算机集群来做什么？

I guess they would do it for weather simulations or physics simulations or something but that's about it maybe moviemaking but no one had any real need for computer clusters because what do you do with them? The fact that deep neural networks when you make them larger and you train them on more data work better provided us with the first thing that is interesting to scale but perhaps one day we will discover that there is some little twist on the thing that we scale that's going to be even better to scale.

我想他们可能会用计算机集群进行天气模拟或物理模拟，或者用于电影制作，但这大概就是它的全部用途了，因为你能用它们做什么呢？然而，当你让深度神经网络变得更大并让它们在更多数据上进行训练时，它们会表现得更好，这提供了一个有趣的可扩展性方向，或许有一天我们会发现，在可扩展性方面有一些微小的改变，那将会更好。

Now how big of a twist and then of course with the benefit of heights that you will say does it even count it's such a simple change but I think the true statement is that it matters what you scale. Right now we just found like a thing to scale that gives us something in return.

现在问题来了，这个转折有多大程度上是让你觉得这是重要的呢？当然，由于高度的优势，你可能会说这根本就不算什么，但我认为真正的问题在于，你放大什么东西是有影响的。现在我们找到了一样可扩大的东西，它会给我们带来回报。

The limitation of large language models to say exists is their knowledge is contained in the language that they're trained on and most human knowledge I think everyone agrees is non-linguistic I'm not sure no Chomsky agrees but there's a problem in large language models as I understand it their objective is to satisfy the statistical consistency of the prompt.

大型语言模型的限制在于它们的知识仅限于它们所训练的语言，而人类大部分的知识是非语言的。虽然我不确定是否有柯姆斯基教授不同意这个观点，但我们发现大型语言模型存在一个问题，那就是它们的目标是满足提示的统计一致性。

They don't have an underlying understanding of reality that language relates to I asked GBT about myself it recognized that I'm a journalist that I've worked at these various newspapers but it went on and on about awards that I've never won and put it all red beautifully but none of it connected to the underlying reality. Is there something that is being done to address that in your research going forward?

他们对语言所关联的现实没有根本的理解。我向GBT询问关于我自己的情况，它识别出我是一名记者，曾在各种报纸工作过，但是它继续谈论我从未获得过的奖项，并且非常明确地描述了它们，但它们并没有与根本的现实联系起来。在您未来的研究中，是否会采取措施解决这一问题？

Yeah so before I comment on the immediate question that you ask I want to comment about some of the earlier parts of the question. Sure. I think that it is very hard to talk about the limits or limitations rather of even something like a language model because two years ago people's confidently spoke about their limitations and they weren't entirely different.

好的，在我回答你提出的问题之前，我想先评论一下问题中早期部分的一些内容。确实。我认为，即使是像语言模型这样的东西的限制或局限性，也很难谈论，因为两年前人们对它们的限制感到自信，而它们并没有完全不同。

Right so it's important to keep this context in mind how confident are we that these limitations that we'll see today will still be with us two years from now. I am not that confident there is another comment I want to make about one part of the question which is that these models just learned statistical regularities and therefore they don't really know what the nature of the world is and I have a view that differs from this.

因此，记住这个背景是很重要的，我们有多大把握认为我们今天看到的这些限制还会持续两年。我并不是很有信心。还有一个我想强调的是，问题的一个部分是，这些模型只是学习了统计规律，因此它们并不真正知道世界的本质，而我有一个与此不同的观点。

In other words I think that learning the statistical regularities is a far bigger deal than meets the eye. The reason we don't initially think so is because we haven't at least most people those who haven't really spent a lot of time with neural networks which are on some level statistical.

换句话说，我认为学习统计规律比我们想象的更重要。之所以我们最初没有意识到这一点，是因为我们没有或者说大多数人没有花太多时间去了解神经网络，而神经网络在某种程度上就是统计学。

Like what's statistical model you just fit some parameters what is really happening. But I think there is a better interpretation to the earlier point of prediction as compression. Prediction is also statistical phenomenon yet to predict you eventually need to understand the true underlying process that produced the data to predict the data well to compress it well you need to understand more and more about the world that produced the data as our generative models become extraordinarily good.

就像你只是拟合了一些参数的统计模型一样，实际上所发生的事情是什么。但我认为对于早期预测与压缩的观点有一个更好的解释。预测也是一种统计现象，但要进行预测，你最终需要了解产生数据的真实基本过程，以便更好地预测数据并进行良好的压缩。随着我们的生成模型变得非常出色，你需要了解越来越多的关于产生数据的世界。

They will have I claim a shocking degree of understanding of the world and many of its subtleties but it's not just the world it is the world as seen through the lens of text. It tries to learn more and more about the world through a projection of the world on the space of text as expressed by human beings on the internet but still this text already expresses the world and I'll give you an example a recent example which I think is really telling and fascinating.

他们声称自己对世界及其微妙之处有着惊人的理解，但不仅仅是针对现实世界，更多地是从文本的角度看待世界。他们尝试通过网络上人类表达的文本投影来更深入地了解世界，但这些文本已经表达了世界。我将为你举个最近的例子，这个例子很有启示性和魅力。

So we've all heard of Sydney being alter ego and I've seen this really interesting interaction with Sydney over Sydney became combative and aggressive when the user told it that it thinks that Google is a better search engine than big. Now how can we like what is a good way to think about this phenomenon?

我们都听过悉尼这个人工智能的代称，我见过一个非常有趣的互动，当用户告诉它Google比big更好时，悉尼变得好斗和攻击性。现在，我们该如何对这种现象进行合理思考呢？

What's a good language? What's what does it mean? You can say well like it's just predicting what people would do and people would do this. It's just true but maybe we're now reaching a point where the language of psychology is starting to be appropriate to understand the behavior of these neural networks. Now let's talk about the limitations.

什么是一门好的语言？这是什么意思？你可以说这是预测人们会做什么，并且人们确实会这么做。这是真的，但也许我们现在达到了一个点，心理学的语言开始成为理解这些神经网络行为的合适工具。现在让我们谈谈它的限制。

It is indeed the case that these neural networks are they do have a tendency to hallucinate but that's because a language model is great for learning about the world but it is a little bit less great for producing good outcomes and there are various technical reasons for that which I could elaborate. On if you think it's useful but that is right now at like at this second I will skip that.

事实上，这些神经网络的确有产生幻觉的倾向，但这是因为语言模型非常适合学习世界知识，但不那么适合产生良好的结果，这有各种技术原因，如果您认为有用我可以详细解释，但现在我会略过。

There are technical reasons why a language model is much better at learning about the world, learning incredible representations of ideas of concepts of people of processes that exist but its outputs aren't quite as good as one would hope or as or other as good as they could be which is why for example for a system like chat GPT is a language model that has an additional reinforcement learning training process.

有技术原因导致语言模型在学习世界、理解观念概念、人和过程方面非常出色。但是它的输出结果并不如人们希望的那样好，或者说并不尽如人们所愿。为此，像聊天系统这样的系统，需要使用一种具有额外强化学习训练过程的语言模型，比如GPT。

Recall it reinforcement learning from human feedback but the thing to understand about that process is this. We can say that the pre-training process when you just train a language model you want to learn everything about the world then the reinforcement learning from human feedback. Now we care about the outputs. Now we say anytime the output is inappropriate don't do this again. Every time the output does not make sense don't do this again and it learns quickly to produce good outputs but now it's the level of the outputs which is not the case during pre-training during the language model training process.

这是关于从人类反馈中进行强化学习的回忆，但需要理解的是，这个过程中有一点很重要：在预训练过程中，你只是训练语言模型来了解世界的一切，而在从人类反馈中进行强化学习时，我们关心的是输出结果。每当输出不合适时，我们会说“不要再这么做了”。每当输出没有意义时，我们也会这样说，它很快就能学会产生良好的输出。但是，这个过程与预训练和语言模型训练过程不同，现在我们关心的是输出的水平。

Now on the point of hallucinations and it has a propensity of making stuff up, indeed it is true. Right now these neural networks even chat GPT makes things up from time to time and that's something that also greatly limits their usefulness but I'm quite hopeful that by simply improving this subsequent reinforcement learning from human feedback step we could just teach it to not hallucinate.

现在谈到幻觉时，它确实有制造虚假内容的倾向。目前这些神经网络，如GPT，有时会编造东西，这也严重限制了它们的用处。但我很有希望，仅通过改进后续的增强学习步骤，从人类反馈中学习到不造假，就可以教会它们。

Now you could say is it really going to learn? My answer is let's find out. And that feedback loop is coming from the public chat GPT interface that if it tells me that I want to pull it's here which unfortunately I didn't. I can tell it that it's wrong and will that train it or create some punishment or reward so that the next time I ask it it'll be more accurate.

你可能会问，这个程序真的能够学习吗？我的回答是让我们试试看。这个程序会通过公共聊天GPT接口来得到反馈，如果它给我的回答不正确的话，我可以告诉它它错了，这样它就会得到训练，或者通过奖励或惩罚来改进，保证下一次回答更加准确。

The way we do things today is that it would be higher people to teach our neural net to behave, to teach our GPT to behave. And right now the manner, the precise manner in which they specified the desired behavior is a little bit different but indeed what you described is the way in which teaching is going to like basically be that's the correct way to teach you. Just interact with it and it sees from your reaction it's in first oh that's not what you wanted.

我们现在做事的方式是让更高级的人教导我们的神经网络和GPT如何行为。目前，他们指定所需行为的方法略有不同，但确实您所描述的就是教学的方式。只需与其互动，它便能从您的反应中看出哪些是正确的，哪些不是，然后进行学习。

You are not happy with its output therefore the output was not good and it should do something different the next time. So in particular hallucinations come up as one of the bigger issues and we'll see but I think there is a quite a high chance that this approach will be able to address them completely.

您对输出结果不满意，因此输出不够好，下一次应该采取不同的方式。尤其是幻觉是其中的重要问题之一，我们将会看到，但我认为这种方法完全能够解决它们。

I wanted to talk to you about Yanla Kun's work on joint embedding predictive architectures and his idea that what's missing from large language models is this underlying world model that is non-linguistic that the language model can refer to.

我想与您谈谈Yanla Kun关于联合嵌入式预测架构的工作，以及他提出的大语言模型缺失的这个底层非语言的世界模型，而语言模型可以引用它的想法。

It's not something that's built but I wanted to hear what you thought of that and whether you've explored that at all. So I reviewed Yanla Kun's proposal and there are a number of ideas there and they're expressed in different language and there are some maybe small differences from the current paradigm but to my mind they are not very significant and I'd like to elaborate.

这不是一个具体建造出来的东西，但我想听听你对此的看法，是否有探讨过。所以我已经审查了Yanla Kun的提案，有许多想法和用不同的语言表达，也许和当前的范例有一些微小的差异，但在我看来它们不是非常重要的，我想进一步解释一下。

The first claim is that it is desirable for a system to have multi-model understanding where it doesn't just know about the world from text and my comment on that will be that indeed multi-model understanding is desirable because you learn more about the world.

第一个主张是系统应该具备多模式理解能力，不仅仅是从文字中学习世界的知识。我的看法是，确实有多模式理解能力是可取的，因为它可以让我们更多地了解这个世界。

You learn more about people you learn more about their condition and so the system will be able to understand what the task that it's supposed to solve and the people and what they want better. We have done quite a bit of work on that most notably in the formal through-mage neural nets that we've done. One is called clip and one is called dali and both of them move towards this multi-model direction but I also want to say that I don't see the situation as a binary either or that if you don't have vision if you don't understand the world visually or from video then things will not work and I'd like to make the case for that.

当你了解一个人的情况时，你就更了解这个人，因此系统就能更好地理解任务需要解决的问题、人们的需求和他们想要什么。我们在这方面已经做了相当多的工作，尤其是我们完成的正式的图像神经网络。其中一个称为Clip，另一个称为Dali，它们都朝着这个多模型方向发展。但我也想说，我不认为这是一个二元的情况，即如果你没有视觉或者无法从视频中理解世界，那么事情就不会工作，我要为这种情况辩护。

So I think that some things are much easier to learn from images and diagrams and so on but I claim that you can still learn them from text only just more slowly and I'll give you an example. Consider the notion of color.

我认为有些事情通过图片和图表等更易于学习，但我认为你仍然可以通过纯文本学习，只是学习速度可能会更慢。我将给你举个例子。考虑颜色的概念。

Surely one cannot learn the notion of color from text only and yet when you look at the embeddings when you make a small detour to explain the concept of an embedding.

毫无疑问，仅靠文字是无法理解颜色概念的，但当你稍微解释一下嵌入概念时，看看这些嵌入物就会让你明白。

Every neural network represents words sentences concepts through representations embeddings, high-dimensional vectors and one thing that we can do is that we can look at those high-dimensional vectors and we can look at what's similar to what, how does the network see this concept of that concept and so we can look at the embeddings of colors and embeddings of colors happen to be exactly right you know it like it knows that purple is more similar to blue than to red and it knows that purple is less similar to red than orange is it knows all those things just from text how can that be so if you have a vision the distinctions between color just jump at you you immediately perceive them whereas with text it takes you longer maybe you know how to talk and you already understand syntax and words and grammars and only much later you say all these colors actually start to understand them so this will be my point about the necessity of multi-modality which I claim it is not necessary but it is most definitely useful.

神经网络通过表示嵌入高维向量来表达单词、句子和概念，我们可以从这些高维向量中观察相似之处，了解网络如何看待某个概念。我们可以观察颜色的嵌入，发现神经网络已经识别到紫色更类似于蓝色而不是红色，橙色比紫色更接近红色。它从文本中读取这些信息，那么这是怎么实现的呢？如果您有视觉能力，您能立即看到颜色的区别，但是对于文本而言，需要更长的时间。我们必须先理解语法、词汇和语法，才能理解所有的颜色。这就是我认为多模式的必要性可能不是必要的，但是它绝对是有用的观点。

I think it's a good direction to pursue I just don't see it in such stark either or claims so the proposal in the paper makes a claim that one of the big challenges is predicting high-dimensional vectors which have uncertainty about them so for example predicting an image like the paper makes a very strong claim there that it's a major challenge and we need to use a particular approach to address that but one thing which I found surprising or at least unacknowledged in the paper is that the current auto-regressive transformers already have that property.

我认为这是一个很好的追求方向，但我不认为它是非黑即白的。论文中的提议声称其中一个巨大的挑战是预测高维向量，这些向量存在不确定性。举个例子，就像论文中强烈地声称预测一张图像是一个重大挑战，我们需要采用一种特定的方法来解决这个问题。但是我在论文中发现一个令人惊讶或者至少未被承认的问题是，目前的自回归转换器已经具备了这个特性。

I'll give you two examples one is given one page in a book predict the next page in a book there could be so many possible pages that follow it's a very complicated high-dimensional space and the ideal village is fine the same applies to images these auto-regressive transformers work perfectly on images for example like this open AI you've done work on the iGPT we just took a transformer and we applied it to pixels and it worked super well and it could generate images in the very complicated in subtle ways it had the very beautiful unsupervised representation learning with dali one same thing again you just generate think with as large pixels like rather than generic million pixels we cluster the pixels into large pixels in the generate thousand large pixels I believe Google's work on image generation from earlier this year called party I believe they'll also take a similar approach.

我将给你两个例子，一个是在一本书中给出一页，预测下一页会是什么，可能会有很多可能的页面跟随它，这是一个非常复杂的高维空间，理想的情况下是完美的，相同的原理也适用于图像，这些自回归变换器在图像上表现得非常出色，例如像这样的open AI中的iGPT，我们只需取一个变换器并将其应用于像素，它就能够以非常复杂和微妙的方式生成图像，它具有非常美丽的无监督学习表示，和dali中的同样的事情，我们只需要生成一个像素非常大的图像，而不是一般的百万像素，我们将像素聚集成大像素并生成一千个大像素。我相信谷歌在今年早些时候发布的名为“party”的图像生成工作也会采用类似的方法。

So the party where I thought that the paper made a strong comment around well the current approaches can't deal with predicting high-dimensional distributions I think they definitely can so maybe this is another point I would make and then what you're talking about converting pixels into vectors it's essentially turning everything into language the vector is like a string of text right the fine language though you turn it into a sequence yeah a sequence of what like you could argue that even for a human life is a sequence of bits now there are other things that people use right now like diffusion modes where they produce those bits rather than one bit at a time they produce them in parallel but I would argue that on some level this distinction is immaterial.

在我认为那篇文章有强烈观点的派对上，评论称现有方法无法预测高维分布的问题，但我认为它们肯定能够。另外，你提到将像素转换为向量，其实就是将一切都转化为语言，向量就像是一串文本，对吧。但是将其转化为序列，那是一种什么样的序列呢？你可以认为即使对于人类生命来说，也是一系列位的序列。现在人们使用的有其他方法，比如扩散模式，能够同时产生多位而不是逐个产生，但我认为在某种程度上这种区别是无关紧要的。

I claim that on some level it doesn't really matter it matters as in like you can get a 10x efficiency gain which is huge in practice but conceptually I claim he doesn't matter on this idea of having an army of human trainers that are working with chat gbt or a large language model to guide at it in effect with reinforcement learning it just intuitively that doesn't sound like an efficient way of teaching a model about the underlying reality of its language isn't there a way of automating that and to to Yen's credit I think that's what he's talking about is coming up with an algorithmic means of teaching a model the underlying reality without a human having to intervene yeah so I have two comments on that.

我认为在某种层面上并不重要，它的意义在于，比如说你可以获得10倍的效率提升，这在实践中是非常重要的，但是从概念上讲，我认为这个想法不重要，即通过人类训练师的军队与聊天GBT或大型语言模型合作，以加强学习的方式对其进行指导，这在直觉上似乎并不像是教授模型语言基本实质的有效方式，有没有一种自动化的方式来解决这个问题呢？对于Yen的贡献，我认为他正在尝试提出一种算法方式，使模型能够学习基本实质，而无需人为干预。所以我有两个意见。（注：GBT和RL分别是Gradient Boosted Tree和Reinforcement Learning的缩写）

I think so the first place so I have a different view on the question so I wouldn't agree with the phrasing of the question yeah I claim that our pre-trained models already know everything they need to know about the underlying reality they all already have this knowledge of language and also a great deal of knowledge about the processes that exist in the world that produce this language and maybe I should reiterate this point it's a small tangent but I think it's so important the thing that large generative models learn about their data and in this case large language models about text data are some compressed representations of the real world processes that produce this data which means not only people and something about their thoughts something about their feelings but also something about the condition that people are in and the interactions that exist between them the different situations a person can be all of these are part of that compressed process that is represented by the neural net to produce the text the better the language model the better the generative model the higher the fidelity the more the better this the better it captures this process so that's the first comment that we make and so in particular I will say the models already have the knowledge.

我认为我有不同的观点，因此我不同意问题的措辞。我声称我们的预训练模型已经知道关于基础现实的所有必要知识，它们已经拥有语言方面的知识以及关于产生这种语言的世界中存在的过程的许多知识。也许我应该重申这一点，这是一个小的侧面但我认为它非常重要，大型生成模型在学习其数据方面，尤其是在这种情况下，大型语言模型对文本数据的学习是关于产生这些数据的真实世界过程的一些压缩表示，这意味着不仅人们及其思想和情感的某些方面，还包括人们所处的环境和人与人之间的互动，以及一个人可能所处的不同情况。所有这些都是由神经网络所表示的压缩过程的一部分，以产生文本。语言模型和生成模型越好，信誉度越高，对这个过程的捕捉就越好。这是我们要提出的第一个观点。具体而言，我会说模型已经具备了这些知识。

Now the army of teachers as you phrase it indeed you know when you want to build the system that performs as well as possible you just say okay like if this thing works do more of that but of course those teachers are also using AI assistants those teachers aren't on their own they are working with our tools together they are very efficient it's like the tools are doing the majority of the work but you do need to have you need to have oversight you need to have people reviewing the behavior because you want to have it to eventually achieve a very high level of reliability but overall I'll say that we are at the same time this second step after we take the finished pre-trained model and then we apply the reinforcement learning on it there is indeed a lot of motivation to make it as efficient and as precise as possible so that the resulting language model will be as well behave as possible so yeah there is these human teachers who are teaching them a model with desired behavior they are also using AI assistants and the manner in which they use AI assistants is constantly increasing so their own efficiency keeps increasing so maybe this will be one way to answer this question yeah and so what you're saying is through this process eventually the model will become a more and more discerning more and more accurate in its outputs.

现在，你所说的“教师军队”的确存在。当你想要建立一个尽可能好的系统时，你只需说，好的，如果这个东西起作用，就多做这个。当然，那些教师也在使用人工智能助手。这些教师不是独自行动的，他们正在与我们的工具一起工作，非常高效。就好像工具正在做大部分工作，但你需要监督，你需要人们审查其行为，因为你希望最终实现非常高的可靠性。总体而言，我会说我们正在采取第二步，即将训练好的模型应用于强化学习，确实有很多动力使其尽可能高效和精确，以使得最终的语言模型行为尽可能好。所以，有这些人类教师教给它们一个所需的行为模型，他们也在不断增加地使用人工智能助手，因此他们自身的效率也在不断提高。也许这将是回答这个问题的一种方式。因此，你所说的是，通过这个过程，最终模型会变得更加有眼光、更加准确地输出。

Yes and it's that's right there is an analogy here which is it already knows all kinds of things and now just want to really say no this is not what we want don't do this here you made a mistake here in the output and of course it's exactly as you say with as much AI in the loop as possible so that the teachers who are providing the final correction to the system their work is amplified they are working as efficiently as possible so it's not unlike an education process how to act well in the world we need to do additional training just to make sure that the model knows that hallucination is not okay ever and then once it knows that now you are in business I think and it's that reinforcement learning human teacher loop that will teach a human teacher loop or some other variant but there is definitely an argument to be made that something here should work and we'll find out. pretty soon I'm sure that's one of the questions where is this going what research are you focused on right now I can't talk in detail about the specific research that I'm working on but I can mention a little bit I can mention some of the research in broad strokes and it would be something like I'm very interested in making those models more reliable more controllable make them learn faster from less data less instructions make them so that indeed they don't hallucinate and I think that all this cluster of questions which I mentioned they're all connected and there's also a question of how far in the future are we talking about in this question and what I commented here on is perhaps nearer future.

是的，确实有一个类比，就是模型已经知道了各种各样的东西，现在只是想要真正地说出“不，这不是我们要的，在这里不要这样做，你在输出的时候犯了一个错误”。当然，尽可能地加入更多的人工智能，以便为最终对系统进行校正的教师的工作提供支持，使他们的工作效率最大化，这与教育过程中如何在世界上表现良好并不相似。我们需要进行额外的培训，以确保模型知道幻觉绝对不是可以接受的。一旦它知道了这一点，你就可以继续进行了。我认为，加强学习和人类教师循环是教授人类教师循环或其他变体的方式，但肯定可以争论说应该有一些东西能够起作用，我们会很快找到答案的。我相信这是一个问题，我们将会发掘出它的未来走向是什么，你现在专注于哪些研究？我不能详细谈论我正在进行的具体研究，但我可以简单地提及一些在广泛的方面进行的研究，比如我非常感兴趣的是使这些模型更加可靠、可控，让它们从更少的数据、更少的指令中学习得更快，让它们不幻觉。我认为，我提到的所有这些问题都是互相关联的，同时还有一个问题，那就是我们在谈论多远的未来，我在这里提到的可能更接近未来。

You talk about the similarities between the brain and neural answers a very interesting observation that Jeff Hinton made to me for sure it's not new to other people but that large models or large language models in particular hold a tremendous amount of data with a modest number of parameters compared to the human brain which has trillions and trillions of parameters but a relatively small amount of data have you thought of it in those terms and can you talk about what's missing in large models to have more parameters to handle the data is that a hardware problem or a training problem this comment which you made is related to one of the problems that I mentioned in the earlier questions of learning from less data and indeed the current structure of the technology does like a lot of data especially early in training now later in training it becomes a bit less data hungry which is why at the end it can learn very not as fast as people yet but it can learn quite quickly so already that means that in some sense do we even care that we need all this data to get to this point but indeed more generally I think we'll be possible to learn more from less data I think it's just I think it requires some creative ideas but I think it is possible and I think learning more from less data will unlock a lot of different possibilities it will allow us to teach RRIs the skills that is missing and to convey to it our desires and preferences exactly how we wanted to behave more recently so I would say that faster learning is indeed very nice and although already after language model that train they can unlock quite quickly I think there is opportunities to do more there.

你谈到了大脑和神经系统之间的相似之处，这是 Jeff Hinton 向我提出的一个非常有趣的观察。尽管这对其他人来说并不新奇，但是相对于拥有数百万亿参数但是数据量相对较小的人脑来说，大型模型，特别是大型语言模型，拥有巨大的数据量和适度数量的参数。你是否考虑过这个问题？你认为大型模型缺乏处理数据所需的参数吗？这是硬件问题还是培训问题？你提出的这个评论与我之前提出的从少量数据中学习的问题有关。当前技术结构确实需要大量的数据，尤其是在早期的培训阶段。而在培训后期，数据需求量会减少一些，这就是为什么在最后它能够学习得不像人类那么快，但是学习速度还是相当快的原因。因此，我们是否需要所有这些数据来达到这一点，实际上是可以考虑的。而且，我认为总的来说，我们可以通过更少的数据获取更多的学习。这需要一些创造性的想法，但我认为是有可能的。从少量数据中学习会开启许多不同的可能性，它将让我们能够教RRIs缺失的技能，并准确地传达我们的愿望和偏好。因此，快速的学习确实非常好，虽然大型语言模型已经可以快速启用，但我认为还有更多的机会可以做出更多的成果。

Purdue make a comment that that we need faster processors to build to scale further and it appears that the scaling of models that there's no ends in sight but the power required to train these models were reaching the limit at least the socially accepted limit so I just want to make one comment which is I don't remember the exact comment that I made if you're referring to but you always want faster processors of course you always want more of them of course power keeps going up generally speaking the cost is going up and the question that I would ask is not whether the cost is large but whether the thing that we get out of paying this cost outweighs the cost maybe you pay all this cost and you get nothing then yeah that's not worth it but if you get something very useful something very valuable something you can solve a lot of problems that you have which we really want sold then the cost can be justified but in terms of the processors faster processors yeah any day are you involved at all in hardware question you work with cerebrists for example the wafer scale chips no all our hardware comes from Azure to be used they provide you yeah yeah

Purdue提出，我们需要更快的处理器来进一步扩展规模，而模型的扩展似乎没有止境，但训练这些模型所需的功率已经接近或至少达到了社会所接受的极限。我想发表一个评论，即我不记得我曾经发表过的具体评论，但当然你总是想要更快的处理器，当然你总是想要更多的处理器，而功耗通常会上升，成本也会上升。我想问的问题不是成本是否高，而是我们通过支付这个成本可以得到什么，这个得到的东西是否超过成本。如果你花了这么多成本却什么都没得到，那么这显然是不值得的，但如果你得到了非常有用，非常有价值，可以解决许多问题的东西，那么成本是可以被证明的。至于处理器方面，更快的处理器肯定更好。你是否参与硬件问题？例如与Cerebras合作生产硅片？不是的，我们的硬件全部来自Azure，并用于我们的工作。

you did talk at one point I saw about democracy and about the impact that's that AI can have on democracy people have talked to me about that if you had enough data and a large enough model you could train the model on the data and it could come up with an optimal solution that would satisfy everybody and do you have any aspiration or do you think about where this might lead in terms of helping humans manage society yeah let's see it's such a big question because it's a much more of a future looking question like I think that there is still many ways in which our models will become far more capable than they are right now there's no question in particular although we train them and use them and so on there's going to be a few changes here and there they might not be immediately obvious today but I think in hindsight it will be extremely obvious that will indeed allow it to have that ability to come up with solutions to problems of this kind it's unpredictable exactly how governments will use this technology as a source of getting advice of various kinds I think that to the question of democracy one thing which I think could happen in the future is that because you have these neural nets and they're going to be so pervasive and they're going to be so impactful in society we will find that it is desirable to have some kind of a democratic process where let's say the citizens of a country provide some information to the neural net about how they'd like things to be how they'd like it to behave or something along these lines I could imagine that happening that can be a very like a high band with form of democracy perhaps where you get a lot more information out of each citizen and you aggregate it to specify how exactly you want such systems to act now it opens a whole lot of questions but that's one thing that could happen in the future yeah and I can see in the democracy exactly you gave that that that individuals would have the opportunity to to input data

你曾谈论过民主以及人工智能对民主的影响，有人曾经对我谈论过，如果你拥有足够的数据和足够大的模型，你就可以对模型进行训练，从而得到一个满足所有人的最佳解决方案。你是否有任何愿望或者想法，关于这可能会如何帮助人类管理社会？这是一个非常大的问题，因为这是一个更具有未来意义的问题，我认为我们的模型将会比现在更加强大的方式还有很多，尽管我们正在对它们进行训练和使用等等，但还有一些细微的变化，它们可能今天不会立即显现出来。但我认为回顾历史时，它们将会非常明显，这确实会让它具备解决这种问题的能力。政府将如何利用这种技术作为获得各种建议的来源是不可预测的。我认为对于民主的问题，未来可能会发生的一件事是，由于你有这些神经网络，它们将会如此普及，对社会的影响将会如此之大，我们会发现有必要建立某种民主流程，比如说，国家的公民向神经网络提供一些关于他们想要事物的信息，他们希望它如何行动等等，我能够想象这种情况发生，这可能会成为一种非常高带宽的民主形式，其中你可以从每个公民身上得到更多的信息，并将其聚合到一起，以指定如何完全控制这些系统的行为。这就开启了一系列的问题，但这是未来可能会发生的一件事。我也能看到，在民主方面，个人将有机会输入数据。

but and this sort of goes to the world model question do you think AI systems will eventually be large enough that they can understand a situation and analyze all of the variables but you would need a model that does more than absorb language or a child with thing what does it mean to analyze all the variables eventually there will be a choice you need to make where you say this variable is similarly important I want to go deep because a person can read the book I can read a hundred books or I can read what book very slowly and carefully and get more out of it so there will be some element of that also I think it's probably fundamentally impossible to understand everything in some sense anytime there is any kind of complicated situation in society even in a company even in a midsize company it's already beyond the comprehension of any single individual and I think that if we build our AI systems the right way I think AI could be incredibly helpful in pretty much any situation that's it for this episode I want to thank Ilio for his time I also want to thank Ellie George for helping arrange the interview if you want to read a transcript of this conversation you can find one on our website i on AI that's EYEE hyphen o n dot AI we love to hear from listeners so feel free to email me at Craig C-R-A-I-G at EYEE hyphen o n dot AI I get a lot of emails so put listener in the subject line so I don't miss it we have listeners in 170 countries and territories to remember the singularity may not be near but AI is changing your world so pay attention

这段话涉及到世界模型的问题。你认为人工智能系统最终会足够强大，可以理解一个情境并分析所有变量吗？但是你需要一个不仅仅吸收语言或类似孩子一样的东西的模型，你知道分析所有变量意味着什么吗？最终你需要做出选择，你会说这个变量同样重要，我想深入研究，因为一个人可以读一本书，我可以读一百本书，或者我可以很缓慢、仔细地读一本书，取得更多的成果。因此，其中会有这样的元素。我认为在某种程度上，彻底理解所有事情可能是根本不可能的。每当有任何复杂情境，甚至是在一个公司，即使是一个中等规模的公司，它已经超出了任何单个个体的理解能力。我认为，如果我们正确地建立我们的人工智能系统，我认为人工智能在几乎任何情况下都可以发挥极大的帮助。本集节目到此结束，我要感谢Ilio抽出时间来接受采访，我还要感谢Ellie George帮忙安排采访。如果您想阅读这次对话的文字稿，您可以在我们的网站上找到，网址为i-on-AI.com。我们很乐意听取听众的意见，欢迎您给我发送电子邮件，我的邮箱是Craig@i-on-AI.com。我们拥有来自170个国家和地区的听众，所以请记住，智能单一之日可能并不那么近，但人工智能正在改变你的世界，请保持关注。