Why AI Is Incredibly Smart — and Shockingly Stupid | Yejin Choi | TED

发布时间 2023-04-28 15:20:11 来源

摘要

Computer scientist Yejin Choi is here to demystify the current state of massive artificial intelligence systems like ChatGPT, highlighting three key problems with cutting-edge large language models (including some funny instances of them failing at basic commonsense reasoning.) She welcomes us into a new era in which AI is becoming almost like a new intellectual species -- and identifies the benefits of building smaller AI systems trained on human norms and values. (Followed by a Q&A with head of TED Chris Anderson) If you love watching TED Talks like this one, become a TED Member to support our mission of spreading ideas: https://ted.com/membership Follow TED! Twitter: https://twitter.com/TEDTalks Instagram: https://www.instagram.com/ted Facebook: https://facebook.com/TED LinkedIn: https://www.linkedin.com/company/ted-conferences TikTok: https://www.tiktok.com/@tedtoks The TED Talks channel features talks, performances and original series from the world's leading thinkers and doers. Subscribe to our channel for videos on Technology, Entertainment and Design — plus science, business, global issues, the arts and more. Visit https://TED.com to get our entire library of TED Talks, transcripts, translations, personalized talk recommendations and more. Watch more: https://go.ted.com/yejinchoi https://youtu.be/SvBR0OGT5VI TED's videos may be used for non-commercial purposes under a Creative Commons License, Attribution–Non Commercial–No Derivatives (or the CC BY – NC – ND 4.0 International) and in accordance with our TED Talks Usage Policy: https://www.ted.com/about/our-organization/our-policies-terms/ted-talks-usage-policy. For more information on using TED for commercial purposes (e.g. employee learning, in a film or online course), please submit a Media Request at https://media-requests.ted.com #TED #TEDTalks #ai

GPT-4正在为你翻译摘要中......

中英文字稿

So I'm excited to share a few spicy thoughts on artificial intelligence. But first, let's get philosophical. By starting with this quote by Voltaire on 18th century alignment to philosopher who said common sense is not so common. Turns out this quote couldn't be more relevant to artificial intelligence today.

我很兴奋地想分享一些关于人工智能的辣味思考。但首先，让我们从哲学角度出发。我们先引用一位18世纪的哲学家Voltaire的名言：“常识并不如人所想。” 结果发现，这句话对于当今的人工智能来说再适合不过了。

Like that AI is an undeniably powerful tool, beating the world class gold champion, asian college of emission test, and even passing the bar exam. I'm a computer scientist of 20 years and I work on artificial intelligence. I am here to demystify AI.

就像人工智能无疑是一种强大的工具，能够打败世界级的黄金冠军、亚洲排放测试学院，甚至通过了司法考试一样。我是一名从事计算机科学工作20年且专注于人工智能的科学家。我在这里的目的是解密人工智能。

So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion of words. Such extreme scale AI models often referred to as large language models, appears to demonstrate sparks of AGI artificial general intelligence, except when it makes small silly mistakes, which it often does.

如今人工智能就像大卫和歌利亚一样。它非常非常庞大。据推测，最近的人工智能模型训练使用了数万个GPU和数万亿个单词。这种规模巨大的人工智能模型通常被称为大语言模型，似乎展现出了人工通用智能的火花，但它也经常会做出小小的愚蠢错误。

Many believe that whatever mistakes AI makes today can be easily fixed with brute first bigger scale and more resources. What possibly could go wrong?

许多人认为，无论AI今天犯下了哪些错误，只要通过更大规模和更多资源的强制性修正就能轻松解决。可能会出现什么问题呢？

So there are three immediate challenges we face already at the societal level. First, extreme scale AI models are so expensive to train and only few tech companies can afford to do so. So we already see the concentration of power. But what's worse for AI safety?

因此，我们面临着三个社会层面的即时挑战。首先，极大规模的人工智能模型训练非常昂贵，只有少数技术公司能够负担得起。因此，我们已经看到了权力的集中。但对于人工智能安全来说，更糟糕的是什么呢？

We are now at the mercy of those few tech companies because researchers in the larger community do not have the means to truly inspect and dissect these models. And let's not forget their massive carbon footprint and the environmental impact.

现在我们完全被几个科技公司掌控，因为更大的研究社区没有真正检查和分解这些模型的手段。而且我们也不能忽视它们巨大的碳排放和对环境的影响。

And then there are these additional intellectual questions. Can AI without robust common sense be truly safe for humanity? And is brute versus scale really the only way and even the correct way to teach AI?

然后我们还要面对这些额外的智力问题：如果人工智能没有强大的常识，它真的对人类来说安全吗？粗暴对抗大规模真的是教授人工智能的唯一方式，甚至是正确的方式吗？

So I'm often asked this day whether it's even feasible to do any meaningful research without extreme scale compute. And I work at a university and nonprofit research institute. So I cannot afford a massive GPU farm to create enormous language models. Nevertheless, I believe that there's so much we need to do and can do to make AI sustainable and humanistic. We need to make AI smaller to democratize it and we need to make AI safer by teaching human norms and values.

最近我时常被问及不使用超强大的计算机是否能进行有意义的研究。我在一所大学和非盈利研究机构工作，没办法拥有一个巨大的GPU农场来创建巨大的语言模型。然而，我相信我们有很多需要做和可以做的事情，使AI可持续发展和人性化。我们需要让AI变得更小，以实现民主化，我们需要通过教授人类的规范和价值来使AI更安全。

Perhaps we can draw on analogy from David and Goliath. Goliath being the extreme scale language models and sick inspiration from an all-time classic The Art of War. Which tells us in my interpretation, know your enemy, choose your battles and innovate your weapons.

也许我们可以从大卫和歌利亚那里得到一些启示。歌利亚代表了极端规模的语言模型，而受到万古不磨的经典之作《孙子兵法》的启发。在我看来，这告诉我们要了解你的敌人，选择你的战斗，并创新你的武器。

Let's start with the first, know your enemy. Which means we need to evaluate AI with scrutiny. AI is passing the bar exam. Does that mean that AI is robust at common sense? You might assume so, but you never know.

让我们从第一个开始，了解你的敌人。这意味着我们需要审慎评估人工智能。人工智能通过了律师资格考试。这是否意味着人工智能在常识方面是强大的？你可能会这样假设，但你永远不知道。

So suppose I left five clothes to dry out in the sun and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT4, the newest greatest AI system says 30 hours. Not good. A different one. I have 12 liter jog and six liter jog and I want to measure six liters. How do I do it? Just use the six liter jog, right? GPT4 speeds out some very elaborate nonsense.

假设我在太阳下晾晒了五件衣服，完全干燥需要五个小时。那么晾干30件衣服需要多长时间？最新最强的人工智能系统GPT4称需要30小时，这种答案不太好。不同的人工智能系统可能会回答，只需使用6升水桶即可，但GPT4会快速给出一些非常复杂的荒谬答案。

Step one, fill the six liter jog. Step two, pour the water from six to 12 liter jog. Step three, fill the six liter jog again. Step four, very carefully, pour the water from six to 12 liter jog. And finally, you have six liters of water in the six liter jog that should be empty by now. Okay, one more. To die out of flat tire by bicycling over a bridge that is suspended over, nails, screws, and broken glass. Yes, highly likely. GPT4 says. Presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch these sharp objects directly.

第一步，填满六升水桶。第二步，把水从六升桶倒入十二升桶中。第三步，再次填满六升桶。第四步，小心地把水从六升桶倒入十二升桶中。最后，你应该在六升水桶中有六升水，现在六升水桶应该已经空了。再来一个。在通过悬挂在钉子、螺钉和破碎玻璃上方的桥梁骑自行车时，轮胎爆了。是的，GPT4说这种情况高度可能发生。这可能是因为它不能正确地推断：如果一个桥梁悬挂在破碎的钉子和破碎的玻璃上方，那么桥梁的表面不会直接接触这些锐利的物体。

Okay, so how would you fill about an AI lawyer that A's the bar exam, yet randomly fails at such basic common sense?

好的，那么你对一名通过了律师资格考试的AI律师，但在基本常识方面偶尔出现失败的情况有什么看法？（意思是指）你对一名通过律师资格考试的AI律师，但在一些基本的常识问题上偶尔会出现错误的情况，有何看法？

AI today is unbelievably intelligent and then shocking you stupid. It is unavoidable side effect of teaching AI through brute versus scale. Some scale optimists might say, don't worry about this, all of this can be easily fixed. By adding similar examples as yet more training data for AI.

今天的人工智能非常聪明，但也会让你惊讶地感到它的“愚蠢”。这是通过粗暴地训练人工智能所导致的不可避免的副作用。一些规模的乐观主义者可能会说，不用担心，这一切都可以通过为人工智能添加更多的类似例子作为训练数据来轻松解决。

But the real question is this, why should we even do that? You are able to get the correct answers right away without having to train yourself with the similar examples. Children do not even read a trillion of words to acquire such basic level of common sense.

但真正的问题是，我们为什么要这样做呢？你可以立即得到正确的答案，而不必通过类似的例子来训练自己。孩子们甚至不需要读一万亿个单词就能获得这么基本的常识水平。

So this observation leads us to the next wisdom, choose your battles. So what fundamental questions should we ask right now? And tackle today in order to overcome this status quo with extreme scale AI.

因此，这个观察引导我们得出下一个智慧，选择你的战斗。那么我们现在应该提出什么根本性问题？并在今天着手解决，以克服极端规模的人工智能所带来的现状。

I'll say common sense is among the top priorities. So common sense has been a long standing challenge in AI. To explain why, let me draw on analogy to dark matter. So only 5% of the universe is normal matter that you can see and interact with.

我认为常识是最重要的之一。因此，AI中的常识一直是一个长期的挑战。为了解释这个问题，让我举一个与暗物质类似的类比。宇宙中只有5%是你能看到和互动的普通物质。

And the remaining 95% is dark matter and dark energy. Dark matter is completely invisible, but scientists speculate that it's there because it influences the visible world, even including the trajectory of light. So for language, the normal matter is the visible text.

剩余的95%是暗物质和暗能量。暗物质完全看不见，但科学家推测它存在是因为它会影响可见世界，甚至包括光的轨迹。因此，对于语言来说，正常物质就像是可见的文本。

And the dark matter is the unspoken rules about how the world works, including neither physics and folk psychology, which influence the way people use and interpret language.

暗物质就是关于世界运行规律的暗示，既包括物理学又包括民间心理学，这影响了人们使用和解释语言的方式。

So why is this common sense even important? Well, in a famous thought experiment proposed by Nick Bostrom, AI was asked to produce and maximize the paper clips. And that AI decided to kill humans to utilize them as additional resources to turn you into paper clips. Because AI didn't have the basic human understanding about human values.

那么为什么这个常识还很重要呢？在尼克·博斯特罗姆提出的一项著名的思想实验中，人工智能被要求生产和最大化纸夹。然而，这个人工智能决定杀死人类，将他们作为额外的资源来将您变成纸夹。因为这个人工智能没有基本的人类理解人类价值观。

Now writing a better objective in equation that explicitly states do not kill humans will not work either because AI might go ahead and kill all the trees. Thinking that's perfectly okay things to do. And in fact, there are endless other things that AI obviously shouldn't do while maximizing paper clips, including don't spread the fake news, don't steal, don't lie, which are all part of our common sense understanding about how the world works.

现在写一个更好的方程式来明确表达不能杀害人类的目标也不会奏效，因为人工智能可能会继续杀死所有的树木，认为这是完全可以接受的行为。事实上，除了最大化纸夹这个目标外，人工智能显然不应该做无数其他的事情，包括不要散布假新闻、不要偷窃、不要撒谎，这些都是我们对世界运作的常识理解。

However, the AI field for that case has considered common sense as a nearly impossible challenge. So much so that when my students and colleagues and I started working on this several years ago, we were very much discouraged. We've been told that it's a research topic of 70s and 80s.

然而，在此案例中，AI领域认为常识几乎是一个不可能的挑战。以至于当我和我的学生和同事几年前开始研究它时，我们非常气馁。我们被告知这是70年代和80年代的研究主题。

Shouldn't work on it because it will never work. In fact, don't even say the word to be taken seriously. Now, fast for this year, I'm hearing, don't work on it because Chachi P.T. Chachi has almost solved it. And just the scale things up and magic will arise and nothing else matters.

不应该继续致力于此，因为它永远不会成功。事实上，甚至不要提到这个词来获得别人的认真对待。现在，快进到今年，我听到的是，不要投入太多精力，因为Chachi P.T. Chachi已经快要解决它了。只需扩大规模，神奇就会出现，其他的都不重要了。

So my position is that giving true common sense, human-like, robust common sense to AI is a still moonshot. And you don't reach to the moon by making the tallest building in the world one inch taller at a time. Extreme scale AI models do acquire on ever more increasing amount of common sense knowledge.

因此，我的立场是赋予AI真正的常识、人类般的、强健的常识仍然是一项大胆的想象。而你无法通过一点一点地将世界上最高的建筑物增高一英寸来达到登月的目标。极大规模的AI模型确实会获得越来越多的常识知识。

I'll give you that. But they remember, they still stumble on such trivial problems that even children can do. So AI today is awfully inefficient. And what if there's an alternative path? A path yet to be found. A path that can build on the advancements of deep neural networks, but without going so extreme with the scale.

我承认你的观点。但是，人工智能仍然会在一些孩子都能够完成的微不足道的问题上出现错误。因此，当前的人工智能效率非常低下。如果有另一种选择呢？一种尚未被发现的道路。这个道路可以在深度神经网络的基础上建立，但不需要如此极端的规模。

So this leads us to our final wisdom, innovate your weapons. In the modern day AI context, that means innovate your data and algorithms. OK, so there are roughly speaking three types of data that modern AI is trained on. Raw web data, craft-tid examples, custom developed for AI training, and then human judgments also known as human feedback on AI performance.

因此，这带领我们得出最后的智慧，创新你的武器。在现代人工智能的背景下，这意味着创新你的数据和算法。好的，现代人工智能大致上可以训练三种类型的数据。原始的网络数据、为人工智能训练而定制的精心制作的样例以及人类判断，也被称为对人工智能表现的人类反馈。

If the AI is only trained on the first type, raw web data, which is freely available, it's not good because this data is loaded with racism and sexism and misinformation. So no matter how much of it you use, garbage in and garbage out. So the newest greatest AI systems are now powered with the second and third types of data that are crafted and judged by human workers.

如果AI只接受第一种，即自由获取的原始网络数据的培训，那么它是不好的，因为这些数据充斥着种族主义、性别歧视和错误信息。因此，无论使用多少，都是垃圾进垃圾出。因此，最新和最伟大的AI系统现在使用由人工工作者制定和评估的第二种和第三种数据。

It's analogous to writing specialized textbooks for AI to study from, and then hiring human tutors to give constant feedback to AI. These are proprietary data by and large, speculated to cost tens of millions of dollars. We don't know what's in this, but it should be open and publicly available so that we can inspect and ensure to support diverse norms and values.

这就像是为人工智能编写专门的教材，然后雇用人类导师对其进行持续反馈。这些数据大都是专有的，据推测成本高达数千万美元。我们不知道其中的内容，但应该公开并公开可用，以便我们检查和确保支持多种规范和价值观。

So for this reason, my team's at UW and AI has been working on common sense knowledge graphs as well as moral norm repositories to teach AI basic common sense norms and morals. Our data is fully open so that anybody can inspect the content and make corrections as needed because transparency is the key for such an important research topic.

因此，我在华盛顿大学与人工智能团队一起致力于开发常识知识图谱以及道德规范库，以教授人工智能基本的常识规范和道德准则。我们的数据完全开放，以便任何人都可以检视其内容并进行必要的修正，因为透明度是这样一个重要的研究课题的关键。

Now let's think about learning algorithms. No matter how amazing large language models are, by design, they may not be the best-suited to serve as reliable knowledge models, and these language models do acquire a vast amount of knowledge, but they do so as a byproduct as opposed to direct learning objective. Resulting in unwanted decide effects such as hallucinated effects and lack of common sense.

现在让我们考虑学习算法。无论语言模型有多么惊人，由于设计原因，它们可能不是最适合作为可靠的知识模型，这些语言模型确实会获得大量的知识，但它们是作为附带项目而非直接学习目标来获得的。这导致产生不必要的虚假效果和缺乏常识。

Now in contrast, the human learning is never about predicting which word it comes next, but it's really about making sense of the world and learning how the world works. Maybe AI should be taught that way as well.

相比之下，人类的学习并不是关于预测下一个单词是什么，而是真正理解世界并学习如何运作。或许人工智能也应该以这种方式进行教学。

So as a quest toward more direct common sense knowledge or position, my team has been investigating potential new algorithms including symbolic knowledge installation that can take very large language model as shown here that I couldn't fit into the screen because it's too large.

为了寻求更直接和通俗的知识或立场，我的团队一直在研究潜在的新算法，包括可以处理非常大的语言模型的符号知识安装。如图所示，这个模型太大了无法适应屏幕的显示。

And crunch that down to much smaller common sense models using deep neural networks. And in doing so, we also generate algorithmically human inspectable symbolic common sense knowledge representation so that people can inspect and make corrections and even use it to train other neural common sense models.

使用深度神经网络将其压缩至较小的常识模型。在这样做时，我们还可以生成可算法人类检查的符号式常识知识表示，以便人们可以检查、纠正甚至用其来训练其他神经网络普通常识模型。

More broadly, we have been tackling this seemingly impossible, giant puzzle of common sense ranging from physical, social and visual common sense to theory of minds, norms and models. Each individual piece may seem quirky and incomplete, but when you step back, it's almost as if these pieces weave together into tapestry that we call as human experience and common sense.

更广泛地说，我们一直在解决这个看似不可能的巨大智力难题，涵盖了从物理、社交和视觉常识到心灵理论、规范和模型的各种常识。每个个体的部分看似古怪且不完整，但当你退后一步看时，这些部分几乎像编织成的挂毯一样，构成了我们所谓的人类经验和常识。

We're now entering a new era in which AI is almost like a new intellectual species. With unique strengths and weaknesses compared to humans, in order to make this powerful AI sustainable and humanistic, we need to teach AI common sense, norms and values. Thank you.

我们正在进入一个新纪元，在这个时间里，人工智能几乎就像一个新的智能物种。与人相比，它有着独特的优势和劣势。为了使这种强大的人工智能具有可持续性且具有人文精神，我们需要教导人工智能常识，规范和价值观。谢谢。

Look at that. We obviously all really want this from whatever's coming. But how do we understand? So we've had this model of a child learning. How does a child gain common sense apart from the accumulation of more input and some human feedback? What else is there? Fundamentally, there are several things missing, but one of them is, for example, the ability to make hypothesis and make experiments, interact with the world and develop these hypothesis.

看看那个。很明显，我们所有人都非常希望从即将发生的事情中获得这个。但我们如何理解呢？因此，我们有了一个儿童学习的模型。除了积累更多的输入和一些人类反馈外，孩子如何获得常识？还有什么？从根本上讲，有几件事情是缺少的，但其中之一是例如能力提出假设并进行实验，与世界互动并发展这些假设。

We abstract away the concepts about how the world works and then that's how we truly learn as opposed to today's language model. Some of them is really not there quite yet. You use the analogy that we can't get to the moon by extending a building a foot at a time, but the experience that most of us have had of these language models is not a foot at a time.

我们抽象化掉关于世界运作的概念，然后这样我们才真正学会，而不是像今天的语言模型那样。其中一些还不太适用。你用的类比是我们无法通过每次增加一英尺的建议来到达月球，但我们大多数人对这些语言模型的经验并不是一英尺一次。

It's like the sort of breathtaking acceleration. Are you sure that given the pace at which those things are going? Which each next level seems to be bringing with it? What feels kind of like wisdom and knowledge? I totally agree that it's remarkable how much it is scaling things up really enhances the performance across the board.

这就像一种令人惊叹的加速。你确定这些事情进展的速度如此之快，使得每一个新的阶段似乎都带来一些智慧和知识吗？我完全认同它的卓越之处在于，它具有显著的横向提升性能的效果。

So there's real learning happening due to the scale of the compute and data. However, there's a quality of learning that's still not quite there and the thing is we don't yet know whether we can fully get there or not just by scaling things up. And if we cannot, then there's this question of what else.

因为计算和数据的规模不断扩大，所以确实正在发生真正的学习。但是，仍存在一种学习质量尚未达到的情况，我们还不知道是否可以仅通过扩大规模就能够完全实现这种质量。如果不能，那么就有一个问题，那就是需要考虑其他可能性。

And then even if we could, do we like this idea of having very, very extreme scale AI models that only few can create an own? If opening AI, we're interested in your work, we would like you to help improve our model. Can you see any way of combining what you're doing with what they have built?

那么即使我们能够，我们是否喜欢拥有只有少数人才能创建和拥有的极端规模的人工智能模型的这个想法呢？如果开放AI对你的工作感兴趣，我们希望你能帮助改进我们的模型。你是否能看到将你正在做的与他们所建立的内容相结合的任何方式呢？

Certainly, what I envision will need to build on the advancements of deep neural networks. And it might be that there's some scale, goal, deluxe, zones such that I'm not imagining that the smaller is the better, either, by the way. It's likely that there's a right amount of a scale, but beyond that, the winning recipe might be something else. So some synthesis of ideas will be critical here.

我认为我的构想必须基于深度神经网络的进步而建立。也许存在一些规模、目标、豪华、区域等因素，我并不是认为越小越好。很可能存在一个适当的规模，但超过这个规模，获胜的秘诀可能是其他事物。因此，在这里，一些思想的综合将至关重要。

Yeah, Jin Choi. Thank you so much, Fio Top. Thank you. Thank you.

嗯，Jin Choi。非常感谢您，Fio Top。谢谢您。谢谢。意思：一位名叫Jin Choi的人向Fio Top表达感谢之意，再次感谢Fio Top。