AI Won’t Plateau — if We Give It Time To Think | Noam Brown | TED
发布时间 2025-02-15 12:00:33 来源
这篇演讲认为,过去五年主要由“规模”驱动的人工智能发展(更大的模型、更多的数据和更长的训练时间),即将进入一个新阶段,通过结合现有的“系统一”思维方式,利用“系统二思维”来加速发展。
演讲者首先承认了当前的规模化范式,即在海量数据集上对越来越大的人工智能模型进行长时间的训练。这带来了持续的改进,但也带来了指数级增长的成本,从而引发了人们对潜在瓶颈的担忧。演讲者最初也有同样的担忧。然而,他们在博士期间专注于开发扑克人工智能的经历改变了他们的观点。
早期的扑克人工智能研究侧重于扩大现有范式的规模,从而产生了能够玩数万亿手牌的机器人。尽管拥有如此丰富的经验,但在与四位顶级扑克牌玩家的挑战赛中,该机器人却输得惨不忍睹。演讲者观察到,虽然人工智能做出决策的速度很快,但人类玩家会花不同的时间进行思考,并运用战略性思考。这促使他们考虑丹尼尔·卡尼曼的“系统一”(快速、直觉性思考)和“系统二”(缓慢、有条理的思考)的概念。
为了量化“系统二思维”对扑克人工智能性能的影响,进行了一系列实验。结果令人震惊:允许机器人在每次出牌前思考仅20秒,就能带来与将模型尺寸和训练规模扩大10万倍相同的性能提升。这一发现促使人们对扑克人工智能进行了根本性的重新设计,明确地结合了系统二思维。
2017年,一个经过重新设计的、能够进行系统一和系统二思维的扑克人工智能,再次挑战了四位顶级人类扑克职业选手,这次是一场12万手牌、奖金高达20万美元的比赛。这一次,人工智能以显著的优势获胜,令人工智能和扑克界都感到惊讶。赛前,即使在最初的胜利之后,赌注也普遍不利于人工智能,突显了这场胜利的出乎意料。到第八天,赌注的焦点已经从预测谁将获胜转移到预测哪个人类输得最少。
演讲者解释说,思考时间带来的好处并非扑克游戏独有,并举例说明了IBM的深蓝(国际象棋)和DeepMind的AlphaGo(围棋)。这些开创性的人工智能在每次移动前都花费了大量的思考时间。事实上,AlphaGo的创建者发现,思考时间极大地提高了它的性能,使其从可能输给人类玩家,转变为大幅击败他们。
系统二思维的好处不仅仅局限于游戏。一项2021年的研究表明,在游戏中将思考时间扩大10倍,大致相当于将模型大小和训练规模扩大相同的倍数。这突出了在传统规模化的基础上,关注审议所带来的潜在效率提升。
该论点的核心是,当前的人工智能范式严重依赖于“系统一训练”的昂贵规模化。相反,查询这些模型的“查询成本”(提出问题并获得答案)目前非常低。演讲者建议将重点转向扩大“系统二思维”的规模,接受略高的查询成本,以换取显著的性能提升。
演讲者强调,OpenAI已经发布了01,这是一个体现这一原则的语言模型。01的响应时间更长,每次查询的成本更高,但它受益于能够在回答问题之前“思考”更长时间,就像成功的游戏人工智能一样。这代表了扩展人工智能能力的一个新的、在很大程度上尚未开发的维度。
演讲结束时,演讲者挑战了人工智能仅仅是聊天机器人的看法。人工智能的潜力延伸到解决医学、能源和数学等各个领域的关键问题。演讲者询问观众是否愿意为一种新的癌症疗法或更高效的太阳能电池板付费,即使成本很高,并且底层技术需要时间来计算。核心思想是,如果它能够在高影响力领域取得突破,人们愿意接受更高的计算成本和延迟。
This speech argues that AI progress, which has been primarily driven by "scale" in the last five years (larger models, more data, and longer training times), is about to enter a new phase of accelerated advancement by leveraging "system two thinking" alongside the existing "system one" approach.
The speaker begins by acknowledging the current scaling paradigm, where ever-larger AI models are trained on vast datasets for extended periods. This has yielded consistent improvements, but with exponentially increasing costs, leading to concerns about a potential plateau. The speaker initially shared these concerns. However, their perspective shifted based on experiences during their PhD work focused on creating AI for playing poker.
Early poker AI research focused on scaling the existing paradigm, leading to bots that played trillions of hands. Despite this vast experience, a bot challenged to play against four top human poker players lost significantly. The speaker observed that while the AI made instantaneous decisions, the human players spent varying amounts of time thinking, employing strategic deliberation. This led them to consider Daniel Kahneman's concepts of "system one" (fast, intuitive thinking) and "system two" (slow, methodical thinking).
Experiments were conducted to quantify the impact of "system two thinking" on poker AI performance. The results were astounding: allowing the bot to think for just 20 seconds before making a move resulted in the same performance boost as scaling up the model size and training by a factor of 100,000. This realization prompted a fundamental redesign of the poker AI, explicitly incorporating system two thinking.
In 2017, a redesigned poker AI, capable of both system one and system two thinking, again challenged four top human poker professionals, this time in a 120,000-hand competition with a $200,000 prize. This time, the AI won by a significant margin, surprising both the AI and poker communities. The pre-competition betting odds were heavily against the AI, even after initial wins, highlighting the unexpected nature of the victory. By the eighth day, betting shifted from predicting the winner to predicting which human would lose by the least amount.
The speaker explains that the benefit of thinking time is not exclusive to poker, citing examples such as IBM's Deep Blue (chess) and DeepMind's AlphaGo (Go). Both these groundbreaking AIs employed significant deliberation time before making each move. In fact, AlphaGo's creators found that thinking time greatly boosted its performance, elevating it from potentially losing against a human player, to beating them by a large margin.
The benefits of system two thinking extend beyond games. A study in 2021 indicated that scaling up thinking time in games by a factor of 10 was roughly equivalent to scaling up model size and training by the same factor. This highlights the potential efficiency gains from focusing on deliberation alongside traditional scaling.
The core of the argument is that the current AI paradigm relies heavily on the costly scaling of "system one training." Conversely, the "cost of querying" these models (asking a question and receiving an answer) is currently very low. The speaker proposes shifting the focus to scaling up "system two thinking", accepting a slightly higher query cost in exchange for significant performance improvements.
The speaker highlights that OpenAI has released 01, a language model that embodies this principle. 01 takes longer to respond and costs more per query, but it benefits from being able to "think" longer before answering, much like the successful game-playing AIs. This represents a new and largely untapped dimension for scaling AI capabilities.
The speech concludes by challenging the perception that AI is simply about chatbots. The potential of AI extends to solving critical problems in various domains, such as medicine, energy, and mathematics. The speaker asks the audience if they would be willing to pay for a new cancer treatment or more efficient solar panels, even if the cost were substantial, and if the underlying technology takes time to compute. The central idea is that people are willing to accept increased computational cost and delay if it leads to a breakthrough in a high-impact field.