This speech argues that AI progress, which has been primarily driven by "scale" in the last five years (larger models, more data, and longer training times), is about to enter a new phase of accelerated advancement by leveraging "system two thinking" alongside the existing "system one" approach.
The speaker begins by acknowledging the current scaling paradigm, where ever-larger AI models are trained on vast datasets for extended periods. This has yielded consistent improvements, but with exponentially increasing costs, leading to concerns about a potential plateau. The speaker initially shared these concerns. However, their perspective shifted based on experiences during their PhD work focused on creating AI for playing poker.
Early poker AI research focused on scaling the existing paradigm, leading to bots that played trillions of hands. Despite this vast experience, a bot challenged to play against four top human poker players lost significantly. The speaker observed that while the AI made instantaneous decisions, the human players spent varying amounts of time thinking, employing strategic deliberation. This led them to consider Daniel Kahneman's concepts of "system one" (fast, intuitive thinking) and "system two" (slow, methodical thinking).
Experiments were conducted to quantify the impact of "system two thinking" on poker AI performance. The results were astounding: allowing the bot to think for just 20 seconds before making a move resulted in the same performance boost as scaling up the model size and training by a factor of 100,000. This realization prompted a fundamental redesign of the poker AI, explicitly incorporating system two thinking.
In 2017, a redesigned poker AI, capable of both system one and system two thinking, again challenged four top human poker professionals, this time in a 120,000-hand competition with a $200,000 prize. This time, the AI won by a significant margin, surprising both the AI and poker communities. The pre-competition betting odds were heavily against the AI, even after initial wins, highlighting the unexpected nature of the victory. By the eighth day, betting shifted from predicting the winner to predicting which human would lose by the least amount.
The speaker explains that the benefit of thinking time is not exclusive to poker, citing examples such as IBM's Deep Blue (chess) and DeepMind's AlphaGo (Go). Both these groundbreaking AIs employed significant deliberation time before making each move. In fact, AlphaGo's creators found that thinking time greatly boosted its performance, elevating it from potentially losing against a human player, to beating them by a large margin.
The benefits of system two thinking extend beyond games. A study in 2021 indicated that scaling up thinking time in games by a factor of 10 was roughly equivalent to scaling up model size and training by the same factor. This highlights the potential efficiency gains from focusing on deliberation alongside traditional scaling.
The core of the argument is that the current AI paradigm relies heavily on the costly scaling of "system one training." Conversely, the "cost of querying" these models (asking a question and receiving an answer) is currently very low. The speaker proposes shifting the focus to scaling up "system two thinking", accepting a slightly higher query cost in exchange for significant performance improvements.
The speaker highlights that OpenAI has released 01, a language model that embodies this principle. 01 takes longer to respond and costs more per query, but it benefits from being able to "think" longer before answering, much like the successful game-playing AIs. This represents a new and largely untapped dimension for scaling AI capabilities.
The speech concludes by challenging the perception that AI is simply about chatbots. The potential of AI extends to solving critical problems in various domains, such as medicine, energy, and mathematics. The speaker asks the audience if they would be willing to pay for a new cancer treatment or more efficient solar panels, even if the cost were substantial, and if the underlying technology takes time to compute. The central idea is that people are willing to accept increased computational cost and delay if it leads to a breakthrough in a high-impact field.