This video discusses the recent buzz surrounding DeepSeek, a Chinese AI company, and its R1 model. The speaker begins by noting the unusual level of attention DeepSeek has garnered, turning into a global news story and influencing market capitalization, attributing it to two key factors: DeepSeek being a Chinese company competing with the US and its open-source approach contrasting with the closed-source model of companies like OpenAI. These factors resonated with diverse groups, including those who support international competition and open-source initiatives.
The conversation delves into what aspects of this story hold truth and what needs debunking. It acknowledges the surprising fact that the second company to release a reasoning model comparable to OpenAI's 01 is a Chinese company. The speaker explains the difference between base language models (like GPT-4O or DeepSeek's V3) and the newer reasoning models, which use reinforcement learning and chain-of-thought to solve complex problems step-by-step. While OpenAI was the first to release a reasoning model, DeepSeek was the next, and notably, they open-sourced it, making it accessible at a significantly lower cost. This development has accelerated the perception of China's progress in AI, with estimates shifting from a 6-12 month lag to a 3-6 month gap.
The discussion then addresses the widely circulated claim that DeepSeek developed the R1 model for only $6 million. The speaker, aligned with views from Palmer Lucky and Brad Gerstner, argues that this figure is misleading and should be debunked. Even if the $6 million represents the cost of the final training run, it doesn't account for the broader research and development investment required. Comparing this to the "soup to nuts" or fully loaded costs of US AI companies is unfair. While validating the exact training cost is difficult, it's crucial to compare like with like.
Dylan Patel, a semiconductor analyst, estimates that DeepSeek possesses a substantial compute cluster of approximately 50,000 GPUs, including H100, H800, and H20 chips, potentially acquired through their founder's hedge fund activities. The cost of such a cluster would exceed a billion dollars, contradicting the "scrappy company" narrative. While acknowledging the difficulty in ascertaining accurate information due to vested interests, the speaker highlights the differences in DeepSeek's approach that are noteworthy.
The conversation emphasizes the innovative algorithms and methodologies employed by DeepSeek. They were forced to invent a totally new reinforcement learning algorithm, called GRPO, which uses a lot less computer memory and is highly performant. Rather than rely on CUDA, Nvidia's proprietary language, they worked around it using PTX to go straight to bare metal, which can be controlled. This inventiveness, likely driven by resource constraints, allowed them to develop solutions that Western companies, flush with capital, had not pursued. The video ponders whether readily available large funding rounds hinder the innovation that arises from necessity.
Friedberg contributes by suggesting this shift highlights new investment opportunities. He cites Balaji Srinivasan's comment about "the rapper" (the user) being the new moat in the value chain. If model performance keeps improving, the creation of value will be further up or down the stream. It brings up the point that while the companies creating the models may not get rich, the value will be found somewhere else in the value chain.