I apologize for adding yet another deep-seek video to your video queue. During a trip to Tokyo last year, I was told that deep-seek was the real deal, a cracked team and perhaps the only ones of significance in China. Since then I have annoyed the guys on Transistor Radio, our podcast with Dylan Patel and Doug O'Loughlin into talking about it, though there was nothing much to be said.
In December 2024, Deep-Seek released their V3-based model, which had impressive efficiency, and a few people in AI were impressed. Then on January 22nd, 2025, Deep-Seek released their reasoning model R1, which works kind of like OpenAI's O1 and O3 models. It takes extra compute time to think up a better answer. R1's release kicked everything off. The next day, The New York Times published an article on it, but focused mostly on the earlier V3's training costs. Quote, The Chinese engineer said they needed only about $6 million in raw computing power to build their new system.
That is about 10 times less than the tech giant meta spent building its latest AI technology. The article seems to be saying that you don't need anywhere as many chips has previously thought for leading edge AI. A message that other large media and smaller influencers amplified. Then on Saturday 25th, CNBC published a 40-minute YouTube video that blew up. As of this writing, it has nearly 5 million views. I'm very jealous. All this led to a massive move down for tech and semiconductor markets on Monday losing a trillion dollars of market value. In video loss 17% or $600 billion, the biggest single day loss in history.
There was a massive online frenzy. CEOs and AI influencers alike posted soothing words. People were suddenly becoming experts on Javan's paradox and all that. I wasn't able to cover any of this as it happened because it happened during the New Year holiday in Taiwan, so I was spending time with family. Immaculate timing guys. So I missed that early rush, but I wrote up a few things anyway. Deepseek's leadership believes they have the way forward for Chinese AI, but does their approach scale and what does it say about the Chinese AI industry as a whole. For today's video, a few discombobulated thoughts on Deepseek.
I'm not going to try to explain in detail what is so special about these new deepseek models. Oh boy, there are so many of those videos out there. This is what I'll say. The deepseek models pull out all the tricks in the book, including several new tricks of their own, to create a model that uses significantly less compute to both train and inference. Some of these innovations involve stealing, quote unquote, data from larger models like GPT-4 and Claude via distillation.
There are issues of perceived fairness of such a technique, but I reckon it is ultimately unavoidable. Another innovation was using 8-bit floating point numbers over industry standard 32-bit floating point numbers. There's also an interesting thing done with a mixture of experts' structure. Ben Ben points out in his FAQ over at Stretecory that many such innovations can be traced to the limitations of NVIDIA's China-specific hardware. So in one way you can say that the terror case is playing out. We imposed restrictions on the Chinese and the Chinese innovated something new to evade those restrictions, like a bacteria growing resistant to a dose of antibiotics. It is pretty nifty how Deepseek assembled them all into one slim model, just look at open AI's model selector menu to see how hard that is, but I should point out that most of these tricks are known or implemented in earlier models by other companies.
For example, Mistral has been using a mixture of experts since at least December 2023. But there is one apparent insight about R1 that caught my eye. Trust me again, I'm no expert. But from what I can discern, Deepseek managed to endow R1-0 with chain of thought and reasoning behaviors without having to insert a human into the loop. This approach took balls. It's a simple idea which people have tried before only to end in failure. So Deepseek not only noticed that it works now for a variety of reasons, but also decided to throw big model resources behind it.
There's another thing. I'm intrigued by Deepseek founder and head the Anglin-Fung's explanation in an interview about how Deepseek's memory-saving multi-head latent architecture came about. After summarizing some mainstream evolutionary trends of the attention mechanism, the Deepseek employee behind MLA just thought they designed an alternative. However, turning the idea into reality was a lengthy process. We formed a team specifically for this and spent months getting it to work.
Imagine what it might take to spend months on such tasks based on a researcher's hunch. That takes a special team in my opinion, and the Ang confirms this. Deepseek's moat, as Liang says, is not its current innovations, but a team capable of generating innovations anew. Liang has an interest in team development. He wrote the preface to the Chinese language version of Gregory's Zuckerman's biography of Jim Symons and Renaissance technologies, the man who saw the market. His book preface marks his interest in building lasting successful teams, and in it he wrote, what kind of characteristics and opportunities made Symons a historical outlier? How do you build and manage an outstanding team that remains undefeated for more than 30 years? Readers can find answers in this book. And going to the second of Liang's two 36K R interviews, basically the only such interviews on the web, what most people have glommed on to were his quotes about the superiority of the open source approach compared to closed source approaches.
But most of the interview actually focused on how China can better surface innovation, as well as the latent strengths of the Chinese people. Liang is trying to rally other Chinese to make quote unquote hard innovations like they do and explains how his team does it. After reading the interview, I get the sense that Deepseek is a very non-Chinese company. Its founder is especially so. Such a contrarian structure lets it tap, supposedly, the very best of what Chinese talents can offer in leading edge software development. To give you a sense of where I am driving at here, I think I need to give some background on how top-level Chinese software companies like ByteDance and Pindoule doe work. China has plenty of quote unquote normal software companies, no different structurally from those in the United States. Want to make that clear?
But China's startup league leaders are distinct with fascinating organizations. Their dev teams are huge, sometimes dozens or even hundreds of people, and only roughly half of those team members are developers. The other half seems to be mostly quote operation staff, which, as best as I can tell, is a generic word for labor heavy tasks. Makes sense considering the low labor cost there. Like they rely on big quality assurance teams to test code before it goes out, which is generally not the case in the United States, or it seems like developers are expected to QA their own code. The way these big Chinese companies manage such big teams is with top-down management. So you have these meetings with 20 to 30 plus people where marching orders are basically broadcast to them. Staff like AI researchers are guided by OKRs and KPIs, written every quarter by their managers.
OKRs often cover the improvement of a model as well as its implementation through two to three features to be released that quarter. ByteDance has a big leaderboard that refreshes every so often, and when one of their doebowl models cannot keep up with the other top models in China, then the anxiety within that particular LLM team is palpable. Moreover information sharing with the outside world has been closed off. In ByteDance, Alibaba, and the other major Chinese AI startups nicknamed the Six Little AI Dragons of China, there's a freeze on publishing, same as in the United States. These Chinese software practices are obviously cherry-picked, but I'm trying to make a point about deep-seek. Small teams with little hierarchy or separation, flexibly going wherever their passion takes them, workers allowed to publish what they learn with their names attached to it, and the time, resources, and balls to go forth into the abyss to chase long-term insights.
Deep-seek really does seem to buck convention in China. In both 36KR interviews, Liang talks about his focus on hiring for ability. In his first interview, he talks about how he first hired his funds sales team and their struggles during their early development. For example, our top two salespeople were outsiders. One came from exporting German machinery, and the other wrote back-end code at a securities firm. When they entered this field, they had no experience, no resources, and no prior connections. In fact, our sales team achieved nothing in their first year, and it was only in the second year that they started to see some results. But our evaluation standards are quite different from those of most companies who don't have KPIs or so-called quotas. Imagine what patients it might take for a company in China, the US anywhere, to tolerate a whole year of non-performance.
Interestingly, Liang mentions that the Deep-Seek B2 team does not have a sea turtle, as in an overseas-trained Chinese. Noticable, since Chinese startups like to brag that their employees are elites with degrees from places like Harvard.
That being said, Deep-Seek does seem to like to hire from peaking and Qinghua universities, not exactly China's Dvrai University. They tend to shy away from hiring famous people.
An online interview with an unnamed Deep-Seek headhunter mentioned the company's preference not to hire people who have already been successful, saying, those who have been successful in the past have already achieved success. They carry the burden of not being allowed to fail. It's better to leave innovation to newcomers.
It does seem like Chinese tech companies are somewhat buying the pitch. Bydance, for example, has this new seed-edge team, a team stocked with highly-paid, cracked researchers empowered with over half of the company's marginal GPU compute.
The goal is to do research for the quote, long-term. Others are doing similarly. But I doubt these Chinese tech giants can entirely pivot to operating like how Deep-Seek does. It's not like what they're doing isn't working for them. Moreover, I question whether Deep-Seek itself can maintain its old ways going forward.
Why? Well, first, Deep-Seek has now gone mainstream. All the worldwide publicity has pushed is chat app to the number one spot in many countries around the world. Something anthropic was never able to do for Claude. It would behoove Deep-Seek's management not to take advantage of this hype.
为什么呢?首先,Deep-Seek 现在已经成为主流。全球的宣传报道使其聊天应用在许多国家排名第一,而这是 Anthropic 从未能为 Claude 实现的。Deep-Seek 的管理层应该充分利用这一热潮。
In his appearance in the Lex Friedman podcast, Dylan Patel mentioned off-handedly that Deep-Seek is now trying to raise around. I think they're going to go head-to-head with OpenAI in the prosumer chatbot market. Such a pathway means scaling up, adding product managers, marketing sales, and doing all the little things OpenAI did to become a quote-unquote real company, and that inevitably will dilute their unique research culture.
Closing a gap always feels fast, as Deep-Seek enters uncharted territory, gains become harder to grasp. There's precedent for this. Just look at anthropic, a small team that rapidly caught up to OpenAI, but has since struggled to move past them. Here's another precedent. Deep-Seek's predecessor Hedgefun, Highflyer Boobed, thanks to an AI system first developed in 2015.
It worked well until 2021, when returns started to lag due to higher assets under management and more intense competition. It eventually caused Liang and management to apologize to customers through WeChat at the end of 2021. This unusual event was covered in the Wall Street Journal. A year later, in 2023, they found a Deep-Seek.
Third, there are talent issues. If I was running an AI team in China, I'd be giving each of those Deep-Seek employees offers that even their passions cannot refuse, and it is already happening.
In late December 2024, before all the hysteria, Chinese media reported that Xiaomi's founder Lei Jun personally recruited Luo Fuli, one of the genius girls behind Deep-Seek V2 with a $1.3 million salary to run their LLM team. Replacing such talents will be challenging.
But I suppose that's fine though in the end. There will be more. China graduates an estimated Croatia or Panama's worth of STEM graduates each year, 4-5 million. One thing Deep-Seek will do, I feel, is lift the stigma of a purely domestic hire. One key limiter for Deep-Seek's team and all others will be chips.
Lei Yang says so in his second 36K R interview. Money has never been the problem for us. Bands on shipments of advanced chips are the problem. Did anyone see that Deep-Seek's hosted models are running inference on Huawei Ascend 910C hardware?
It makes perfect sense that China's leading AI lab work with China's best hardware engineering company. The Deep-Seek kerfuffle will motivate the American government to further tighten the export of Nvidia chips. Even if all of Deep-Seek's Nvidia chips turn out to be kosher, it looks to me that they will get cut off, and Nvidia's China revenue will go to zero in the foreseeable future. Thus, Lei China's AI companies will have no choice but to turn to Huawei, whose Ascend accelerators are very good.
This in turn will strengthen the Huawei-Smick industrial complex ramping up their semiconductor proficiency. It has been said that Google held a hardware-derived cost advantage because of their TPU ecosystem. Imagine Chinese AI companies possessing similar TPU-like advantages over their American cohorts. It would be beneficial for the Americans to have access to the most flops per dollar they can possibly get.
So yeah, I don't think Deep-Seek's rather unchinies-like approach scales. They seem more like a one-of-one. But it doesn't have to. Another sentiment in the West is that the US is dominant in software, maybe unassailable so. This is wrong. China is one of the few countries with the world-competitive software industry. They are far better at it than Americans think. And specifically within AI, the Chinese are world-beaters in face recognition, machine vision and other AI fields. We bring over bunches of Chinese to the US to write our code. Why should we think that they can't catch up and fast? With regards to LLM development, the Chinese see themselves in a war situation. It's like their EV industry during the mid to late 2010s with a multitude of teams irresponsibly spending their way to the top. I think the American AI labs have too much confidence in their massive capital expenditure numbers, their perceived lead in productization, and their coziness with government. I urge them to push harder. Deepseek was the first Chinese lab to come out of left field, but it won't be the last. Alright everyone, that's it for tonight, thanks for watching. Subscribe to the channel, sign up for the newsletter, and I'll see you guys next time.