It was one of the worst if the Chinese company had billion dollar in this, but it says it has been in and of course the country's largest country. Hi, welcome to another episode of Cold Fusion. Look at these stock market charts from the 28th of January 2025. What you're looking at is a bloodbath, a bloodbath in the US stock market of over $1 trillion. And the course? The release of the DeepSeek R1 AI model from China. The Chinese model is as capable as the best US models, but it's free to use, open source, more efficient and most shocking of all, it reportedly costs less than 3% of chat GPT-01 to develop. Just two years ago on this channel, we were talking about an AI arms race between companies. Today that's evolved into an AI race between countries.
In the one corner, we have the United States. They have a long history of technological dominance. But then on the other side, we have China, a country with a very different ideology and motives. In this race to dominance, it's not about weapons, but it's about developing systems that are designed to think artificial intelligence. This race is reminiscent of the Cold War. Some have even dubbed these events as quote, the Sputnik moment of AI. The White House says that they're looking into quote, national security implications of China's DeepSeek AI platform.
And to top it all off, open AI has accused DeepSeek of stealing its IP to train their own model. It's all heating up. With the United States pouring in half a trillion dollars into the Stargate AI project, the global race is on. And this ongoing battle could be one of the biggest stories in tech this year. As artificial intelligence becomes a matter of national security, the technology would be forced to move faster than it is today. What a crazy time to be alive. But before we get ahead of ourselves, what is really going on here? How did a company from nowhere do all of this? Is this all just part of the AI hype cycle? Or is this the real deal? It seems like the whole world is playing catch-up since the release, so let's try and make sense of it all.
Historically, when technology meets national security threat from an ideological opponent, we get inventions like the computer and jet aircraft from the competition of World War II, but this time around, the United States was completely unchallenged in the field of AI for the most part. But that all changed on January 20, 2025, with the release of R1. DeepSeek R1, which is free, has performance reportedly on par with OpenAI's $200 a month model. And this is performance in the context of tasks such as language reasoning, mathematics, and coding. The free model also beats out anthropics Claude Sonnet and Google's Gemini. But what many people may not know is that DeepSeek does things a little bit differently to the current state of the art models. It's in part why it's so efficient, but we'll cover these details later in the episode.
Because there's no competition for that level of AI performance for free, users have been flocking to it, with DeepSeek becoming number one in Apple's App Store. But here are the stats of why people's jaws are dropping. The AI was built in two months, and reportedly cost less than 5.6 million to build. The AI company Anthropic says that 100 million to 1 billion is the general amount needed to develop an AI system from scratch. And to that end, meta plans to spend 65 billion on AI. So creating something that performs this well with just 5.6 million dollars is groundbreaking. But all may not be as it seems. More on that later. Much more.
由于没有其他免费的 AI 性能提供竞争,用户们纷纷涌向这款应用,使得 DeepSeek 成为苹果应用商店的第一名。以下是让人瞠目结舌的数据。这款 AI 在两个月内完成建造,成本 reportedly 低于 560 万美元。而据 AI 公司 Anthropic 所说,从零开始开发一个 AI 系统通常需要投入 1 亿到 10 亿美元。为了达到这一目标,Meta 计划在 AI 上耗费 650 亿美元。因此,仅用 560 万美元就创造出如此优秀的产品,实属突破。但事情可能并不像看起来那么简单。后续还有更多内容要揭晓。
I think there are two very important things that people need to know about what's happening with DeepSeek AI and the way it's being interpreted on Wall Street. The first is, it doesn't matter if it's a Chinese government scyop or not. The technological innovation of having an LLM train itself through reinforcement learning is impressive. The cost efficiency of doing inference with only 7 billion parameters rather than 700 billion parameters is impressive. The possibility of being able to do more model training and inferencing with less usage of power and less chips is impressive. It doesn't mean though that chip demand is at risk. What I think it means is you're more likely to see an acceleration of AI everywhere all over the economy. DeepSeek R1 being open source means that its code is freely available for whoever wants to use it and for whatever they want to use it for. Users can modify it as they please all for free. This is totally the opposite approach of open AI which is pretty ironic.
This is all horrific news for US AI companies because it means that suddenly their costs are all out of balance. DeepSeek with its 671 billion parameters can run locally on a stack of M4Mac pros. In contrast, investors and companies have poured billions of dollars into American AI servers. After the shock of this release, now it looks like US companies have been spending too much money using too much energy and charging too much for the services that they've been providing. Maybe in the future it's not going to be so much the models that would make the most money but the applications that run on top of them. Has this all been a massive mistake from US investors? No one knows for sure and that's why the markets are selling off.
这对美国的人工智能公司来说是个可怕的消息,因为这意味着他们的成本突然失去了平衡。名为DeepSeek的模型拥有6710亿个参数,可以在多台M4 Mac Pro上本地运行。相比之下,投资者和公司已经在美国的人工智能服务器上投入了数十亿美元。在这次发布的震惊之后,现在看来美国公司在使用过多能源和提供服务收费过高方面花费了太多钱。也许将来最赚钱的未必是模型本身,而是运行在它们之上的应用程序。这一切难道是美国投资者犯下的巨大错误吗?没有人能确定,这也是为什么市场正在抛售的原因。
One bright spot for US companies though is that users of AI systems may not feel comfortable in giving their data directly to China especially in corporate settings. In order to compete, Sam Altman, CEO of the chatGPT maker OpenAI, has announced that their GPT-30 mini model will now be given away for free. As for Mark Zuckerberg and Meta, they're internally panicking but it's not just the Americans. Over in China, the effect is the same. Other Chinese tech giants such as Bitedance, the maker of TikTok, Alibaba and Tencent have freaked out and had to cut the prices of their AI models to compete and despite the low price charged by DeepSeek, it remains profitable while its rivals lose money.
对美国公司来说,有一个积极的因素是,用户可能不愿意在企业环境中直接将数据提供给中国。为了竞争,ChatGPT的制造商OpenAI的首席执行官 Sam Altman 宣布,他们的GPT-30迷你模型将免费提供。而对于马克·扎克伯格和Meta,他们在内部感到恐慌,但这种情况不仅限于美国。在中国,情况也是一样的。其他中国科技巨头,如抖音的制造商字节跳动、阿里巴巴和腾讯,也感到压力,不得不降低其AI模型的价格以保持竞争力。尽管DeepSeek的收费较低,但它依然盈利,而它的竞争对手则在亏损。
Interestingly, OpenAI told the financial times that they have evidence that DeepSeek was using the output from chatGPT to train its own model. In fact, last year, they blocked OpenAI API accounts that they believe belonged to DeepSeek, suspecting theft. The US government's official stance is that it is possible that IP theft has occurred. It should also be noted that it seems like Chinese AI developers are still managing to get their hands on top of the line in videographics cards despite US sanctions. But that begs the question, who are DeepSeek and how did DeepSeek seemingly overnight build this thing?
有趣的是,OpenAI 告诉《金融时报》,他们有证据表明 DeepSeek 正在利用 ChatGPT 的输出内容来训练自己的模型。实际上,去年 OpenAI 封锁了一些他们认为属于 DeepSeek 的 API 账户,怀疑这些账户涉及盗窃行为。美国政府的官方立场是,知识产权盗窃有可能发生。还要注意的是,尽管有美国的制裁措施,中国的 AI 开发者似乎仍能获得顶级的视频显卡。但这就引发了一个问题,DeepSeek 究竟是谁,他们是如何在短时间内建造出这个东西的呢?
For a company responsible for one of the biggest red days in the US stock market, not a lot is known about the founder and the team behind DeepSeek, but the story is interesting so far. DeepSeek founder Liang Wenfang isn't from the typical tech world. He actually has a background in finance and co-founded a hedge fund called High Flyer. His company used AI to predict market trends and help make investment decisions. And he was very successful at that and his fund now manages 8 billion. But after his initial success, he wanted more. His next goal was to build, quote, human level AI. In 2021, he started buying thousands of NVIDIA GPUs as part of his quote, AI side project. This was right before the Biden administration began limiting US exports of AI hardware to China. Liang eventually spun off his AI side project into another company and that company was DeepSeek and the R1 is their latest model.
But honestly, the more I've been reading up on the Liang story, the more interesting it gets. So let me know in the comments section if you want to see a dedicated episode on the DeepSeek founder. So DeepSeek R1 was trained with reinforcement learning. That means there weren't any humans who helped it learn. And the method that DeepSeek uses for their model architecture is different to most of the other players. It's a technique called mixture of experts. Sky News explains it well. Quote, where open AI is latest model, GPT-40, attempts to be Einstein, Shakespeare and Picasso rolled into one. DeepSeek's is more like a university broken up into expert departments. This allows the AI to decide what kind of query it's being asked and then send it to a particular part of the digital brain to be dealt with. This lets the other parts to remain switched off, saving time, energy and most importantly, the need for computing power. The YouTube channel Computer File explains further.
So maybe you ask every specific maths question. What mixture of experts will do is have trained a specific part of this network, a much smaller part to solve that problem for you. And so you basically have the early stages will root the question to different parts of the network and then only activate a small part of it. Let's say 30 billion parameters, which is a huge, huge saving. So this sort of shaded area here will activate and then that will produce your answer. You can develop systems using agents like this where you have one that's trained to do this and one that's trained to do this and you just ask the right one. Suppose I want to train a network to write my emails for me. Maybe it's very good at that. I train a different network to solve a different problem and I just ask the right one as opposed to hoping that one model can do it. So that's much more efficiently.
To add to the efficiency is a process called distillation. Basically using larger models to train smaller models in targeted domains. The result is equivalent performance with significantly less computing power. And this was the big shock for AI developers and financial markets. Making chain of thought reasoning completely open and visible was an interesting choice. Open AI basically does the opposite. Daz is essentially write down a step by step process for solving the problem and slowly solve it and then write down the answer. You tend to get much better at solving problems that require multiple steps. If you want to just what is why is the sky blue it will just go to regurgitate that pretty easily from Texas learned on the internet.
But if you're asking like problem solving skills it's hard to do in one shot so you kind of take a little bit of time to just to just take you know to just work through it. Now open AI pioneered this chain of thought but they don't tell you how they do it because it's all closed. And so it's not open AI at all right in some sense. So essentially you see a kind of pracy summary version of the chain of thought but it's not their internal actual internal model which is essentially a trade secret. What R1 is doing is it's doing a chain of thought which is similar to O1 but it's fully public they've released all the models they've released all the code you can talk to it you can see the entire model log and they've also trained it with massively more limited data.
As mentioned earlier things may not be as they seem that cost figure of five point six million dollars to create the model may not be complete. In fact in a paper released by Deepseek themselves they mentioned that that five point six million dollar figure includes only the official training of Deepseek v3 and does not include costs of prior research experiments on architectures algorithms or data that does put a question mark on all the headlines we've been seeing that this thing was built for under six million dollars but whatever the real figure is it's likely to be much less than what US companies have been spending. In the latest news Deepseek has also dropped an open image model and at this rate a video model will probably soon follow and it might even rival open AI Sora or Google's anticipated VO2. In terms of search interest right now Deepseek now outpaces chat GPT and it became one of the most downloaded apps on the app store and then towards the end of January things absolutely blew up and went wild.
China during Chinese New Year went crazy. First Alibaba comes out with Quinn 2.5 max it's a very capable AI that could one-shot this code animation. Just asking a computer to code an animation and then it goes out and does it is so intuitive that I think kids of the future will believe that this is how coding always worked. Alibaba's Quinn 2.5 max outperforms Deepseek and even GPT40 in some tasks and then there's Kimi K1.5 released around the same day. It's also a great performer, is multi-modal and can browse the web in real time. Okay before you all rush out to sign up to Deepseek please be aware of something it collects data such as chat history any text or audio inputs uploaded files key stroke patterns basically anything you input into the model. Now open AI does similar things but the difference is that with Deepseek your data goes straight to servers in the people's Republic of China so I guess the question is do you want to be spied on by the US or do you want to be spied on by China?
I can't tell you what to do but that's just a heads up but in terms of privacy there is a bright side does mean that Deepseek can run locally on a machine without an internet connection for complete privacy. Here's the YouTube channel some ordinary gamers running it locally. Code things for you so for instance I can ask Deepseek like write me code for a simple login webpage so at this moment in time it'll think it'll be like all right the user is asking for code to create a simple login page so first it's going to structure HTML then it's going to style then it's going to validate and then here it is it's actually writing me the HTML code so we're sitting in a world where like I feel so scared for like junior coders these days because god damn AI is really coming for some of the jobs that people least expected to lose first so again it writes this actual login page and of course once it's done it'll actually also provide you a preview in this chat box software so you can see it for yourself before you actually throw it into you know production or testing or whatever so right here I'm just going to hit that preview button and boom there it is it actually and as I was making this video
Deepseek at the start of the week had to quote temporarily limit user registrations due to large scale malicious attacks this was also a warning to many as it seems like the program may not be as ready as it seems so what does Sam Altman think is only directly referenced the company once saying deepseeks r1 is an impressive model particularly around what they're able to deliver for the price we will obviously deliver much better models and it's also legit invigorating to have a new competitor we will pull up some releases we'll see what's around the corner for open AI but the joke is AI took chat chp t's job but in all seriousness I don't think that this is over I believe that this is just the beginning of major competition what we're seeing here is the technological version of Thucydides trap basically it states when arising power challenges and existing power conflict arises in an interview with waves republished in the China Academy back in mid 2024 Deepseeks founder Liang made his ambitions clear he said quote for years Chinese companies have been accustomed to leveraging technological innovations developed somewhere else and monetizing them through applications but this isn't sustainable this time our goal isn't quick profits but advancing the technological frontier to drive ecosystem growth why is silicon valley so innovative because they dare to try when chat gpt debuted China lacked confidence in frontier research from investors to major tech firms many felt the gap was too wide and focused instead on applications but innovation requires confidence and young people tend to have more of it end quote with such a mindset deepseek may force AI innovation forward and China could be at the forefront of the global AI race competitors around the world will be forced to reduce their costs and rethink how they're creating AI models efficiency will be the aim of the game we don't know how it will play out but we do know that we'll be having some rapid advancements in the coming years if we do remain positive we could see breakthroughs in medical science material science mathematics and even theoretical physics in the long term we could make products for cheaper make them longer lasting and produce them more efficiently but on the flip side
what about nefarious uses and bad actors geopolitically also what happens to all of the humans through this transition as AI rapidly improves that's for the future to decide and I have done a video on that topic years ago before AI blew up so you can check it out after this one but as usual in all of this let's just keep a close eye and see where this goes anyway that's about it from me and that is where we are with deepseek r1 how it works so efficiently and the absolute shock that it's caused around the world although a lot of people may find consumer AI annoying these days there's no getting around it it's here to stay and improving with each week
it's going to be an important part of everyday life soon but how does AI work anyway well now there's a fun and easy way to learn about that and many other stem subjects with today's sponsor brilliant brilliance course on artificial neural networks is perfect for that i've used it to brush up on some background context when i was making AI episodes each lesson on brilliant allows you to play with concepts a method proven to be six times more effective than watching lecture videos plus all content on brilliant is crafted by teachers researchers and professionals from mit cow tech duke microsoft google and more learn at your own pace to brush up on a project for work or just for your own self development and curiosity
to try everything brilliant has to offer for a full 30 days visit the url brilliant.org slash cold fusion to get started you'll also get 20% off an annual premium subscription thanks for watching my name is de gogo and you'll have been watching cold fusion and i'll catch you again soon for the next episode cheers guys have a good one thanks