Why Hardware-Software Co-Design Is AI's Real 100x: Dylan Patel of SemiAnalysis

发布时间    来源
Episode 设置




摘要

Dylan Patel, founder of SemiAnalysis, argues the biggest gains in AI don't come from faster chips, they come from software-hardware co-design. Optimizing the model, the kernels, and the silicon together turns a 2x here and a 2x there into 100x. He explains why DeepSeek's experts were shaped for Nvidia's Hopper (and why TPUs struggle to run it), why OpenAI's sparser models and Anthropic's denser ones pull them toward different hardware, and why the so-called CUDA moat was never really about CUDA. Dylan breaks down InferenceX, his living benchmark that runs the latest models on over $50M of donated hardware daily, tracking a roughly 60x annual drop in cost per unit of quality. He makes the case that inference will be a bigger market than oil, that the compute crunch persists because models expand the value of useful work faster than compute grows, and why Jensen Huang is bankrolling neoclouds to engineer a multipolar world. Hosted by Shaun Maguire and Sonya Huang, Sequoia Capital 00:00 Introduction 01:58 Motel Kid Origins 03:11 Xbox Repair Spark 04:23 Internet Forums to Semis 06:42 From Quant to Founder 09:16 Homeless Research Roadtrip 14:04 InferenceX and Benchmarking 34:35 Sparse vs Dense Models 35:08 Interconnect Shapes Architecture 35:48 CUDA Moat Is Shifting 36:46 Ecosystems and Co-Design 38:46 Cerebras Speed and Limits 42:07 ROI Debates and Hot Takes 44:20 Ten Year Tech Bets 50:48 Compute Crunch and NeoClouds

GPT-4正在为你翻译摘要中......

中英文字稿