The Neocloud Boom: State of AI Compute 2026 | Stephen Balaban

发布时间    来源
Episode 设置




摘要

Many people said GPU compute would become a commodity. The opposite happened — and a new category of "neoclouds" is now racing to build the physical backbone of the AI boom. Stephen Balaban, co-founder and CTO of Lambda, explains why the conventional wisdom was exactly wrong, why we're still massively underbuilding compute, and what it actually takes to stand up a gigawatt-scale AI factory: land, power, cooling, networking, and a financing stack most people have never heard of. We go deep on the physics of how energy becomes tokens, NVIDIA's real moat, why a 2023 GPU can lease for more today than the day it shipped, and Stephen's provocative vision of "neural software." Plus the wild Lambda origin story — from a facial recognition startup to a camera in a baseball cap to a near-billion-dollar cloud business. This is the state of AI compute in 2026, from inside one of the companies building it. Stephen Balaban LinkedIn - https://www.linkedin.com/in/sbalaban X/Twitter - https://x.com/stephenbalaban Lambda Website - https://lambda.ai X/Twitter - https://x.com/LambdaAPI Matt Turck (Managing Director) Blog - https://mattturck.com LinkedIn - https://www.linkedin.com/in/turck/ X/Twitter - https://x.com/mattturck FirstMark Website - https://firstmark.com X/Twitter - https://x.com/FirstMarkCap Listen on: Spotify - https://open.spotify.com/show/7yLATDSaFvgJG80ACcRJtq Apple - https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724 00:00 — Cold open 01:21 — Why GPU compute was never a commodity 02:45 — The H100 price index and what it gets wrong 04:02 — The real moat: technology or financing? 05:57 — Winner-take-all, or room for many neoclouds? 06:48 — Are we overbuilding or underbuilding AI compute? 09:26 — What if AI gets 10x more compute-efficient? 10:44 — The real bottleneck: land, power, and shell 11:38 — The backlash against data centers — and the misinformation 15:00 — Opening the hood: from photons to tokens 17:11 — Extracting more value from the same chip 19:26 — Frontier inference and distributed training, explained 23:26 — What actually drives compute cost 25:21 — Lambda's chip stack and the NVIDIA relationship 26:17 — A multi-silicon world? CUDA, CUDNN, and NVIDIA's real moat 28:59 — Networking, storage, and the one-click cluster 34:46 — Renting vs. owning, and full vertical integration 36:24 — How global is Lambda? Does location still matter? 38:44 — The financing stack: off-take agreements, SPVs, and credit 41:16 — Why a 2023 GPU leases for more today 42:36 — A futures market for compute? 43:54 — Origin story: facial recognition, Perceptio, and Apple 47:03 — The Lambda hat and Dream Scope 48:59 — The $60K bet that became a cloud business 52:00 — Holding the team together through the hard times 54:30 — Bringing on a new CEO; Stephen as CTO 57:33 — Matching xAI on high-velocity deployment 59:29 — "AI won't write software — it will become the software" 01:01:30 — Neural software vs. vibe coding 01:04:25 — Do agents change the compute layer? 01:06:14 — Self-assembling software inside Lambda 01:08:18 — Gigawatt-scale AI factories 01:08:57 — One person, one GPU 01:12:04 — Hot takes: overrated and underrated in AI

GPT-4正在为你翻译摘要中......

中英文字稿