The Neocloud Boom: State of AI Compute 2026 | Stephen Balaban
发布时间 来源
Episode 设置
摘要
Many people said GPU compute would become a commodity. The opposite happened — and a new category of "neoclouds" is now racing to build the physical backbone of the AI boom. Stephen Balaban, co-founder and CTO of Lambda, explains why the conventional wisdom was exactly wrong, why we're still massively underbuilding compute, and what it actually takes to stand up a gigawatt-scale AI factory: land, power, cooling, networking, and a financing stack most people have never heard of. We go deep on the physics of how energy becomes tokens, NVIDIA's real moat, why a 2023 GPU can lease for more today than the day it shipped, and Stephen's provocative vision of "neural software." Plus the wild Lambda origin story — from a facial recognition startup to a camera in a baseball cap to a near-billion-dollar cloud business. This is the state of AI compute in 2026, from inside one of the companies building it.
Stephen Balaban
LinkedIn - https://www.linkedin.com/in/sbalaban
X/Twitter - https://x.com/stephenbalaban
Lambda
Website - https://lambda.ai
X/Twitter - https://x.com/LambdaAPI
Matt Turck (Managing Director)
Blog - https://mattturck.com
LinkedIn - https://www.linkedin.com/in/turck/
X/Twitter - https://x.com/mattturck
FirstMark
Website - https://firstmark.com
X/Twitter - https://x.com/FirstMarkCap
Listen on:
Spotify - https://open.spotify.com/show/7yLATDSaFvgJG80ACcRJtq
Apple - https://podcasts.apple.com/us/podcast/the-mad-podcast-with-matt-turck/id1686238724
00:00 — Cold open
01:21 — Why GPU compute was never a commodity
02:45 — The H100 price index and what it gets wrong
04:02 — The real moat: technology or financing?
05:57 — Winner-take-all, or room for many neoclouds?
06:48 — Are we overbuilding or underbuilding AI compute?
09:26 — What if AI gets 10x more compute-efficient?
10:44 — The real bottleneck: land, power, and shell
11:38 — The backlash against data centers — and the misinformation
15:00 — Opening the hood: from photons to tokens
17:11 — Extracting more value from the same chip
19:26 — Frontier inference and distributed training, explained
23:26 — What actually drives compute cost
25:21 — Lambda's chip stack and the NVIDIA relationship
26:17 — A multi-silicon world? CUDA, CUDNN, and NVIDIA's real moat
28:59 — Networking, storage, and the one-click cluster
34:46 — Renting vs. owning, and full vertical integration
36:24 — How global is Lambda? Does location still matter?
38:44 — The financing stack: off-take agreements, SPVs, and credit
41:16 — Why a 2023 GPU leases for more today
42:36 — A futures market for compute?
43:54 — Origin story: facial recognition, Perceptio, and Apple
47:03 — The Lambda hat and Dream Scope
48:59 — The $60K bet that became a cloud business
52:00 — Holding the team together through the hard times
54:30 — Bringing on a new CEO; Stephen as CTO
57:33 — Matching xAI on high-velocity deployment
59:29 — "AI won't write software — it will become the software"
01:01:30 — Neural software vs. vibe coding
01:04:25 — Do agents change the compute layer?
01:06:14 — Self-assembling software inside Lambda
01:08:18 — Gigawatt-scale AI factories
01:08:57 — One person, one GPU
01:12:04 — Hot takes: overrated and underrated in AI
GPT-4正在为你翻译摘要中......
