Sequoia Capital - Andrej Karpathy: From Vibe Coding to Agentic Engineering

发布时间：2026-04-29 15:21:18 原节目

以下是这段内容的中文翻译： AI领域的领军人物Andrej Karpathy分享了他最近一个令人震惊的领悟：他“从未感觉自己作为一名程序员如此落伍”。这种感受大约在2023年12月出现，标志着AI能力的一次根本性转变。他观察到大语言模型（LLM）工具已经从需要频繁纠正，发展到能够稳定地生成经过微调的代码片段，从而实现了“随性编程”（vibe coding）——一种流畅、直观的开发过程，这让他开始着手一个“无尽的支线项目”列表。Karpathy强调，AI的演进不仅仅是渐进式的；它是一种根本性的变革，需要一种全新的视角。 Karpathy认为，LLM正在开启“软件3.0”时代，这彻底颠覆了以往的范式。软件1.0是基于显式规则的编码，软件2.0则利用数据训练神经网络（通过创建数据集进行编程）。而软件3.0，则将编程重新定义为“提示”（prompting），将LLM视为解释器，“上下文窗口”就是你的杠杆。他提供了引人注目的例子：现在安装“Open Claw”只需向智能体提供文本指令，而不是运行复杂的shell脚本。更令人惊叹的是，他个人用于叠加菜单项图片的“MenuGen”应用如今变得“多余”，因为只需向Gemini输入一张图片并获得一张输出图片，即可实现相同功能。这种范式转变意味着AI不仅仅是加快了现有编程速度；它还实现了全新的信息处理形式，例如从非结构化文档中生成知识库，这在以前是根本不可能的。展望2026年，Karpathy设想了一个“神经计算机”的未来，其中原始感官数据（视频、音频）直接输入到神经网络中，由其动态渲染用户界面。他提出当前计算架构将发生逆转，神经网络将成为“宿主进程”，而传统CPU则充当“协处理器”，从而形成一个“极其陌生”的计算图景。 Karpathy探讨的一个核心概念是“可验证性”。LLM擅长自动化那些输出可以被客观验证的任务，这种特质源于它们使用验证奖励进行的强化学习（RL）训练。这解释了LLM智能的“参差不齐”：模型可以完美地重构庞大的代码库或发现零日漏洞，却难以回答简单的常识性问题，比如是否应该步行50米去洗车店。这种不均衡性需要人工监督，因为LLM虽然强大，但仍是易犯错的工具。实验室在训练数据上的决策（例如，GPT-4的大量国际象棋数据）显著影响这些能力，这意味着用户“受制于”所包含的数据。对于创始人来说，这意味着在尚未被主要实验室充分探索的可验证领域存在机会，定制化的强化学习环境和微调可以产生显著成果。 Karpathy将提高所有程序员准入门槛的“随性编程”与“智能体工程”区分开来。后者专注于在显著加速开发的同时，维持专业软件的质量标准。他认为传统的“10倍工程师”概念现在被低估了，因为有效的智能体工程师通过协调那些“不稳定、易犯错、随机性”的智能体，实现了远超此前的提速。因此，招聘流程必须适应，从解决难题转向评估候选人使用智能体工具实现大型项目的能力，并由其他智能体负责找出他们作品中的漏洞。在这个智能体驱动的世界中，人类的审美、判断力、品味和高层级监督等技能变得无价。虽然智能体可以处理API细节和死记硬背的琐事（例如`keepdims`与`axis`的区别），但人类必须保留对底层基础知识的理解（例如张量中的内存管理），并提供战略性设计。Karpathy指出，当前智能体生成的代码可能“臃肿”或“粗糙”，这强调了在保持质量和优雅方面持续需要人类的判断力。最终，Karpathy预见了一个“智能体原生”的世界，其中基础设施是为智能体而非人类设计的。他对当前文档表示沮丧，因为它们提供了人类指令（“前往此URL”），而不是智能体可用的“复制-粘贴”命令。理想情况是只需提示LLM“构建MenuGen”，它就能在没有任何人工干预的情况下完全部署。这种智能体优先的方法将扩展到智能体间通信，实现“我的智能体与你的智能体对话”以安排任务等。关于教育，Karpathy强调了理解的持久价值，引用了“你可以外包你的思考，但你不能外包你的理解”这一观点。他强调，人类仍然是方向、目的和真正理解的瓶颈。虽然由LLM驱动的知识库可以通过重新处理信息来增强理解，但人类在辨别“要构建什么、为什么值得做以及如何指导”这些强大智能体方面的作用仍然不可替代。

Andrej Karpathy, a leading figure in AI, shared his recent, startling realization: he's "never felt more behind as a programmer." This sentiment, arising around December 2023, signifies a fundamental shift in AI capabilities. He observed LLM tools evolving from requiring frequent correction to reliably producing fine-tuned code chunks, enabling "vibe coding" – a fluid, intuitive development process that has led him to pursue an "infinity side project" list. Karpathy emphasizes that AI's evolution is not merely incremental; it's a fundamental change demanding a new perspective. Karpathy posits that LLMs are ushering in "Software 3.0," a radical departure from previous paradigms. Software 1.0 involved explicit rule-based coding, and Software 2.0 leveraged data to train neural networks (programming by creating datasets). Software 3.0, however, redefines programming as "prompting," treating the LLM as an interpreter where the "context window is your lever." He provides striking examples: installing "Open Claw" now involves giving text instructions to an agent, rather than running a complex shell script. More dramatically, his personal "MenuGen" app, designed to overlay menu item pictures, is rendered "spurious" because the same functionality can be achieved by simply prompting Gemini with an image input and getting an image output. This paradigm shift means AI doesn't just make existing programming faster; it enables entirely new forms of information processing, like generating knowledge bases from unstructured documents, which were previously impossible. Looking towards 2026, Karpathy envisions a future of "neural computers" where raw sensory data (video, audio) directly feeds into neural networks that dynamically render UIs. He suggests a reversal of current computing architecture, with neural nets becoming the "host process" and traditional CPUs serving as "co-processors," leading to an "extremely foreign" computing landscape. A core concept Karpathy explores is "verifiability." LLMs excel at automating tasks where outputs can be objectively verified, a trait stemming from their reinforcement learning (RL) training using verification rewards. This explains the "jaggedness" of LLM intelligence: models can flawlessly refactor vast codebases or find zero-day vulnerabilities, yet struggle with simple common-sense questions like whether to walk to a car wash 50 meters away. This jaggedness necessitates human oversight, as LLMs, while powerful, remain fallible tools. Lab decisions on training data (e.g., extensive chess data for GPT-4) significantly influence these capabilities, implying users are "at the mercy" of what data is included. For founders, this means opportunities exist in verifiable domains not yet fully explored by major labs, where custom RL environments and fine-tuning could yield significant results. Karpathy differentiates "vibe coding," which raises the accessibility floor for all programmers, from "agentic engineering." The latter focuses on maintaining the quality bar of professional software while dramatically accelerating development. He believes the traditional "10x engineer" concept is now understated, as effective agentic engineers achieve far greater speed-ups by coordinating "spiky, fallible, stochastic" agents. Consequently, hiring processes must adapt, moving from puzzle-solving to evaluating candidates based on their ability to implement large-scale projects using agentic tools, with other agents tasked to break their creations. In this agent-driven world, human skills like aesthetics, judgment, taste, and high-level oversight become invaluable. While agents can handle API specifics and rote details (e.g., `keepdims` vs `axis`), humans must retain an understanding of underlying fundamentals (e.g., memory management in tensors) and provide strategic design. Karpathy notes that current agent-generated code can be "bloaty" or "gross," underscoring the ongoing need for human discretion in maintaining quality and elegance. Ultimately, Karpathy foresees an "agent-native" world where infrastructure is designed for agents, not just humans. He expresses frustration with current documentation that provides human instructions ("go to this URL") rather than agent-ready "copy-paste" commands. The ideal scenario would be to simply prompt an LLM to "build MenuGen" and have it fully deploy without any manual intervention. This agent-first approach will extend to inter-agent communication, with "my agent talk[ing] to your agent" for tasks like scheduling. Regarding education, Karpathy highlights the enduring value of understanding, quoting the idea: "You can outsource your thinking but you can't outsource your understanding." He stresses that humans remain the bottleneck for direction, purpose, and true comprehension. While tools like LLM-powered knowledge bases can enhance understanding by re-processing information, the human role in discerning "what to build, why it's worth doing, and how to direct" these powerful agents remains irreplaceable.