a16z - The Quest for Community-Trained Open Source AI Models

发布时间：2024-10-01 14:05:14 原节目

以下是将原文翻译成中文： **新研究（New Research）** 是一项开源AI加速器倡议，旨在将AI带给每一个人，不仅仅是作为一种产品，更作为一种人人都可以接触和贡献的技术。通过促进开源创新，他们希望赋能个人学习、实验，并在AI这一变革性技术的基础上进行构建。他们的重点在于基础研究，用最小的计算资源来突破AI的边界。与传统的学术方法（通常是渐进式的贡献）不同，AI目前正处于可以通过探索不同的替代方案，并汇集来自不同背景的人才来实现突破性研究的状态。他们最初的重点是Hermes系列AI模型，该系列模型旨在保持中立对齐，并允许用户指示模型采用任何角色，这与强制执行护栏的封闭供应商不同。这种个人主义的方法赋能用户表达自我，并在没有道德约束的情况下进行创造。他们的团队还开发了一种名为Yarn的方法，该方法扩展了AI模型的上下文窗口，使其能够处理更大的文本量。这项研究已被其他开源AI工具广泛采用。 Distro是一个开创性的项目，它仅使用标准的互联网连接就能训练出高性能的AI模型，从而将性能扩展与互连扩展解耦。这项创新解决了AI中的一个根本性问题，即由于带宽限制，训练模型需要所有GPU在同一房间内。通过克服这一限制，Distro旨在通过允许任何人参与训练最先进的模型来 democratize AI，无论他们是否有权访问昂贵的、共址的数据中心。传统上，由于需要大量计算能力才能运行的高互连速度（需要40,000+ H100 GPU），只有少数组织可以训练大型模型。 Distro已将其减少到普通计算机可以通过其互联网连接完成的任务。 New Research团队的研究标准是它应该是基础的并且在数学上有根基，允许进行较小的实验和迭代开发。他们寻求能够消除障碍并为开源社区带来倍增效应的“10倍能量提升”。这一理念指导了Hermes的开发，它利用合成数据来克服人工数据收集的成本和局限性。 Distro背后的核心思想是创建一个系统，让全世界可以协同创造AI，代表每个人的贡献。在最初专注于数据收集和Hermes之后，该团队开始思考，如果他们没有获得Llama 4供开源使用会怎么样。他们意识到存在一个技术问题（互联网带宽）。尽管最初对Distro的反应是不相信，但该团队已经复制了他们的结果，包括使用来自Alan AI的Olmo框架。这涉及从头开始重新实现Distro，并针对Olmo的基线对其进行测试，从而证实了最初的发现。 Distro的工作原理是允许GPU使用自己的数据独立训练，仅传递学习到的最重要信息，而不是同步整个模型。这创建了一个在空间中一起移动的模型云，最终达到类似的性能。结果是带宽需求几乎减少了1000倍。他们的发现表明，关键信息足以作为替代，从而解锁对这些模型如何学习的理解。他们计划发布一篇详细介绍Distro的论文。为了将Distro产品化，他们计划专注于构建全栈工具，并将其开放给社区开发。他们表示，他们希望建立一个基于高性能通用计算的理想系统，并且他们正在采取一种工程优先的方法，从而摆脱对当前规范的先入为主的观念。虽然中心化的参与者仍然可以从中受益，但最终目标是赋能个人贡献者并 democratize AI 领域。

**New Research** is an open-source AI accelerator initiative that aims to bring AI to everyone, not just as a product but also as a technology that everyone can touch and contribute to. By promoting open-source innovation, they hope to empower individuals to learn, experiment, and build upon the transformative technology of AI. Their focus lies in fundamental research that pushes the boundaries of AI with minimal compute resources. Unlike traditional academic approaches where contributions are often incremental, AI is currently in a state where groundbreaking research can be achieved by exploring diverse alternatives and bringing together individuals from various backgrounds. Their initial focus was the Hermes series of AI models, designed to be neutrally aligned and allow users to instruct the model to adopt any persona, unlike closed providers that enforce guardrails. This individualistic approach empowers users to express themselves and be creative without moralizing constraints. Their team also developed a method called Yarn, which extends the context window of AI models, enabling them to handle larger amounts of text. This research has been widely adopted by other open source AI tools. Distro is a groundbreaking project that enables the training of highly capable AI models using only a standard internet connection, decoupling performance scaling from interconnect scaling. This innovation addresses a fundamental problem in AI, where training models requires all GPUs to be in the same room due to bandwidth limitations. By overcoming this constraint, Distro aims to democratize AI by allowing anyone to contribute to training state-of-the-art models, regardless of access to expensive, co-located data centers. Traditionally, only a handful of organizations can train large models due to high interconnect speeds which require large computing power to run (requiring 40,000+ H100 GPUs). Distro has reduced that to what your average computer can do with it's internet connection. The New Research team's criteria for research is that it should be fundamental and mathematically grounded, allowing for smaller experiments and iterative development. They seek "10x power-ups" that remove blockers and enable multipliers for the open-source community. This philosophy guided the development of Hermes, which leveraged synthetic data to overcome the cost and limitations of human data collection. The core idea behind Distro is to create a system where the entire world can collaborate to create AI, representative of everyone's contributions. After initially focusing on data collection and Hermes, the team began to ask what if they didn't get Llama 4 for open source use. They realized there was a technical problem (internet bandwidths). While the initial response to Distro was disbelief, the team has replicated their results, including using the Olmo framework from Alan AI. This involved re-implementing Distro from scratch and testing it against Olmo's baseline, confirming the initial findings. Distro works by allowing GPUs to train independently with their own data, communicating only the most important information learned rather than synchronizing the entire model. This creates a cloud of models that move together in space, eventually achieving similar performance. The result is almost a 1,000x reduction in bandwidth requirements. Their findings suggest key pieces of information can be enough of a substitute, unlocking an understanding of how these models learn. They plan to release a paper detailing Distro. To productize Distro, they plan to focus on building full stack tooling and open it up to community development. They said they hope for an ideal system based on high performance general purpose computing, and they are taking an engineering first approach that frees themselves of pre-conceived notions of current norms. While centralized actors can still benefit, the ultimate goal is to empower individual contributors and democratize the AI landscape.