a16z Podcast - Human Data is Key to AI: Alex Wang from Scale AI

发布时间：2024-09-24 10:00:00 原节目

这期A16Z播客节目邀请了Scale AI的创始人兼首席执行官Alexander Wang，讨论了人工智能的演变、数据的重要性以及他的领导哲学。Wang和A16Z的普通合伙人David George深入探讨了人工智能发展的三大支柱：算力、算法和数据。Wang将Scale AI定位为通过数据生产推动人工智能进步的关键角色。 Wang认为，语言模型目前正接近“第二阶段”的尾声，即早期纯研究之后的扩展阶段。他相信，行业即将进入一个研究方向将发生显著分化的阶段，不同实验室将在不同时间取得突破。其中一个关键要素是，从原始执行转向创新驱动的周期，尤其是在解决现有数据局限性方面。对话强调了数据生产是下一个前沿领域。随着Common Crawl等容易获取的数据耗尽，各个实验室正在触及“数据墙”。Wang提出要关注数据丰富性，转向“前沿数据”，以实现人工智能更复杂的功能。他指出，目前的人工智能代理尚不完善，缺乏高质量的代理数据来训练模型执行诸如人类自然进行的工具组合任务。解决方案在于捕捉更多人类行为，投资于合成和混合数据，并创建数据铸造厂来生成大量高质量数据。 Wang讨论了大型科技公司凭借其内部数据所拥有的优势，但也谈到了利用这些数据可能存在的监管问题，尤其是在欧洲。他强调了大型科技公司的财务优势，使他们能够大力投资人工智能，并有可能获得可观的回报。讨论转向模型层的市场结构。模型推理的成本大幅下降，这表明智能可能正在成为一种商品。Wang怀疑租用模型是否会成为一项有利可图的长期业务。他指出了提供底层硬件（NVIDIA）或必要云基础设施的企业价值。他认为，在模型层之上，应用层存在着更高质量的企业，ChatGPT就是一个很好的例子。谈到企业对人工智能的应用，Wang观察到激动和试验并存，但注意到与预期相比，最终投入生产的实验性概念验证项目较少。到目前为止，人工智能主要带来了效率提升和辅助功能的改进，即在边缘领域。Wang鼓励企业专注于通过节省成本、提高效率和改善客户体验来有意义地提升股票价格的人工智能实施方案。他强调了企业数据的潜在价值，这些数据被人工智能用来改造当前的业务运营。数据是有价值的，而且是极具价值的，然而，公司在组织和利用这些数据方面面临着挑战。对话随后转向Wang的领导哲学以及从2020年和2021年招聘热潮中获得的经验教训。他承认，认为更多人等于更好结果的假设是错误的。他发现，一个高性能的团队是脆弱的，并且难以在不牺牲质量和文化的情况下大幅扩展。他建议初创公司应优先保持高性能团队的完整。他还建议，在聘用高管时，重要的是让他们融入现有的文化和运营，然后再进行大规模的变革。 Wang讨论了Scale AI实施的“MEI”（优绩、卓越和智慧）原则，强调无论人口统计特征如何，他们都会为每个职位聘用最合适的候选人。最后，Wang将通用人工智能（AGI）定义为人工智能能够完成人们纯粹在计算机上完成的80%或更多工作时。他预计至少还需要四年时间。

This A16Z podcast episode features Alexander Wang, founder and CEO of Scale AI, discussing the evolution of AI, the importance of data, and his leadership philosophy. Wang and A16Z General Partner David George delve into the three pillars of AI progress: compute, algorithms, and data. Wang positions Scale AI as a key player in fueling AI advancements through data production. Wang characterizes the current state of language models as closing in on the end of "phase two," which is the scaling phase after the early pure research. He believes that the industry is entering a phase where research directions will diverge significantly, with breakthroughs occurring at various times across different labs. One crucial element highlighted is the shift from raw execution to innovation-powered cycles, particularly in addressing the limitations of existing data. The conversation stresses the importance of data production as the next frontier. With easily accessible data like Common Crawl exhausted, labs are hitting a "data wall." Wang proposes a focus on data abundance, moving towards "frontier data" to enable more complex capabilities in AI. He points to the current inadequacy of AI agents, noting the lack of high-quality agent data for training models to perform tool composition tasks like humans naturally do. The solution lies in capturing more of what humans do, investing in synthetic and hybrid data, and creating data foundries to generate massive amounts of high-quality data. Wang discusses the advantage that big tech companies have with their internal data, but also addresses potential regulatory issues in utilizing this data, especially in Europe. He highlights the financial advantages of big tech companies, allowing them to heavily invest in AI efforts with the potential for significant returns. The discussion shifts to the market structure of the model layer. The cost of model inference has dramatically decreased, suggesting that intelligence may be becoming a commodity. Wang doubts that renting models will be a lucrative long-term business. He notes the value of businesses that provide the underlying hardware (NVIDIA) or the necessary cloud infrastructure. He notes higher quality businesses exist above the model layer with application, and a good example of this is ChatGPT. Turning to enterprise adoption of AI, Wang observes excitement and experimentation but notes that fewer proof-of-concepts are making it to production than expected. Instead of a complete transformation, AI has so far mostly delivered efficiency gains and improvements to support functions, so in marginal areas. Wang encourages enterprises to focus on AI implementations that meaningfully boost stock prices through cost savings, efficiency gains, and better customer experiences. He highlights the potential value of enterprise data, which is used by AI to transform current business operations. Data is valuable, and it’s hyper valuable, however there are challenges around how companies organize and leverage that data. The conversation pivots to Wang's leadership philosophy and lessons learned from the hiring boom of 2020 and 2021. He acknowledges the mistake of assuming that more people equate to better results. He discovered that a high-performing team is delicate and difficult to scale dramatically without sacrificing quality and culture. He suggests that startups should prioritize keeping high-performing teams intact. He also advises that when hiring executives, it's important to integrate them into the existing culture and operations before making sweeping changes. Wang discusses Scale AI's implementation of "MEI" (merit, excellence, and intelligence), emphasizing that they will hire the best candidate for each position regardless of demographics. Finally, Wang offers his definition of Artificial General Intelligence (AGI) as when AI can accomplish 80% or more of jobs that people do purely on computers. He expects that to be a minimum of four years away.