Translating Claude’s thoughts into language

发布时间 2026-05-07 17:01:21 来源

Episode 设置

摘要

AI models like Claude talk in words but think in numbers. These numbers, called activations, encode Claude’s thoughts, but not in a language we can read. We are introducing Natural Language Autoencoders, or NLAs, which translate AI models’ activations into readable text. NLAs have already helped us improve how we test our models for safety and better understand why they do what they do. Read more about this research on our blog: https://www.anthropic.com/research/natural-language-autoencoders

GPT-4正在为你翻译摘要中......

Translating Claude’s thoughts into language

摘要

中英文字稿