Reward hacking: a potential source of serious Al misalignment

发布时间 2025-11-21 16:59:20 来源

Episode 设置

We discuss our new paper, "Natural emergent misalignment from reward hacking in production RL". In this paper, we show for the ...

GPT-4正在为你翻译摘要中......