Reward hacking: a potential source of serious Al misalignment

发布时间 2025-11-21 16:59:20    来源

摘要

We discuss our new paper, "Natural emergent misalignment from reward hacking in production RL". In this paper, we show for the ...

GPT-4正在为你翻译摘要中......

中英文字稿