A new arXiv paper argues that large language models suffer from "rational value risk"—a gap between their deployed reasoning and theoretically optimal decision-making—even when successfully aligned to human values during training. Testing Llama, Qwen, Tulu, GPT, and DeepSeek models across math and reasoning benchmarks, researchers found this irrationality is widespread, cannot be fully eliminated through alignment alone, and improves only incrementally with longer reasoning chains.
Why it matters: As AI systems take on higher-stakes reasoning tasks, understanding that alignment training doesn't guarantee rational decision-making in deployment is critical for practitioners assessing real-world LLM reliability.