A new research paper challenges the assumption that chain-of-thought reasoning reduces bias in AI models, finding instead that position bias in multiple-choice questions actually increases with longer reasoning trajectories across models like DeepSeek-R1. The study tested thirteen reasoning configurations and found that 12 showed statistically significant positive correlations between reasoning length and position bias, with effects ranging from 16% to 32% in some cases, even after controlling for accuracy.
Why it matters: As reasoning-tuned models become central to enterprise AI evaluation pipelines, this finding suggests current multiple-choice benchmarking practices may produce misleading results—and that developers need new diagnostic tools to audit these hidden biases before deployment.