Researchers tested the common assumption that fine-tuning language models with synthetic explanations (rationales) improves clinical prediction tasks, specifically for Alzheimer's disease forecasting. Across 504 configurations, they found rationale-based training consistently hurt performance compared to label-only fine-tuning, even when the rationales were medically accurate—identifying a structural conflict between narrative plausibility and predictive optimization as the culprit.
Why it matters: This challenges a widespread practice in AI for healthcare and reveals that explainability-focused training can backfire in high-stakes applications, forcing practitioners to reconsider how language models should be supervised for clinical tasks.