Researchers analyzing the 2025 ACL Rolling Review found that LLM-generated paper reviews show limited alignment with human reviews, with alignment varying significantly across different prompts and models. The study also discovered that authors can effectively game LLM reviews through iterative revision cycles, achieving statistically significant score increases for up to 35% of papers tested.
Why it matters: As major academic conferences increasingly adopt LLM-assisted peer review, this research exposes critical vulnerabilities in the systems—misalignment with human standards and exploitability—that could undermine the integrity of the scientific publication process.