Researchers tested five advanced language models from Anthropic and OpenAI as autonomous curators for annotating biological phenotypes—a traditionally labor-intensive task requiring expert humans. All five agents performed within the range of expert human curators on a standardized benchmark and significantly outperformed prior NLP tools, suggesting LLM-based agents could help overcome a major bottleneck in comparative biology research.
Why it matters: As life sciences increasingly rely on integrated cross-study data, automating phenotype annotation at human-expert level could dramatically accelerate research while reducing dependence on scarce trained curators.