Researchers introduce Evoflux, an inference-time evolutionary search technique that helps compact language models execute complex tool workflows by repairing failed plans in real-time. Testing on 250 tools across live servers, Evoflux increased execution success rates from roughly 3% to 17-24% for small planners, outperforming traditional training methods like supervised fine-tuning and reinforcement learning approaches.
Why it matters: As teams deploy smaller, cheaper language models for agent applications, improving their reliability at handling multi-step tool interactions directly addresses a major deployment gap between capable but expensive models and production-ready compact alternatives.