A benchmark of six document processing approaches across 30 image-heavy PDFs and 171 questions found that premium OCR-based pipelines (LlamaCloud and Azure premium) achieved 59.6% and 58.5% accuracy respectively, while direct vision LLM processing of PDFs ranked fifth at 52% accuracy and highest cost ($0.2552 per query). Vision models particularly struggled with charts and tables—the exact use cases they're promoted for—while OCR pipelines maintained 100% reliability after retries compared to vision's 7% permanent failure rate.
Why it matters: As teams evaluate whether to replace OCR infrastructure with vision-capable LLMs, this data challenges the common narrative that vision models have made traditional OCR obsolete, showing real accuracy and reliability tradeoffs that should inform architecture decisions.