You sort of have to use both. OCR and LLM and then correlate the two results. Th...

fzysingularity · on Feb 27, 2025

We think VLMs would outperform most OCR+LLM solutions in due time. I get that there’s need for these hybrid solutions today, but we’re comparing 20+ year mature tech vs something that’s roughly 1.5 years old.

Also, VLMs are end-to-end trainable, unlike OCR+LLM solutions (that are trained separately), so it’s clear that these approaches scale much better for domain-specific use cases or verticals.

cpursley · on Feb 27, 2025

Any tips on how to prompt that second pairing step? And what sort of things to ask the llm to extract in step 1?

K0balt · on Feb 27, 2025

A VLM that invokes ocr tool use is a compelling idea that could result in pretty good results, I would expect.