Sarras
  Writing

Essay

Recognition is a guess

Automatic transcription, OCR, and classification have gotten remarkably good. Building software around them means designing for the moments they're confidently wrong.

We say a system read the document, or recognized the speaker, or classified the file — as if it knew. It didn't. Automatic recognition produces a best guess with a confidence attached, then hands you the guess and quietly drops the confidence. Build on top of that without accounting for it, and you've built on sand that looks like stone.

The good part is real

This isn't skepticism for its own sake. Modern recognition is genuinely remarkable: transcription that would have been science fiction a decade ago, OCR that reads a crumpled receipt, classifiers that sort thousands of documents in seconds. Used well, it turns work that took days into minutes. The mistake is never using it — it's trusting it uniformly, as though every output were equally solid.

Two ways it fails

Recognition fails in two shapes, and they call for different defenses:

  • Confidently wrong. The model returns a clean, fluent answer that happens to be false — a misheard name, a transposed digit, a plausible word that was never said. This is the dangerous one, because nothing about the output looks uncertain.
  • Silently incomplete. It simply misses — a quiet passage, a smudged line, a case it was never trained on. The gap doesn't announce itself either.

Classification adds a third: the forced choice. A classifier must pick a bucket even when the honest answer is "none of these" or "I'm not sure." Make it choose and it will — confidently, into the wrong bin.

The output that looks most certain and the output that is most wrong are, too often, the same output.

Design for the confident mistake

If you assume recognition will sometimes be confidently wrong, the design follows:

  • Keep the source. Never discard the original the recognition came from. The audio, the scan, the image stays the source of truth; the recognized text is just an index into it.
  • Carry the confidence. The model knows when it's unsure — surface that instead of throwing it away. A shaky line should not look identical to a certain one.
  • Make verification cheap. The pattern that wins is recognize everything, verify what matters: let the machine do the broad pass, then make it one click to jump from any claim back to the source and confirm it.

Classification needs an escape hatch

Give the classifier permission to abstain. "Unknown" and "needs review" are first-class outcomes, not failures — route those items to a person instead of forcing a label. Never auto-act on a guess that crosses a consequential threshold without a human in the loop. And keep in mind that categories are lossy: the world rarely fits the buckets cleanly, and the cases that matter most tend to live on the boundaries between them.

Provenance ties it together

This is the same discipline as auditability. Every recognized value and every classification should point back to what produced it — and how sure the machine was. Then a wrong guess becomes a traceable, correctable event, rather than a "fact" that silently entered your system and can never be found again.

Where we land

We build on automatic recognition constantly — it's how Scribe Verbatim turns hours of recordings into searchable, speaker-attributed transcripts, and how it sorts a case's documents. But we treat every recognized line and every classification as a guess with a source attached: verifiable, correctable, never load-bearing on its own. If you're putting recognition or classification somewhere it has to be right, that's exactly the kind of problem we like.