Linux, OCR and PDF: Scan to PDF/A

Friday, March 29th, 2013 | Author:

The (by far) most visited post on this blog is from 2010, about OCRing a PDF in GNU/Linux (Optical Character Recognition), and it contains a small shell script that has been improved by others several times. After having bought a new flatbed scanner, I re-investigated how to scan and OCR pdfs, how to produce DJVU files that are incredibly small and how to get metadata right. It turns out what I really ever wanted was to create PDF/A compliant documents (I just didn't know what PDF/A was before). But let me explain the details after presenting you the quick solution. At the end, I have a shell script that scans directly to PDF/A.

Continue reading «Linux, OCR and PDF: Scan to PDF/A»

Category: English, Not Mathematics | 20 Comments