Convert PDF to TXT while preserving layout #

with Poppler's pdftotext:

$ pdftotext -layout input.pdf output.txt

Preinstalled in current versions of Debian, Ubuntu, et al.; Homebrew formula (brew install poppler), raw source, and Windows binary also available. Beautiful conversion of QuickBooks invoice PDFs into plain text.

H/T: Linux Uprising

UPDATE: "Marker converts PDF, EPUB, and MOBI to markdown. It's 10x faster than nougat, more accurate on most documents, and has low hallucination risk."

/nix | Apr 14, 2023


Subscribe or visit the archives.