Offline speech-to-text that is accurate and free (as in speech) #

Accurate speech recognition has largely been relegated to the cloud, with unfortunate if predictable results:

Even Apple's "privacy respecting" macOS has removed offline-only speech recognition since macOS 10.15 Catalina.

Happily, there is an accurate and free (as in speech) option for offline speech to text processing - OpenAI's Whisper (GitHub | Hacker News). According to the blurb, it "approaches human level robustness and accuracy on English speech recognition"; daily testing has borne that claim out.

Installation is a breeze:

pip install git+https://github.com/openai/whisper.git

as is basic usage:

whisper audio.mp3.

A number of models are available, including the default "small" (which "works well for transcribing English" and weighs 483MB) and "large" at around 3GB.

Update

Sindre Sorhus' free Aiko offers a native macOS/iOS GUI for "the Whisper large v2 model on macOS and the medium or small model on iOS depending on available memory". Both MAS and non-MAS versions available.

/nix | Nov 02, 2022


Subscribe or visit the archives.