Offline speech-to-text that is accurate and free (as in speech) #

Accurate speech recognition has largely been relegated to the cloud, with unfortunate if predictable results:

Apple contractors 'regularly hear confidential details' on Siri recordings

Workers hear drug deals, medical details and people having sex, says whistleblower
This journalist’s Otter.ai scare is a reminder that cloud transcription isn’t completely private

A report recently published by Politico about the automated transcription service Otter.ai serves as a great reminder of how difficult it can be to keep things truly private in the age of cloud-based services. It starts off with a nerve-wracking story — the journalist interviewed Mustafa Aksu, a Uyghur human rights activist who could be a target of surveillance from the Chinese government. But though they took pains to keep their communication confidential, they used Otter to record the call — and a day later, they received a message from Otter asking about the purpose of the conversation with Aksu.

Even Apple's "privacy respecting" macOS has removed offline-only speech recognition since macOS 10.15 Catalina.

Happily, there is an accurate and free (as in speech) option for offline speech to text processing - OpenAI's Whisper (GitHub | Hacker News). According to the blurb, it "approaches human level robustness and accuracy on English speech recognition"; daily testing has borne that claim out.

Installation is a breeze:

pip install git+https://github.com/openai/whisper.git

as is basic usage:

whisper audio.mp3.

A number of models are available, including the default "small" (which "works well for transcribing English" and weighs 483MB) and "large" at around 3GB.

Update

Sindre Sorhus' free Aiko offers a native macOS/iOS GUI for "the Whisper large v2 model on macOS and the medium or small model on iOS depending on available memory". Both MAS and non-MAS versions available.

/nix | Nov 02, 2022

RSS | Archives | Links