Accurate speech recognition has largely been relegated to the cloud, with unfortunate if predictable results:
Apple contractors 'regularly hear confidential details' on Siri recordings
Workers hear drug deals, medical details and people having sex, says whistleblower
This journalist’s Otter.ai scare is a reminder that cloud transcription isn’t completely private
A report recently published by Politico about the automated transcription service Otter.ai serves as a great reminder of how difficult it can be to keep things truly private in the age of cloud-based services. It starts off with a nerve-wracking story — the journalist interviewed Mustafa Aksu, a Uyghur human rights activist who could be a target of surveillance from the Chinese government. But though they took pains to keep their communication confidential, they used Otter to record the call — and a day later, they received a message from Otter asking about the purpose of the conversation with Aksu.
Even Apple's "privacy respecting" macOS has removed offline-only speech recognition since macOS 10.15 Catalina.
Happily, there is an accurate and free (as in speech) option for offline speech to text processing - OpenAI's Whisper (GitHub | Hacker News). According to the blurb, it "approaches human level robustness and accuracy on English speech recognition"; daily testing has borne that claim out.
Installation is a breeze:
pip install git+https://github.com/openai/whisper.git
as is basic usage:
whisper audio.mp3
.
A number of models are available, including the default "small" (which "works well for transcribing English" and weighs 483MB) and "large" at around 3GB.
Sindre Sorhus' free Aiko offers a native macOS/iOS GUI for "the Whisper large v2 model on macOS and the medium or small model on iOS depending on available memory". Both MAS and non-MAS versions available.
/nix | Nov 02, 2022