with Demucs, an open source "state-of-the-art music source separation model, currently capable of separating drums, bass, and vocals from the rest of the accompaniment."
Install: pip3 install demucs soundfile
Run: /path/to/demucs --two-stems=vocals --out=
/path/to/output_dir/ /path/to/audio.flac
Listen: Find vocals.wav & no_vocals.wav inside output_dir/audio/.
Including soundfile
resolved RuntimeError: Couldn't find appropriate backend to handle uri.
The --jobs
flag (e.g., demucs --jobs
10 audio.flac) significantly speeds up processing when multiple cores are available, albeit at the cost of increased memory usage.
vocals.wav contained a few brief instrumental segments; removed the vocal segments in Fission (by selecting and silencing them) then overlaid vocals.wav with no_vocals.wav to preseve all non-vocal elements like so:
pip3 install pydub
In Python:
from pydub import AudioSegment # Load the two WAV files sound1 = AudioSegment.from_wav("vocals.wav") sound2 = AudioSegment.from_wav("no_vocals.wav") # Combine the two audio files combined = sound1.overlay(sound2) # Export the combined file combined.export("
combined.wav", format="wav")
/mac | Aug 28, 2024