v0.12.0 Silero VAD

Silero VAD for Cleaner Linux Voice Dictation

Vocalinux now uses Silero neural voice activity detection when ONNX Runtime support is available. It identifies speech before transcription, drops silence-only buffers, and falls back safely when the neural backend is not installed.

Drops silence-only buffers

Silero VAD helps Vocalinux ignore chunks that do not contain speech before they reach the recognition engine.

Cleaner dictation sessions

Better speech boundaries reduce empty transcriptions and lower the chance that silence becomes stray text.

Same sensitivity control

The existing 1-5 VAD sensitivity setting works for both Silero and the amplitude fallback.

How Speech Detection Fits In

  1. 1Microphone audio is captured locally.
  2. 2Vocalinux checks each short chunk for speech activity.
  3. 3Speech chunks are kept for transcription.
  4. 4Silence-only chunks are dropped before recognition.
  5. 5If Silero is unavailable, amplitude-based VAD takes over.

Settings and Installation

The official installer attempts to install neural VAD support. If ONNX Runtime is not available, Vocalinux logs the fallback and continues with amplitude-based VAD.

pip install "vocalinux[vad]"

The Recognition tab shows which backend is active. Higher VAD sensitivity values are more responsive to quiet speech for both backends.

Local by design

Silero runs on your machine. Voice activity detection does not require a cloud service.

Small CPU-side helper

The VAD step is a lightweight pre-filter before whisper.cpp, Whisper, VOSK, or Remote API recognition.

Useful for push-to-talk

Cleaner speech boundaries also help short recordings and stop-on-release flows avoid silence-only output.

Related Guides