Speech Engine Comparison

whisper.cpp vs Whisper vs VOSK vs Remote API on Linux

If you are choosing a Linux speech-to-text engine, this page gives a practical side-by-side comparison focused on latency, hardware support, install footprint, privacy boundary, and real desktop usage.

whisper.cpp

Speed
Fastest startup + low latency
Hardware
CPU + AMD/Intel/NVIDIA GPU
Accuracy
High (best overall balance)
Footprint
Small models available (~74MB tiny)
Best for
Most users who want strong speed + quality

Whisper (OpenAI)

Speed
Slower install and startup
Hardware
CPU or NVIDIA CUDA
Accuracy
High
Footprint
Large dependency footprint (~2.3GB)
Best for
Users already standardized on PyTorch stack

VOSK

Speed
Very fast realtime on low-end systems
Hardware
CPU
Accuracy
Good for lightweight use
Footprint
Very lightweight (~40MB model)
Best for
Older hardware and minimal-resource environments

Remote API

Speed
Depends on server + network latency
Hardware
Client CPU + remote Whisper server
Accuracy
Depends on remote model
Footprint
No local model required
Best for
Powerful LAN servers or shared transcription backends

Switching Between Engines

You can switch between whisper.cpp, Whisper, VOSK, and Remote API from Settings. v0.10.1+ safely stops recognition before switching to prevent crashes. v0.12.0 adds Remote API configuration under Advanced settings for compatible transcription servers.

When to pick whisper.cpp

Choose whisper.cpp when you want the best speed-to-accuracy ratio and broad hardware support. It is the default in Vocalinux for a reason. Safe engine switching - v0.10.1+ stops recognition before switching to prevent crashes.

When to pick Whisper

Choose OpenAI Whisper if your environment already depends on PyTorch/CUDA workflows and you prefer that runtime profile.

When to pick VOSK

Choose VOSK on older laptops, low-RAM systems, or lightweight VMs where small model size and minimal overhead matter most.

When to pick Remote API

Choose Remote API when a trusted server has stronger hardware, larger models, or a shared Whisper backend. Use local engines when your voice data must stay entirely on-device.

Remote setup

Next steps

  • Install by distro:Ubuntu,Fedora,Arch Linux.
  • Use interactive install to detect your hardware and pick the best engine defaults.
  • After install, tune model size, VAD sensitivity, or Remote API settings for your preferred latency and accuracy level.