whisper.cpp
- Speed
- Fastest startup + low latency
- Hardware
- CPU + AMD/Intel/NVIDIA GPU
- Accuracy
- High (best overall balance)
- Footprint
- Small models available (~74MB tiny)
- Best for
- Most users who want strong speed + quality
Speech Engine Comparison
If you are choosing a Linux speech-to-text engine, this page gives a practical side-by-side comparison focused on latency, hardware support, install footprint, privacy boundary, and real desktop usage.
| Engine | Speed | Hardware | Accuracy | Footprint | Best for |
|---|---|---|---|---|---|
| whisper.cpp | Fastest startup + low latency | CPU + AMD/Intel/NVIDIA GPU | High (best overall balance) | Small models available (~74MB tiny) | Most users who want strong speed + quality |
| Whisper (OpenAI) | Slower install and startup | CPU or NVIDIA CUDA | High | Large dependency footprint (~2.3GB) | Users already standardized on PyTorch stack |
| VOSK | Very fast realtime on low-end systems | CPU | Good for lightweight use | Very lightweight (~40MB model) | Older hardware and minimal-resource environments |
| Remote API | Depends on server + network latency | Client CPU + remote Whisper server | Depends on remote model | No local model required | Powerful LAN servers or shared transcription backends |
You can switch between whisper.cpp, Whisper, VOSK, and Remote API from Settings. v0.10.1+ safely stops recognition before switching to prevent crashes. v0.12.0 adds Remote API configuration under Advanced settings for compatible transcription servers.
Choose whisper.cpp when you want the best speed-to-accuracy ratio and broad hardware support. It is the default in Vocalinux for a reason. Safe engine switching - v0.10.1+ stops recognition before switching to prevent crashes.
Choose OpenAI Whisper if your environment already depends on PyTorch/CUDA workflows and you prefer that runtime profile.
Choose VOSK on older laptops, low-RAM systems, or lightweight VMs where small model size and minimal overhead matter most.
Choose Remote API when a trusted server has stronger hardware, larger models, or a shared Whisper backend. Use local engines when your voice data must stay entirely on-device.
Remote setup