Tiny or Base
- Best if you care about low latency over absolute transcription quality.
- Great for older CPUs, battery-conscious laptops, and quick chat replies.
Whisper Model Guide
Picking the right model is the biggest performance decision for Linux voice dictation. Use this guide to choose between tiny, base, small, medium, and large based on RAM, speed, and real-world transcription accuracy in Vocalinux.
| Model | Size | RAM (CPU) | RAM (GPU) | Accuracy | Speed | Best For |
|---|---|---|---|---|---|---|
| Tiny | ~39MB | ~0.8-1.2GB | ~0.5-0.8GB VRAM | Good | Fastest | Older laptops, lowest latency, quick notes |
| Base | ~74MB | ~1.2-1.8GB | ~0.8-1.2GB VRAM | Good+ | Very fast | Balanced real-time dictation on most systems |
| Small | ~244MB | ~2-3GB | ~1.5-2.5GB VRAM | High | Fast | Most users wanting stronger punctuation and names |
| Medium | ~769MB | ~4-6GB | ~3-5GB VRAM | Very high | Moderate | Accuracy-focused dictation, technical writing |
| Large | ~1.5GB | ~8GB+ | ~6-10GB VRAM | Best | Slowest | Maximum quality with strong GPU hardware |
Enable GPU acceleration to run larger models faster, especially medium and large.
GPU acceleration guideSee whisper.cpp vs Whisper vs VOSK before deciding your full dictation stack.
Engine comparisonUse the install guide to get Vocalinux running with the model size that fits your hardware.
Installation guideStart with small if unsure, then move up or down after a day of use. Vocalinux makes model changes simple, so you can tune for your exact speed/accuracy preference.
Install Vocalinux