v0.12.0 Remote Engine

Remote API Speech Recognition for Linux

Vocalinux can offload the heavy transcription step to a trusted HTTP server while keeping desktop capture, voice activity detection, shortcut handling, and text injection on your Linux machine.

Use stronger hardware

Run larger Whisper models on a workstation, mini PC, or server while a lightweight laptop stays responsive.

Open protocol choices

Target OpenAI-compatible transcription services or the native whisper.cpp server endpoint.

Same desktop workflow

Remote API behaves like a Vocalinux engine, so toggle mode, push-to-talk, VAD, and text injection still work normally.

Supported Remote API Formats

OpenAI-compatible

Use servers that expose the common Whisper transcription API shape, including Speaches, LocalAI, and other compatible backends.

/v1/audio/transcriptions

whisper.cpp server

Point Vocalinux at the HTTP server bundled with whisper.cpp when you want a small LAN transcription appliance.

/inference

How the Request Flows

1Vocalinux records microphone audio locally as 16 kHz mono PCM.
2Voice activity detection decides when an utterance is ready.
3The utterance is packaged as an in-memory WAV upload.
4The configured server returns JSON with a text field.
5Vocalinux injects the transcription into the focused Linux app.

Privacy and Security Notes

Audio is not written to disk before upload.
Use HTTPS and an API key outside a trusted local network.
Remote API is not the same as fully offline mode because audio is sent to the server you configure.

Remote API Setup Path

Run a trusted transcription server

Start a whisper.cpp server or an OpenAI-compatible Whisper service on a machine your Linux desktop can reach.

Open Settings -> Advanced

Remote API is a power-user engine. Enable advanced settings, then configure the Remote Server section.

Choose the endpoint format

Select OpenAI-compatible or whisper.cpp so Vocalinux sends the multipart fields your server expects.

Test, then dictate

Use the connection test as a reachability check, then start dictating with the same shortcut flow as local engines.