idx/??·
case studies/local voice
Healthcare / private practice·Local Voice·shipped

Replacing a $480/mo cloud transcription vendor with on-device voice

An operator running clinical notes through a per-minute cloud vendor moved the entire workflow on-device in a single deploy session.

Recurring cost
$480/mo → $0
Audio leaving device
None
Time to deploy
1 session
challenge

A solo operator was transcribing clinical notes through a cloud transcription vendor at roughly $480/month, billed per minute. Usage growth meant the bill only went up.

Worse than the cost was the exposure: sensitive patient audio was leaving the operator's machine and sitting with a third-party vendor, with the renewal anxiety that comes with depending on someone else's pricing and uptime.

approach

We deployed a private voice stack on the operator's own hardware: Whisper for high-accuracy speech-to-text, packaged as a self-contained service with a clean local API.

Model selection was tuned for the operator's accent and vocabulary, and the workflow was wired so a recording becomes a structured note without anything leaving the laptop.

The whole deploy — install, model selection, and a working integration — happened in a single session.

architecture
architecture — On-device replacement for a cloud transcription vendor
  Recording ──▶  Whisper (local)  ──▶  Structured note
                                            │
                                            ▼
                                  Stays on operator's machine
  [ no cloud vendor · no per-minute meter · no audio leaving ]
result

The $480/month cloud vendor was eliminated entirely. The replacement runs on hardware the operator already owned.

No audio leaves the machine, removing the third-party privacy exposure and the renewal anxiety.

Transcription is now effectively free at the margin, so the operator uses it more, not less.

stack
WhisperPythonFastAPIDockerLocal Voice TTS
lessons learned
  • Local voice is no longer a research project — on commodity hardware it's a same-day deploy.
  • For privacy-sensitive work, 'no audio leaves the machine' is a stronger selling point than raw accuracy.
  • Removing per-minute billing changes behavior: people use the tool more once the meter is gone.
end of document·doc. v2026.05.r1·sheet 01 of 01
Replacing a $480/mo cloud transcription vendor with on-device voice · Stride Techworks