Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Voice (TTS & STT)

OpenCrabs supports text-to-speech and speech-to-text with three modes: Off, API (cloud), or Local (on-device, zero cost).

Quick Setup

Run /onboard:voice in the TUI to configure everything interactively — model downloads, voice previews, and API keys are all handled by the wizard.

Speech-to-Text (STT)

Modes

ModeEngineCostLatencySetup
APIGroq Whisper (whisper-large-v3-turbo)Per-minute pricing~1sAPI key in keys.toml
Localwhisper.cpp (on-device)Free~2-5sAuto-downloads model

Local STT Models

ModelSizeQualitySpeed
local-tiny~75 MBGood for short messagesFastest
local-base~142 MBBetter accuracyFast
local-small~466 MBHigh accuracyModerate
local-medium~1.5 GBBest accuracySlower

Models auto-download from HuggingFace to ~/.local/share/opencrabs/models/whisper/ on first use.

Configuration

# config.toml
[voice]
stt_enabled = true
stt_mode = "local"              # "api" or "local"
local_stt_model = "local-tiny"  # local-tiny, local-base, local-small, local-medium

For API mode:

# keys.toml
[providers.stt.groq]
api_key = "your-groq-key"       # From console.groq.com

Text-to-Speech (TTS)

Modes

ModeEngineCostVoicesSetup
APIOpenAI TTS (gpt-4o-mini-tts)Per-character pricingalloy, echo, fable, onyx, nova, shimmerAPI key in keys.toml
LocalPiper (on-device)Free6 voicesAuto-downloads model

Local TTS Voices (Piper)

VoiceDescription
ryanUS Male (default)
amyUS Female
lessacUS Female
kristinUS Female
joeUS Male
coriUK Female

Models auto-download from HuggingFace to ~/.local/share/opencrabs/models/piper/. A Python venv is created automatically for the Piper runtime.

Configuration

# config.toml
[voice]
tts_enabled = true
tts_mode = "local"              # "api" or "local"
local_tts_voice = "ryan"        # ryan, amy, lessac, kristin, joe, cori

For API mode:

# config.toml
[voice]
tts_mode = "api"
tts_voice = "echo"              # OpenAI voice name
tts_model = "gpt-4o-mini-tts"   # OpenAI model
# keys.toml
[providers.tts.openai]
api_key = "your-openai-key"

Full Configuration Reference

# config.toml
[voice]
# Speech-to-Text
stt_enabled = true
stt_mode = "local"              # "api" or "local"
local_stt_model = "local-tiny"  # local-tiny, local-base, local-small, local-medium

# Text-to-Speech
tts_enabled = true
tts_mode = "local"              # "api" or "local"
tts_voice = "echo"              # API mode: OpenAI voice
tts_model = "gpt-4o-mini-tts"   # API mode: OpenAI model
local_tts_voice = "ryan"        # Local mode: Piper voice

How Voice Messages Work

When a voice message arrives on Telegram, WhatsApp, Discord, or Slack:

  1. Audio is decoded (OGG/Opus or WAV)
  2. Transcribed via STT (local whisper.cpp or Groq API)
  3. Agent processes the text and generates a response
  4. Response is converted to speech via TTS (local Piper or OpenAI API)
  5. Audio is encoded as OGG/Opus and sent back as a voice message

Local mode handles everything on-device — no API calls, no cost, no data leaves your machine.

Hardware Requirements

FeatureCPU RequirementNotes
Local STT (rwhisper)AVX2 (Haswell 2013+)Metal GPU on macOS Apple Silicon
Local TTS (Piper)No restrictionsTested on 2007 iMac — works on any x86/ARM
Local embeddingsAVX (Sandy Bridge 2011+)Falls back to FTS-only search

OpenCrabs detects CPU capabilities at runtime and hides unavailable options in the onboarding wizard. Local TTS (Piper) has no CPU limitations and should work on virtually any machine.

Building Without Voice

Voice features are enabled by default. To build without them (smaller binary):

cargo build --release --no-default-features --features telegram,whatsapp,discord,slack,trello

Feature flags: local-stt (whisper.cpp), local-tts (Piper).