Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Voice (TTS & STT)

OpenCrabs supports text-to-speech and speech-to-text with five provider tiers: Off, Groq (API), OpenAI-compatible (any /v1/audio endpoint), Voicebox (self-hosted), or Local (on-device, zero cost).

Quick Setup

Run /onboard:voice in the TUI to configure everything interactively. The voice screen has radio selectors for both STT and TTS, with fields shown/hidden based on the selected provider. API keys are wired to keys.toml automatically.

Speech-to-Text (STT)

Providers

ProviderEngineCostLatencySetup
GroqWhisper (whisper-large-v3-turbo)Per-minute pricing~1sAPI key in keys.toml
OpenAI-compatibleAny Whisper-compatible endpointVaries~1-3sstt_base_url + stt_model + API key
VoiceboxSelf-hosted open-sourceFree~2-5svoicebox_stt_enabled=true + voicebox_stt_base_url
Localwhisper.cpp (on-device)Free~2-5sAuto-downloads model

Local STT Models

ModelSizeQualitySpeed
local-tiny~75 MBGood for short messagesFastest
local-base~142 MBBetter accuracyFast
local-small~466 MBHigh accuracyModerate
local-medium~1.5 GBBest accuracySlower

Models auto-download from HuggingFace to ~/.local/share/opencrabs/models/whisper/ on first use.

Configuration

# config.toml
[voice]
stt_enabled = true
stt_mode = "local"              # "api" or "local"
local_stt_model = "local-tiny"  # local-tiny, local-base, local-small, local-medium

For API mode:

# keys.toml
[providers.stt.groq]
api_key = "your-groq-key"       # From console.groq.com

Text-to-Speech (TTS)

Providers

ProviderEngineCostVoicesSetup
OpenAIgpt-4o-mini-ttsPer-character pricingalloy, echo, fable, onyx, nova, shimmerAPI key in keys.toml
OpenAI-compatibleAny /v1/audio/speech endpointVariesVaries by servertts_base_url + tts_model + tts_voice + API key
VoiceboxSelf-hosted async POST /generateFreeConfigurable profilesvoicebox_tts_enabled=true + voicebox_tts_base_url + voicebox_tts_profile_id
LocalPiper (on-device)Free6 voicesAuto-downloads model

Local TTS Voices (Piper)

VoiceDescription
ryanUS Male (default)
amyUS Female
lessacUS Female
kristinUS Female
joeUS Male
coriUK Female

Models auto-download from HuggingFace to ~/.local/share/opencrabs/models/piper/. A Python venv is created automatically for the Piper runtime.

Configuration

# config.toml
[voice]
tts_enabled = true
tts_mode = "local"              # "api" or "local"
local_tts_voice = "ryan"        # ryan, amy, lessac, kristin, joe, cori

For API mode:

# config.toml
[voice]
tts_mode = "api"
tts_voice = "echo"              # OpenAI voice name
tts_model = "gpt-4o-mini-tts"   # OpenAI model
# keys.toml
[providers.tts.openai]
api_key = "your-openai-key"

Full Configuration Reference

# config.toml
[voice]
# Speech-to-Text
stt_enabled = true
stt_mode = "groq"                 # "groq", "openai_compatible", "voicebox", "local"
local_stt_model = "local-tiny"    # local-tiny, local-base, local-small, local-medium
stt_base_url = "https://..."      # OpenAI-compatible STT endpoint
stt_model = "whisper-1"           # OpenAI-compatible STT model
voicebox_stt_enabled = false
voicebox_stt_base_url = "https://..."

# Text-to-Speech
tts_enabled = true
tts_mode = "openai"               # "openai", "openai_compatible", "voicebox", "local"
tts_voice = "echo"                # OpenAI TTS voice name
tts_model = "gpt-4o-mini-tts"     # OpenAI TTS model
local_tts_voice = "ryan"          # Local mode: Piper voice
tts_base_url = "https://..."      # OpenAI-compatible TTS endpoint
tts_model = "tts-1"               # OpenAI-compatible TTS model
voicebox_tts_enabled = false
voicebox_tts_base_url = "https://..."
voicebox_tts_profile_id = "profile-id"
# keys.toml
[providers.stt.groq]
api_key = "your-groq-key"

[providers.stt.openai_compatible]
api_key = "your-api-key"

[providers.tts.openai]
api_key = "your-openai-key"

[providers.tts.openai_compatible]
api_key = "your-api-key"

How Voice Messages Work

When a voice message arrives on Telegram, WhatsApp, Discord, or Slack:

  1. Audio is decoded (OGG/Opus or WAV)
  2. Transcribed via STT (local whisper.cpp or Groq API)
  3. Agent processes the text and generates a response
  4. Response is converted to speech via TTS (local Piper or OpenAI API)
  5. Audio is encoded as OGG/Opus and sent back as a voice message

Local mode handles everything on-device — no API calls, no cost, no data leaves your machine.

Hardware Requirements

FeatureCPU RequirementNotes
Local STT (rwhisper)AVX2 (Haswell 2013+)Metal GPU on macOS Apple Silicon
Local TTS (Piper)No restrictionsTested on 2007 iMac — works on any x86/ARM
Local embeddingsAVX (Sandy Bridge 2011+)Falls back to FTS-only search

OpenCrabs detects CPU capabilities at runtime and hides unavailable options in the onboarding wizard. Local TTS (Piper) has no CPU limitations and should work on virtually any machine.

Building Without Voice

Voice features are enabled by default. To build without them (smaller binary):

cargo build --release --no-default-features --features telegram,whatsapp,discord,slack,trello

Feature flags: local-stt (whisper.cpp), local-tts (Piper).