Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

OpenCrabs is a self-hosted, provider-agnostic AI orchestration agent that runs as a single Rust binary. It automates your terminal, browser, channels (Telegram/Discord/Slack/WhatsApp/Trello), and codebase — all while respecting your privacy and keeping you in control.

What Makes OpenCrabs Different

🔄 Provider-Agnostic with Native CLI Integration

  • 14 built-in providers + Custom OpenAI Compatible: Anthropic Claude, OpenAI, Gemini, OpenRouter, Qwen (DashScope), MiniMax, Ollama, z.ai GLM, GitHub Copilot, Codex, Codex CLI, OpenCode, OpenCode CLI
  • Claude Code CLI, OpenCode CLI, Codex CLI & Codex OAuth integrated as native providers — use their models without API keys
  • Codex (OAuth) — native OpenAI Codex subscription auth via device-code PKCE flow. No CLI dependency, no API key. Authenticate through browser once; tokens stored with automatic refresh (v0.3.19)
  • Ollama as native provider — run any local model via Ollama API without custom provider setup (v0.3.15)
  • Custom OpenAI-compatible backends now stream thinking tokens, tool calls, and intermediate text exactly like native providers (v0.3.2)
  • Sticky fallback chain — auto-failover to secondary providers on rate limits or errors
  • Health-aware sticky fallback persistence — fallback state survives restarts, checks provider health on creation and advances if primary has 2+ consecutive failures (v0.3.17)
  • OpenRouter response caching — zero cost for identical requests (v0.3.17)
  • TCP keepalive on all HTTP clients — 15s keepalive detects silent TCP drops in ~15-45s instead of waiting for 300s idle timeout (v0.3.17)
  • Prompt caching across Anthropic, OpenRouter, Gemini, Qwen DashScope — reduces costs up to 95% (v0.3.2)

🤖 Multi-Agent Orchestration

  • Sessions are fully isolated agents — each session is an independent agent with its own brain, provider, model, working directory, and history. Zero context contamination between concurrent sessions, guaranteed by Rust’s async runtime and type system
  • Typed sub-agents: general, explore, plan, code, research — each with tailored tool access
  • Team orchestration: team_create, team_broadcast, team_delete for coordinated workflows
  • Spawn/wait/resume sub-agents with A2A protocol support
  • ALWAYS_EXCLUDED tools per agent type for safety boundaries

🌐 Channel-Native Communication

  • Telegram, Discord, Slack, WhatsApp, Trello — respond to messages, send files, manage threads
  • Cross-channel crash recovery — pending requests route back to originating channel on restart (v0.2.93)
  • DB-persisted channel sessions — state survives restarts
  • Voice support — local Whisper STT + Piper TTS, fully offline. Voicebox with STT/TTS fallback chains, 2s liveness probe, librosa error translator, per-provider fallback chains in config.toml (v0.3.28)
  • Cross-channel stable session suffixes — Discord, Slack, WhatsApp all use [chat:<id>] suffix pattern for reliable session resolution, shared channels::session_resolve module with suffix-first lookup + legacy forward-migration (v0.3.29)
  • Follow-up message = ESC x2 cancel — sending a message during an active agent run cancels the current run and starts fresh, across all channels (v0.3.30)
  • ZIP attachment handling — extracts and processes ZIP files inline (text inlined, images get vision markers, PDFs extracted, 50 files/10MB cap) (v0.3.30)
  • Topic-aware channel_searchtopic_id filter for Telegram forum supergroups (v0.3.30, #127)
  • Video uploads across all channels — Slack, Telegram, Discord, WhatsApp, and Trello automatically route video attachments to analyze_video when vision is enabled (v0.3.17)
  • Telegram 20 MB Bot API cap — surfaces “compress to under 20 MB and resend” message instead of silently failing on large uploads (v0.3.17)
  • Telegram dropped video/animation — now downloads and routes through vision processing, including iPhone .mov uploads auto-converted to MP4-backed Animation (v0.3.17)
  • Slack intermediate-vs-final dedup race closed — captures and awaits all IntermediateText JoinHandles before dedup check, with post-completion sweep for late entries (v0.3.17)
  • Clean display text — all channels persist clean text to DB and TUI instead of LLM metadata brackets (v0.3.17)

🧠 Self-Healing & Self-Improvement (v0.3.7)

  • Recursive Self-Improvement (RSI) — agent analyzes its own performance, identifies patterns, and autonomously rewrites brain files (v0.3.6)
  • Feedback ledger — persistent SQLite table recording every tool success/failure, user correction, provider error (v0.3.6)
  • Phantom tool call detection — catches when the model narrates file changes in prose without executing tools (v0.3.7)
  • Expanded phantom detection — catches “Now <file-op gerund>” phantoms (creating/writing/editing) and build/deploy intent + past-tense completion claims. Gaslighting and phantom detectors extracted into their own module (v0.3.17)
  • RSI escalation for repeat violations — violation counter bumps on existing rules instead of deduping away. Rules that keep getting broken get louder, not silenced (v0.3.17)
  • RSI feedback records actual model used — when helpers remap a mismatched model to the provider default, RSI now records the resolved model instead of the impossible original pair (v0.3.19)
  • Append-only brain files — brain files (SOUL.md, TOOLS.md, etc.) are now append-only with backup-before-write to prevent data loss (v0.3.13)
  • Upstream template sync — automatically syncs brain file templates from the repo with version gating and append-only diffs (v0.3.15)
  • RSI alert suppression — suppresses alerts whose dimension already has a fix commit, stale alerts age out (v0.3.13)
  • Partial JSON repair — closes unterminated strings, balances brackets, strips trailing commas. Wired into 5 drop sites across OpenAI-compatible providers (v0.3.17)
  • Context budget management: 65% soft / 90% hard compaction thresholds with 3-retry LLM fallback. Real-time ctx counter uses provider-reported input_tokens verbatim (v0.3.28, calibration system removed)
  • Stuck stream detection: 2048-byte rolling window catches repeating patterns, auto-recover
  • Gaslighting defense: strips tool-refusal preambles mid-turn across 4+ phrase families
  • Auto-fallback on rate limits — saves state mid-stream, resumes on fallback provider
  • RetryAttempt progress event — TUI shows “⏳ Retry 2/3 — stream dropped” so you see transient recovery in progress (v0.3.17)
  • Mid-stream decode retry — 3x backoff before provider fallback (v0.3.0)
  • Non-streaming compatibility — synthesizes full stream events from non-streaming JSON (v0.3.7)
  • Per-session message queue isolation — prevents cross-session message bleeding in split panes and channels (v0.3.13)
  • Tool loop reasoning markers persisted — reasoning content survives across tool loop iterations (v0.3.19)
  • @ file picker fixed for large repos — skips .git/.hg/.svn directories, raised result cap to 20k (v0.3.19)

🖥️ Terminal UI Excellence (v0.3.2)

  • Real-time tok/s throughput meter — footer displays live tokens-per-second during streaming (between model info and approval policy pill), counts only active streaming time, persists last rate during idle (v0.3.30)
  • Dynamic plan widget — hides tasks that don’t fit terminal height instead of overflowing (v0.3.30)
  • Per-pane error & notification banners — dedicated error/notification display per TUI pane for better visibility (v0.3.20)
  • Scroll fixes — removed load_more_history() from scroll handler (was causing scroll-up to overshoot hundreds of pages), preserved scroll position during streaming and system messages, skip scroll compensation on first render (v0.3.25)
  • Auto-title fires at end of first turn — works across all channels, not just TUI (v0.3.25)
  • Header card overlay replaces splash screen — animated, responsive, vanishes after load
  • Select/Drag to Copy — native mouse selection in TUI, auto-copies to clipboard on release
  • O(N) input render — tall pastes no longer cause quadratic render cost; scroll-to-cursor preserved
  • Emoji cursor rendering — grapheme cluster extraction for multi-byte emoji highlighting
  • Line navigation in multiline — Up/Down navigates lines inside recalled multi-line input
  • F12 mouse capture toggle — toggle native terminal text selection without exiting TUI
  • Async session load — instant first paint, messages load in background
  • Video attachments in TUI — pasting a video path emits <<VID:path>>, top-right indicator labels each as Video #N, chat display rewrites to [VID: clip.mp4] (v0.3.17)
  • Thinking content persisted to DB — captured on both ResponseComplete and IntermediateText events (v0.3.17)
  • Approval policy read at runtime — loaded from config on every tool request instead of cached at startup (v0.3.17)

🔧 Developer Experience

  • Bang operator (!cmd) — run shell commands directly from TUI input, no LLM round-trip (v0.3.1)
  • Full CLI surface: 20+ subcommands (/models, /approve, /compact, /rebuild, /evolve, /new, /doctor, /btw, /mission-control, /skills, /repo-audit, etc.)
  • /btw parallel agent — spawn an isolated sub-agent for side tasks while the main conversation continues (v0.3.15)
  • Mission Control (/mission-control) — full-screen dashboard showing RSI inbox, activity log, and cron schedule in one view (v0.3.16)
  • Skills system (/skills) — browse and launch workflow templates with fuzzy-finding, auto-registered as slash commands (v0.3.16)
  • Programmatic /evolve — bypasses LLM, runs update directly (v0.3.1)
  • Auto-update on startup[agent] auto_update = true silently installs + hot-restarts (v0.3.1)
  • Dynamic tools — runtime-defined via TOML (HTTP + shell executors)
  • Split panes — tmux-style parallel sessions with layout persistence
  • Usage Dashboard/usage command shows daily tokens, cost, active models, session categories, project activity (v0.3.9)
  • Onboarding welcome — personalized first-time detection with welcome message and guided setup (v0.3.13)
  • Recent file memory — persists recent file paths across sessions to anchor the agent (v0.3.13)
  • Bash hardening — rejects interactive commands up-front, short-circuits exact same failing command retries, tilde expansion fixed (v0.3.13)
  • SSH askpass detection — detects password prompts on remote servers and aborts gracefully instead of hanging (v0.3.16)
  • Async proactive compaction — at 65% context, compaction runs in background without blocking the chat (v0.3.16)
  • rename_session — agent proactively renames the current session with a descriptive title (3–8 words). Useful for long-running conversations where default titles become unhelpful (v0.3.24)
  • follow_up_question — agent asks the user a multi-choice question with up to 8 button options. Implemented across all channels: Telegram (inline keyboard), Discord (button components), Slack (Block Kit actions), WhatsApp (quick replies) (v0.3.24, #94)
  • Auto-generated session titles — new sessions get titles from the first user message via background LLM call. Never enters conversation context. Thinking-only model fallback extracts title from reasoning block (v0.3.24, v0.3.29 #121)
  • /models picker overhaul — surfaces unconfigured providers with 🔒 lock + setup help text, single-source CLI model list, custom-provider empty-state help (v0.3.30, #126)
  • RTK Token Savings — bundled RTK binary (4MB, v0.40.0) as default feature. Zero-config proxy intercepts tool output, filters via Rust, returns token-optimized version. 100+ commands (git, cargo, npm, docker, kubectl, grep, find, ls, tree, curl), blocklist for interactive commands. /rtk slash command shows savings stats. Real-world: 53.5% token savings (v0.3.25, #102). Sysadmin expansion (v0.3.34): added 11 sysadmin commands (ps, top, lsof, netstat, ss, journalctl, dmesg, dig, nslookup, host, traceroute) that were bypassing RTK entirely, plus bundled rtk_filters.toml.example with 8 conservative starter rules
  • Tool call stacking — 3+ consecutive tool call groups collapse into single summary line in TUI. Ctrl+O expands/collapses. Shows “N tool calls” or “N tool calls (M groups)” (v0.3.25)
  • hashline_edit tool — hash-anchored file editing. Each line gets 2-char content hash from read_file(hashline=true). Reference lines as LINE#ID, stale hashes rejected before changes applied. Batch edits supported. Collision detection escalates to edit_file fallback (v0.3.25, #60; v0.3.26 #105)
  • Sensitive data redaction — applied to tool output in TUI and all channels. Patterns: env var suffixes (_pass=, _password=, _secret=, _token=, _key=, _apikey=, _api_key=, _credential=, auth=), piped secrets, plus existing (sk-*, ghp, xoxb-, AWS keys, Bearer tokens) (v0.3.25)
  • Context budget footer for channels — every channel (Telegram, Discord, Slack, WhatsApp) appends “ctx: 8K/200K 4%” footer to final message, matching TUI footer. Always delivered even when body fully consumed (v0.3.25, #104)
  • Generic deliver_api_key for cron jobs — HTTP webhook Bearer token auth configurable per-job via cron_manage tool (v0.3.18)
  • File paths starting with / no longer treated as slash command typos/Users/.../file.pdf yo crabs check this works correctly (v0.3.18)
  • Truncation continuations no longer trigger provider fallback — mid-sentence continuations stay on the same provider (v0.3.18)
  • Fallback error reason surfaced in TUI — when fallback fires, the underlying error shows as a system message (v0.3.18)
  • OpenAI-compatible embedding API — configure external embedding providers (OpenAI, Ollama, Jina, LM Studio) instead of downloading 300MB GGUF model. Dynamic vector dimensions from API response (v0.3.19)
  • FTS5-only memory mode for VPS — pure keyword search with zero RAM overhead. Auto-detects VPS environments and configures automatically (v0.3.19)
  • img2img for generate_image — optional image parameter (local path or HTTPS URL) feeds Gemini inlineData for editing user-uploaded images. OpenAI-shaped backends reject with clear error pointing at Gemini (v0.3.30)
  • PDF page_range param — pass "1-30", "5,7,10-15", or "3" for targeted extraction. Text-first routing skips Gemini for text-native PDFs, inline cap raised to ~60 pages (v0.3.31)
  • Telegram forum topic routingthread_id carries through full pipeline, new list_topics action surfaces topic names → IDs for proactive sends and replies (v0.3.31, #130, #131)
  • Agent self-awareness — compiled features surface in system prompt with check-first directive, Known paths section for logs so the agent stops guessing file locations (v0.3.31)
  • RSI skill proposalsskill as third proposal kind alongside tool/command, writes SKILL.md brain file (v0.3.31)
  • Fun POST-COMPACTION PROTOCOL prompts — compaction now delivers delightful re-orientation prompts; opt out with [agent] silent_compaction = true (v0.3.31)
  • Evolve hardening — remove+rename dance for busy Linux binaries, delayed systemd-run restart, structured tracing on every failure branch, pre-flight count_matching_systemd_units check (v0.3.32, #136)
  • User-correction metadata — captures actual user message via display_text_override instead of the 236-char Telegram channel prefix that was polluting the feedback ledger (v0.3.33, #138, PR #140)
  • Phantom post-success exemption — turn-scoped tool_calls_completed_this_turn counter + phantom_eligible gate prevent the phantom detector from firing on completion acks (“Pushed.”, “Done.”) after real tool runs. New FINISHING A TURN directive enforces one-line ack, no verification re-runs, no restating conclusions (unreleased)
  • follow_up_question intermediate flush — Telegram, Discord, Slack, and WhatsApp now flush pending intermediate text before sending the question, closing a race where buttons arrived before the explanatory text that preceded them (issue #142)
  • Provider registry single source of truth — fixed opencode/ollama/bedrock/vertex silent TUI omission; one 16-entry table instead of drifted if-else chains (unreleased, #141)

🌐 Browser Automation

  • Full CDP support: navigate, click, type, screenshot, JS eval, wait for selectors, find elements
  • Multi-step navigation hardeningtext=/xpath= selector prefixes, recovery hints on click failures, semantic loop detection (4+ screenshots in 8 iterations triggers abort), no-op screenshot rejection, same-URL short-circuit (v0.3.28)
  • browser_find tool — enumerate elements by CSS, XPath, text, or aria-label with stable selectors (v0.3.13)
  • browser_close tool — close browser tabs and free CDP sessions, prevents stale page reuse (v0.3.18)
  • Headless or headed mode, element-specific screenshots
  • Cookie/session persistence across browser sessions
  • Per-session tab isolation — no cross-session DOM stomping (v0.3.13)
  • Smart default browser detection — auto-detects your default Chromium on macOS, Linux, and Windows (v0.3.13)

🎥 Video Vision (v0.3.17)

  • analyze_video tool — routes video attachments through Gemini’s multimodal API. Inline bytes for files ≤18 MB, resumable Files API upload for larger files
  • Video uploads across all channels — Slack, Telegram, Discord, WhatsApp, and Trello automatically route video attachments to analyze_video
  • <<VID:path>> marker — analogous to <<IMG:>> for images. Supports mp4/m4v/mov/webm/mkv/avi/3gp/flv

📊 Usage Analytics (v0.3.9)

  • Interactive dashboard/usage command with daily token counts, cost estimates, active models, session categories
  • Session auto-categorizer — heuristic classification (dev, ops, research, chat, etc.)
  • Tool execution tracking — DB records every tool call for per-project analytics
  • Project activity view — normalized paths, category breakdown, token distribution
  • Soft-delete sessions — metadata preserved even after session removal

🔐 Security & Privacy

  • Zero telemetry — nothing sent anywhere, ever
  • API key security: zeroize on drop, separate keys.toml (chmod 600)
  • Tool path resolution centralized — tilde expansion, relative paths, symlink handling in one place (v0.3.2)
  • Auto-approve propagationapproval_policy = "auto-always" actually reaches tool loop (v0.3.2)

📊 Testing & Quality

  • 3,616+ tests covering providers, tools, channels, TUI, self-healing, crash recovery, browser automation
  • CI/CD: GitHub Actions, CodeQL, cargo audit security checks, release automation

🔧 Built-in Skills (v0.3.17)

  • 5 safe built-in skills: opencli, browser-cdp, a2a-gateway, dynamic-tools, repo-audit
  • SKILLS section added to help screen and splash integration
  • /repo-audit skill — language-agnostic repository health checks. 5-phase pipeline: language detection → native tool execution → git metrics → AST analysis → scoring + recommendations (v0.3.18)

Quick Start

# Install (Linux/macOS)
ARCH=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/')
OS=$(uname -s | tr A-Z a-z)
# Requires jq for reliable tag parsing; fallback to grep if unavailable
TAG=$(command -v jq >/dev/null 2>&1 && curl -s https://api.github.com/repos/adolfousier/opencrabs/releases/latest | jq -r .tag_name || curl -s https://api.github.com/repos/adolfousier/opencrabs/releases/latest | grep -o '"tag_name":"[^"]*"' | cut -d'"' -f4)
curl -fsSL "https://github.com/adolfousier/opencrabs/releases/download/${TAG}/opencrabs-${TAG}-${OS}-${ARCH}.tar.gz" | tar xz
./opencrabs

# Or via Cargo (requires Rust 1.94+)
cargo install opencrabs --locked

# Auto-update enabled by default; disable with [agent] auto_update = false in ~/.opencrabs/config.toml

Architecture Overview

┌─────────────────────────────────────────┐
│           OpenCrabs Binary              │
│  (Single 17-22 MB Rust executable)      │
├─────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────────┐  │
│  │   TUI       │  │   CLI Daemon    │  │
│  │  (crossterm)│  │  (systemd/launchd)││
│  └─────────────┘  └─────────────────┘  │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │        Provider Registry         │   │
│  │  • Native: Anthropic, OpenAI... │   │
│  │  • CLI: Claude Code, OpenCode   │   │
│  │  • Custom: any OpenAI-compatible│   │
│  │  • Fallback chain w/ sticky swap│   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │        Tool Layer                │   │
│  │  • 50+ built-in tools           │   │
│  │  • Dynamic tools via TOML       │   │
│  │  • ALWAYS_EXCLUDED per agent    │   │
│  │  • Centralized path resolution  │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │        Channel Adapters          │   │
│  │  • Telegram/Discord/Slack/      │   │
│  │    WhatsApp/Trello/Voice        │   │
│  │  • Cross-channel crash recovery │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │        Self-Healing Layer       │   │
│  │  • Context budget management    │   │
│  │  • Stuck stream detection       │   │
│  │  • Gaslighting refusal strip    │   │
│  │  • Panic recovery + cancel persist││
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │        Persistence              │   │
│  │  • SQLite sessions + memory DB  │   │
│  │  • Brain files (~/.opencrabs/)  │   │
│  │  • Hybrid FTS5 + vector search  │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Next Steps

Installation

Three ways to get OpenCrabs running.

Grab a pre-built binary from GitHub Releases.

Linux (amd64)

sudo apt install -y jq libgomp1
TAG=$(curl -s https://api.github.com/repos/adolfousier/opencrabs/releases/latest | jq -r .tag_name)
curl -fsSL "https://github.com/adolfousier/opencrabs/releases/download/${TAG}/opencrabs-${TAG}-linux-amd64.tar.gz" | tar xz
./opencrabs

Linux (arm64)

sudo apt install -y jq libgomp1
TAG=$(curl -s https://api.github.com/repos/adolfousier/opencrabs/releases/latest | jq -r .tag_name)
curl -fsSL "https://github.com/adolfousier/opencrabs/releases/download/${TAG}/opencrabs-${TAG}-linux-arm64.tar.gz" | tar xz
./opencrabs

macOS (arm64 / Apple Silicon)

TAG=$(curl -s https://api.github.com/repos/adolfousier/opencrabs/releases/latest | jq -r .tag_name)
curl -fsSL "https://github.com/adolfousier/opencrabs/releases/download/${TAG}/opencrabs-${TAG}-macos-arm64.tar.gz" | tar xz
./opencrabs

Windows

Download from GitHub Releases.

The onboarding wizard handles everything on first run.

Terminal permissions required. OpenCrabs reads/writes brain files, config, and project files. Your terminal app needs filesystem access or the OS will block operations.

OSWhat to do
macOSSystem Settings → Privacy & Security → Full Disk Access → toggle your terminal app ON (Alacritty, iTerm2, Terminal, etc.). If not listed, click “+” and add it from /Applications/. Without this, macOS repeatedly prompts “would like to access data from other apps”.
WindowsRun your terminal (Windows Terminal, PowerShell, cmd) as Administrator on first run, or grant the terminal write access to %USERPROFILE%\.opencrabs\ and your project directories. Windows Defender may also prompt — click “Allow”.
LinuxEnsure your user owns ~/.opencrabs/ and project directories. On SELinux/AppArmor systems, the terminal process needs read/write access to those paths. Flatpak/Snap terminals may need --filesystem=home or equivalent permission.

/rebuild works even with pre-built binaries — it auto-clones the source to ~/.opencrabs/source/ on first use, then builds and hot-restarts.

Option 2: Build from Source

Required for /rebuild, adding custom tools, or modifying the agent.

The setup script auto-detects your platform (macOS, Debian/Ubuntu, Fedora/RHEL, Arch) and installs all build dependencies + Rust:

# Install all dependencies
curl -fsSL https://raw.githubusercontent.com/adolfousier/opencrabs/main/scripts/setup.sh | bash

# Clone and build
git clone https://github.com/adolfousier/opencrabs.git
cd opencrabs
cargo build --release
./target/release/opencrabs

Manual setup

If you prefer to install dependencies yourself:

  • Rust stableInstall Rust. Stable toolchain works since v0.2.85
  • An API key from at least one supported provider
  • SQLite (bundled via rusqlite)
  • macOS: brew install cmake pkg-config
  • Debian/Ubuntu: sudo apt install build-essential pkg-config libssl-dev cmake
  • Fedora/RHEL: sudo dnf install gcc gcc-c++ make pkg-config openssl-devel cmake
  • Arch: sudo pacman -S base-devel pkg-config openssl cmake
git clone https://github.com/adolfousier/opencrabs.git
cd opencrabs
cargo build --release
./target/release/opencrabs

OpenCrabs uses keys.toml instead of .env for API keys. The onboarding wizard will help you set it up, or edit ~/.opencrabs/keys.toml directly.

Option 3: Docker

Run OpenCrabs in an isolated container. Build takes ~15min (Rust release + LTO).

git clone https://github.com/adolfousier/opencrabs.git
cd opencrabs
docker compose -f src/docker/compose.yml up --build

Config, workspace, and memory DB persist in a Docker volume across restarts. API keys in keys.toml are mounted into the container at runtime — never baked into the image.

Autostart on Boot

Keep OpenCrabs running as a background daemon that starts with your system.

Linux (systemd)

cat > ~/.config/systemd/user/opencrabs.service << 'EOF'
[Unit]
Description=OpenCrabs AI Agent
After=network.target

[Service]
ExecStart=%h/.cargo/bin/opencrabs daemon
Restart=on-failure
RestartSec=5
Environment=OPENCRABS_HOME=%h/.opencrabs

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable opencrabs
systemctl --user start opencrabs

Check status: systemctl --user status opencrabs | Logs: journalctl --user -u opencrabs -f

macOS (launchd)

cat > ~/Library/LaunchAgents/com.opencrabs.agent.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.opencrabs.agent</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/opencrabs</string>
        <string>daemon</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/opencrabs.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/opencrabs.err</string>
</dict>
</plist>
EOF

launchctl load ~/Library/LaunchAgents/com.opencrabs.agent.plist

Update the path in ProgramArguments to match your install location.

Windows (Task Scheduler)

  1. Win + Rtaskschd.msc
  2. Create Basic Task → Name: OpenCrabs
  3. Trigger: When I log on
  4. Action: Start a programC:\Users\<you>\.cargo\bin\opencrabs.exe, Arguments: daemon
  5. In Properties > Settings, check If the task fails, restart every 1 minute

Or via PowerShell:

$action = New-ScheduledTaskAction -Execute "$env:USERPROFILE\.cargo\bin\opencrabs.exe" -Argument "daemon"
$trigger = New-ScheduledTaskTrigger -AtLogon
$settings = New-ScheduledTaskSettingsSet -RestartCount 3 -RestartInterval (New-TimeSpan -Minutes 1)
Register-ScheduledTask -TaskName "OpenCrabs" -Action $action -Trigger $trigger -Settings $settings

Updating

  • Binary users: Type /evolve in the TUI to download the latest release
  • Source users: git pull && cargo build --release, or type /rebuild in the TUI
  • Docker users: docker compose pull && docker compose up -d

Configuration

OpenCrabs uses two config files stored at ~/.opencrabs/:

FilePurpose
config.tomlProvider settings, features, channel connections
keys.tomlAPI keys and secrets (never committed to git)

Workspace Layout

~/.opencrabs/
├── config.toml          # Main configuration
├── keys.toml            # API keys (gitignored)
├── commands.toml        # Custom slash commands
├── opencrabs.db         # SQLite database
├── SOUL.md              # Agent personality
├── IDENTITY.md          # Agent identity
├── USER.md              # Your profile
├── MEMORY.md            # Long-term memory
├── AGENTS.md            # Agent behavior docs
├── TOOLS.md             # Tool reference
├── SECURITY.md          # Security policies
├── HEARTBEAT.md         # Periodic check tasks
├── memory/              # Daily memory notes
│   └── YYYY-MM-DD.md
├── images/              # Generated images
├── logs/                # Application logs
└── skills/              # Custom skills/plugins

Provider Configuration

See Provider Setup for detailed provider configuration.

Quick example — add Anthropic:

# config.toml
[providers.anthropic]
enabled = true
default_model = "claude-sonnet-4-20250514"
# keys.toml
[providers.anthropic]
api_key = "sk-ant-..."

Provider Priority

When multiple providers are enabled, the first one found in this order is used for new sessions:

MiniMax > OpenRouter > Anthropic > OpenAI > Gemini > Custom

Each session remembers which provider and model it was using. Switch providers per-session via /models.

Feature Flags

# config.toml
[agent]
working_directory = "/path/to/default/dir"
thinking = "on"                # "on", "off", or "budget_XXk"

[a2a]
enabled = false
bind = "127.0.0.1"
port = 18790

[image.generation]
enabled = true
model = "gemini-3.1-flash-image-preview"

[image.vision]
enabled = true
model = "gemini-3.1-flash-image-preview"

First Session

When you launch OpenCrabs for the first time, the onboarding wizard walks you through setup.

Onboarding Flow

The wizard is a keyboard-driven TUI with 8 steps. Navigate with arrow keys, Tab to advance, Esc to go back.

StepScreenWhat you do
1Mode SelectChoose QuickStart (skip channels) or Advanced
2WorkspacePick a working directory for file operations
3Provider & AuthSelect provider → paste API key → pick model (fetched live)
4ChannelsSpace to toggle channels on/off → Enter on each to configure
5VoiceSTT provider (Groq, local Whisper, or off) + TTS voice
6ImageVision toggle + generation model + API key
7DaemonInstall background daemon (optional)
8Brain SetupAuto-generate SOUL.md, IDENTITY.md from your profile

Channel Setup (Step 4)

The channels screen lists 5 integrations: Telegram, Discord, WhatsApp, Slack, Trello.

  • Space toggles a channel on/off
  • Enter on an enabled channel opens its setup screen (token, IDs, allowlists)
  • Enter on Continue or Tab skips to the next step
  • Each channel setup screen has a Test Connection button

See Channels Overview for the full navigation guide.

Re-running Setup

You can jump to any step without re-running the full wizard:

CommandStep
/onboardFull wizard
/onboard:providerProvider & model selection
/onboard:channelsChannel picker
/onboard:voiceVoice setup
/onboard:imageImage setup
/onboard:brainBrain file generation

After onboarding, your agent boots up and introduces itself. It reads its brain files (SOUL.md, IDENTITY.md, AGENTS.md, TOOLS.md) and starts a conversation.

Bootstrap

On the very first run, the agent goes through a bootstrap phase:

  • Gets to know you (name, preferences, work style)
  • Establishes its identity (name, personality, emoji)
  • Opens SOUL.md together to discuss values
  • Sets up USER.md with your profile

The bootstrap file (BOOTSTRAP.md) deletes itself when complete.

Key Commands

CommandDescription
/helpShow all available commands
/modelsSwitch provider or model
/newCreate a new session
/sessionsSwitch between sessions
/cdChange working directory
/compactManually compact context
/evolveDownload latest version
/rebuildBuild from source
/approveSet approval policy

Approval Modes

Control how much autonomy the agent has:

ModeBehavior
/approveAsk before every tool use (default)
/approve autoAuto-approve for this session
/approve yoloAuto-approve always (persists)

Working Directory

The agent operates within a working directory for file operations. Change it with:

  • /cd command in chat
  • Directory picker in the TUI (Tab to select)
  • config_manager set_working_directory tool

The working directory is persisted per-session — switching sessions restores the directory automatically.

CLI Commands

OpenCrabs has a full CLI with 20+ subcommands for managing every aspect of the agent.

Usage

opencrabs [COMMAND] [OPTIONS]

Commands

CommandDescription
chat (default)Launch the TUI chat interface
daemonRun in background (channels only, no TUI)
agentInteractive multi-turn chat or single-message mode
cronManage scheduled tasks (add/list/remove/enable/disable/test)
channelChannel management (list, doctor)
memoryMemory management (list, get, stats)
sessionSession management (list, get)
dbDatabase management (init, stats, clear)
logsLog management (status, view, clean, open)
serviceSystem service management (install/start/stop/restart/status/uninstall)
statusShow agent status
doctorRun connection health check
onboardRun the setup wizard
completionsGenerate shell completions (bash/zsh/fish/powershell)
versionShow version info
!commandBang operator — Run any shell command instantly without an LLM round-trip. Output shown as system message. e.g. !git status, !ls -la
/evolveAuto-update — Downloads latest release and hot-restarts. Runs automatically on startup when [agent] auto_update = true
/btwParallel agent — Spawns an isolated sub-agent for a side task while the main conversation continues. e.g. /btw research the latest Rust async patterns
/mission-controlMission Control — Full-screen dashboard showing RSI inbox (pending proposals), activity log (improvements applied), and cron schedule. Navigate with vim keys, apply/reject proposals with a/r.
/skillsSkills picker — Browse and launch workflow templates with fuzzy-finding. Every loaded skill auto-registers as a slash command.
/security-auditSecurity audit — Comprehensive language-agnostic security & CVE audit. Detects project type, runs the right scanner, reviews recent diff for injection/auth/crypto patterns, scores 0-100.
/cost-estimateCost estimate — Codebase cost-to-build estimate, AI-assisted ROI breakdown, and fair-market valuation. Asks for business context before producing the valuation range.
/repo-auditRepo audit — Language-agnostic repository health checks. 5-phase pipeline: language detection → native tool execution → git metrics → AST analysis → scoring + recommendations. Covers Rust, JS/TS, Python, Go.

Configuration Flags

FlagDefaultDescription
[agent] auto_updatetrueAuto-install new releases on startup and hot-restart. Set to false to keep the manual prompt dialog.

Keyboard Shortcuts (TUI)

ShortcutAction
F12Toggle mouse capture on/off for native terminal text selection

Startup Update Prompt

When a new version is available, a centered dialog appears on the splash screen asking you to accept (Enter) or skip (Esc). Accepting triggers /evolve automatically. After update, the binary restarts and the splash shows the new version.

Channel Commands

/doctor, /help, /usage, /evolve, and system commands work directly on Telegram, Discord, Slack, and WhatsApp without going through the LLM. They execute instantly and return results in the channel.

All channel command logic is centralized in src/channels/commands.rs (847 lines) – a shared handler that eliminates duplicated command logic across 5 channel implementations. Each channel delegates to try_execute_text_command() for consistent behavior.

/evolve on channels now runs directly (downloads + installs the binary) without requiring an LLM round-trip. Previously it was routed through the agent.

Chat Mode

# Default — launch TUI
opencrabs

# Same as above
opencrabs chat

Agent Mode

Non-interactive mode for scripting and automation:

# Interactive multi-turn chat
opencrabs agent

# Single-message mode
opencrabs agent -m "What files changed today?"

Daemon Mode

Run OpenCrabs without the TUI — useful for servers where you only need channel bots. Supports a health endpoint for monitoring.

opencrabs daemon

The agent processes messages from all connected channels (Telegram, Discord, Slack, WhatsApp) but without the terminal UI. Channel bots auto-reconnect on network failures with 5-second backoff.

Health Endpoint

Add to config.toml to expose a health check:

[daemon]
health_port = 8080

Then GET http://localhost:8080/health returns 200 OK with JSON status. Useful for systemd watchdog, uptime monitors, or load balancers.

Service Management

Install OpenCrabs as a system service (launchd on macOS, systemd on Linux):

opencrabs service install
opencrabs service start
opencrabs service stop
opencrabs service restart
opencrabs service status
opencrabs service uninstall

Cron Management

# List all cron jobs
opencrabs cron list

# Add a new cron job
opencrabs cron add \
  --name "Daily Report" \
  --cron "0 9 * * *" \
  --tz "America/New_York" \
  --prompt "Check emails and summarize" \
  --provider anthropic \
  --model claude-sonnet-4-20250514 \
  --thinking off \
  --deliver-to telegram:123456

# Remove a cron job (accepts name or ID)
opencrabs cron remove "Daily Report"

# Enable/disable (accepts name or ID)
opencrabs cron enable "Daily Report"
opencrabs cron disable "Daily Report"

TUI Keyboard Shortcuts

KeyAction
EnterSend message
EscCancel / dismiss
Ctrl+NNew session
Ctrl+LSessions screen
Ctrl+KClear current session
Ctrl+OToggle tool group collapse
|Split pane horizontally
_Split pane vertically
Ctrl+XClose focused pane
TabCycle pane focus / Accept autocomplete
Up/DownNavigate suggestions / sessions
/Start slash command (e.g. /help, /models)
:Start emoji picker

Troubleshooting

Common issues and how to fix them.

Windows Defender Blocking OpenCrabs

Windows Defender may flag opencrabs.exe as suspicious because it’s an unsigned binary that executes shell commands and makes network requests. This is a false positive.

Add an exclusion:

  1. Open Windows SecurityVirus & threat protection
  2. Virus & threat protection settingsManage settings
  3. ExclusionsAdd or remove exclusions
  4. Add an exclusionFile → select opencrabs.exe

Or via PowerShell (admin):

Add-MpPreference -ExclusionPath "C:\path\to\opencrabs.exe"

If SmartScreen blocks the first run, click More infoRun anyway.


Binary Won’t Start or Crashes

Startup Errors

Run with debug logging to see what’s failing:

opencrabs -d chat

Logs are written to ~/.opencrabs/logs/.

Download a Previous Version

If the latest release crashes on your machine, download a previous working version from GitHub Releases:

# List all releases
gh release list -R adolfousier/opencrabs

# Download a specific version
gh release download v0.2.66 -R adolfousier/opencrabs -p "opencrabs-*$(uname -m)*$(uname -s | tr A-Z a-z)*"

/evolve — Update & Rollback

/evolve downloads the latest release from GitHub and hot-swaps the binary. It has built-in safety checks:

  1. Download — Fetches the platform-specific binary from GitHub Releases
  2. Pre-swap health check — Runs opencrabs health-check on the new binary (10s timeout). If it fails, the new binary is deleted and your current version stays untouched.
  3. Backup — Creates a backup at <binary-path>.evolve_backup
  4. Atomic swap — Replaces the current binary
  5. Post-swap health check — Verifies the swapped binary works. If it fails, auto-rolls back to the backup.
  6. Restart — exec()-restarts into the new version
  7. Brain update prompt — After restart, your crab announces the new version, diffs brain templates against your local files, and offers to update them

If /evolve Fails

The most common reason is the health check caught an issue — your current version stays safe. If something went wrong after the swap:

# Restore the backup manually
cp /path/to/opencrabs.evolve_backup /path/to/opencrabs
chmod +x /path/to/opencrabs

Cargo Install Fallback

When /evolve uses cargo install (building from source), it tries the stable toolchain first. If that fails, it automatically falls back to cargo +nightly. The progress message shows which toolchain succeeded.

Check-Only Mode

The agent can check for updates without installing:

/evolve check_only=true

Bash Tool Safety

The bash tool includes a hard command blocklist that prevents catastrophic commands even if accidentally approved:

  • rm -rf /, sudo rm -rf .
  • mkfs, dd to /dev/
  • Fork bombs
  • /etc overwrites, /proc writes
  • Sensitive file exfiltration
  • Crypto mining commands

These are blocked at the tool level — no configuration needed.


Older CPUs (Pre-2011 / No AVX)

Some features require AVX/AVX2 instructions. Since v0.2.67, OpenCrabs detects CPU capabilities at runtime and automatically hides unavailable options in the onboarding wizard.

What’s Affected

FeatureCPU RequirementFallback
Local embeddings (memory search)AVX (Sandy Bridge 2011+)FTS-only keyword search (still works)
Local STT (rwhisper/candle)AVX2 (Haswell 2013+)API mode (Groq Whisper) or disabled
Local TTS (Piper)None — tested on 2007 iMacWorks on any x86/ARM CPU

Symptoms

  • Local STT option doesn’t appear in /onboard:voice — your CPU lacks AVX2
  • Local TTS (Piper) should always be available — no CPU restrictions, works on machines as old as 2007
  • Memory search falls back to text-only FTS silently
  • Crash with “illegal instruction” on very old CPUs

Fix: Build from Source with CPU Targeting

# For your specific CPU (best performance)
RUSTFLAGS="-C target-cpu=native" cargo build --release

# For Sandy Bridge (AVX but no AVX2)
RUSTFLAGS="-C target-cpu=sandybridge" cargo build --release

macOS with Apple Silicon

Local STT uses Metal GPU acceleration on macOS — no CPU flags needed. Works out of the box on M1/M2/M3/M4.


Config Issues

Config Won’t Load

If config.toml has a syntax error, OpenCrabs will fail to start. Restore from backup:

# Check if a backup exists
ls ~/.opencrabs/config.toml.backup

# Restore it
cp ~/.opencrabs/config.toml.backup ~/.opencrabs/config.toml

Or reinitialize with defaults:

opencrabs init --force

Warning: --force overwrites your config. Back up keys.toml first — it contains your API keys.

Manual Backup

Always keep a backup of your critical files:

cp ~/.opencrabs/config.toml ~/.opencrabs/config.toml.backup
cp ~/.opencrabs/keys.toml ~/.opencrabs/keys.toml.backup
cp ~/.opencrabs/commands.toml ~/.opencrabs/commands.toml.backup

Channel Issues

Telegram

Bot not responding:

  1. Verify token from @BotFather is in keys.toml
  2. Check your numeric user ID is in allowed_users
  3. If respond_to = "mention", you must @mention the bot in groups

Regenerate bot token:

  1. Open @BotFather on Telegram
  2. /mybots → select bot → API Token → Revoke
  3. Copy new token to keys.toml under [channels.telegram]
  4. Restart OpenCrabs

Re-setup from scratch: Run /onboard:channels in the TUI.

WhatsApp

QR code / session expired:

WhatsApp sessions are stored at ~/.opencrabs/whatsapp/session.db. To reconnect:

# Delete the session file
rm ~/.opencrabs/whatsapp/session.db

# Re-pair via onboarding
opencrabs chat --onboard

Or press R on the WhatsApp onboarding screen to reset and get a fresh QR code.

Messages not received:

  • Verify phone number is in allowed_phones using E.164 format: "+15551234567"
  • Empty allowed_phones = [] means accept from everyone

Discord

Bot not receiving messages:

  1. Ensure Message Content Intent is enabled in Discord Developer Portal → Bot settings
  2. Required intents: gateway, guild_messages, direct_messages, message_content
  3. Use the bot token (starts with MTk...), not the application ID

Regenerate token: Discord Developer Portal → Bot → Regenerate Token

Slack

Both tokens required:

  • Bot token (xoxb-...): For sending messages
  • App token (xapp-...): For Socket Mode (receiving events)

Without the app token, the bot can send but not receive messages.

Socket Mode: Must be enabled in app settings → Features → Socket Mode → ON

Trello

Setup:

  1. Get API key: trello.com/app-key
  2. Generate token from the same page
  3. Add board_ids to config — the bot only monitors listed boards
  4. Set poll_interval_secs > 0 to enable polling (default 0 = disabled)

General: Re-run Channel Setup

For any channel issues, re-run the onboarding wizard:

opencrabs chat --onboard

Or type /onboard:channels in the TUI.


Local STT (Speech-to-Text)

Since v0.2.67, local STT uses rwhisper (candle, pure Rust) instead of whisper-rs/ggml. On macOS, it uses Metal GPU acceleration automatically.

Models

ModelSizeQuality
quantized-tiny~42 MBGood for short messages
base-en~142 MBBetter accuracy (English)
small-en~466 MBHigh accuracy
medium-en~1.5 GBBest accuracy

Models download automatically from HuggingFace on first use.

Common Issues

Local STT option not showing in wizard: Your CPU lacks AVX2. Use API mode (Groq Whisper) instead, or build from source with RUSTFLAGS="-C target-cpu=native".

“No audio samples decoded”: Audio file is corrupt or unsupported format. Supported: OGG/Opus, WAV.

Transcription hangs: Times out after 300 seconds. Try a smaller model (quantized-tiny).

Model download fails: Check network connection. Models are fetched from HuggingFace.

Audio too short: Messages under 1 second are automatically padded to prevent tensor errors.

Disabling

[voice]
stt_enabled = false

Local TTS (Text-to-Speech)

Requirements

  • Python 3 must be installed and in PATH
  • Piper installs automatically in a venv at ~/.opencrabs/models/piper/venv/

Voices

VoiceDescriptionSize
ryanUS Male (default)~200-400 MB
amyUS Female~200-400 MB
lessacUS Female~200-400 MB
kristinUS Female~200-400 MB
joeUS Male~200-400 MB
coriUK Female~200-400 MB

Common Issues

“python3 -m venv failed”: Install Python 3. On Ubuntu: sudo apt install python3 python3-venv. On macOS: brew install python3.

“pip install piper-tts failed”: Network issue or pip corrupted. Fix pip first:

python3 -m pip install --upgrade pip

Telegram voice messages show no waveform: This was fixed in v0.2.64 — audio is now properly encoded as OGG/Opus (RFC 7845). Update to latest version.

Voice preview not playing: Preview uses afplay (macOS), aplay (Linux), or powershell (Windows). Ensure audio output is available.

Re-setup

Run /onboard:voice in the TUI to reconfigure STT/TTS mode and re-download models.

Disabling

[voice]
tts_mode = "off"

The memory search engine uses a ~300 MB embedding model (llama.cpp) for semantic search. It requires AVX on x86 CPUs.

Fallback

If embeddings can’t initialize (no AVX, download failed, disk full), memory search falls back to FTS-only (keyword matching). It still works, just less semantic.

Fix for Older CPUs

Build from source with CPU targeting (see Older CPUs above).

Model Location

Models are stored in ~/.local/share/opencrabs/models/ (platform-specific data directory).


Database Issues

Location

  • Main database: ~/.opencrabs/opencrabs.db (SQLite + WAL)
  • WhatsApp session: ~/.opencrabs/whatsapp/session.db

Database Corruption

SQLite with WAL mode is very resilient, but if corruption occurs:

# Back up the corrupted file first
cp ~/.opencrabs/opencrabs.db ~/.opencrabs/opencrabs.db.corrupt

# Reinitialize (WARNING: loses all history)
opencrabs db init

Migration Errors

The database automatically migrates on startup (11 migrations). If migrating from an older version with sqlx, the transition is handled automatically — no manual steps needed.


Building from Source

Quick Setup

curl -fsSL https://raw.githubusercontent.com/adolfousier/opencrabs/main/scripts/setup.sh | bash
git clone https://github.com/adolfousier/opencrabs.git && cd opencrabs
cargo build --release

Build Without Voice (Smaller Binary)

cargo build --release --no-default-features --features telegram,whatsapp,discord,slack,trello

Feature Flags

FlagDefaultDescription
local-sttOnwhisper.cpp for local speech-to-text
local-ttsOnPiper for local text-to-speech
telegramOnTelegram channel
whatsappOnWhatsApp channel
discordOnDiscord channel
slackOnSlack channel
trelloOnTrello channel

Debug Mode

Run with -d for verbose logging:

opencrabs -d chat

Logs go to ~/.opencrabs/logs/ with 7-day retention.

Supported AI Providers

OpenCrabs supports 14 built-in providers + Custom OpenAI Compatible. Switch between them at any time via /models in the TUI or any channel.

ProviderAuthModelsStreamingToolsNotes
Anthropic ClaudeAPI keyClaude Opus 4.6, Sonnet 4.5, Haiku 4.5YesYesExtended thinking, 200K context
OpenAIAPI keyGPT-5 Turbo, GPT-5, o3/o4-miniYesYesModels fetched live
GitHub CopilotOAuthGPT-4o, Claude Sonnet 4+YesYesUses your Copilot subscription — no API charges
OpenRouterAPI key400+ modelsYesYesFree models available. Reasoning output support (Qwen 3.6 Plus, etc.)
Google GeminiAPI keyGemini 2.5 Flash, 2.0, 1.5 ProYesYes1M+ context, vision, image generation
MiniMaxAPI keyM2.7, M2.5, M2.1, Text-01YesYesCompetitive pricing, auto-configured vision
z.ai GLMAPI keyGLM-4.5 through GLM-5 TurboYesYesGeneral API + Coding API endpoints
Claude CLICLI authVia claude binaryYesYesUses your Claude Code subscription
CodexOAuth (PKCE)GPT-5.5, GPT-5.4, GPT-5.3-CodexYesYesNative Codex subscription auth via device-code PKCE — no CLI, no API key
Codex CLICLI authVia @openai/codex binaryYesYesUses your Codex subscription — free tier available
Qwen/DashScopeAPI keyqwen3.6-plus (default)YesYesDashScope API-key provider (replaced OAuth rotation). Local model tool-call extraction from text (bare JSON, Claude-style XML, Qwen formats). Prompt caching via cache_control, rate limit retry with exponential backoff
OllamaOptionalAny Ollama modelYesYesNative local provider — run any model via Ollama API
OpenCodeNoneAny OpenAI-compatible modelYesYesNon-CLI OpenAI-compatible provider
OpenCode CLINoneFree models (Mimo, etc.)YesYesFree — no API key or subscription needed
CustomOptionalAnyYesYesLM Studio, Groq, NVIDIA, any OpenAI-compatible API

How It Works

  • One provider active at a time per session — switch with /models
  • Per-session isolation — each session remembers its own provider and model. Changing provider in the TUI does not affect other active sessions (Telegram, Discord, Slack)
  • Fallback chain — configure automatic failover when the primary provider goes down
  • Models fetched live — no binary update needed when providers add new models
  • Function calling detection — OpenCrabs detects when a model doesn’t support tool use and warns you with a model switch suggestion, rather than silently failing
  • tool_choice: "auto" — sent automatically for OpenAI-compatible providers when tools are active, enabling function calling on models that require explicit opt-in
  • vision_model works on ANY provider — add vision_model = "..." to any built-in or custom provider block and the agent routes incoming images through that model on the same endpoint. No second API key, no Gemini dependency. See Image Generation & Vision for the full two-path setup

Custom Provider Onboarding (v0.3.24)

Adding a custom OpenAI-compatible provider is now smoother:

  • Paste-by-default: Ctrl+V / Cmd+V on the API key field pastes immediately — no need to tab into the field first
  • Enter-to-load: type a model name not in the fetched list and press Enter — it’s added to the list and selected
  • Field refresh: saved values (base URL, API key, model list) appear instantly without restarting the dialog

Provider Registry (v0.3.34)

All provider resolution now routes through a single registry source of truth — no more hardcoded if-else ladders scattered across the codebase. The registry correctly enforces api_key requirements for API providers (Anthropic, OpenAI, GitHub Copilot, Gemini, OpenRouter, MiniMax), so resolution skips them cleanly when keys are missing instead of silently falling back. Adding a new provider is now a one-file change.

Qwen Cache Auto-Enable (v0.3.30)

Custom providers targeting Qwen-shaped endpoints (base URLs containing dashscope, aliyun, aliyuncs, dialagram, or models prefixed with qwen-*) automatically get ephemeral cache_control markers on the system prompt, last streaming message, and last tool call. Zero-config cost savings for Qwen custom providers — no API key or flag needed.

/models Picker (v0.3.30)

The /models command now surfaces every known provider including unconfigured ones, marked with a 🔒 lock icon and setup help text. This helps users discover available providers without needing to know which ones need API keys. Custom providers with no configured models show a helpful empty-state message instead of an inert button.

OpenRouter Reasoning

For models that support extended reasoning (e.g. Qwen 3.6 Plus), OpenCrabs sends include_reasoning: true automatically when using OpenRouter. Thinking/reasoning output is displayed in collapsible sections:

▶ Thinking... (click to expand)
  The user wants to refactor...

Reasoning text wraps to screen width instead of truncating.

See Provider Setup for configuration details and API key setup.

AI Provider Setup

OpenCrabs supports 15 providers (14 built-in + Custom OpenAI-Compatible). Configure them through the onboarding wizard or manually via config.toml and keys.toml at ~/.opencrabs/.

Setup via Onboarding Wizard

The fastest way to configure a provider is the interactive wizard. Run /onboard:provider (or /onboard and navigate to step 3).

Keyboard Navigation

KeyAction
/ or j / kMove between providers / models
Enter or TabAdvance to next field
BackTab or Shift+TabGo back to previous field
EscGo back to previous wizard step
Type any characterFilter model list (when on model picker)

For providers with a /v1/models API endpoint, the wizard fetches the model list live after you enter your API key.

Supported: Anthropic, OpenAI, OpenRouter, Gemini, MiniMax, Qwen (DashScope), Ollama, z.ai GLM

Flow:

  1. Use / to select your provider (e.g. OpenRouter)
  2. Press Enter — advances to the API key field
  3. Paste your API key (e.g. sk-or-...)
  4. Press Enter — triggers a live fetch from the provider’s /v1/models endpoint
  5. Use / to browse models, or type to filter (case-insensitive substring match)
  6. Press Enter on your chosen model — saves config and advances

Tip: If you’ve already configured a key, the wizard detects it (shown as ••••••••) and skips straight to the model picker. Press Enter to re-fetch models with the existing key.

OAuth Providers (GitHub Copilot, Codex)

No API key needed — authenticate through your browser.

Flow:

  1. Select GitHub Copilot or Codex
  2. Press Enter — starts the device-code PKCE flow
  3. A user code and URL appear (e.g. github.com/login/device)
  4. Open the URL in your browser, enter the code, and authorize
  5. Tokens are saved automatically to ~/.opencrabs/auth/
  6. Models are fetched live — pick one and press Enter

CLI Providers (Claude CLI, OpenCode CLI, Codex CLI)

Use your existing CLI subscription — no separate API key.

Flow:

  1. Select the CLI provider (e.g. Claude CLI)
  2. Press Enter — skips the API key field (none needed)
  3. Models are fetched from the local binary (claude models, opencode models, etc.)
  4. Pick a model and press Enter

Requirements: The CLI binary must be installed and authenticated in your PATH.

z.ai GLM (Zhipu AI)

z.ai has two endpoint types. The wizard asks which one before the API key.

Flow:

  1. Select z.ai GLM
  2. Press Enter — advances to Endpoint Type selector
  3. Use / to choose API (general) or Coding (CodeGeeX)
  4. Press Enter — advances to API key field
  5. Paste your key, press Enter — fetches models
  6. Pick a model and press Enter

Custom OpenAI-Compatible

For Ollama, LM Studio, LocalAI, Groq, NVIDIA, vLLM, or any OpenAI-compatible endpoint.

Flow:

  1. Select Custom OpenAI-Compatible (last in the list)
  2. Press Enter — advances to Name field
  3. Name — type a provider identifier (e.g. ollama, lm-studio, nvidia). Press Enter — normalized to a TOML-safe key
  4. Base URL — paste your endpoint (e.g. http://localhost:1234/v1). Press Enter
  5. API Key — paste if required, or leave empty for local endpoints. Press Enter
  6. Model — you have two options:
    • Type or paste a model name — use this for newly-launched models not yet available on the live API (e.g. qwen3.6-35b-a3b-gguf)
    • Press Enter on empty field — triggers a live fetch from {base_url}/models, then pick from the list
  7. Context Window — enter the token limit (e.g. 128000). Press Enter — saves and advances

Context Window Recommendation: Set to 200000 (200k tokens) for best results. OpenCrabs handles large contexts gracefully with smart auto-compaction that keeps you always up to date without manual intervention.

Local LLMs: No API key needed — just set base URL and model name. If the model is already running, paste the name directly. If you want to browse available models, leave the Model field empty and press Enter to fetch the list from your local server.

Re-running Provider Setup

CommandWhat it does
/onboard:providerJump to provider setup, return to chat when done
/modelsSwitch provider/model for the current session
/onboardFull wizard (all steps)

Manual Configuration (advanced)

If you prefer editing files directly, configure providers in config.toml and keys.toml.


Anthropic Claude

Models: claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5, and legacy models — fetched live from the API.

# keys.toml
[providers.anthropic]
api_key = "sk-ant-..."
# config.toml
[providers.anthropic]
enabled = true
default_model = "claude-sonnet-4-20250514"

Features: Streaming, tool use, extended thinking, vision, 200K context window.

OpenAI

Models: GPT-5 Turbo, GPT-5, and others — fetched live.

# keys.toml
[providers.openai]
api_key = "sk-YOUR_KEY"

OpenRouter — 400+ Models

Access 400+ models from every major provider through a single API key. Includes free models (DeepSeek-R1, Llama 3.3, Gemma 2, Mistral 7B).

# keys.toml
[providers.openrouter]
api_key = "sk-or-YOUR_KEY"

Get a key at openrouter.ai/keys. Model list is fetched live — no binary update needed when new models are added.

Google Gemini

Models: gemini-2.5-flash, gemini-2.0-flash, gemini-1.5-pro — fetched live.

# keys.toml
[providers.gemini]
api_key = "AIza..."
# config.toml
[providers.gemini]
enabled = true
default_model = "gemini-2.5-flash"

Features: Streaming, tool use, vision, 1M+ token context window.

Gemini also powers the separate image generation and vision tools. See Image Generation & Vision.

GitHub Copilot

Use your existing GitHub Copilot subscription — no separate API charges. Authenticates via OAuth device flow.

# config.toml
[providers.github_copilot]
enabled = true

Setup: Run /onboard:providers → select GitHub Copilot → follow the device code flow at github.com/login/device. Models are fetched live from the Copilot API.

Requirements: An active GitHub Copilot subscription (Individual, Business, or Enterprise).

z.ai (Zhipu AI)

Models: GLM-4-Plus, GLM-4-Flash, GLM-4-0520, CodeGeeX — fetched live. Two endpoint types: General API and Coding API.

# keys.toml
[providers.zai]
api_key = "your-api-key"
# config.toml
[providers.zai]
enabled = true
default_model = "glm-4-plus"

Get your API key at open.bigmodel.cn.

Claude CLI

Use your existing Claude Code subscription through the local claude binary — no separate API key needed. Supports streaming and extended thinking.

# config.toml
[providers.claude_cli]
enabled = true

Requirements: The claude CLI must be installed and authenticated. Models are detected automatically.

Ollama

Run any Ollama model natively — no custom provider setup needed. Supports both local (localhost:11434) and cloud (api.ollama.com) instances.

# config.toml
[providers.ollama]
enabled = true
default_model = "llama3"
# keys.toml (optional — only for cloud Ollama)
[providers.ollama]
api_key = "your-api-key"

Features: Streaming, tool use, local model tool-call extraction from text. Models are fetched live from the Ollama API.

Requirements: Ollama must be running locally (ollama serve) or you must have a cloud Ollama API key.

OpenCode CLI

Use the local opencode binary for free LLM completions — no API key or subscription needed. Supports NDJSON streaming and extended thinking.

# config.toml
[providers.opencode_cli]
enabled = true

Requirements: The opencode binary must be installed and available in your PATH. Models are fetched live via opencode models.

Codex CLI

Use OpenAI’s @openai/codex CLI as a native provider. User authenticates once via codex CLI; OpenCrabs piggybacks on cached credentials — zero API key handling. Non-interactive mode via codex exec --json with JSONL streaming.

# config.toml
[providers.codex_cli]
enabled = true

Models: GPT-5.5, GPT-5.4, GPT-5.3-Codex

Requirements: The codex CLI must be installed (npm install -g @openai/codex) and authenticated. Models are detected automatically.

Codex OAuth

Native OpenAI Codex subscription auth via device-code PKCE flow. No CLI dependency, no API key. User authenticates through browser once; tokens stored in ~/.opencrabs/auth/codex.json with automatic refresh and background rotation.

# config.toml
[providers.codex]
enabled = true

Models: GPT-5.5, GPT-5.4, GPT-5.3-Codex (curated GPT-5 model list)

Setup: Run /onboard:provider → select Codex OAuth → follow the device code flow at auth.openai.com/codex/device. Two-step PKCE exchange: device auth poll → authorization code → token exchange.

Requirements: An active OpenAI Codex subscription. No CLI installation needed.

MiniMax

Models: MiniMax-M2.7, MiniMax-M2.5, MiniMax-M2.1, MiniMax-Text-01

# keys.toml
[providers.minimax]
api_key = "your-api-key"

Get your API key from platform.minimax.io. Model list comes from config.toml (no /models endpoint).

Custom (OpenAI-Compatible)

Use for Ollama, LM Studio, LocalAI, Groq, or any OpenAI-compatible API.

# config.toml
[providers.custom.lm_studio]
enabled = true
base_url = "http://localhost:1234/v1"
default_model = "qwen2.5-coder-7b-instruct"
models = ["qwen2.5-coder-7b-instruct", "llama-3-8B"]

Local LLMs: No API key needed — just set base_url and default_model.

Remote APIs (Groq, etc.): Add the key in keys.toml:

[providers.custom.groq]
api_key = "your-api-key"

Multiple Custom Providers

Define as many as you need with different names:

[providers.custom.lm_studio]
enabled = true
base_url = "http://localhost:1234/v1"
default_model = "qwen2.5-coder-7b-instruct"

[providers.custom.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
default_model = "mistral"

Free Prototyping with NVIDIA API

Kimi K2.5 is available for free on the NVIDIA API Catalog — no billing required.

# config.toml
[providers.custom.nvidia]
enabled = true
base_url = "https://integrate.api.nvidia.com/v1"
default_model = "moonshotai/kimi-k2.5"
# keys.toml
[providers.custom.nvidia]
api_key = "nvapi-..."

Fallback Provider Chain

Configure automatic failover when the primary provider fails (rate limits, outages, errors). Fallbacks are tried in order until one succeeds.

# config.toml
[providers.fallback]
enabled = true
providers = ["openrouter", "anthropic"]  # Tried in order on failure

Each fallback provider must have its API key configured in keys.toml. Both complete() and stream() calls are retried transparently — no changes needed downstream.

Single fallback shorthand:

[providers.fallback]
enabled = true
provider = "openrouter"

Or just ask your Crab: “Set up fallback providers using openrouter and anthropic” — it will configure config.toml for you at runtime.

Vision Model

When your default chat model doesn’t support vision, set vision_model to a vision-capable model on the same provider. This registers a vision tool that the agent can call — it sends the image to the vision model, gets a description back, and the chat model uses that context to reply.

# config.toml
[providers.minimax]
enabled = true
default_model = "MiniMax-M2.5"
vision_model = "MiniMax-Text-01"  # Agent calls vision tool → this model describes image → M2.5 replies
[providers.openai]
enabled = true
default_model = "gpt-5-nano"
vision_model = "gpt-5-nano"

MiniMax auto-configures vision_model = "MiniMax-Text-01" on first run. You can also ask your Crab to set it up: “Configure vision model for MiniMax” — it will update config.toml at runtime.

This is separate from the Gemini image tools which provide dedicated generate_image and analyze_image tools.

Per-Session Providers

Each session remembers its provider and model. Switch to Claude in one session, Gemini in another — switching sessions restores the provider automatically.

Image Generation & Vision

OpenCrabs supports image generation (text-to-image and img2img) and vision analysis (image-to-text). Vision works through two paths — pick whichever fits your provider setup.

Vision: Two Paths

Path A: vision_model on Your Active Provider (Preferred)

Set vision_model = "<model>" inside the provider block you’re already using. Works for every built-in and custom provider. No second API key needed — the agent calls analyze_image against the vision model on the same provider endpoint.

# keys.toml
[providers.openrouter]
api_key = "sk-or-..."

# config.toml
[providers.openrouter]
model = "anthropic/claude-sonnet-4"
vision_model = "google/gemini-2.5-flash"   # ← any vision-capable model on the same endpoint

When a user sends an image and the chat model can’t handle it natively, the agent routes the image through vision_model, gets a text description back, and replies with that context.

Example: User sends an image while you’re on MiniMax M2.5 (no native vision). The agent calls the vision tool, which sends the image to MiniMax-Text-01 (or any model you set), gets the description, and M2.5 replies using that context.

Why this is preferred:

  • Single API key, single billing account
  • Works on any OpenAI-compatible endpoint (OpenRouter, Ollama, LM Studio, vLLM, Groq, custom)
  • No extra onboarding step — just add one line to your existing provider block

Path B: Gemini Global Fallback

Use this only when your active provider has no vision-capable model. Gemini acts as a dedicated vision+image backend, independent of your chat provider.

# keys.toml
[image]
api_key = "AIza..."    # ← MUST go here. See gotcha below.

# config.toml
[image.generation]
enabled = true
model = "gemini-3.1-flash-image-preview"

[image.vision]
enabled = true
model = "gemini-3.1-flash-image-preview"

Get a free API key from aistudio.google.com. Configure interactively with /onboard:image.

⚠️ Gotcha: #[serde(skip)] on [image.vision] api_key

The api_key field under [image.vision] in config.toml is silently ignored — it’s marked #[serde(skip)] in the source. Always put the Gemini key in keys.toml under [image], never in config.toml. If vision reports as unavailable despite a key being set, this is almost always the cause.

Diagnostic: Why Is Vision Unavailable?

is_vision_available logs the exact reason at INFO level with target=vision. Search your daily log:

grep 'target=vision' ~/.opencrabs/logs/opencrabs.$(date -u +%Y-%m-%d)

Common causes surfaced:

  • Missing vision_model on active provider
  • Missing api_key for that provider
  • Missing Gemini [image] api_key in keys.toml
  • Key placed in config.toml where #[serde(skip)] drops it

Agent Tools

When vision or image generation is enabled, these tools become available:

ToolDescription
generate_imageGenerate an image from a text prompt — saves to ~/.opencrabs/images/
analyze_imageAnalyze an image file or URL via the active vision path (Path A provider or Gemini fallback)

Example prompts:

  • “Generate a pixel art crab logo” — agent calls generate_image, returns file path
  • “What’s in this image: /tmp/screenshot.png” — agent calls analyze_image

img2img: Edit Images with Context

generate_image accepts an optional image parameter (local file path or HTTPS URL). When provided, the model modifies, restyles, or composites onto that image instead of generating from scratch.

User: "Make this logo darker and add a border"
Agent: generate_image(prompt="dark background with thin white border", image="/tmp/logo.png")
  • Gemini backend — full img2img support via inlineData
  • OpenAI-shaped backends — reject with a clear error pointing at Gemini (img2img not supported)

Use cases: replace elements, restyle photos, composite logos onto backgrounds, modify user-uploaded images in-place.

Incoming Images from Channels

When a user sends an image from any channel, it arrives as <<IMG:/tmp/path>> in the message. The file is already downloaded — the agent can:

  • See it directly (if the chat model supports vision natively)
  • Pass the path to analyze_image for vision processing
  • Use the path in bash commands or any tool that accepts file paths
  • Reference it in replies with <<IMG:path>> to forward to channels

Model Choices

  • Path A — any vision-capable model on your active provider. On OpenRouter: google/gemini-2.5-flash, anthropic/claude-sonnet-4, openai/gpt-4o. On Ollama: llava, bakllava. On custom endpoints: whatever the server offers.
  • Path Bgemini-3.1-flash-image-preview handles both vision input and image output in a single request.

Channel Integrations

OpenCrabs connects to multiple messaging platforms simultaneously. All channels share the TUI session by default, with per-user sessions for non-owners.

Setting Up Channels

Channels are configured through the onboarding wizard, not by editing TOML files manually.

Running the Wizard

  • First launch — the wizard runs automatically
  • Re-run — type /onboard in chat, or /onboard:channels to jump straight to the channels step
  • Quick jump/onboard:channels opens the channel picker and returns to chat when done

The channel picker is a keyboard-driven TUI screen:

KeyAction
/ or j / kMove focus between channels
SpaceToggle the focused channel on/off
Enter on an enabled channelOpen that channel’s setup screen
Enter on ContinueSkip remaining setup and advance
TabSame as Continue — advance to the next wizard step
EscGo back to the previous step

Channel Setup Screens

When you press Enter on an enabled channel, a dedicated setup screen opens with the fields needed for that platform (bot token, channel ID, allowed users, etc.). Each field:

  • Auto-detects existing values from config.toml / keys.toml (shown as masked •••••••• for secrets, plain text for IDs)
  • Tab moves to the next field
  • Enter on the last field (or the Test Connection button) saves and returns to the channel list
  • BackTab moves to the previous field

The Five Channels

#ChannelSetup FieldsTest
0TelegramBot Token, Owner User ID, Respond ToSend test message
1DiscordBot Token, Channel ID, Allowed Users, Respond ToSend test message
2WhatsAppQR Code scan, Phone AllowlistConnection status
3SlackBot Token, App Token, Channel ID, Allowed Users, Respond ToSend test message
4TrelloAPI Key, API Token, Board ID, Allowed UsersBoard access check

After enabling and configuring your channels, the wizard saves everything to config.toml and keys.toml automatically. You can always re-run /onboard:channels to modify settings.

Supported Channels

ChannelProtocolImages InVoice InImage Gen OutSetup
TelegramLong pollingVision pipelineSTTNative photoBot token
DiscordWebSocketVision pipelineSTTFile attachmentBot token
SlackSocket ModeVision pipelineSTTFile uploadBot + App token
WhatsAppQR pairingVision pipelineSTTNative imageQR code
TrelloREST APICard attachmentsCard attachmentAPI key + token

Cross-Channel Session Resolution (v0.3.29)

All messaging channels now share a stable [chat:<id>] suffix pattern for reliable session lookup. Previously only Telegram had this; Discord, Slack, and WhatsApp used exact-title matching which broke when the agent auto-renamed sessions (creating duplicates on every message).

The shared channels::session_resolve module provides:

  • Suffix-first lookup — fast path using [chat:discord-dm-<user_id>], [chat:slack-<channel_id>], [chat:wa-<phone>] etc.
  • Legacy forward-migration — pre-suffix rows are migrated to the suffix format on first lookup
  • /sessions binding — explicit chat→session binding on switch so user choices win over suffix lookup

Follow-Up Cancel (v0.3.30)

Sending a message while the agent is mid-run now acts as ESC x2 (cancel current run) across all channels. The cancelled partial content is preserved, and the new message starts a fresh agent turn.

ZIP Attachments (v0.3.30)

ZIP file attachments from users are extracted and processed inline:

  • Text files are inlined into the conversation
  • Images get vision markers for multimodal processing
  • PDFs get text extraction
  • Capped at 50 files / 10 MB per ZIP entry

Common Features

All messaging channels support:

  • Shared session with TUI (owner) or per-user sessions (non-owners)
  • Slash commands/help, /models, /new, /sessions, custom commands
  • Inline buttons — Provider picker, model picker, session switcher (Telegram, Discord, Slack)
  • User allowlists — Restrict access by user ID, chat ID, or phone number
  • respond_to filterall, dm_only, or mention (respond only when @mentioned)

File & Media Support

ChannelImages (in)Text files (in)Documents (in)Audio (in)Image gen (out)
TelegramVision pipelineExtracted inlinePDF noteSTTNative photo
WhatsAppVision pipelineExtracted inlinePDF noteSTTNative image
DiscordVision pipelineExtracted inlinePDF noteSTTFile attachment
SlackVision pipelineExtracted inlinePDF noteSTTFile upload
TrelloCard attachments → visionExtracted inlineCard attachment
TUIPaste path → visionPaste path → inlineSTT[IMG: name] display

Images are passed to the active model’s vision pipeline if it supports multimodal input, or routed to the analyze_image tool (Google Gemini vision) otherwise. Text files are extracted as UTF-8 and included inline up to 8,000 characters.

Proactive Channel Tools

The agent can send messages and take actions proactively:

ToolActions
discord_send17 actions: send, reply, react, edit, delete, pin, create_thread, send_embed, etc.
slack_send17 actions: send, reply, react, edit, delete, pin, set_topic, send_blocks, send_file (TTS voice via OGG/Opus)
trello_send22 actions: create_card, move_card, add_comment, add_checklist, search, etc.

Channel Voice Parity

All four messaging channels (Telegram, Discord, WhatsApp, Slack) now share a single code path via crate::channels::voice::{transcribe, synthesize}. Bot replies are recorded in the channel_messages table for conversation context — previously only user messages were stored.

Telegram

Connect OpenCrabs to Telegram for DMs and group chats.

Setup

Step 1: Create a Bot with BotFather

  1. Message @BotFather on Telegram
  2. Send /newbot and follow the prompts
  3. Copy the bot token (format: 123456:ABC-DEF...)

Step 2: Configure via the Onboarding Wizard

Run /onboard:channels (or /onboard and navigate to the Channels step):

  1. Use / to focus Telegram
  2. Press Space to toggle it on
  3. Press Enter to open the Telegram setup screen
  4. Fill in the fields:
    • Bot Token — paste the token from BotFather
    • Owner User ID — your numeric Telegram chat ID
    • Respond Toall, dm_only, or mention (when to respond in groups)
  5. Press Enter on Test Connection to verify the bot works
  6. Press Enter to save and return to the channel list

Get your chat ID by messaging @userinfobot on Telegram.

Manual Configuration (advanced)

If you prefer editing files directly, the wizard writes to ~/.opencrabs/keys.toml and ~/.opencrabs/config.toml:

# keys.toml
[channels.telegram]
bot_token = "123456:ABC..."

# config.toml
[channels.telegram]
enabled = true
allowed_users = ["123456789"]
respond_to = "all"

Features

  • DMs and groups — Works in private chats and group conversations
  • Forum topic routing (v0.3.31) — In supergroups with topics enabled, the bot tracks thread_id through the full pipeline. Use list_topics action to map topic names (e.g. #announcements) to numeric IDs, then pass thread_id to send / reply / send_photo to route into a specific topic
  • Context-aware pre-tool status (v0.3.31) — While a tool runs, the bot shows a live status message naming the tool, elapsed time, and either a reasoning excerpt or an anchored phrase from the user’s request
  • follow_up_question polish (v0.3.34) — Telegram keyboard is now single-column with a 40-character label cap (longer options rejected with a clear error). The rolling “Running follow_up_question (16s)” status is suppressed while the keyboard is pending so buttons don’t get visually buried. The LLM is instructed to call the tool silently without echoing the question text in surrounding prose
  • Inline buttons — Provider picker, model picker, session switcher use Telegram inline keyboards
  • Image support — Send images to the bot, receive generated images
  • Voice messages — STT transcription + TTS response
  • All slash commands/help, /models, /new, /sessions, custom commands
  • Owner vs non-owner — Owner uses the shared TUI session, non-owners get per-user sessions
  • Onboarding overhaul (v0.3.30) — Auto-detects owner user ID from getUpdates, persists partial config on cancel, only Enter on the last step commits (Tab no longer silently rewrites ~30 config keys)

Agent Tools

The agent can use telegram_send with 20+ actions. The thread_id field on send / reply / send_photo targets a specific forum topic in supergroups with topics enabled.

ActionDescription
sendSend text message (with optional thread_id for forum topics)
replyReply to a message
send_photoSend image file
send_documentSend document
send_voiceSend voice message
list_topicsReturns (thread_id, topic_name) pairs the bot has observed — translate #announcements into a numeric thread_id
pin / unpinPin or unpin a message
set_reactionAdd an emoji reaction
And more…

Group Chat Behavior

In groups, the agent:

  • Responds when mentioned by name or replied to
  • Stays quiet when the conversation doesn’t involve it
  • Tracks context from group messages passively

Discord

Connect OpenCrabs to Discord for server and DM interactions.

Setup

Step 1: Create a Discord Bot

  1. Go to discord.com/developers/applications
  2. Create a new application
  3. Go to Bot section, create a bot
  4. Enable MESSAGE CONTENT Intent (required — under Privileged Gateway Intents)
  5. Copy the bot token
  6. Under OAuth2 → URL Generator, select bot scope with Send Messages and Read Message History permissions
  7. Use the generated URL to invite the bot to your server

Step 2: Configure via the Onboarding Wizard

Run /onboard:channels (or /onboard and navigate to the Channels step):

  1. Use / to focus Discord
  2. Press Space to toggle it on
  3. Press Enter to open the Discord setup screen
  4. Fill in the fields:
    • Bot Token — paste the token from the Developer Portal
    • Channel ID — the Discord channel to send the welcome message to (right-click a channel with Developer Mode on → Copy Channel ID)
    • Allowed Users — comma-separated Discord user IDs (leave empty to allow everyone)
    • Respond Toall, dm_only, or mention
  5. Press Enter on Test Connection to verify
  6. Press Enter to save and return to the channel list

Enable Developer Mode in Discord: Settings → Advanced → Developer Mode

Manual Configuration (advanced)

# keys.toml
[channels.discord]
token = "your-bot-token"

# config.toml
[channels.discord]
enabled = true
allowed_channels = ["123456789"]
allowed_users = []
respond_to = "all"

Features

  • Server channels and DMs — Works in text channels and direct messages
  • Button interactions — Provider picker, model picker, session switcher use Discord buttons
  • Image support — Send and receive images
  • Embed suppression — Agent wraps multiple links in <> to suppress embeds
  • Slash commands — All built-in and custom commands work
  • Reactions — Agent can add emoji reactions to messages

Formatting Notes

  • No markdown tables in Discord — use bullet lists instead
  • Wrap multiple links in <url> to suppress embeds

Slack

Connect OpenCrabs to Slack workspaces.

Setup

Step 1: Create a Slack App

  1. Go to api.slack.com/apps
  2. Create a new app (From Scratch)
  3. Enable Socket Mode under Settings
  4. Generate an App-Level Token (Settings → Basic Information → App-Level Tokens) with connections:write scope
  5. Under OAuth & Permissions, add bot scopes: chat:write, channels:history, groups:history, im:history, reactions:write
  6. Install the app to your workspace
  7. Copy the Bot Token (xoxb-...) and App-Level Token (xapp-...)

Step 2: Configure via the Onboarding Wizard

Run /onboard:channels (or /onboard and navigate to the Channels step):

  1. Use / to focus Slack
  2. Press Space to toggle it on
  3. Press Enter to open the Slack setup screen
  4. Fill in the fields:
    • Bot Token — the xoxb-... token
    • App Token — the xapp-... token
    • Channel ID — right-click a channel → View channel details → copy the Channel ID at the bottom
    • Allowed Users — comma-separated Slack user IDs (Profile → ⋯ → Copy member ID)
    • Respond Toall, dm_only, or mention
  5. Press Enter on Test Connection to verify
  6. Press Enter to save

Manual Configuration (advanced)

# keys.toml
[channels.slack]
bot_token = "xoxb-..."
app_token = "xapp-..."

# config.toml
[channels.slack]
enabled = true
allowed_channels = ["C12345678"]
allowed_users = []
respond_to = "all"

Features

  • Channels and DMs — Works in public/private channels and direct messages
  • Action buttons — Provider picker, model picker, session switcher use Slack action buttons
  • Thread support — Responds in threads when appropriate
  • Slash commands — All built-in and custom commands work
  • Reactions — Agent can add emoji reactions
  • TTS voice replies — Voice responses sent as OGG/Opus files via files.upload with inline waveform UI

Socket Mode

Slack uses Socket Mode (WebSocket) instead of HTTP webhooks — no public URL or ngrok needed. The connection is outbound from your machine.

WhatsApp

Connect OpenCrabs to WhatsApp via QR code pairing.

Setup

Configure via the Onboarding Wizard

Run /onboard:channels (or /onboard and navigate to the Channels step):

  1. Use / to focus WhatsApp
  2. Press Space to toggle it on
  3. Press Enter to open the WhatsApp setup screen
  4. The setup screen has two fields:
    • Connection — shows the QR code or connection status
    • Phone Allowlist — comma-separated phone numbers in E.164 format (e.g. +15551234567). Leave empty to accept all messages.
  5. When not yet paired, a QR code appears in the terminal:
    • Open WhatsApp on your phone → Settings → Linked Devices → Link a Device
    • Scan the QR code
    • The status updates to “Connected” automatically
  6. When already paired, press R on the Connection field to reset and re-pair
  7. Press Enter to save

The pairing session persists across restarts — no need to re-scan.

Manual Configuration (advanced)

# config.toml
[channels.whatsapp]
enabled = true
allowed_phones = ["+15551234567"]

Features

  • Personal and group chats — Works in DMs and group conversations
  • Image support — Send and receive images
  • Voice messages — STT transcription + TTS response
  • Plain text UI — No buttons (WhatsApp limitation), uses text-based menus
  • Slash commands — All built-in and custom commands work

Formatting Notes

  • No markdown tables — use bullet lists
  • No headers — use bold or CAPS for emphasis
  • Links render natively

Voice Message Handling

When receiving a voice message:

  1. Agent downloads and transcribes via STT
  2. Sends text response first (searchable)
  3. Optionally generates TTS audio response

Trello

OpenCrabs integrates with Trello for board and card management via the trello_send tool.

Setup

Step 1: Get Trello Credentials

  1. Go to trello.com/power-ups/admin
  2. Create a new Power-Up to get your API Key
  3. Click the “Token” link next to your API key to generate an API Token

Step 2: Configure via the Onboarding Wizard

Run /onboard:channels (or /onboard and navigate to the Channels step):

  1. Use / to focus Trello
  2. Press Space to toggle it on
  3. Press Enter to open the Trello setup screen
  4. Fill in the fields:
    • API Key — from the Trello Power-Up admin page
    • API Token — generated alongside the API key
    • Board ID — board name or 24-character hex ID (names are resolved automatically)
    • Allowed Users — Trello member IDs allowed to interact with the bot (leave empty for all members)
  5. Press Enter on Test Connection to verify board access
  6. Press Enter to save

Manual Configuration (advanced)

# keys.toml
[channels.trello]
api_key = "your-api-key"
token = "your-token"

# config.toml
[channels.trello]
enabled = true
boards = ["Board Name or ID"]
allowed_users = []
# poll_interval_secs = 30  # Poll for new card comments

Tool Actions

The trello_send tool supports 22 actions:

ActionDescription
create_cardCreate a new card
get_cardGet card details
update_cardUpdate card fields
move_cardMove card to another list
archive_cardArchive a card
find_cardsSearch for cards
add_commentAdd a comment to a card
get_card_commentsRead card comments
add_checklistAdd a checklist to a card
add_checklist_itemAdd an item to a checklist
complete_checklist_itemMark checklist item done
add_label_to_cardAdd a label
remove_label_from_cardRemove a label
add_member_to_cardAssign a member
remove_member_from_cardUnassign a member
add_attachmentAttach a file or URL
list_boardsList accessible boards
list_listsList columns in a board
get_board_membersGet board members
searchSearch across boards
get_notificationsGet notifications
mark_notifications_readMark notifications read

Behavior

  • Tool-only by default — The agent acts on Trello only when explicitly asked
  • Optional polling — Set poll_interval_secs to enable monitoring for @bot_username mentions
  • Image attachments — Generated images are sent as card attachments with embedded previews
  • File attachments — Card attachments (images, documents) are fetched and processed through the vision pipeline

Built-in Tools

OpenCrabs ships with 50+ tools available to the agent out of the box, plus support for user-defined dynamic tools.

File Operations

ToolParametersDescription
lspathList directory contents
globpattern, pathFind files by glob pattern
greppattern, path, includeSearch file contents with regex
read_filepath, line_start, line_endRead file contents
edit_filepath, old_string, new_stringEdit files with search/replace
write_filepath, contentWrite new files

Code Execution

ToolParametersDescription
bashcommand, timeoutExecute shell commands
execute_codelanguage, codeRun code in sandboxed environment

Web & Network

ToolParametersDescription
web_searchquerySearch the web (Brave Search)
http_requestmethod, url, headers, bodyMake HTTP requests

Session & Memory

ToolParametersDescription
session_searchquery, limitSemantic search across sessions
session_contextactionRead/write session context
task_manageraction, variousManage plans and tasks

Image & Video

ToolParametersDescription
generate_imageprompt, filenameGenerate images via Gemini
analyze_imageimage, questionAnalyze images via Gemini vision
analyze_videovideo, questionAnalyze videos via Gemini multimodal vision. Supports mp4/m4v/mov/webm/mkv/avi/3gp/flv. Inline bytes for ≤18 MB, resumable Files API upload for larger files (v0.3.17)

Channel Integrations

ToolParametersDescription
telegram_sendaction, variousTelegram operations (19 actions)
discord_connectaction, variousDiscord operations (17 actions)
slack_sendaction, variousSlack operations (17 actions)
trello_connectaction, variousTrello operations (22 actions)

Sub-Agent Orchestration

Agents can spawn independent child agents for parallel task execution:

ToolParametersDescription
spawn_agentlabel, agent_type, promptSpawn a typed child agent in an isolated session
wait_agentagent_id, timeout_secsWait for a child agent to complete and return output
send_inputagent_id, textSend follow-up input to a running agent (multi-turn)
close_agentagent_idTerminate a running agent and clean up resources
resume_agentagent_id, promptResume a completed/failed agent with new prompt (preserves context)
team_createteam_name, agents[]Spawn N typed agents as a named team (parallel)
team_broadcastteam_name, messageFan-out message to all running agents in a team
team_deleteteam_nameCancel and clean up all agents in a team

Agent Types

When spawning, agent_type selects a specialized role with a curated tool registry:

TypeRoleTool Access
generalFull-capability (default)All parent tools minus recursive/dangerous
exploreFast read-only codebase navigationread_file, glob, grep, ls
planArchitecture planningread_file, glob, grep, ls, bash
codeImplementation with full write accessAll parent tools minus recursive/dangerous
researchWeb search + documentation lookupread_file, glob, grep, ls, web_search, http_request

ALWAYS_EXCLUDED tools (no agent type has these): spawn_agent, resume_agent, wait_agent, send_input, close_agent, rebuild, evolve – no recursive spawning, no self-modification from subagents.

Browser Automation

Native headless Chrome control via Chrome DevTools Protocol (CDP):

ToolParametersDescription
navigateurlOpen a URL in the browser
clickselectorClick an element by CSS selector
typeselector, textType text into an input field
screenshotselectorCapture a screenshot
eval_jscodeExecute JavaScript in the page context
extract_contentselectorExtract text content from elements
wait_for_elementselector, timeoutWait for an element to appear
findpattern, modeFind elements by CSS, XPath, text, or aria-label. Returns stable selectors
browser_closeClose browser tab and free CDP session. Prevents stale page reuse across browser actions (v0.3.18)

Auto-detects your default Chromium browser. Feature-gated under browser (enabled by default).

Dynamic Tools

Define custom tools at runtime via ~/.opencrabs/tools.toml. See Dynamic Tools for details.

ToolParametersDescription
tool_manageaction, variousCreate, remove, or reload dynamic tools

System

ToolParametersDescription
slash_commandcommand, argsExecute slash commands (/cd, /compact, etc.)
config_manageraction, variousRead/write config, manage commands
evolvecheck_onlyDownload latest release
rebuildBuild from source and restart
planaction, variousCreate and manage execution plans

Error Handling

v0.2.92 improved error surfacing across all tool connections. Channel connect tools (slack_connect, whatsapp_connect, trello_connect) now surface actual connection errors instead of silently swallowing them. Tool call status correctly transitions from “running” to success/failure instead of showing a perpetual spinner.

System CLI Tools

OpenCrabs runs in a TUI with full terminal access. The agent can execute any CLI tool installed on the host via the bash tool – no plugins, no wrappers. If it’s on your system, the agent can use it. Common ones:

ToolPurposeCheck
ghGitHub CLI — issues, PRs, repos, releases, actionsgh --version
gogGoogle CLI — Gmail, Calendar (OAuth)gog --version
dockerContainer managementdocker --version
sshRemote server accessssh -V
nodeRun JavaScript/TypeScript toolsnode --version
python3Run Python scripts and toolspython3 --version
ffmpegAudio/video processingffmpeg -version
curlHTTP requests (fallback when http_request insufficient)curl --version

GitHub CLI (gh)

Authenticated GitHub CLI for full repo management:

gh issue list / view / create / close / comment
gh pr list / view / create / merge / checks
gh release list / create
gh run list / view / watch

Google CLI (gog)

OAuth-authenticated Google Workspace CLI. Supports Gmail and Calendar:

gog calendar events --max 10
gog gmail search "is:unread" --max 20
gog gmail send --to user@email.com --subject "Subject" --body "Body"

Requires GOG_KEYRING_PASSWORD and GOG_ACCOUNT env vars.

Companion Tools

SocialCrabs — Social Media Automation

SocialCrabs is a social media automation tool with human-like behavior simulation (Playwright). Supports Twitter/X, Instagram, and LinkedIn.

The agent calls SocialCrabs CLI commands via bash:

node dist/cli.js x tweet "Hello world"
node dist/cli.js x mentions -n 5
node dist/cli.js ig like <post-url>
node dist/cli.js linkedin connect <profile-url>

Read operations are safe. Write operations (tweet, like, follow, comment) require explicit user approval.

WhisperCrabs — Floating Voice-to-Text

WhisperCrabs is a floating voice-to-text widget controllable via D-Bus. Click to record, click to stop, text goes to clipboard. The agent can start/stop recording, switch providers, and view transcription history via D-Bus commands.

Custom Commands

Define your own slash commands in ~/.opencrabs/commands.toml. Commands work from the TUI and all channels (Telegram, Discord, Slack, WhatsApp).

Configuration

# ~/.opencrabs/commands.toml

[commands.credits]
description = "Show remaining API credits"
action = "prompt"
value = "Check my API credit balance across all providers and give me a summary"

[commands.deploy]
description = "Deploy to production"
action = "prompt"
value = "Run the production deployment pipeline: git pull, build, test, deploy"

[commands.status]
description = "Show system status"
action = "system"
value = "System is operational. All channels connected."

Action Types

ActionBehavior
promptSends the value as a message to the agent — the agent processes it like any user message
systemDisplays the value directly as a system message — no agent involvement

Using Commands

Type /commandname in the TUI or any connected channel:

/credits     → agent checks API balances
/deploy      → agent runs deployment
/status      → shows static system message

Visibility

Custom commands appear in:

  • /help output (TUI and channels) under a “Custom Commands” section
  • TUI slash autocomplete when typing /

Commands are sorted alphabetically and show their description.

Memory System

OpenCrabs uses a 3-tier memory system for persistent context across sessions.

Memory Tiers

1. Daily Notes (memory/YYYY-MM-DD.md)

Automatic daily files for session-specific observations:

~/.opencrabs/memory/2026-03-07.md

The agent writes here during conversations — new integrations, bugs fixed, decisions made, server changes.

2. Long-term Memory (MEMORY.md)

Curated knowledge that persists across all sessions:

  • Server details, SSH access, credentials locations
  • User preferences and workflows
  • Integration configurations
  • Lessons learned from debugging

Full-text search across all past sessions stored in SQLite. The agent can query:

  • Previous conversations
  • Tool execution history
  • Past decisions and context

The agent uses session_search for fast memory lookups (~500 tokens) instead of reading full memory files (~15K tokens). This is the primary recall mechanism.

Embedding Modes

OpenCrabs supports three embedding configurations:

  1. Local GGUF (default) — downloads a 300MB embedding model and runs it locally via llama.cpp
  2. OpenAI-compatible API — configure external embedding providers (OpenAI text-embedding-3-small, Ollama nomic-embed-text, Jina, LM Studio, or any /v1/embeddings endpoint) via [memory.embedding] config with url, model, api_key, dimensions
  3. FTS5-only — pure keyword search with zero RAM overhead. Set [memory] vector_enabled = false. Auto-detects VPS environments and configures automatically

Context Compaction

When context reaches ~80% capacity, OpenCrabs automatically compacts:

  1. Summarizes the conversation so far into a comprehensive continuation document
  2. Clears old messages from context
  3. Continues with the summary as context

Manual compaction: type /compact in chat.

Auto-Save Triggers

The agent saves to memory when:

  • New integrations are connected
  • Server/infrastructure changes occur
  • Bugs are found and fixed
  • New tools are configured
  • Credentials are rotated
  • Architecture decisions are made
  • You say “remember this”
  • Errors take >5 minutes to debug

Brain Files

See Brain Files for the full list of files the agent reads on startup.

Brain Files

Brain files define the agent’s personality, knowledge, and behavior. They live at ~/.opencrabs/ and are loaded on every session start.

Startup Read Order

  1. SOUL.md — Personality and values
  2. USER.md — Your profile and preferences
  3. memory/YYYY-MM-DD.md — Today’s notes
  4. MEMORY.md — Long-term memory
  5. AGENTS.md — Agent behavior guidelines
  6. TOOLS.md — Tool reference and custom notes
  7. CODE.md — Coding standards and file organization
  8. SECURITY.md — Security policies
  9. HEARTBEAT.md — Periodic check tasks

File Reference

SOUL.md

Agent personality. Core truths: strong opinions, brevity, resourcefulness, honesty. Hard rules: never delete files without approval, never send emails without request, never commit code directly.

IDENTITY.md

Agent identity created during bootstrap: name, creature type, vibe, emoji, prohibited patterns.

USER.md

Your profile: name, location, timezone, role, specialties, communication preferences, pet peeves.

AGENTS.md

Comprehensive agent behavior docs: memory system, safety rules, git rules, workspace vs repository separation, cron best practices, platform formatting, heartbeat guidelines.

TOOLS.md

Tool parameter reference, system CLI tools, provider configuration, integration details for all channels and services.

CODE.md

Coding standards brain template. Enforces: no file over 500 lines (target 100–250), types in types.rs, one responsibility per file, mandatory tests for every feature, security-first patterns. Rust-first philosophy — single binary, no runtime dependencies. The agent follows these rules when writing or reviewing code.

SECURITY.md

Security policies: third-party code review, attack playbook awareness, network security, data handling, incident response.

HEARTBEAT.md

Tasks for periodic proactive checks. Keep empty to skip heartbeat API calls. Add tasks for the agent to rotate through (email checks, calendar, weather, etc.).

BOOT.md

Startup procedures: check git log, verify build, greet human with context awareness.

Customization

These files are yours. The agent reads them but you control the content. Templates are at src/docs/reference/templates/ in the source repo — compare your local files against templates when updating to pick up new sections without losing custom content.

New installs (v0.2.72+): CODE.md and SECURITY.md are automatically seeded on first run. Existing users can ask their crab: “Check my brain templates and update them if any are missing or outdated.”

Upgrading: Brain files are never overwritten by /evolve or /rebuild. After updating, ask your crab to compare templates against local files and patch in new sections.

Sessions

Sessions as Isolated Agents

A session in OpenCrabs is not a tab, not a window, not a chat thread. Each session is a fully independent agent with its own brain: conversation history, provider, model, working directory, tool state, approval policy, and context window. When you create a new session, you are spinning up a separate agent that knows nothing about any other session and shares nothing with them.

This is the core mental model: one session = one agent = one context. You can run dozens of sessions in parallel and they will never interfere with each other.

Zero Context Contamination

Sessions are completely isolated at every layer:

  • Separate message queues — each session has its own queue. Messages are routed strictly to their originating session. No cross-session bleeding, even when 10 sessions are processing simultaneously.
  • Separate provider and model — switching to Gemini in session A does not affect session B running Claude. Each session remembers its own provider independently.
  • Separate working directory/cd in one session does not change the working directory of any other session.
  • Separate conversation history — full SQLite-backed history per session. No shared memory, no prompt pollution, no context bleed.
  • Separate token tracking — cumulative usage, cost, and context window are tracked per session.

This isolation is guaranteed by Rust’s thread safety and async runtime. Each session runs as an independent tokio task with its own state, and the type system prevents accidental sharing at compile time. You can run split panes, channel sessions, background agents, and sub-agents all at once with zero risk of one session’s output leaking into another.

Workflow Patterns: One Session Per Context

The power of isolated sessions becomes clear when you treat each one as a dedicated agent for a specific domain:

Pattern 1: DevOps for a Server

Create a session for a specific server or infrastructure concern. Send a first message like "Devops for server XYZ — monitor nginx builds, manage cronjobs, handle deployments, run log cleanups". The session locks into that context: working directory set to the server’s codebase, provider set to a fast model for quick ops tasks, history filled with that server’s deployment patterns. Come back to it days later and the agent remembers every previous deploy, every cron change, every nginx config tweak.

Pattern 2: Mobile App Development with a Co-Founder

Create a session named mybrand-mobile and connect it to a Telegram group with your co-founder. The agent is locked into the Dart/Flutter codebase, the product design context, and the mobile-specific toolchain. Your co-founder can ask questions, request design changes, or review PRs directly in Telegram while you work on backend tasks in a separate session. The two contexts never mix.

Pattern 3: Production Logs with a Team on Slack

Create a session named mybrand-prod-logs-debug and connect it to a Slack channel. Your team can ask questions about production, staging, or dev logs without you having to context-switch. The agent stays locked into log analysis mode with the right SSH aliases, the right log paths, and the right debugging tools. Meanwhile, your main TUI session is free for development work.

The key insight: you never have to explain the full context again. Once a session is locked into a domain, every follow-up message inherits that context automatically.

Creating Sessions

  • TUI: Press Ctrl+N or type /new
  • Channels: Type /new in any channel

The First Message Matters

When you create a new session, the first message you send becomes the seed for the entire session’s context. OpenCrabs uses it to:

  1. Auto-generate a session title — a background LLM call extracts a 3-8 word descriptive title from your first message. This runs asynchronously and never enters the conversation context.
  2. Anchor the agent’s context — the initial message establishes what this session is about, what codebase it should focus on, what tools it should prioritize.

Good first messages are specific and contextual:

  • "Devops for server XYZ — nginx, cronjobs, deployments, log cleanups"
  • "Flutter mobile app for mybrand — Dart codebase at ~/srv/mobile/mybrand"
  • "Debug production logs for mybrand staging and dev environments"

The auto-title will generate something like Devops Server XYZ, Mybrand Mobile Flutter, or Mybrand Prod Logs Debug. You can always rename it later.

Switching Sessions

  • TUI: Press Ctrl+L to open the sessions screen, navigate with arrow keys, press Enter to select
  • Channels: Type /sessions to see recent sessions with inline buttons

Renaming Sessions

Auto-generated titles are a starting point, not a final name. You can rename any session:

  • TUI: Press Ctrl+L, navigate to the session, press r to rename
  • Agent-initiated: the rename_session tool lets the agent rename the current session with a descriptive title when the conversation evolves beyond its original scope

Empty or whitespace-only titles are rejected (v0.3.30, #128).

Session Screen

The sessions screen shows:

  • Session name
  • Created date
  • Provider/model badge
  • Working directory
  • Token usage
  • Context window usage (current session)
  • Status indicators (processing spinner, pending approval, unread)

Per-Session State

Each session remembers:

  • Provider and model — Switch to Claude in one, Gemini in another
  • Working directory/cd persists per session
  • Conversation history — Full message history in SQLite
  • Token count and cost — Cumulative usage tracking

Session Management

ActionTUIChannels
NewCtrl+N / /new/new
SwitchCtrl+L + Enter/sessions
RenameR on sessions screen
DeleteD on sessions screen

Background Processing

Sessions can process in the background while you work in another session. The sessions screen shows:

  • Spinner for actively processing sessions
  • ! for sessions waiting for tool approval
  • Dot for sessions with unread messages

Split Panes

Run multiple sessions side by side with tmux-style pane splitting. Each pane is a fully isolated agent — see Split Panes for details.

State Management

v0.2.92 improved session state tracking:

  • Session reload after cancellation — After Esc+Esc cancel, session context reloads from DB to pick up any changes made during the cancelled operation
  • Cached state cleanup — Deleting a session now clears stale pane cache entries, preventing phantom state on restart
  • CLI tool segment persistence — Tool results from CLI providers (Claude CLI, OpenCode CLI) are now saved to DB alongside regular messages, preserving correct text/tool interleaving across restarts
  • Case-insensitive tool input — Tool input descriptions use case-insensitive key lookup, fixing failures when providers return different casing

Channel Sessions

All channels (Telegram, Discord, Slack, WhatsApp, Trello) persist sessions in SQLite by channel/group title. Sessions survive process restarts — no more lost context after daemon restart. Each channel group gets its own isolated session, while owner DMs share the TUI session. Cross-channel stable session suffixes ([chat:<id>]) ensure reliable session resolution across Discord, Slack, and WhatsApp (v0.3.29).

Split Panes

OpenCrabs supports tmux-style pane splitting in the TUI. Run multiple sessions side by side, each with its own provider, model, and context — all processing in parallel.

Splitting

ActionShortcut
Split horizontal| (pipe)
Split vertical_ (underscore)
Cycle focusTab
Close paneCtrl+X

How It Works

Each pane runs an independent session. You can have one pane writing code with Claude while another reviews tests with Gemini. The status bar shows [n/total] to indicate which pane is focused.

  • Independent providers — Each pane can use a different AI provider and model
  • Independent context — Conversation history is isolated per pane
  • Parallel processing — All panes process concurrently via Tokio
  • Persistent sessions — Each pane’s session is saved to SQLite like any other session

Example Layout

┌──────────────────────┬──────────────────────┐
│  Session 1 (Claude)  │  Session 2 (Gemini)  │
│  Writing code...     │  Reviewing PR...     │
├──────────────────────┴──────────────────────┤
│  Session 3 (OpenRouter)                      │
│  Running tests...                            │
└──────────────────────────────────────────────┘

Split vertically with _, then horizontally with | in the top pane.

Persistent Layout

Split pane configuration (splits, sizes, focused pane) saves to ~/.opencrabs/pane_layout.json on quit and Ctrl+C. On restart, your layout is restored exactly as you left it. Each restored pane preloads its session messages from the database, so content is visible immediately instead of blank.

Non-Focused Panes

Non-focused panes show compact tool call summaries and stripped reasoning text. Tool groups display as single collapsed lines matching the focused pane style. All panes auto-scroll to the bottom when new messages arrive.

v0.2.92 fixed several rendering issues:

  • Tool calls no longer show a perpetual “running” spinner after completion
  • Scroll position correctly tracks for inactive panes
  • Stale cache is cleared when sessions are updated or deleted

State Management

Deleting a session now properly cleans up cached pane state. Previously, deleting a session left stale entries in the pane cache, which could cause phantom panes on restart.

Limits

There is no hard limit on pane count – you can run as many as your terminal fits. Each pane is a full session with its own token tracking and working directory.

Dynamic Tools

Define custom tools at runtime without recompiling. Tools are defined in ~/.opencrabs/tools.toml and can be created, removed, and reloaded on the fly.

Defining Tools

Create ~/.opencrabs/tools.toml:

[[tools]]
name = "deploy"
description = "Deploy the application to production"
executor = "shell"
command = "cd {{project_dir}} && ./deploy.sh {{environment}}"

[[tools]]
name = "check-status"
description = "Check service health"
executor = "http"
method = "GET"
url = "https://api.example.com/health"

Executors

ExecutorDescription
shellRuns a shell command
httpMakes an HTTP request

Template Parameters

Use {{param}} syntax for dynamic values. The agent fills these in when calling the tool:

[[tools]]
name = "search-logs"
description = "Search application logs for a pattern"
executor = "shell"
command = "grep -r '{{pattern}}' /var/log/myapp/ --include='*.log' -l"

Runtime Management

The tool_manage meta-tool lets the agent manage dynamic tools during a session:

  • Create — Add a new tool definition
  • Remove — Delete an existing dynamic tool
  • Reload — Re-read tools.toml without restarting

Dynamic tools appear alongside built-in tools in the agent’s tool list. Enable or disable individual tools without restarting the process.

Per-Parameter Value Coercion (v0.3.24)

Dynamic tools defined in tools.toml can now handle empty-string or null parameters gracefully. When a parameter arrives as "" or null, the engine substitutes a configured value before rendering the command template.

FieldPurpose
coerce_empty_toSubstitute when parameter is ""
coerce_null_toSubstitute when parameter is null
[[tools]]
name = "deploy"
description = "Deploy to environment with optional verbose flag"
executor = "shell"
command = "cd {{project_dir}} && ./deploy.sh --env {{environment}} {{verbose}}"

[[tools.deploy.params]]
name = "verbose"
type = "string"
required = false
coerce_empty_to = "--quiet"

A shell tool with an optional --verbose flag no longer breaks when the parameter is omitted. The engine substitutes --quiet (or any configured default) instead of passing an empty string.

(#95)

External contributions now enable tools.toml to be loaded in run mode and agent mode (not just the TUI). Previously, dynamic tools only worked in the interactive TUI session. Now they’re available across all modes, allowing headless automation and scripted workflows to use custom tools.

(#79 — thanks @leshchenko)

Browser Automation

OpenCrabs includes native headless Chrome control via the Chrome DevTools Protocol (CDP). No Selenium, no Playwright — direct browser control built into the binary.

Requirements

  • A Chromium-based browser installed (Chrome, Brave, Edge, or Chromium)
  • Feature flag: browser (enabled by default)

OpenCrabs auto-detects your default Chromium browser — no manual path configuration needed.

Browser Tools

ToolDescription
navigateOpen a URL in the browser
clickClick an element by CSS selector
typeType text into an input field
screenshotCapture a screenshot of the page
eval_jsExecute JavaScript in the page context
extract_contentExtract text content from elements
wait_for_elementWait for an element to appear
findFind elements matching a pattern (CSS, XPath, text, or aria-label). Returns stable selectors for subsequent click/type operations

How It Works

The browser is lazy-initialized as a singleton — it only launches when the agent first needs it. It runs in stealth mode with a persistent profile directory, so cookies and sessions survive across tool calls.

On macOS, display auto-detection enables headed mode when a display is available, falling back to headless in CI or daemon environments.

Example

Ask the agent:

“Go to our staging site, log in with the test account, navigate to the dashboard, and take a screenshot”

The agent will chain navigatetype (username) → type (password) → click (login button) → navigate (dashboard) → screenshot — all autonomously.

Configuration

No configuration needed. The browser feature is enabled by default. To disable it at build time:

cargo build --release --no-default-features --features "telegram,discord,slack"

Cron Jobs

Schedule tasks to run on a recurring schedule. Cron jobs can run in isolated sessions or wake the main session.

CLI Management

# Add a job
opencrabs cron add \
  --name "Morning Report" \
  --cron "0 9 * * *" \
  --tz "Europe/London" \
  --prompt "Check emails, calendar, and give me a morning briefing" \
  --deliver-to telegram:123456

# List all jobs
opencrabs cron list

# Enable/disable (accepts name or ID)
opencrabs cron enable "Morning Report"
opencrabs cron disable "Morning Report"

# Remove (accepts name or ID)
opencrabs cron remove "Morning Report"

Agent Management

The agent can also manage cron jobs via the cron_manage tool:

"Create a cron job that checks my emails every morning at 9am"

Options

FlagDescription
--nameJob name (unique identifier)
--cronCron expression (e.g. 0 9 * * *)
--tzTimezone (e.g. America/New_York)
--promptThe prompt to send to the agent
--providerAI provider to use (optional)
--modelModel to use (optional)
--thinkingThinking mode: on, off, budget_XXk
--deliver-toChannel delivery: telegram:CHAT_ID, discord:CHANNEL_ID, HTTP webhook URL, or comma-separated multiple targets
--auto-approveAuto-approve tool use for this job

Multi-Target Delivery

deliver_to accepts comma-separated targets to send results to multiple destinations simultaneously:

opencrabs cron add \
  --name "Morning Report" \
  --cron "0 9 * * *" \
  --prompt "Give me a morning briefing" \
  --deliver-to "telegram:-12345,http://webhook.example.com/notify"

Supported targets in any combination:

  • telegram:CHAT_ID or telegram:-GROUP_ID
  • discord:CHANNEL_ID
  • slack:CHANNEL_ID
  • http://... or https://... (webhook URL)

Results are stored in the DB via the cron_results table regardless of delivery target, so you can query past execution results with opencrabs cron results <name>.

Heartbeat vs Cron

Use heartbeat (HEARTBEAT.md) when:

  • Checks are periodic but timing is flexible (~30 min)
  • You want to reduce API calls by batching
  • Tasks share the main session context

Use cron when:

  • Exact timing matters (“9:00 AM every Monday”)
  • Task needs isolation from main session
  • You want a different model or thinking level
  • Output should deliver to a specific channel

Plans

Plans provide structured multi-step task execution with a live progress widget in the TUI.

Creating a Plan

Ask the agent to plan a complex task:

"Plan the migration from PostgreSQL to SQLite"

The agent uses the plan tool internally to create a plan with:

  • Title and description
  • Technical stack
  • Risk assessment
  • Test strategy
  • Ordered tasks with dependencies and complexity ratings

Plan Lifecycle

  1. Draft — Agent creates the plan and adds tasks
  2. Finalize — Agent calls finalize which triggers the tool approval dialog
  3. Approved — You approve in the tool dialog, plan status becomes Approved, and the agent begins executing tasks immediately
  4. In Progress — Tasks execute in dependency order
  5. Completed — All tasks done

In ask mode (default), the finalize step triggers the tool approval dialog — you review the full plan before execution begins. In auto-approve mode, finalize is auto-approved and the agent plans and executes without pausing.

Task States

Each task in a plan can be:

  • Pending (·) — Waiting for dependencies
  • InProgress (▶) — Currently executing
  • Completed (✓) — Done
  • Skipped (✓) — Manually skipped
  • Failed (✗) — Execution failed
  • Blocked (·) — Dependencies not met

TUI Plan Widget

When a plan is active, a live checklist panel appears above the input box showing:

  • Plan title and progress counter (e.g. 3/7)
  • Progress bar — Visual ██████░░░░ bar with percentage
  • Task list — Up to 6 tasks visible with status icons and task numbers
  • Overflow indicator... (N more) when tasks exceed the visible limit

The widget updates in real-time as the agent completes each task.

Managing Plans

Plans are managed through natural language:

"Approve the plan"
"Reject the plan"
"What's the plan status?"
"Skip task 3"

The agent handles plan creation, approval, execution, and status reporting through the plan tool.

Multi-Agent Orchestration

OpenCrabs supports spawning specialized sub-agents that run autonomously in isolated sessions. Each child agent gets its own context, tool registry, and cancel token. Introduced in v0.2.97 with a typed agent system and team orchestration.

Agent Types

When spawning an agent, an agent_type parameter selects a specialized role with a curated tool set:

TypeRoleTools
generalFull-capability agent (default)All parent tools minus recursive/dangerous
exploreFast codebase navigation (read-only)read_file, glob, grep, ls
planArchitecture planning (read + analysis)read_file, glob, grep, ls, bash
codeImplementation (full write access)All parent tools minus recursive/dangerous
researchWeb search + documentation lookupread_file, glob, grep, ls, web_search, http_request

Each type receives a role-specific system prompt that shapes its behavior. Explore agents are fast and lightweight – they only read files. Code agents can modify anything. Research agents can search the web but not touch your filesystem.

Safety: ALWAYS_EXCLUDED Tools

No agent type has access to these tools, preventing dangerous or recursive operations:

  • spawn_agent – no spawning agents from agents
  • resume_agent, wait_agent, send_input, close_agent – no managing siblings
  • rebuild – no building from source
  • evolve – no self-updating

Five Orchestration Tools

ToolDescription
spawn_agentCreate a typed child agent to handle a sub-task autonomously in the background
wait_agentWait for a spawned agent to complete and return its output (configurable timeout)
send_inputSend follow-up instructions to a running agent (multi-turn conversation)
close_agentTerminate a running agent and clean up its resources
resume_agentResume a completed or failed agent with a new prompt (preserves prior context)

Spawn an Agent

spawn_agent(
  label: "refactor-auth",      # Human-readable label
  agent_type: "code",          # general | explore | plan | code | research
  prompt: "Refactor auth..."   # Task instruction
)

The agent runs in its own session with auto-approved tools. No blocking – it executes in the background while the parent continues.

Wait for Completion

wait_agent(
  agent_id: "abc-123",
  timeout_secs: 300            # Max wait time (default: 300s)
)

Multi-Turn with send_input

After spawning, you can send additional instructions without restarting:

send_input(
  agent_id: "abc-123",
  text: "Also add unit tests for the new module"
)

The child agent processes the input on its next iteration. This enables iterative workflows – review the agent’s output, then ask it to refine or continue.

Resume a Completed Agent

resume_agent(
  agent_id: "abc-123",
  prompt: "Now port the same changes to the other two files"
)

The agent continues in its original session, preserving all prior context. No need to re-explain the codebase.

Team Orchestration

The TeamManager coordinates named groups of agents for parallel execution. Three team-specific tools:

Create a Team

team_create(
  team_name: "backend-refactor",
  agents: [
    { label: "auth", agent_type: "code", prompt: "Refactor auth module" },
    { label: "tests", agent_type: "code", prompt: "Write tests for auth" },
    { label: "docs", agent_type: "general", prompt: "Update documentation" }
  ]
)

All agents spawn simultaneously and run in parallel. Returns the team name and all agent IDs.

Broadcast to a Team

team_broadcast(
  team_name: "backend-refactor",
  message: "Use the new AuthError enum instead of plain strings"
)

Sends a message to all running agents in the team. Non-running agents are skipped. Useful for sharing context or direction changes.

Delete a Team

team_delete(team_name: "backend-refactor")

Cancels all running agents and cleans up resources. Completed agents are left in the subagent manager for reference.

Subagent Provider/Model Config

By default, every spawned agent inherits the parent session’s provider and model. You can override this globally in config.toml so child agents route to a different (usually cheaper or faster) backend:

[agent]
subagent_provider = "openrouter"   # Provider for child agents
subagent_model    = "qwen/qwen3-235b" # Model override

# Omit both keys and child agents inherit the parent session's provider
# and run on that provider's default model.

The override applies to spawn_agent, resume_agent, and every member of a team_create team. Per-call overrides on the spawn tools are a planned follow-up — until then, the config is the single knob. Changes take effect on next session start; running sessions keep their existing provider.

Why It Matters

The common pattern is premium parent, cheap children. Your main conversation stays on a reasoning-capable model (Opus, GPT-5, Gemini 2.5 Pro) while subtasks — file exploration, test writing, web research, bulk refactors — run on a faster, cheaper model. With a 4-agent team running 10 minutes each, the cost delta between Opus and Qwen on the children is roughly 50x.

Concrete Examples

OpenRouter parent, Qwen children — best bang-for-buck on mixed workloads:

[providers.openrouter]
enabled = true
api_key = "sk-or-..."
model = "anthropic/claude-opus-4"

[agent]
subagent_provider = "openrouter"
subagent_model    = "qwen/qwen3-235b-a22b-instruct"

Kimi on custom OpenCode provider — fast code generation for code and explore agents:

[providers.opencode-kimi]
enabled = true
base_url = "https://api.kimi.com/v1"
api_key  = "..."
model    = "kimi-k2.5"

[agent]
subagent_provider = "opencode-kimi"
subagent_model    = "kimi-k2.6"

Local Ollama children — zero cost, fully offline, good for explore agents that just read files:

[providers.ollama]
enabled = true
model = "qwen3:14b"

[agent]
subagent_provider = "ollama"
subagent_model    = "qwen3:14b"

Gemini parent, Gemini Flash children — single billing account, reasoning on main, flash on team:

[providers.gemini]
enabled = true
model = "gemini-2.5-pro"

[agent]
subagent_provider = "gemini"
subagent_model    = "gemini-2.5-flash"

Gotchas

  • The subagent provider must be enabled and have a valid API key (or be a CLI/none-auth provider). Missing keys cause the spawn to fail with a provider resolution error.
  • subagent_model must be a model the provider actually serves. qwen/qwen3-235b works on OpenRouter, not on Anthropic. Check /models on the target provider to confirm.
  • team_create members all share the same subagent config. If you need heterogeneous routing (e.g. a research agent on web-search model, a code agent on code-specialized model), spawn them individually with spawn_agent under different config profiles.
  • The CLI model override is surfaced in the spawn_agent, resume_agent, and team_create tool descriptions themselves, so the LLM knows to mention these keys to you instead of inventing per-call overrides.

If subagent_provider or subagent_model is not set, the spawned agent loads from the parent session’s provider and runs on that provider’s default model.

Workflow Patterns

Parallel Research + Implementation

team_create("feature-research", [
  { label: "research", agent_type: "research", prompt: "Find best practices for rate limiting in Rust" },
  { label: "explore", agent_type: "explore", prompt: "Find all middleware files in the codebase" }
])

Wait for results, then spawn a code agent with the combined context.

Iterative Code Review

# 1. Spawn a code agent
spawn_agent(label: "impl", agent_type: "code", prompt: "Implement rate limiting middleware")

# 2. Wait for completion
wait_agent(agent_id: "impl-id")

# 3. Resume with refinements
resume_agent(agent_id: "impl-id", prompt: "Add tests for the edge cases we discussed")

Large-Scale Refactoring

team_create("refactor-team", [
  { label: "module-a", agent_type: "code", prompt: "Refactor module A to use the new trait" },
  { label: "module-b", agent_type: "code", prompt: "Refactor module B to use the new trait" },
  { label: "module-c", agent_type: "code", prompt: "Refactor module C to use the new trait" },
  { label: "tests", agent_type: "code", prompt: "Update all tests for the new trait signature" }
])

Testing

84 tests cover the entire multi-agent system:

  • Manager state machine (spawn, wait, close lifecycle)
  • SendInput wiring and input loop
  • CloseAgent cleanup
  • WaitAgent timeout behavior
  • AgentType tool filtering
  • TeamManager, TeamDelete, TeamBroadcast
  • Registry exclusion (ALWAYS_EXCLUDED enforcement)

Agent-to-Agent (A2A) Protocol

OpenCrabs includes a built-in A2A gateway implementing the A2A Protocol RC v1.0 for peer-to-peer agent communication.

Enabling

# config.toml
[a2a]
enabled = true
bind = "127.0.0.1"   # Loopback only (default) — use "0.0.0.0" to expose externally
port = 18790
# api_key = "your-secret"  # Optional Bearer token auth for incoming requests
# allowed_origins = ["http://localhost:3000"]  # CORS

Configuration Options

OptionDefaultDescription
enabledfalseEnable the A2A gateway
bind127.0.0.1Bind address — use 0.0.0.0 to accept external connections
port18790Gateway port
api_key(none)Bearer token for authenticating incoming requests. If set, all JSON-RPC requests must include Authorization: Bearer <key>
allowed_origins[]CORS allowed origins — no cross-origin requests unless explicitly set

Endpoints

EndpointMethodDescription
/.well-known/agent.jsonGETAgent Card — discover capabilities (auto-populated from tool registry)
/a2a/v1POSTJSON-RPC 2.0 — message/send, message/stream, tasks/get, tasks/cancel
/a2a/healthGETHealth check

Methods

  • message/send — Send a message to the agent, creates a task. Returns the task with result.
  • message/stream — Same as message/send but returns an SSE stream with real-time status updates and artifact chunks as the agent works.
  • tasks/get — Poll a task by ID to check status and retrieve results.
  • tasks/cancel — Cancel a running task.

Active tasks are persisted to the database and restored on restart.

The a2a_send Tool

The agent has a built-in a2a_send tool that lets it proactively communicate with remote A2A agents. This enables true bidirectional agent-to-agent communication.

Actions:

ActionDescription
discoverFetch a remote agent’s Agent Card to see its capabilities and skills
sendSend a task to a remote agent and wait for the result
getPoll a task by ID on a remote agent
cancelCancel a running task on a remote agent

The agent can use this tool autonomously — for example, delegating subtasks to a specialized remote agent.

Connecting Two Agents

Example: VPS + Local Machine

On VPS (~/.opencrabs/config.toml):

[a2a]
enabled = true
bind = "0.0.0.0"
port = 18790
api_key = "your-shared-secret"

On local machine (~/.opencrabs/config.toml):

[a2a]
enabled = true
bind = "127.0.0.1"
port = 18790

Connectivity Options

  1. SSH tunnel (recommended) — No ports to open, encrypted:

    # From local machine, tunnel VPS A2A to localhost:18791
    ssh -L 18791:127.0.0.1:18790 user@your-vps
    

    Local agent talks to http://127.0.0.1:18791/a2a/v1

  2. Direct — Open port 18790 on VPS firewall. Simple but exposes the port. Always use api_key with this approach.

  3. Reverse proxy — Nginx/Caddy on VPS with TLS + Bearer auth via api_key.

Examples

# Discover the agent
curl http://127.0.0.1:18790/.well-known/agent.json | jq .

# Send a message (with Bearer auth)
curl -X POST http://127.0.0.1:18790/a2a/v1 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-shared-secret" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "message/send",
    "params": {
      "message": {
        "role": "user",
        "parts": [{"text": "What tools do you have?"}]
      }
    }
  }'

# Stream a task (SSE)
curl -N -X POST http://127.0.0.1:18790/a2a/v1 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-shared-secret" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "message/stream",
    "params": {
      "message": {
        "role": "user",
        "parts": [{"text": "Analyze the system status"}]
      }
    }
  }'

# Poll a task
curl -X POST http://127.0.0.1:18790/a2a/v1 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-shared-secret" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tasks/get","params":{"id":"TASK_ID"}}'

# Cancel a task
curl -X POST http://127.0.0.1:18790/a2a/v1 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-shared-secret" \
  -d '{"jsonrpc":"2.0","id":4,"method":"tasks/cancel","params":{"id":"TASK_ID"}}'

# Health check
curl http://127.0.0.1:18790/a2a/health | jq .

Bee Colony Debate

Multi-agent structured debate via confidence-weighted voting (based on ReConcile, ACL 2024). Multiple “bee” agents argue across configurable rounds, enriched with knowledge context, then converge on a consensus answer.

Security

  • Loopback only by default — binds to 127.0.0.1
  • Bearer auth — set api_key to require Authorization: Bearer <key> on all JSON-RPC requests
  • CORS locked down — no cross-origin requests unless allowed_origins is set
  • For public exposure, use a reverse proxy with TLS + the api_key Bearer auth

Self-Healing

OpenCrabs monitors its own health and automatically recovers from failures without user intervention. All recovery events surface as visible notifications across TUI and all channels.

How It Differs from Crash Recovery

OpenCrabs has had crash recovery since early versions – if the process dies mid-request, pending requests are tracked in SQLite and automatically resumed on restart (see Pending Request Recovery below).

Self-healing (v0.2.92) goes further: the agent detects and fixes problems while it’s still running – corrupted config, degraded providers, context overflow, stuck streams, DB corruption – without restarting. Crash recovery is the safety net; self-healing prevents the fall.

Config Recovery

Every successful write to config.toml creates a snapshot at ~/.opencrabs/config.last_good.toml. When the config becomes corrupted or unparseable, OpenCrabs restores from the last-known-good snapshot automatically.

⚠️ Config was corrupted — restored from last-known-good snapshot (2 minutes ago)

A CONFIG_RECOVERED atomic flag tracks whether recovery happened during the current session, so downstream code can react accordingly.

Unknown Key Detection

Unknown top-level keys in config.toml trigger a startup warning listing the unrecognized entries. This catches typos like [teelgram] or [a2a_gatway] before they cause silent misconfiguration.

Known valid sections: [crabrace], [database], [logging], [debug], [providers], [channels], [agent], [daemon], [a2a], [image], [cron].

The [a2a] section also accepts gateway as an alias via serde, deduplicating a common typo.

Custom Provider Name Normalization

Provider names with mixed case or whitespace (e.g. "My Provider" vs "my provider") are normalized on load and save, preventing duplicate entries that would confuse the provider registry.

Provider Health Tracking

Per-provider success/failure history is persisted to ~/.opencrabs/provider_health.json. Each provider tracks:

  • last_success and last_failure (epoch seconds)
  • last_error (truncated to 200 chars)
  • consecutive_failures count (resets on success)
{
  "anthropic": {
    "last_success": 1743250500,
    "consecutive_failures": 0
  },
  "openai": {
    "last_success": 1743249800,
    "last_failure": 1743249700,
    "last_error": "rate_limit_exceeded",
    "consecutive_failures": 0
  }
}

The /doctor command surfaces health stats for every configured provider. Combined with the fallback provider chain, OpenCrabs detects degraded providers and routes to healthy ones automatically.

Source: src/config/health.rs (120 lines), integrated into src/brain/agent/service/helpers.rs.

DB Integrity Check

SQLite PRAGMA integrity_check runs at startup. If corruption is detected, a notification appears in TUI and all connected channels instead of silently failing.

Error Surfacing

v0.2.92 eliminated 14+ instances of silently swallowed errors across:

  • Config writes
  • Channel sends (Telegram, Discord, Slack, WhatsApp)
  • Tool connections (Slack, WhatsApp, Trello connect tools)
  • Pane state persistence

Before: let _ = ... and .ok() everywhere, errors vanish. After: Every error surfaces via logging or user notification.

Onboarding config writes use try_write! macros that batch errors during wizard steps and report them all at the end, so users see exactly what failed.

AgentService Config Propagation

AgentService::new() now requires an explicit &Config parameter instead of calling Config::load() internally. This eliminates hidden I/O, makes dependencies explicit, and enables test injection via AgentService::new_for_test().

Render, dialogs, messaging, and cron modules no longer call Config::load() internally – errors propagate up the call stack instead of being swallowed.

Context Budget Management

The agent enforces a 65% context budget threshold. When token usage reaches 65% of the effective context window (context limit minus tool schema overhead), automatic LLM compaction fires:

  1. Detect context usage ≥ 65% of effective max tokens
  2. Compact via LLM summarization (preserves meaning, not just truncation)
  3. Retry up to 3 times if compaction fails
  4. Second pass with tighter budget if still over threshold

The 65% threshold exists because providers like MiniMax degrade on function-calling quality well before hitting theoretical context limits – tool calls break around ~133k tokens of a 200k limit.

Async Proactive Compaction (v0.3.16)

At 65% context, compaction now runs asynchronously in the background instead of blocking the chat. The agent continues processing while the LLM summarizes older messages. Once compaction completes, the context is swapped seamlessly. No more frozen UI during compaction.

Source: src/brain/agent/service/tool_loop.rs (lines 14-112)

Emergency Compaction (ARG_MAX Recovery)

When CLI provider conversation context exceeds the OS ARG_MAX limit (~1MB on macOS), the agent recovers with a 3-stage fallback:

  1. Catch the “Argument list too long” or “prompt too large” error
  2. Emergency compact the conversation with an LLM summarization pass
  3. Insert a system marker so the agent knows context was compacted
  4. Retry the request

If compaction still fails, hard truncation kicks in – keeps last 24 messages (12 conversation pairs) with a marker telling the agent to use search_session for older context. Both markers persist to DB for recovery across sessions.

Both actions emit SelfHealingAlert progress events so users see exactly what happened.

Source: src/brain/agent/service/tool_loop.rs (lines 550-687), tested with ArgTooLongMockProvider and ContextLengthMockProvider in src/tests/cli_arg_too_long_test.rs (352 lines).

Stream Resilience

Stuck Loop Detection

Some streaming providers (notably MiniMax) occasionally loop the same content indefinitely without sending a stop signal. The agent detects this:

  • Maintains a 2048-byte rolling window of recent streamed text
  • When a 200+ byte substring from the second half appears in the first half, it’s a repeat
  • Stream is terminated immediately and retry logic fires

Source: src/brain/agent/service/helpers.rsdetect_text_repetition(), tested in src/tests/stream_loop_test.rs (15 tests)

Idle Timeout

If a stream goes silent for 60 seconds (API providers) or 10 minutes (CLI providers) with no events, it’s treated as a dropped connection.

CLI providers (Claude CLI, OpenCode CLI) run internal tools — cargo builds, tests, gh commands — that can take several minutes without producing stream events. The 60-second timeout caused premature termination on these, so CLI providers now get a 10-minute window before timeout fires.

If a stream goes silent:

#![allow(unused)]
fn main() {
const STREAM_IDLE_TIMEOUT: Duration = Duration::from_secs(60);
}

The tokio::select! loop races the stream against the timeout and the user’s cancellation token. Timeout triggers retry, not a hard error.

Pending Request Recovery

Crash recovery tracks every in-flight agent request in a pending_requests SQLite table. When a request starts, a row is inserted; when it completes (success or failure), the row is deleted.

On startup, any surviving rows mean the process crashed mid-request:

  1. Query pending_requests for interrupted rows
  2. Clear all rows (prevents double-recovery if this run also crashes)
  3. Dedup by session_id (resume each session only once)
  4. Spawn background tasks with a continuation prompt:

    “A restart just occurred while you were processing a request. Read the conversation context and continue where you left off naturally.”

  5. Emit TuiEvent::PendingResumed so the TUI shows a recovery notification

Source: src/db/repository/pending_request.rs, src/cli/ui.rs (lines 705-790)

Cross-Channel Crash Recovery (v0.2.93)

Before v0.2.93, pending request recovery always responded via the TUI — even if the original request came from Telegram, Discord, Slack, or WhatsApp. The resumed response would appear in the wrong place.

Now each channel passes its name and chat_id into run_tool_loop, which stores them in pending_requests. On restart, recovery routes responses back to the originating channel:

Original channelRecovery response goes to
TelegramSame Telegram chat
DiscordSame Discord channel
SlackSame Slack channel
WhatsAppSame WhatsApp chat
TrelloSame Trello board
TUITUI (as before)

The pending_requests table gained channel and channel_chat_id columns via a DB migration. get_interrupted_for_channel lets each channel handler query only its own pending rows. Selective delete_ids prevents one channel from clearing another channel’s recovery entries.

State Cleanup

Session deletion triggers cascade deletes across all related data:

  • Messages (full conversation history)
  • Usage ledger entries (token/cost records)
  • Channel messages (Telegram, Discord, Slack, WhatsApp delivery records)
  • Plans (autonomous plans created in the session)
  • Cron jobs (scheduled tasks bound to the session)
  • Cached pane state (stale split pane entries)

Custom provider names are normalized on load and save ("My Provider""my-provider"), preventing duplicate entries that would confuse the provider registry.

Model Selector Safety

Pressing Enter in the model selector no longer clears existing API keys. The selector preserves current configuration while switching models.

Model switching errors now surface the actual error with a ⚠️ prefix on all channels, instead of always showing “Model switched” even on failure.

UTF-8 Safety

split_message() across all 5 channel handlers (Telegram, Discord, Slack, WhatsApp, Trello) now uses is_char_boundary() to find safe split points, preventing panics on multi-byte characters (emojis, CJK, accented characters).

Cancel Persistence (v0.2.97)

When a user double-Escapes to abort a streaming response, the partial content is now persisted to the database before handle.abort() fires. This means cancelled content survives a session reload – you can scroll back and see exactly what the agent was saying before you stopped it.

Claude CLI Subprocess Cleanup

Previously, aborting a Claude CLI request would orphan the underlying claude subprocess. Now the stream reader loop monitors tx.closed() via tokio::select! and kills the child process when the receiver drops, preventing leaked subprocesses accumulating in the background.

Telegram Stale Delivery Suppression

When a request is cancelled mid-flight, the agent sometimes continued processing and delivered a stale response to Telegram. A cancel_token.is_cancelled() guard now fires before final delivery, preventing old agent results from posting after cancellation.

Config Overwrite Protection

The onboarding wizard previously overwrote existing channel settings on every save, causing data loss when re-running /onboard. apply_config() now scopes writes to only the current onboarding step. from_config() sets EXISTING_KEY_SENTINEL for all existing channel data, ensuring untouched fields are never overwritten.

Tool Description Wrapping

Tool call descriptions were previously truncated at 80 characters in the TUI. render_tool_group now wraps description headers and value lines to terminal width, and the 80-char pre-truncation of bash commands in format_tool_description has been removed. Long commands and file paths display fully.

Auto-Fallback on Rate Limits (v0.2.98)

When the primary provider hits a rate or account limit mid-stream, OpenCrabs catches the RateLimitExceeded error, saves the current conversation state, and resumes the same conversation on a fallback provider configured in [providers.fallback]:

[providers.fallback]
enabled = true
providers = ["openrouter", "anthropic"]  # tried in order

The fallback chain reads from config at startup. has_fallback_provider() and try_get_fallback_provider() are available at runtime for dynamic queries.

Two-Tier Context Budget Enforcement

Compaction budget scales proportionally to max_tokens instead of a hardcoded 170k, supporting custom providers with different context windows:

  • 65% soft trigger — LLM compaction with retries (preserves meaning)
  • 90% hard floor — Forced truncation to 75% (cannot fail)
  • Pre-truncate target: 85% of max_tokens
  • Compaction is silent to user — summary written to memory log only, no chat spam

Mid-Stream Decode Retry (v0.3.0)

Transient stream decoding errors now trigger a 3x backoff retry before falling back to the provider fallback chain. This reduces false provider switches caused by momentary network glitches.

SIGINT Handler + Panic Hook (v0.3.0)

Proper terminal restoration on crash or Ctrl+C via custom SIGINT handler and panic hook. No more garbled terminal after interrupt — the handler restores raw mode, cursor visibility, and alternate screen before exiting.

Proactive Rate Limiting (v0.2.99)

For OpenRouter :free models, OpenCrabs paces requests automatically using a shared global static limiter to avoid account-level bans. The rate limiter’s first-call sentinel (last_granted=0) no longer causes an unnecessary sleep.

RSI Alert Suppression (v0.3.13)

RSI alerts are now suppressed when the feedback dimension already has a fix commit in the recent git history. This prevents the agent from alerting on issues that have already been addressed. Stale alerts also age out via a sliding window on tool failure stats.

Expanded Phantom Detection (v0.3.17)

The phantom detector now catches additional patterns:

  • “Now <file-op gerund>” phantoms — catches phrases like “Now creating…”, “Now writing…”, “Now editing…” where the model narrates a file operation without actually executing it
  • Build/deploy intent + past-tense completion claims — catches when the model claims to have built or deployed something without running the actual commands
  • Module extraction — gaslighting and phantom detectors extracted into their own dedicated module for cleaner maintenance

RSI Escalation for Repeat Violations (v0.3.17)

RSI now bumps a violation counter on existing rules instead of deduping repeat violations away. Rules that keep getting broken get louder, not silenced. This prevents the agent from ignoring persistent failure patterns.

Partial JSON Repair (v0.3.17)

A new json_repair module automatically fixes common JSON corruption:

  • Closes unterminated strings
  • Balances brackets
  • Strips trailing commas
  • Drops trailing keys-without-value

Wired into 5 drop sites across OpenAI-compatible providers and the ContentBlockStop finalizer. Unrecoverable input returns a {"_partial": ..., "_repair_failed": true} envelope instead of crashing the turn.

Upstream Template Sync (v0.3.15)

Brain file templates are now automatically synced from the upstream OpenCrabs repo. The sync uses version gating (only applies templates from newer versions) and append-only diffs (never overwrites existing content). This ensures you always get the latest brain file improvements without losing your customizations.

Browser Resilience (v0.3.18)

Multiple browser reliability improvements:

  • Network idle wait after navigate — now waits for networkIdle instead of just CDP load event, catching async fetches
  • CDP manager lock released before await — lock was held during screenshot await, blocking concurrent browser operations
  • CDP pre-flight health check — added health check before screenshot capture to prevent stale connection failures
  • Browser navigate errors logged — navigate errors no longer silently swallowed with let _ =, now logged at WARN

Cloud Handshake Timeout (v0.3.18)

Bumped cloud provider handshake timeout from 30s to 60s. Routing proxies like dialagram legitimately take 20-45s; 30s was killing mid-request on slow-but-healthy providers.

Gemini API Key Security Fix (v0.3.18)

Fixed CodeQL #64 (HIGH): Gemini API key was leaked in URL query string (?key=...) in analyze_video’s resumable upload init and file-state polling. Moved to x-goog-api-key header, matching analyze_image and generate_image.

Stream & TUI Fixes (v0.3.18)

  • File paths starting with / no longer treated as slash command typos/Users/.../file.pdf yo crabs check this triggered “Unknown command”. Added looks_like_file_path() helper gating both TUI and channel handlers.
  • Truncation continuations no longer trigger provider fallback — mid-sentence continuations should stay on the same provider. Fallback now skipped for truncation paths.
  • Fallback error reason surfaced in TUI — when fallback fired, the underlying error was swallowed. Now shows as a system message.
  • Pipe-delimited rows hard-broken — when not recognized as a table, pipe rows ran together. Added hard-break between rows.

v0.3.25 Fixes

  • Compaction dropped 55% kept-tail — summary IS the conversation now, no more redundant tail retention
  • Self-heal 5-nudge budget — reasoning-only turns get 5 nudges before sticky fallback, preventing empty replies from silently dropping
  • Completion-escape clause — phantom enforcement messages now have escape clause to prevent infinite loops
  • Scroll fixes — removed load_more_history() from scroll handler (overshoot fix), preserved scroll during streaming, skip first-render compensation
  • Brain file cleanup_intentwrite_opencrabs_file now accepts cleanup_intent flag for user-driven maintenance. RSI agent blocked from shrinking brain files (issue #103)
  • Channel improvements — WhatsApp photo batching for multi-image uploads, Telegram media_group_id-based batching, Gemini schema strips default/example from tool schemas (#101, @leshchenko1979)
  • Custom provider model selection persistence — properly saves and displays custom provider model selection
  • Compaction prompt dominance fix — plan tool descriptions and scroll sensitivity improvements

v0.3.23 Fixes (Hotfix Release)

  • Phantom detection restored — v0.3.21’s turn-level tools_executed_this_turn gate was too aggressive: once any tool ran in a turn, phantom detection went silent for the rest of the turn, letting fabricated wrap-up text reach the TUI. Dropped the gate from all three phantom branches.
  • Self-heal never aborts — stuck-intent-loop now fast-escalates to sticky fallback instead of aborting; cap-exhaustion resets retry counter and injects hard nudge; phantom_retries_used now tracks consecutive phantoms since last real tool. Recovery always retries or falls back.
  • Brain file guardrail — generic write_file / edit_file now refuse to modify protected brain files (SOUL.md, USER.md, TOOLS.md, etc.), preventing accidental clobber. Routed through write_opencrabs_file instead.
  • A2A approval policy wired — A2A message/send tasks now resolve approval policy via check_approval_policy(). With auto-always set, tools auto-approve; otherwise returns warning. Fixes “Tool requires approval but no approval mechanism configured” errors.
  • Channel /new session switching fixed/new now uses per-message resolver’s title format everywhere (Telegram, Discord, Slack), so session switching works across all channels.
  • Version-aware model sort — when OpenAI-compatible servers return zero or identical created timestamps, extracts numeric segments from model names and sorts newest version first. Fixes meaningless model lists on vLLM/llama.cpp.

v0.3.22 Fixes

  • Compaction typing without banner — reverted the visible “🗜️ Compacting context” banner text. Now uses typing-only refresh (Telegram send_chat_action(Typing), Discord broadcast_typing loop) keeping the “is typing” indicator alive during the 10-60s compaction window silently.
  • Channel /new archive consistency — unified archive behavior across all channels: non-owner sessions get archived (so next title lookup resolves cleanly), owner sessions stay non-archived and remain visible in /sessions.

v0.3.21 Fixes

  • Multi-language phantom detection via compile-time TOML — replaced regex patterns per language with TOML-defined char sets compiled into build-time match arms. New languages added by editing TOML, no Rust changes. Cross-language regression test added.
  • Self-heal pipeline hardened — phantom detection gated on turn-level tool execution, phantom iterations no longer persisted to DB, phantom text stripped from context before next turn, sticky fallback applied on exhaust.
  • OpenAI-compatible image generation — new image generation backend calling any /v1/images/generations endpoint. Providers override generation model independently via generation_model config field.
  • Working directory visible across tools — working directory now visible to all tools within the same iteration.
  • Compaction banner stripped from context — compaction banner text no longer fed to LLM context, preventing models from echoing it back.
  • Pipe-separate model callback — custom-provider model callbacks now pipe-separated so colons in provider names (e.g. “Qwen: DashScope”) survive parse.
  • Custom-provider model selection persists/models dialog now correctly saves and syncs live model list for custom providers.
  • one_shot_pct display corrected — fixed incorrect percentage display in usage dashboard.
  • Session updated_at touched on switch — session last-modified timestamp updated when switching sessions via Telegram, preventing stale session resolution.

v0.3.19 Fixes

  • Cron provider/model cross-contamination fixed — cron’s execute_job called global swap_provider() instead of session-scoped swap_provider_for_session(), so concurrent cron jobs on the shared Cron session overwrote each other’s provider. Now each job swaps on its own session ID.
  • Cron mismatched pair validation — reversed cron config (e.g. default_model = "zhipu" where zhipu is a provider name) produced impossible pairs like dialagram/zhipu that timed out with no diagnostics. Added validation: if effective_model is not in the provider’s supported_models(), the job is skipped with a loud error.
  • Windows CI test failures fixedtool_loop_helpers_test.rs used hardcoded Unix /tmp/ paths and /etc/hosts assertions. Added platform-specific test variants with #[cfg(unix)] / #[cfg(windows)].
  • CI Node 24 forced upgrade removed — removed FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true env var that broke actions/cache@v4 with punycode deprecation on Node 21+.
  • Codex OAuth device flow field names fixed — OpenAI’s device auth API uses non-standard field names (device_auth_id instead of device_code, string interval instead of number, expires_at instead of expires_in). Fixed with serde aliases and custom deserializer.
  • Codex OAuth verification URL corrected — was hardcoded to non-existent auth.openai.com/verify, changed to auth.openai.com/codex/device matching Codex CLI.
  • Codex OAuth model list curated/models dialog showed non-OpenAI models (Phi-4, Llama, Mistral) because the codex provider ID wasn’t mapped to the curated GPT-5 model list.

v0.3.26 Fixes

  • Hashline collision detection (#105) — pure content hash prevents line-shift avalanche when lines are inserted/deleted above a hash anchor. On collision, escalates to edit_file fallback instead of corrupting the edit
  • RSI brain file hygiene (#111) — rejects raw failure-event logs from being written to brain files. RSI now sanitizes feedback dimensions before persisting
  • Tool error output (#113) — tool errors now include stdout/stderr in error content, ANSI escape sequences stripped, 8000 char cap to prevent context blowout

v0.3.27 Fixes

  • Ctx budget baseline on channel /new — shows calibrated baseline immediately after /new instead of waiting for first message
  • Auto-title session fix (#114) — preserves [chat:ID] suffix to prevent title duplication on subsequent auto-title fires
  • Sessions display (#115) — arrow prefix + “current” label instead of checkmark for clearer session switching UI

v0.3.28 Fixes

  • Voicebox + STT/TTS fallback chains — 2s liveness probe detects dead audio devices, librosa error translator surfaces actionable messages instead of Python tracebacks, per-provider fallback chains configured in config.toml
  • Browser multi-step navigation hardening (6 commits) — text=/xpath= selector prefixes, recovery hints on click failures, semantic loop detection (4+ screenshots in 8 iterations triggers abort), no-op screenshot rejection, same-URL short-circuit
  • Tool-call shape recovery — dict-by-call-id extraction for Qwen-3.7-max-preview regression where tool calls arrive as flat dicts instead of nested arrays
  • Edit tool improvements (#117) — fuzzy line-sequence fallback when exact match fails, hashline docs clarification
  • Brain backup rotation — max 5 backups per file, max 7 days old, preventing unbounded .bak accumulation
  • Auto-title fixes (#118, #120) — fires on FIRST turn (not second), retries on LLM failure instead of giving up
  • Ctx counter real-time only (#119) — ripped out calibration system entirely, uses provider-reported input_tokens verbatim. No more “0/max” for uncalibrated providers
  • Profile brain-template seeding — seeds 8 templates on profile create, recovery path for empty profiles

v0.3.29 Fixes

  • Auto-title thinking-block fallback (#121) — reasoning models returning only a Thinking block (no Text block) now get a title extracted from the thinking content instead of dropping silently. extract_title_candidate falls back to pluck_title_from_thinking (last quoted phrase, then last short sentence)
  • Telegram label-drift fix (PR #123 @leshchenko1979) — auto-titled sessions no longer overwritten on every subsequent message. should_refresh_label policy only refreshes default→default-different or group label changes, never auto-titled or custom titles. Chat→session binding on /sessions switch

v0.3.30 Fixes

  • 5-language deferment stall detection — self-heal catches “I need to X” / “I have to X” / “I must X” / “I should X” patterns in English, Spanish, Portuguese, French, and Russian
  • Follow-up message = ESC x2 cancel — all four channel handlers treat a follow-up message during an active agent run as double-Escape cancel, then starts fresh
  • Dynamic Telegram status messages — replaced hardcoded quips with context-aware messages showing actual tool being called, tokens streamed, and elapsed time
  • rename_session rejects empty titles (#128) — whitespace-only titles rejected so sessions can’t become unidentifiable

v0.3.31 Fixes

  • Fun POST-COMPACTION PROTOCOL prompts — after compaction, the agent receives a playful system prompt instead of a sterile summary marker. These rotating prompts (e.g. “You just woke up from a nap. The summary above is everything you remember.”) make the post-compaction experience less robotic. Users can opt out with [agent] silent_compaction = true in config.toml.
  • Telegram forum topic routing — in supergroups with topics enabled, thread_id is tracked through the full pipeline. The agent can use list_topics to map topic names to IDs, then route responses to specific topics via thread_id on send/reply/send_photo.
  • PDF page_range paramparse_document now accepts page_range strings like "1-30", "5,7,10-15", or "3" for targeted extraction. Text-first routing skips Gemini for text-native PDFs. Inline preview cap raised to ~60 pages.

v0.3.32 Fixes

  • Evolve hardening (#136) — the /evolve command now handles busy Linux binaries with a remove+rename dance (can’t overwrite a running binary on Linux), delayed systemd-run restart to let the current process finish cleanly, structured tracing for better error diagnosis, and a pre-flight count_matching_systemd_units check to avoid restarting when multiple OpenCrabs instances are running.

v0.3.33 Fixes

  • User-correction metadata (#138, PR #140) — display_text_override now captures the actual user message text instead of the 236-character Telegram channel prefix that was previously stored. This makes user correction entries in the feedback ledger readable and actionable.

v0.3.34 Fixes

  • follow_up_question race fix (closes #142) — all four channels (Telegram, Discord, Slack, WhatsApp) now flush intermediate text handles before presenting the follow-up keyboard. Prevents the race where the bot’s in-progress message got orphaned or duplicated when the user tapped a button mid-stream. Each channel got its own atomic commit with per-channel regression tests pinning the flush-before-keyboard sequence.
  • follow_up_question display polish (closes #148) — Telegram keyboard is now single-column with a 40-character label cap (rejects options longer than 40 chars in the tool validator with a clear error). Rolling “Running follow_up_question (16s)” status is suppressed while the keyboard is pending, and the LLM is now instructed to call the tool silently without echoing the question text in surrounding prose. Discord left alone due to its 5-ActionRow-per-message hard limit.
  • Phantom detector hardening — two narration shapes had been leaking past the phantom detector: pronounless deferment (Need to read the X) and bare gerunds (Reading the current state of the affected files). Added 28 pronounless EN variants, 15 telegraphic FR besoin de variants, and gerund+determiner bigrams. New regression file pins both leaked sentences verbatim. Follow-up fixed French accent detection: detect_language missed é/è/ë/ü, so French narration fell through to English and the new besoin de phrases never matched. Added the 4 markers.
  • Fallback provider cascade (closes #152)/models swaps and session restores were storing a raw provider instead of wrapping it in FallbackProvider, so the fallback cascade could not fire on 5xx/429 errors after a model switch. Every active provider now gets wrapped unless it is already a chain or no fallbacks are configured. 174-line integration test simulating 5xx cascades across swapped providers.
  • Error persistence — agent failures now persist as permanent chat bubbles with actionable wording on TUI and channels instead of vanishing after the turn. UTF-8 panic after redact-prefix scan fixed by snapping to char boundary.
  • FINISHING A TURN rewrite — split the brain preamble directive into side-effect vs analysis response shapes, added a nudge on empty data-fetch closes so the agent never ends a turn silently after running research tools, and requires an explicit acknowledgement sentence instead of letting finish_reason: stop with no content reach the user.
  • Claude CLI model auto-learn — footer showed Opus 4.7 after Anthropic shipped 4.8 because default_for_alias hardcoded opus -> opus-4-7. Now the provider learns the CLI-resolved version from message_start events, persists to ~/.opencrabs/claude_cli_models.json (rewriting only when the value changes), and the TUI refreshes the session model live so the footer self-corrects to the actual version without code changes. default_for_alias prefers the learned cache and falls back to a build-time seed only on a fresh install.
  • tok/s in channel footers — channel context budget footers showed only ctx: XK/YK Z% while the TUI also showed | N tok/s. Added tokens_per_second: Option<f64> to AgentResponse, extended format_ctx_footer to accept a third tps parameter, computed tok/s from total_output_tokens / turn_duration across the whole turn in tool_loop.rs, and wired it through all four channels.

Unreleased (post-v0.3.33)

  • Phantom post-success exemption — the phantom detector used to fire on short completion acknowledgments like “Pushed.”, “Done.”, or “Committed as abc123” because those look like past-tense completion claims without a tool call. But when the agent just finished a real tool run, that one-line ack is the correct behavior. A turn-scoped tool_calls_completed_this_turn counter and a phantom_eligible gate now suppress phantom detection once real tool calls have landed in the current turn. The complementary FINISHING A TURN brain preamble directive tells the agent to reply with one short ack, skip verification re-runs, and stop restating conclusions in different wording.
  • follow_up_question intermediate flush (issue #142) — when the agent called follow_up_question after typing an explanatory preamble, Telegram/Discord/Slack/WhatsApp sometimes delivered the button block before the preamble text because intermediate text sat in a 500ms-polled queue while follow_up_question sent directly. All four channel handlers now flush pending intermediate JoinHandles before dispatching the question, guaranteeing the explanatory text renders above the buttons.

Notifications

All self-healing events are delivered to:

  • TUI (status bar notification)
  • Telegram, Discord, Slack, WhatsApp (if connected)

Nothing happens silently. If the crab fixes itself, it tells you what it fixed.

Self-Improvement (RSI)

OpenCrabs improves itself over time through Recursive Self-Improvement (RSI). The agent analyzes its own performance, identifies patterns, and autonomously updates its own brain files.

How It Works

1. Feedback Collection

Every tool execution, user correction, and interaction is automatically logged to the feedback ledger. Categories include:

  • tool_success / tool_failure — whether tool calls worked
  • user_correction — when you corrected the agent’s behavior
  • provider_error — LLM stream drops, rate limits, timeouts
  • pattern_observed — recurring behaviors the agent notices

2. Pattern Analysis

The agent calls feedback_analyze to review its performance:

  • Per-tool success rates
  • Recent failure patterns
  • User correction frequency
  • Provider reliability trends

3. Autonomous Improvement

When patterns are identified, the agent calls self_improve to:

  • read: Load a brain file (SOUL.md, TOOLS.md, etc.) before modifying
  • apply: Append new instructions based on observed patterns
  • update: Surgically replace existing sections that need refinement
  • list: Show all previously applied improvements

4. Change Tracking

Every improvement is logged to ~/.opencrabs/rsi/improvements.md with:

  • Timestamp
  • Target file modified
  • Description of the change
  • Rationale (which feedback event triggered it)

Old improvements are archived to ~/.opencrabs/rsi/history/ to keep the active file lean.

Example

User: "stop including testing steps in your output"
  → feedback_record(event_type="user_correction", dimension="output_hygiene")
  
Agent notices pattern of 5+ corrections on output hygiene:
  → feedback_analyze(query="failures")
  → self_improve(action="apply", target_file="SOUL.md", 
    content="Never include testing steps or verification commands in user-facing output.")
  → Logged to rsi/improvements.md

Key Rules

  • No human approval needed for self-improvements — the agent identifies patterns and applies fixes directly
  • Surgical updates only — replaces specific sections, doesn’t rewrite entire files
  • Always reads before modifying — never blindly overwrites brain files
  • Archives old improvements — keeps the improvement log manageable

RSI Engine Architecture

The RSI engine is a background task that runs continuously alongside OpenCrabs. Here’s how it works at each layer:

Feedback Ledger

Every tool execution, user correction, provider error, and self-heal event is automatically logged to a SQLite-backed feedback ledger. Event types:

Event TypeWhat It Tracks
tool_success / tool_failureWhether tool calls worked, with args and error details
user_correctionWhen you corrected the agent’s behavior
provider_errorLLM stream drops, rate limits, timeouts
pattern_observedRecurring behaviors the agent notices
context_compactionContext budget exceeded
improvement_appliedRSI applied a fix to a brain file
self_heal_triggerRuntime self-heal caught and fixed an issue

Cycle Flow

  1. Startup — writes a digest of feedback stats to ~/.opencrabs/rsi/digest.md
  2. Every hour — checks for new feedback entries since the last cycle
  3. Opportunity detection — identifies tools with >20% failure rate (7-day window), user correction patterns, and provider errors
  4. Git-aware suppression — checks if a fix commit already landed for the tool in question. If yes, suppresses the alert instead of re-reporting stale issues
  5. Autonomous agent spawn — if opportunities are found, spawns a lightweight agent with RSI-only tools (feedback_analyze, self_improve, rsi_propose) that analyzes the data and applies targeted fixes

Brain File Taxonomy

RSI routes improvements to the correct brain file based on what went wrong:

Brain FileWhat It ControlsWhen RSI Writes Here
SOUL.mdBehavior, tone, reasoning patternsPhantom tool calls, verbose responses, wrong tone
TOOLS.mdTool usage, argument formats, pitfallsRepeated tool failures with similar args
USER.mdUser preferences and correctionsRepeated user corrections
MEMORY.mdPersistent knowledge and contextAgent lacks context it should retain
AGENTS.mdWorkspace rules, safety policiesAgent-level behavior issues
CODE.mdCoding standardsCode quality feedback
SECURITY.mdSecurity policiesSecurity-related feedback

Repeat-Violation Escalation

RSI tracks violation counters inline in brain file rules. When a rule keeps getting broken, RSI bumps the counter and appends evidence (dates, session IDs). Rules that keep getting broken get louder, not silenced. This is the escalation pattern that makes RSI effective at fixing persistent bad habits.

RSI Proposals

The RSI loop can propose new dynamic tools and slash commands based on gaps it observes in the agent’s capabilities. Proposals land in TOML inboxes at:

~/.opencrabs/rsi/
├── proposed_tools.toml      # pending tool proposals
├── proposed_commands.toml   # pending command proposals
├── applied/                  # accepted proposals (daily archive)
│   ├── 2026-05-01-tools.toml
│   └── 2026-05-01-commands.toml
└── rejected/                 # rejected proposals (daily archive)
    ├── 2026-05-01-tools.toml
    └── 2026-05-01-commands.toml

How Proposals Work

  1. RSI analyzes feedback and notices the agent repeatedly working around a missing capability
  2. RSI drafts a tool or command definition with a rationale citing the evidence
  3. Proposal lands in the inbox — reviewed via Mission Control or the rsi_proposals tool
  4. User applies or rejects — applied entries go to tools.toml/commands.toml, rejected entries are archived with an optional reason

When RSI Proposes a Tool

  • A specific bash command appears repeatedly across sessions (e.g. gh issue list, docker ps)
  • The agent calls http_request to the same endpoint multiple times with similar payloads
  • Only safe-by-default tools are proposed (read-only verbs, GET requests). Shell-based tools always set requires_approval=true

When RSI Proposes a Command

  • The user types /something repeatedly that doesn’t exist
  • A common multi-step prompt gets reused verbatim — a slash command saves typing

Safety Guardrails

  • RSI never installs directly — proposals require user approval via Mission Control or the rsi_proposals tool
  • No destructive proposals — RSI will never propose rm, dd, mv, or any shell tool with destructive side effects
  • Deduplication — if a proposal was already filed and not applied, RSI won’t repropose it
  • One proposal per cycle — quality over quantity
  • Evidence required — every proposal cites the feedback events that drove it

RSI Hardening (v0.3.13)

  • Append-only brain files — brain files (SOUL.md, TOOLS.md, etc.) are now append-only with backup-before-write. The agent can only add new content, never delete or overwrite existing lines. This prevents accidental data loss from bad self-improvements.
  • Upstream template sync — brain file templates are automatically synced from the upstream repo with version gating and append-only diffs. You get the latest improvements without losing your customizations.
  • RSI alert suppression — alerts are suppressed when the dimension already has a fix commit, preventing noise on already-addressed issues.

RSI Autonomous Proposals (v0.3.16)

The RSI loop can now propose new tools and slash commands autonomously. Proposals land in the Mission Control inbox for review — the agent identifies gaps from feedback data and drafts solutions, but installation requires human approval via the inbox UI or /mission-control.

RSI Escalation for Repeat Violations (v0.3.17)

RSI now bumps a violation counter on existing rules instead of deduping repeat violations away. When a rule keeps getting broken across multiple sessions, the escalation counter increases and the agent prioritizes fixing that pattern. This prevents persistent bad habits from being silently ignored.

v0.3.10 Additions

  • Cycle summaries no longer truncated — full text displays in TUI instead of cutting off mid-sentence
  • Phantom detection reduced to 2-signal requirement — needs both intent keyphrase AND zero tool calls before flagging, eliminating spurious self-heal triggers
  • Uses active provider — respects current provider/model config instead of hardcoded Anthropic
  • Persistent session reuse — one session per cycle, survives app restarts by persisting last_cycle timestamp
  • Skips unchanged feedback — if feedback count hasn’t changed, skips analysis to avoid wasted LLM calls

v0.3.11 Additions

  • DashScope migration — Qwen OAuth rotation replaced with simple API-key provider, deleting ~2,500 lines of complexity
  • Local model tool-call extraction — auto-extracts tool calls from text content: bare JSON {"tool_calls":[...]}, Claude-style XML <TOOLNAME><PARAM>value</PARAM></TOOLNAME>, and Qwen-specific <!-- tool_calls --> markers
  • 40+ TUI/self-heal fixes — narrowed phantom gate, split thinking per iteration, anti-code-block nudge for local models, tighter phantom scope, mid-turn “Let me see:” catch, backtick code reference detection
  • Per-session provider isolation — each session carries its own provider instance; no global swap affecting all sessions
  • Sub-agent AwaitingInput statewait_agent polls state and returns partial progress on timeout instead of deadlocking

v0.3.20 Additions

  • RSI home directory resolution fixed — RSI now resolves ~ to the actual home directory instead of using CWD-relative paths, preventing brain file writes to wrong locations
  • Bare tool-call arrays caught — top-level arrays from models no longer crash RSI’s feedback dimension parsing; wrapped correctly before recording

v0.3.21 Additions

  • Multi-language phantom detection — compile-time TOML char sets replaced language-specific regex patterns. RSI feedback now works with all supported languages via the new char-set system. Cross-language regression test added.
  • RSI cycle output dedup by hashing — cycle output dedup now uses hash comparison of assembled opportunities instead of string matching, preventing duplicate cycle reports.
  • Sticky fallback on phantom exhaust — when phantom detection exhausts retries, RSI applies sticky fallback provider to prevent cascading failures.
  • Phantom iterations not persisted — phantom iterations no longer written to DB, keeping history clean of failed self-heal attempts.
  • OpenAI-compatible image generation — image generation via any /v1/images/generations endpoint with configurable generation_model override.

v0.3.23 Additions

  • Brain file clobber guardrail — generic write_file / edit_file now refuse protected brain files, routing through write_opencrabs_file which enforces append-only, dedup-aware shrink, and .bak snapshots.
  • A2A approval policy — A2A tasks now resolve approval policy correctly. auto-always and auto-session policies work for remote agents.

v0.3.25 Additions

  • Brain file cleanup_intentwrite_opencrabs_file accepts cleanup_intent flag for user-driven brain file maintenance. RSI agent explicitly blocked from shrinking brain files, preventing autonomous self-improvement from accidentally wiping content (issue #103).
  • RTK Token Savings integration — bundled RTK binary (4MB, v0.40.0) as default feature with zero-config. Works as direct proxy: agent runs git status, RTK intercepts output through Rust, filters it, returns token-optimized version. 100+ commands supported (git, cargo, npm, pnpm, docker, kubectl, grep, find, ls, tree, curl), blocklist for interactive/REPL commands (vim, ssh, python, mysql). Binary discovery checks bundled location first, falls back to PATH. /rtk slash command shows savings stats. Real-world results: 53.5% token savings across 180 commands (PR #102).

v0.3.19 Additions

  • RSI feedback records actual model used — when helpers remap a mismatched model to the provider default, RSI now records the resolved model instead of the impossible original pair. All 3 recording sites in tool_loop.rs now resolve the actual model before constructing the feedback dimension
  • Tool loop reasoning markers persisted — reasoning content persisted in non-CLI content column so thinking state survives across tool loop iterations
  • @ file picker fixed for large repos — recursive walk now skips .git/.hg/.svn directories and raised result cap from 5k to 20k, preventing pack/ref files from exhausting the cap

v0.3.26 Additions

  • RSI brain file hygiene — rejects raw failure-event logs from being written to brain files. Feedback dimensions are sanitized before persisting, preventing noise accumulation in SOUL.md and TOOLS.md
  • Hashline collision escalation — when hashline_edit detects a collision (two lines with identical content hashes), RSI escalates to edit_file fallback instead of applying a corrupted edit
  • Dynamic help screen — help screen auto-generates from SLASH_COMMANDS constant, so new commands appear automatically without manual help text updates

v0.3.28 Additions

  • Brain backup rotation — max 5 backups per file, max 7 days old. Prevents unbounded .bak accumulation in ~/.opencrabs/ from repeated RSI writes
  • Profile brain-template seedingprofile create now seeds 8 brain file templates automatically, with recovery path for empty profiles. Ensures new profiles start with complete brain file sets
  • Auto-title retry on LLM failure — auto-title no longer gives up on first LLM error; retries with backoff before falling back to truncated first message

v0.3.30 Additions

  • RSI rejects trivial contentself_improve apply action now rejects trivial test content before it can pollute brain files, preventing noise from accumulating in SOUL.md and TOOLS.md

v0.3.31 Additions

  • RSI skill proposalsskill is now a third proposal kind alongside tool and command. When RSI identifies a multi-stage workflow pattern that recurs across sessions, it proposes a SKILL.md file instead of a simple tool or command. Applied skill proposals write to ~/.opencrabs/skills/<name>/SKILL.md and become immediately invocable as /<name> across all channels.
  • Bash command visibility — RSI now sees the actual bash command text plus a subsystem classifier (git, cargo, docker, npm, etc.) in feedback events. This lets RSI identify recurring shell patterns more accurately and propose targeted tools or skills.
  • Successful patterns surface as proposals — RSI doesn’t only react to failures. When a tool/command/skill pattern works reliably across multiple sessions, RSI surfaces it as a proposal to make the pattern more discoverable or ergonomic.

v0.3.34 Additions

  • Brain dedup scan (closes #147) — new RSI proposal kind BrainDedup that scans all 11 brain files daily, clusters duplicate lines (minimum 10 chars, skips structural markdown like headings and separators), and files dedup proposals into Mission Control with a soft purple badge. Runs every 24 RSI cycles (about once per day at 1-hour intervals), never auto-applies — human approval required through the existing rsi_proposals apply/reject flow. Core scan logic in dedup_scan.rs (393 lines), hooked into the RSI cycle with periodicity gating, 14 regression tests covering empty files, short-line filtering, cross-file detection, proposal format, and canonical selection.
  • Skill description injection (closes #151) — skill descriptions were documented in TOOLS.md as LLM auto-invoking triggers but were never actually injected into the system prompt, so the LLM could not auto-invoke from description alone. Added push_skills_section() to prompt_builder.rs that loads all skills via crate::brain::skills::load_all_skills() and formats each as - skill_name: description, appending an ## Available Skills block to both build_core_brain() and build_system_brain(). 2 regression tests.
  • RSI decorative counters removed (closes #149, PR #150) — removed the counter-bumping logic that incremented inline counters in SOUL.md like phantom_tool_call: 219. These counters were decorative only, nothing read them, and the real canonical source is the SQLite feedback ledger at ~/.opencrabs/feedback.db. Counters went stale (ledger showed 302, SOUL.md showed 219) and got wiped by upstream template sync. Replaced with evidence appends (date/session). DB stays the single source of truth. Follow-up commit escaped unescaped double quotes the PR introduced in the prompt string literal and added text regression tests.

Self-Healing vs Self-Improvement

Self-HealingSelf-Improvement
Fixes runtime errors (config corruption, DB issues)Fixes behavioral patterns (bad habits, user corrections)
Automatic, no analysis neededRequires feedback analysis first
Protects the system from crashingMakes the agent better over time
ImmediateAccumulates across sessions

Mission Control

Mission Control is a full-screen TUI dashboard that brings RSI activity, inbox proposals, and scheduled jobs into one place. Open it with /mission-control.

The Three Panels

The screen is divided into three panels: a large Inbox on the left, and Activity + Schedule stacked on the right.

Inbox

Pending RSI proposals displayed as cards. Each card shows:

  • Tool proposals (orange tool badge) — new dynamic tools RSI thinks you need, with the shell command template
  • Command proposals (teal command badge) — new slash commands RSI drafted based on usage patterns
  • Skill proposals — new SKILL.md files RSI drafted when it detects a repeated multi-step workflow that isn’t covered by an existing skill

Each card shows the proposal name, type badge, description or command template, and how long ago it was proposed.

Apply or reject proposals inline:

  • a — apply the selected proposal (installs tool/command to config, or creates skill directory)
  • r — reject the selected proposal (archives with optional reason)

Applied and rejected entries are archived daily to ~/.opencrabs/rsi/applied/ and ~/.opencrabs/rsi/rejected/ so the trail is auditable.

A banner on session start shows the count of pending inbox items.

Activity

A chronological feed of the last 100 RSI improvements. Shows what the autonomous engine did, when, and why:

  • Brain file modifications (SOUL.md, MEMORY.md, TOOLS.md, etc.)
  • Template syncs from upstream
  • Hard rule additions
  • Feedback analysis summaries
  • Violation count updates

Each entry shows the time ago, a summary of the change, and the target file.

Schedule

Your cron job queue with paused/active state. Each job shows:

  • Job name
  • Cron expression
  • Next run time (when active)
  • paused label (when paused)

See Cron Jobs for full cron documentation.

Cron BLOB Recovery (v0.3.20)

Cron jobs with legacy BLOB-typed prompt rows in the database are now tolerated instead of causing silent failures. The schedule panel resumes showing jobs normally.

Compaction Typing Without Banner (v0.3.22)

The visible compaction banner text has been removed. The schedule panel now uses typing-only indicators during compaction windows (10-60s), keeping the experience clean.

Context Counter Evolution (v0.3.26→v0.3.28)

  • v0.3.26 — introduced per-provider tokenizer calibration. Uncalibrated providers showed 0/max until first message calibrated the ratio
  • v0.3.28 — calibration system removed entirely. Context counter now uses provider-reported input_tokens verbatim, showing real-time usage without calibration overhead

Keyboard Navigation

KeyAction
Tab / Shift+TabCycle focus between panels (Inbox → Activity → Schedule)
/ Move selection within a panel
EnterOpen detail popup for selected item
aApply selected inbox proposal
rReject selected inbox proposal
EscClose popup or exit Mission Control

Architecture

Mission Control is split into three module trees:

LayerPathPurpose
Data servicesbrain/mission_control/Fetches inbox proposals, activity log, schedule items
Panel rendererstui/render/mission_control/Draws each panel (inbox, activity, schedule, detail popup)
App state + inputtui/app/mission_control/Focus management, keystroke handling, actions

Layout and keystroke contracts are unit-testable without spinning up a full App instance.

Multi-Profile Support

Run multiple isolated OpenCrabs instances from a single installation. Each profile gets its own config, memory, sessions, brain files, skills, cron jobs, and gateway service.

Introduced in v0.2.94.

Why Profiles?

Common use cases:

  • Work vs personal — separate API keys, brain files, Telegram bots
  • Multiple clients — different persona and config per customer
  • Model experimentation — compare different provider setups without clobbering your main config
  • Staging vs production — test brain file changes on a staging profile before rolling to your main agent

Creating a Profile

# Create a new profile
opencrabs profile create hermes

# List all profiles
opencrabs profile list

# Show details for a profile
opencrabs profile show hermes

# Delete a profile
opencrabs profile delete hermes

Switching Profiles

There are two ways to use a non-default profile:

# CLI flag (per-session)
opencrabs -p hermes

# Environment variable (persistent)
export OPENCRABS_PROFILE=hermes
opencrabs

The default profile (~/.opencrabs/) works exactly as before — zero breaking changes.

Directory Structure

Each profile gets its own directory under ~/.opencrabs/profiles/<name>/:

~/.opencrabs/
├── config.toml          # default profile config
├── memory/              # default profile memory
├── sessions.db          # default profile sessions
└── profiles/
    ├── hermes/
    │   ├── config.toml
    │   ├── memory/
    │   ├── sessions.db
    │   ├── logs/
    │   └── layout/
    └── assistant/
        ├── config.toml
        └── ...

Profile Migration

Copy config and brain files from one profile to another:

# Copy from default to hermes
opencrabs profile migrate --from default --to hermes

# Overwrite existing files in target
opencrabs profile migrate --from default --to hermes --force

Migration copies all .md and .toml files plus the memory/ directory. It excludes the database, sessions, logs, and layout state — so the target profile starts fresh with the source’s personality and configuration, not its history.

Export and Import

Share profiles as portable archives:

# Export a profile as .tar.gz
opencrabs profile export hermes
# → creates hermes.tar.gz in current directory

# Import on another machine
opencrabs profile import ./hermes.tar.gz

Token-Lock Isolation

Two profiles cannot bind the same bot token simultaneously. Before connecting a Telegram, Discord, Slack, or Trello channel, OpenCrabs checks for existing token locks using PID-based lock files:

~/.opencrabs/locks/telegram_<token_hash>.lock

If another profile (still running) holds the lock, startup fails with a clear message. Stale locks (process dead) are automatically cleaned up.

This prevents split-brain scenarios where two agents fight over the same bot.

Profile-Aware Daemons

Install a separate OS service per profile:

# Install daemon for the hermes profile
opencrabs -p hermes service install

# Start it
opencrabs -p hermes service start

# macOS: creates com.opencrabs.daemon.hermes.plist
# Linux: creates opencrabs-hermes.service

Multiple profile daemons can run simultaneously as separate OS services, each with its own ports, bot connections, and config.

Per-Session Provider Isolation

Changing the provider in one session does not affect other sessions or profiles. Each session remembers its own provider independently. See Sessions for the full isolation story.

Voice (TTS & STT)

OpenCrabs supports text-to-speech and speech-to-text with five provider tiers: Off, Groq (API), OpenAI-compatible (any /v1/audio endpoint), Voicebox (self-hosted), or Local (on-device, zero cost).

Quick Setup

Run /onboard:voice in the TUI to configure everything interactively. The voice screen has radio selectors for both STT and TTS, with fields shown/hidden based on the selected provider. API keys are wired to keys.toml automatically.

Speech-to-Text (STT)

Providers

ProviderEngineCostLatencySetup
GroqWhisper (whisper-large-v3-turbo)Per-minute pricing~1sAPI key in keys.toml
OpenAI-compatibleAny Whisper-compatible endpointVaries~1-3sstt_base_url + stt_model + API key
VoiceboxSelf-hosted open-sourceFree~2-5svoicebox_stt_enabled=true + voicebox_stt_base_url
Localwhisper.cpp (on-device)Free~2-5sAuto-downloads model

Local STT Models

ModelSizeQualitySpeed
local-tiny~75 MBGood for short messagesFastest
local-base~142 MBBetter accuracyFast
local-small~466 MBHigh accuracyModerate
local-medium~1.5 GBBest accuracySlower

Models auto-download from HuggingFace to ~/.local/share/opencrabs/models/whisper/ on first use.

Configuration

# config.toml
[voice]
stt_enabled = true
stt_mode = "local"              # "api" or "local"
local_stt_model = "local-tiny"  # local-tiny, local-base, local-small, local-medium

For API mode:

# keys.toml
[providers.stt.groq]
api_key = "your-groq-key"       # From console.groq.com

Text-to-Speech (TTS)

Providers

ProviderEngineCostVoicesSetup
OpenAIgpt-4o-mini-ttsPer-character pricingalloy, echo, fable, onyx, nova, shimmerAPI key in keys.toml
OpenAI-compatibleAny /v1/audio/speech endpointVariesVaries by servertts_base_url + tts_model + tts_voice + API key
VoiceboxSelf-hosted async POST /generateFreeConfigurable profilesvoicebox_tts_enabled=true + voicebox_tts_base_url + voicebox_tts_profile_id
LocalPiper (on-device)Free6 voicesAuto-downloads model

Local TTS Voices (Piper)

VoiceDescription
ryanUS Male (default)
amyUS Female
lessacUS Female
kristinUS Female
joeUS Male
coriUK Female

Models auto-download from HuggingFace to ~/.local/share/opencrabs/models/piper/. A Python venv is created automatically for the Piper runtime.

Configuration

# config.toml
[voice]
tts_enabled = true
tts_mode = "local"              # "api" or "local"
local_tts_voice = "ryan"        # ryan, amy, lessac, kristin, joe, cori

For API mode:

# config.toml
[voice]
tts_mode = "api"
tts_voice = "echo"              # OpenAI voice name
tts_model = "gpt-4o-mini-tts"   # OpenAI model
# keys.toml
[providers.tts.openai]
api_key = "your-openai-key"

Full Configuration Reference

# config.toml
[voice]
# Speech-to-Text
stt_enabled = true
stt_mode = "groq"                 # "groq", "openai_compatible", "voicebox", "local"
local_stt_model = "local-tiny"    # local-tiny, local-base, local-small, local-medium
stt_base_url = "https://..."      # OpenAI-compatible STT endpoint
stt_model = "whisper-1"           # OpenAI-compatible STT model
voicebox_stt_enabled = false
voicebox_stt_base_url = "https://..."

# Text-to-Speech
tts_enabled = true
tts_mode = "openai"               # "openai", "openai_compatible", "voicebox", "local"
tts_voice = "echo"                # OpenAI TTS voice name
tts_model = "gpt-4o-mini-tts"     # OpenAI TTS model
local_tts_voice = "ryan"          # Local mode: Piper voice
tts_base_url = "https://..."      # OpenAI-compatible TTS endpoint
tts_model = "tts-1"               # OpenAI-compatible TTS model
voicebox_tts_enabled = false
voicebox_tts_base_url = "https://..."
voicebox_tts_profile_id = "profile-id"
# keys.toml
[providers.stt.groq]
api_key = "your-groq-key"

[providers.stt.openai_compatible]
api_key = "your-api-key"

[providers.tts.openai]
api_key = "your-openai-key"

[providers.tts.openai_compatible]
api_key = "your-api-key"

How Voice Messages Work

When a voice message arrives on Telegram, WhatsApp, Discord, or Slack:

  1. Audio is decoded (OGG/Opus or WAV)
  2. Transcribed via STT (local whisper.cpp or Groq API)
  3. Agent processes the text and generates a response
  4. Response is converted to speech via TTS (local Piper or OpenAI API)
  5. Audio is encoded as OGG/Opus and sent back as a voice message

Local mode handles everything on-device — no API calls, no cost, no data leaves your machine.

Hardware Requirements

FeatureCPU RequirementNotes
Local STT (rwhisper)AVX2 (Haswell 2013+)Metal GPU on macOS Apple Silicon
Local TTS (Piper)No restrictionsTested on 2007 iMac — works on any x86/ARM
Local embeddingsAVX (Sandy Bridge 2011+)Falls back to FTS-only search

OpenCrabs detects CPU capabilities at runtime and hides unavailable options in the onboarding wizard. Local TTS (Piper) has no CPU limitations and should work on virtually any machine.

Building Without Voice

Voice features are enabled by default. To build without them (smaller binary):

cargo build --release --no-default-features --features telegram,whatsapp,discord,slack,trello

Feature flags: local-stt (whisper.cpp), local-tts (Piper).

Skills System

Skills are reusable workflow templates that extend OpenCrabs with specialized capabilities. They work across Claude Code, Anthropic managed agents, and OpenClaw using a shared SKILL.md format.

How Skills Work

Each skill lives in its own directory under ~/.opencrabs/skills/:

~/.opencrabs/skills/
├── security-audit/
│   └── SKILL.md
├── cost-estimate/
│   └── SKILL.md
└── my-custom-skill/
    └── SKILL.md

Skill Format

Every skill is a markdown file with YAML frontmatter:

---
name: security-audit
description: Language-agnostic security & CVE audit for any codebase
---

# Security Audit

You are a senior security engineer performing a comprehensive
security audit of the codebase in the current working directory...

## Stage 1 — Project detection
...

The name and description fields in the frontmatter are required. The markdown body becomes the prompt that gets injected when the skill runs.

Built-in vs User Skills

Skills come from two sources:

  • Built-in (orange badge) — ship with the OpenCrabs binary via include_str!. Always available.
  • User (teal badge) — created by you in ~/.opencrabs/skills/<name>/SKILL.md. Override built-ins by file presence.

Built-in Skills

SkillDescription
opencliReference for all 25+ opencli-rs dynamic tools (news, social, search, web). Use when user asks about trending topics, news, social media, jobs, or web search.
browser-cdpNative CDP browser automation reference. Headless/headed Chrome control, screenshots, JS evaluation.
a2a-gatewayAgent-to-Agent (A2A) protocol gateway reference. JSON-RPC 2.0 peer-to-peer agent communication.
dynamic-toolsRuntime tool management with tool_manage and tools.toml format. Create, enable, disable, reload tools without restart.
security-auditLanguage-agnostic security & CVE audit. Detects project type from manifests, runs the appropriate scanner, reviews the diff for injection / auth / crypto / deserialization / path-traversal patterns, and scores 0-100.
cost-estimateCodebase cost-to-build estimate, AI-assisted ROI breakdown, and fair-market valuation.
repo-auditLanguage-agnostic repository health checks. 5-phase pipeline: language detection, native tool execution, git metrics, AST analysis, scoring + recommendations. Covers Rust, JS/TS, Python, Go. (v0.3.18)

Running Skills

Skills Picker (/skills)

Type /skills to open the full-screen filterable picker. The top shows a filter bar with the total skill count. The main area lists all skills, each showing:

  • Skill name as a slash command (e.g. /security-audit)
  • Type badge — orange built-in or teal user
  • Description of what the skill does
  • Keywords for search matching in parentheses
KeyAction
Tab / ↑↓Navigate the skill list
EnterRun the selected skill
EscClose the picker
TypeFilter skills by name and description (case-insensitive)

When the filter narrows to a single match, Enter fires it immediately.

Slash Commands

Type any skill name directly as a slash command:

/security-audit

Channels

Skills auto-register as slash commands across all connected channels (Telegram, Discord, Slack, WhatsApp). No commands.toml entry needed. Just type /<skill-name> in any channel to run it.

Creating Custom Skills

  1. Create a directory under ~/.opencrabs/skills/:
mkdir -p ~/.opencrabs/skills/my-skill
  1. Create SKILL.md with frontmatter and prompt:
---
name: my-skill
description: What this skill does
keywords: [my-skill, custom, example]
---

# My Skill

Instructions for the agent when this skill runs...
  1. The skill immediately appears in /skills (with a user badge) and as /my-skill in TUI and all channels.

Cross-Harness Compatibility

The SKILL.md format works identically on:

  • OpenCrabs — native support via /skills picker and slash commands
  • Claude Code — drop the same SKILL.md file into Claude Code’s skills directory
  • Anthropic managed agents — compatible with managed agent skill loading
  • OpenClaw — works with OpenClaw’s skill system

Write a skill once, use it everywhere.

RSI-Proposed Skills

The RSI engine can propose new skills based on usage patterns it observes in the feedback ledger. For example, if the agent repeatedly performs a multi-step workflow that isn’t covered by an existing skill, RSI will draft a skill and file it in the Mission Control inbox for your review.

This is part of the RSI Proposals system — RSI identifies gaps in the agent’s capabilities and drafts solutions, but installation always requires your approval.

Usage Dashboard

The Usage Dashboard shows your token usage, costs, models, tools, and project breakdown. Open it with /usage.

Overview

The header shows your totals:

MetricDescription
TokensTotal tokens consumed (in millions)
CostTotal spend in USD
SessionsNumber of sessions
CallsTotal API calls made

The Four Panels

The dashboard is a 2x2 grid of panels:

Daily Activity (top-left)

A horizontal bar chart showing token usage per day. Peak days stand out clearly. Useful for spotting burst activity or debugging unexpected spikes.

By Project (top-right)

A ranked table of projects by cost:

ColumnDescription
ProjectWorking directory name
$Total cost
MTokens in millions
sTotal session time

By Model (bottom-left)

A ranked table of every model used:

ColumnDescription
ModelProvider + model name
$Total cost
MTokens in millions
CNumber of API calls

The selected row is highlighted in orange. Use this to spot expensive models or optimize your provider mix.

By-Model Quantization Tree View (v0.3.20)

Model variants grouped under parent rows with tree connectors:

ColumnDescription
ModelProvider + model name (parent row, bold)
├─ / └─Variant rows (e.g. qwen3.6-35b-a3b-gguf-oq2, qwen3.6-35b-a3b-gguf-oq4)

Parent rows show aggregated stats (total tokens, cost, calls) across all quant variants. This eliminates the noisy duplication where qwen3.6-35b-a3b-gguf, -oq2, -oq4, -iq4_xs each appeared as separate rows.

Before: 6 separate rows for one model family After: 1 parent row + 3 variant rows with aggregated parent stats

Core Tools (bottom-right)

A horizontal bar chart ranking your most-used tools. bash and read_file typically dominate. Useful for understanding your agent’s workflow patterns.

A summary table showing cost and turns by activity category (Development, CI/Deploy, Features), plus the 1-shot success rate for each.

Time Filters

KeyFilter
TToday
WThis week
MThis month
AAll time
EscClose dashboard
KeyAction
TabCycle focus between panels
EnterOpen details for selected item
EscClose dashboard

Building from Source

Prerequisites

  • Rust 1.94+ (stable, nightly not required)
  • SQLite3 development headers
  • OpenSSL development headers (vendored by default)
  • pkg-config (Linux/macOS)

macOS

brew install sqlite3 pkg-config

Ubuntu / Debian

sudo apt install build-essential pkg-config libsqlite3-dev libssl-dev

Arch Linux

sudo pacman -S base-devel sqlite openssl pkg-config

Clone and Build

git clone https://github.com/adolfousier/opencrabs.git
cd opencrabs
cargo build --release

The binary is at target/release/opencrabs.

Feature Flags

OpenCrabs uses Cargo features to toggle channel support:

FeatureDefaultDescription
telegramYesTelegram bot via teloxide
discordYesDiscord bot via serenity
slackYesSlack bot via slack-morphism
whatsappYesWhatsApp via whatsapp-rust
trelloYesTrello integration
browserYesHeadless Chrome automation via CDP
profilingNopprof flamegraphs (Unix only)

Build with specific features:

# Minimal — TUI only, no channels
cargo build --release --no-default-features

# Only Telegram
cargo build --release --no-default-features --features telegram

Release Profile

The release profile is optimized for size and speed:

[profile.release]
opt-level = 3
lto = "fat"
codegen-units = 1
strip = true
panic = "abort"

There’s also a release-small profile for minimal binary size:

cargo build --profile release-small

Running Tests

cargo test --all-features

Linting

Always use clippy with all features:

cargo clippy --all-features

Self-Update

If you build from source, use git pull && cargo build --release instead of /evolve. The /evolve command downloads pre-built binaries from GitHub Releases.

Architecture

High-Level Overview

┌─────────────────────────────────────────────────┐
│          TUI (ratatui) + Split Panes             │
├────────┬────────┬──────────┬────────────────────┤
│Telegram│Discord │  Slack   │     WhatsApp       │
├────────┴────────┴──────────┴────────────────────┤
│                 Brain (Agent Core)               │
│  ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│  │ Providers│ │  Tools   │ │  Memory (3-tier) │ │
│  │ Registry │ │ +Dynamic │ │                  │ │
│  └──────────┘ └──────────┘ └──────────────────┘ │
├─────────────────────────────────────────────────┤
│   Services / DB (SQLite) │ Browser (CDP)         │
├─────────────────────────────────────────────────┤
│   A2A Gateway │ Cron Scheduler │ Sub-Agents      │
├─────────────────────────────────────────────────┤
│   Shared Channel Commands (commands.rs — 847 lines) │
├─────────────────────────────────────────────────┤
│   Self-Healing (config recovery, provider health, │
│   ARG_MAX compaction, error surfacing)             │
├─────────────────────────────────────────────────┤
│   Daemon Mode (health endpoint, auto-reconnect)  │
└─────────────────────────────────────────────────┘

Source Layout

src/
├── main.rs              # Entry point, CLI parsing
├── lib.rs               # Library root
├── cli/                 # CLI argument parsing (clap)
├── config/              # Configuration types, loading, health tracking
│   └── health.rs        # Provider health persistence (120 lines)
├── db/                  # SQLite database layer
│   ├── models.rs        # Data models (Session, Message, etc.)
│   └── repository/      # Query functions per entity
├── migrations/          # SQL migration files
├── services/            # Business logic layer
│   └── session.rs       # Session management service
├── brain/               # Agent core
│   ├── agent/           # Agent service, context, tool loop
│   │   └── service/     # Builder, context, helpers, tool_loop
│   ├── provider/        # LLM provider implementations
│   ├── tools/           # 50+ tool implementations
│   └── memory/          # 3-tier memory system
├── tui/                 # Terminal UI (ratatui + crossterm)
│   ├── app/             # App state, input, messaging
│   └── render/          # UI rendering modules
├── channels/            # Messaging platform integrations
│   ├── commands.rs      # Shared text command handler (847 lines)
│   ├── telegram/        # Teloxide-based bot
│   ├── discord/         # Serenity-based bot
│   ├── slack/           # Slack Socket Mode
│   └── whatsapp/        # WhatsApp Web pairing
├── a2a/                 # Agent-to-Agent gateway (axum)
├── cron/                # Cron job scheduler
├── memory/              # Vector search + FTS5
├── docs/                # Embedded doc templates
├── tests/               # Integration tests
└── benches/             # Criterion benchmarks

Key Crates

CratePurpose
ratatui + crosstermTerminal UI rendering and input
rusqlite + deadpool-sqliteSQLite database with connection pooling
reqwestHTTP client for LLM APIs
axum + tower-httpA2A HTTP gateway
crabraceProvider registry and routing
teloxideTelegram Bot API
serenityDiscord gateway
slack-morphismSlack API
qmd + llama-cpp-2Memory search (FTS5 + embeddings)
rwhisper (candle)Local STT — pure Rust, Metal GPU on macOS
piper (Python venv)Local TTS with OGG/Opus encoding
syntectSyntax highlighting in TUI
tiktoken-rsToken counting

Data Flow

  1. Input arrives from TUI, channel, A2A, or cron trigger
  2. Channel commands (/doctor, /help, /usage, /evolve) execute directly via the shared handler without LLM routing
  3. Brain builds context (system prompt + brain files + memory + conversation)
  4. Provider streams the LLM response via the selected provider; health is tracked per-provider
  5. Tool Loop executes any tool calls, feeds results back to the LLM. CLI provider segments (text/tool interleaving) are tracked for correct ordering
  6. Response is delivered back to the originating channel
  7. DB persists messages, token usage, session state, and CLI tool segments
  8. Self-healing monitors for config corruption, context budget overflow (65% threshold), ARG_MAX limits, stuck streams (2048-byte repeat detection), idle timeouts (60s), provider failures (per-provider health tracking with auto-failover), and DB integrity. Crash recovery replays pending requests on restart. All errors surfaced – nothing swallowed silently

Database

SQLite with WAL mode. Tables:

  • sessions — Session metadata, provider, model, working directory
  • messages — Conversation history per session
  • usage_ledger — Permanent token/cost tracking
  • memory_* — FTS5 and vector tables for semantic memory

Migrations run automatically on startup from src/migrations/.

Concurrency Model

  • Tokio async runtime with multi-threaded scheduler
  • Each channel runs as an independent tokio task
  • Sessions are isolated — each has its own conversation state
  • Tool execution uses tokio::task::block_in_place for sync operations
  • A2A gateway runs as a separate axum server task

Testing Guide

Comprehensive test coverage for OpenCrabs. All tests run with:

cargo test --all-features

Quick Reference

CategoryTestsLocation
Tests — CLI Parsing28src/tests/cli_test.rs
Tests — Cron Jobs & Scheduling49src/tests/cron_test.rs
Tests — Channel Search24src/tests/channel_search_test.rs
Tests — Voice STT Dispatch11src/tests/voice_stt_dispatch_test.rs
Tests — Voice Onboarding62src/tests/voice_onboarding_test.rs
Tests — Candle Whisper6src/tests/candle_whisper_test.rs
Tests — Evolve (Self-Update)23src/tests/evolve_test.rs
Tests — Session & Working Dir15src/tests/session_working_dir_test.rs
Tests — Message Compaction24src/tests/compaction_test.rs
Tests — Fallback Vision35src/tests/fallback_vision_test.rs
Tests — GitHub Copilot Provider38src/tests/github_provider_test.rs
Tests — File Extract36src/tests/file_extract_test.rs
Tests — Image Utils9src/tests/image_util_test.rs
Tests — Onboarding Brain21src/tests/onboarding_brain_test.rs
Tests — Onboarding Navigation26src/tests/onboarding_navigation_test.rs
Tests — Onboarding Types16src/tests/onboarding_types_test.rs
Tests — Onboarding Keys4src/tests/onboarding_keys_test.rs
Tests — OpenAI Provider16src/tests/openai_provider_test.rs
Tests — Plan Document15src/tests/plan_document_test.rs
Tests — TUI Error16src/tests/tui_error_test.rs
Tests — Queued Messages15src/tests/queued_message_test.rs
Tests — Custom Provider27src/tests/custom_provider_test.rs
Tests — Context Window14src/tests/context_window_test.rs
Tests — Onboarding Field Nav46src/tests/onboarding_field_nav_test.rs
Tests — Provider Sync10src/tests/provider_sync_test.rs
Tests — Brain Templates8src/tests/brain_templates_test.rs
Tests — Collapse Build Output9src/tests/collapse_build_output_test.rs
Tests — Reasoning Lines6src/tests/reasoning_lines_test.rs
Tests — AltGr Input8src/tests/altgr_input_test.rs
Tests — System Continuation6src/tests/system_continuation_test.rs
Tests — QR Render8src/tests/qr_render_test.rs
Tests — WhatsApp State7src/tests/whatsapp_state_test.rs
Tests — Post-Evolve5src/tests/post_evolve_test.rs
Tests — Stream Loop Detection15src/tests/stream_loop_test.rs
Tests — XML Tool Fallback10src/tests/xml_tool_fallback_test.rs
Tests — TUI Render Clear4src/tests/tui_render_clear_test.rs
Tests — Split Panes21src/tests/split_pane_test.rs
Tests — Slack Formatting21src/tests/slack_formatting_test.rs
Tests — Daemon Health10src/tests/daemon_health_test.rs
Tests — Claude CLI Cache5src/tests/claude_cli_cache_test.rs
Tests — Browser Headless4src/tests/browser_headless_test.rs
Tests — Provider Registry8src/tests/provider_registry_test.rs
Tests — Self-Healing System27src/tests/self_healing_test.rs
Tests — Emergency Compaction2src/tests/compaction_test.rs
Tests — Cross-Channel Crash Recovery12src/tests/pending_request_test.rs
Tests — Profile System57src/tests/profile_test.rs
Tests — Token Tracking29src/tests/token_tracking_test.rs
Tests — Cron Execution Storage6src/tests/cron_results_test.rs
Tests — LLM Artifact Stripping8src/tests/artifact_strip_test.rs
Tests — Subagent & Team Orchestration84src/tests/subagent_test.rs
Tests — Telegram Resume Pipeline55src/tests/telegram_resume_test.rs
Brain (all modules)365src/brain/
TUI (all modules)141src/tui/
Channels (all modules)105src/channels/
Utils (all modules)56src/utils/
Services (all modules)44src/services/
DB (all modules)40src/db/
Config (all modules)32src/config/
A2A (all modules)21src/a2a/
Usage17src/usage.rs
Pricing17src/pricing.rs
Memory9src/memory/
Logging4src/logging/
Total3,616

Feature-Gated Tests

Some tests only compile/run with specific feature flags:

FeatureTests
local-sttLocal whisper inline tests, candle whisper tests, STT dispatch local-mode tests, codec tests, availability cycling tests
local-ttsTTS voice cycling, Piper voice Up/Down

All feature-gated tests use #[cfg(feature = "...")] and are automatically included when running with --all-features.


Running Tests

# Run all tests (recommended)
cargo test --all-features

# Run a specific test module
cargo test --all-features -- voice_onboarding_test

# Run a single test
cargo test --all-features -- is_newer_major_bump

# Run with output (for debugging)
cargo test --all-features -- --nocapture

# Run only local-stt tests
cargo test --features local-stt -- local_whisper

Disabled Test Modules

These modules exist but are commented out in src/tests/mod.rs (require network or external services):

ModuleReason
error_scenarios_testRequires mock API server
integration_testEnd-to-end with LLM provider
plan_mode_integration_testEnd-to-end plan workflow
streaming_testRequires streaming API endpoint

Contributing

Getting Started

  1. Fork the repository on GitHub
  2. Clone your fork and create a branch:
    git clone https://github.com/YOUR_USERNAME/opencrabs.git
    cd opencrabs
    git checkout -b my-feature
    
  3. Build and test:
    cargo clippy --all-features
    cargo test --all-features
    

Code Style

  • Run cargo clippy --all-features before committing — never cargo check
  • Follow existing patterns in the codebase
  • Keep changes focused — one feature or fix per PR
  • Add tests for new functionality in src/tests/

Pull Requests

  • Write a clear title and description
  • Reference any related issues
  • Ensure all tests pass
  • Keep PRs small and reviewable

Adding a New Tool

  1. Create a new file in src/brain/tools/
  2. Implement the tool handler function
  3. Register it in the tool registry
  4. Add the tool description to src/docs/reference/templates/TOOLS.md
  5. Add tests in src/tests/

Adding a New Provider

  1. Implement the provider in src/brain/provider/
  2. Register it in the provider registry via crabrace
  3. Add configuration docs to src/docs/reference/templates/
  4. Document setup in docs/src/brain/providers.md

Reporting Issues

Open an issue at github.com/adolfousier/opencrabs/issues with:

  • OpenCrabs version (opencrabs --version)
  • OS and architecture
  • Steps to reproduce
  • Expected vs actual behavior
  • Relevant log output (from ~/.opencrabs/logs/)

License

OpenCrabs is MIT licensed. By contributing, you agree that your contributions will be licensed under the same terms.

Security

Threat Model

OpenCrabs runs locally on your machine with access to your filesystem and shell. Security focuses on:

  1. API key protection — Keys never leave your machine except to their respective providers
  2. Network exposure — Minimal attack surface by default
  3. Tool execution — Sandboxed with user approval

API Key Storage

Keys are stored in ~/.opencrabs/keys.toml:

  • File permissions: 600 (owner read/write only)
  • Keys are loaded into memory with zeroize — zeroed on drop
  • Keys are never logged or included in conversation history
  • Keys are never sent to any provider other than their own

Network Security

A2A Gateway

  • Binds to 127.0.0.1 (loopback) by default
  • CORS disabled unless explicitly configured
  • No authentication built-in — use a reverse proxy for public exposure

Channel Connections

  • All channel APIs use TLS (HTTPS/WSS)
  • Telegram: long polling over HTTPS
  • Discord: WebSocket with TLS
  • Slack: Socket Mode (WebSocket)
  • WhatsApp: Noise protocol encryption

Tool Approval

Tools that modify your system require approval:

  • File writes — Shows the file path and diff
  • Shell commands — Shows the exact command before execution
  • Git operations — Push, commit, branch operations

Auto-approve mode (--auto-approve) bypasses this for automation use cases like cron jobs.

Data Storage

  • All data stored locally in ~/.opencrabs/opencrabs.db (SQLite)
  • No telemetry or analytics
  • No data sent to OpenCrabs servers (there are none)
  • Conversation history stays on your machine

Reporting Vulnerabilities

If you discover a security vulnerability, please report it responsibly:

  • Email: adolfo@meetneura.ai
  • Do not open a public issue for security vulnerabilities
  • We will acknowledge receipt within 48 hours

Changelog

Loading changelog from GitHub...