Enable Voice
Set up Talk Mode with speech-to-text and text-to-speech so you can speak to your agent.
By the end of this page, you'll be able to speak to your agent and hear it respond out loud.
Time: ~30 minutes
What Talk Mode is
Talk Mode is OpenClaw's voice interface. It works like this:
- You press a wake key (or say a wake word)
- You speak — OpenClaw transcribes your speech using Whisper (STT)
- Your agent processes the message and generates a response
- The response is spoken aloud via ElevenLabs or system TTS
It runs locally on your machine. No cloud pipeline — audio goes from mic → Whisper → your AI provider → TTS → speakers.
Prerequisites
- A microphone (built-in laptop mic works)
- An ElevenLabs API key (free tier is sufficient) — or use macOS system TTS as a free alternative
- Python 3.10+ (for Whisper — the installer handles this)
Step 1 — Enable Talk Mode
Open ~/.openclaw/openclaw.json:
{
"talk_mode": {
"enabled": true,
"stt": {
"provider": "whisper",
"model": "base"
},
"tts": {
"provider": "elevenlabs",
"apiKey": "YOUR_ELEVENLABS_API_KEY",
"voiceId": "YOUR_VOICE_ID"
},
"wakeKey": "ctrl+space"
}
}STT models: tiny (fastest, less accurate), base (good balance), small (slower, more accurate). Start with base.
TTS provider options:
elevenlabs— high quality, requires API key, free tier = 10,000 chars/monthsystem— uses macOSsaycommand, free but robotic
Step 2 — Get an ElevenLabs voice ID
- Create a free account at elevenlabs.io
- Go to Voices → browse the default voices
- Click a voice → copy the Voice ID from the URL
Or use the ElevenLabs API to list available voices:
curl -H "xi-api-key: YOUR_KEY" https://api.elevenlabs.io/v1/voices | jq '.voices[] | {name: .name, id: .voice_id}'Step 3 — Install Whisper dependencies
openclaw plugins install voiceThis installs the voice plugin, including the Python Whisper package, and downloads the selected model. For base, expect ~150MB download.
Step 4 — Restart and test
openclaw gateway restartThen start the voice plugin:
openclaw plugins run voicePress ctrl+space (or your configured wake key), speak a sentence, release. Wait for the response. You should hear your agent reply out loud.
Use a wake word instead
If you want always-on listening instead of a hotkey:
{
"talk_mode": {
"wakeWord": "hey claw",
"wakeKey": null
}
}Wake word detection uses a lightweight local model — it doesn't send audio to any server until after the wake word is detected.
Troubleshooting
| Problem | Fix |
|---|---|
| No audio transcribed | Check mic permissions in System Preferences → Privacy → Microphone |
| TTS not playing | Check ElevenLabs API key and voice ID are correct |
| High latency | Switch STT to tiny model and TTS to system to reduce processing time |
| Whisper install fails | Run pip3 install openai-whisper manually, then retry openclaw plugins install voice |