Skip to main content
HowOpenClawv2026.3.24

Make Voice Feel Natural

Tune SOUL.md and prompt design so your agent sounds good when spoken aloud.

By the end of this page, your agent's voice responses will sound natural — not like it's reading a list of bullet points out loud.

Time: ~15 minutes


The problem with default responses in voice mode

Your agent was probably configured for text. Text responses use markdown — bullet points, headers, code blocks. Read aloud, these sound like:

"Dash dash dash. Number one, install the package. Number two, run the command. Number three..."

You need a voice-specific SOUL.md configuration.


Voice-specific SOUL.md section

Add a section to ~/.openclaw/SOUL.md specifically for voice:

## Voice mode (Talk Mode)
When responding in Talk Mode (voice):
- No markdown. No bullet points, no headers, no asterisks.
- Write in natural spoken sentences. "First you'll want to... then you..." not "1. ... 2. ..."
- Keep responses short. 2-3 sentences for simple questions. 4-5 for complex ones.
- Never read out URLs. Say "I'll send you the link" or "check the link I'm sending" instead.
- End responses naturally, not with "Is there anything else?"
- For lists, use "and" and "also" — not enumeration.

OpenClaw automatically passes a voice: true flag when Talk Mode is active, so your agent knows to apply these rules.


Prompt design for voice

For questions where you'd normally get a structured response, prime the format:

Instead of:

"What are the steps to deploy a Docker container?"

Try:

"Walk me through deploying a Docker container like you're explaining it to me out loud"

The difference in output is significant. The second prompt gets natural speech; the first gets a numbered list.

You can also add to SOUL.md:

When I ask "how do I [X]?" in voice mode, give me a narrative walkthrough, not steps.

Response length tuning

Voice responses that are too long interrupt the flow. Tune for length:

## Voice response length
- Single question → 1-2 sentences
- "Explain [topic]" → 3-4 sentences max
- "Walk me through [process]" → 5-6 sentences, then pause and ask if I want to continue

If you want the agent to pause mid-response for long topics:

For long explanations in voice: give the first part, end with "Want me to continue?" and wait.

When your agent mentions something that has a URL (documentation, article), you can have it send the link as a text message after speaking:

Add to config:

{
  "talk_mode": {
    "fallbackChannel": "telegram"
  }
}

With this set, when your agent says "I'll send you the link", it automatically sends the URL to your Telegram. Voice for conversation, text for links and long content.


A good voice SOUL.md setup

Here's a complete voice section that works well:

## Voice mode
In Talk Mode:
- Respond in natural spoken language. No markdown, no lists, no asterisks.
- Keep it short. 2-3 sentences unless I ask for more.
- Never read URLs aloud — say "sending you the link" and use the fallback channel.
- For step-by-step instructions: narrate them like a person, not like a numbered list.
- Sound like a calm, knowledgeable person, not a text-to-speech demo.
- Don't end with "Is there anything else I can help with?"

Goal complete

Your voice setup:

  • Press ctrl+space (or say your wake word) → speak → hear a natural response
  • Responses are tuned for speech, not reading
  • Links and long content fall back to your text channel

Where to go next