Voice API
  1. Custom Pronunciations
Voice API
  • Welcome to Async Voice API
  • Getting Started with the Async Voice API
  • API Reference
    • API Status
      • API Status Check
    • Text-to-Speech
      • Text to Speech (WebSocket)
      • Text to Speech
      • Text to Speech with Word Timestamps
      • Text to Speech (Stream)
    • Voice Management
      • Clone Voice
      • List Voices
      • Get Voice
      • Update Voice
      • Delete Voice
      • Get Voice Preview
  • Advanced Guides
    • Embed Player
    • Custom Pronunciations
      • Embedding Custom Phonemes in Async Voice API
      • Pronouncing digits one‑by‑one
      • Insert Silent Pauses with <break>
  • Integrations
    • Integrate with Twilio
    • Pipecat Integration
  1. Custom Pronunciations

Pronouncing digits one‑by‑one

Need your TTS to spell out numbers digit‑by‑digit instead of reading them as whole numbers? Wrap the sequence in a <digits> block.
ElementRequiredDescription
<digits>✓Opening tag that starts the digit‑override region.
Digits & separators✓Any digits (0‑9) and optional separators: spaces, dashes, parentheses, periods. Separators are skipped (or introduce brief pauses) during pronunciation.
</digits>✓Closing tag.
The tag works across all languages supported by the model. Digits are pronounced in the language currently active:
Auto-detected language, or
the language explicitly forced with language flag.
Example
My phone number is <digits>(123) 456‑7890</digits> and my credit card is <digits>1234‑5678‑9012‑3456</digits>.
The engine will voice:
"My phone number is one two three four five six seven eight nine zero and my credit card is one two three four five six seven eight nine zero one two three four five six."

Inline digits‑pronunciation example#

{
  "model_id": "asyncflow_v2.0",
  "transcript": "My phone number is <digits>(123) 456-7890</digits> and my credit card is <digits>1234-5678-9012-3456</digits>.",
  "voice": { "mode": "id", "id": "e0f39dc4-f691-4e78-bba5-5c636692cc04" },
  "output_format": {
    "container": "raw",
    "encoding": "pcm_s16le",
    "sample_rate": 44100
  }
}
The engine will render:
"My phone number is one two three four five six seven eight nine zero and my credit card is one two three four five six seven eight nine zero one two three four five six."

Modified at 2025-12-04 10:00:17
Previous
Embedding Custom Phonemes in Async Voice API
Next
Insert Silent Pauses with <break>
Built with