Voice API
    • Welcome to Async Voice API
    • Getting Started with the Async Voice API
    • Models
    • API Reference
      • API Status
        • API Status Check
      • Text-to-Speech
        • Text to Speech (WebSocket)
        • Text to Speech
        • Text to Speech with Word Timestamps
        • Text to Speech (Stream)
      • Voice Management
        • Clone Voice
        • List Voices
        • Get Voice
        • Get Voices (Batch)
        • Update Voice
        • Delete Voice
        • Get Voice Preview
    • Advanced Guides
      • Embed Player
      • Custom Pronunciations
        • Embedding Custom Phonemes in Async Voice API
        • Pronouncing digits one‑by‑one
        • Insert Silent Pauses with <break>
    • Integrations
      • Integrate with Twilio
      • Pipecat Integration
      • Livekit Integration

    Models

    Async offers three text-to-speech models, each tuned for different use cases. All models are available via the Streaming HTTP and WebSocket endpoints.

    Model Overview#

    Async Pro v1.0Async Flash v1.5Async Flash v1.0
    Model IDasync_pro_v1.0async_flash_v1.5async_flash_v1.0
    Best forHighest-quality, low-latency English speechLow-latency streaming with built-in text normalizationReal-time multilingual applications
    LanguagesEnglishEnglish, Spanish, French, German, Italian, PortugueseEnglish, French, Spanish, German, Italian, Portuguese, Arabic, Russian, Romanian, Japanese, Hebrew, Armenian, Turkish, Hindi, Chinese
    Supported endpointsStreaming, WebSocketStreaming, WebSocketAll (Streaming, WebSocket, HTTP sync, Timestamps)
    Text normalizationBuilt-inBuilt-inNot available
    speed_controlNot availableNot available0.7 – 2.0
    stabilityNot availableNot available0 – 100

    Async Pro v1.0#

    Our highest-quality English TTS model, designed for natural speech generation with strong non-standard text handling and fast inference speed. Optimized for low-latency streaming, making it equally suitable for real-time conversational applications.
    Model ID: async_pro_v1.0
    Languages: English
    Endpoints: Streaming (POST /text_to_speech/streaming), WebSocket (WSS /text_to_speech/websocket/ws)
    Text normalization: Automatically handles dates, currencies, numbers, and abbreviations — no preprocessing needed
    Best for: Voice assistants, conversational AI, audiobooks, narration, podcasts, and any use case where English voice quality is the top priority

    Async Flash v1.5#

    A latency-optimized streaming TTS model supporting 6 languages, with strong built-in handling of non-standard text such as dates, currencies, numbers, and abbreviations.
    Model ID: async_flash_v1.5
    Languages: English, Spanish, French, German, Italian, Portuguese
    Endpoints: Streaming (POST /text_to_speech/streaming), WebSocket (WSS /text_to_speech/websocket/ws)
    Text normalization: Automatically handles dates, currencies, numbers, and abbreviations — no preprocessing needed
    Best for: Real-time voice assistants, chatbots, and conversational AI across European languages

    Async Flash v1.0#

    Our legacy low-latency multilingual TTS model, supporting 15 languages and optimized for fast real-time speech generation.
    Model ID: async_flash_v1.0
    Languages: English, French, Spanish, German, Italian, Portuguese, Arabic, Russian, Romanian, Japanese, Hebrew, Armenian, Turkish, Hindi, Chinese
    Endpoints: All — Streaming, WebSocket, HTTP sync (POST /text_to_speech), Timestamps (POST /text_to_speech/with_timestamps)
    Unique parameters:
    speed_control (0.7 – 2.0) — Adjusts the speaking speed of the synthesized voice
    stability (0 – 100) — Adjusts how stable or expressive the synthesized voice sounds
    Best for: Applications requiring broad language coverage or synchronous audio generation

    Choosing the Right Model#

    Need the best English voice quality?
    Use async_pro_v1.0. It produces the most natural-sounding speech, handles non-standard text out of the box, and is fast enough for real-time streaming use cases.
    Need low latency across European languages?
    Use async_flash_v1.5. It combines fast streaming with built-in text normalization for 6 languages.
    Need broad multilingual support?
    Use async_flash_v1.0. It covers 15 languages and is the only model available on the synchronous HTTP and timestamps endpoints.

    Important Notes#

    async_pro_v1.0 and async_flash_v1.5 are streaming and WebSocket only. They are not available on the synchronous POST /text_to_speech or POST /text_to_speech/with_timestamps endpoints.
    The speed_control and stability parameters are only supported by async_flash_v1.0. They have no effect on the other models.
    All models support the same voice library — you can use any voice with any model.
    All models support custom phonemes, digit pronunciation, and silent pauses.
    Modified at 2026-05-18 13:01:10
    Previous
    Getting Started with the Async Voice API
    Next
    API Status Check
    Built with