# Integrate with Twilio


Bring low‑latency, high‑quality speech from **Async** into any Twilio voice call. This guide shows how to:

1. Connect to Async Text‑to‑Speech (WebSocket)
2. Spin up a local WebSocket server for Twilio `<Stream/>` media
3. Expose that server through **ngrok**
4. Dial a phone call and pipe the generated audio into it



## 1  Prerequisites

| Tool               | Notes                                                        |
| ------------------ | ------------------------------------------------------------ |
| **Node.js 18+**    | ES module syntax and modern WS APIs                          |
| **Twilio account** | Copy your *Account SID* and *Auth Token*; buy/verify numbers |
| **Async voice API account**  | Copy your *API key* and pick a *Voice ID*                    |
| **ngrok (free)**   | Exposes your local WS server to Twilio’s cloud               |

```bash
npm i twilio ws ngrok dotenv
```

---

## 2  Environment variables

Create a `.env` file next to the script:

```bash
TWILIO_ACCOUNT_SID=ACXXXXXXXXXXXXXXXXXXXXXXXXXXXX
TWILIO_AUTH_TOKEN=your_twilio_token
ASYNC_API_KEY=your_async_key
OUTBOUND_NUMBER=+15551230001   # number to call
INBOUND_NUMBER=+15557654321    # your Twilio number
```

You can also supply any of these via command‑line, e.g. `node async-twilio.js OUTBOUND_NUMBER=+1555…`.

---

## 3  How the bridge works

| Step  | Flow                                                                                         |
| ----- | -------------------------------------------------------------------------------------------- |
| **1** | Script connects to **Async** over WebSocket, sends an *init* frame (model, voice, codec).    |
| **2** | A lightweight HTTP + WS server starts locally (`ws://localhost:<port>`).                     |
| **3** | `ngrok` publishes that port; you get a public **wss\://** URL.                               |
| **4** | Script tells **Twilio** to dial `<OUTBOUND_NUMBER>` and stream call audio to that URL.       |
| **5** | On Twilio `start`, script streams text → Async.                                              |
| **6** | Async replies with μ‑law PCM chunks; script forwards each chunk to Twilio as `media` frames. |
| **7** | After all chunks (or on timeout) script ends the call.                                       |

```mermaid
sequenceDiagram
  participant App as "Script"
  participant Async
  participant Twilio
  participant PSTN as "Phone Call"

  App->>Async: open WS, send init
  App->>Twilio: POST /Calls (<Stream>)
  Twilio->>App: WS connect (start)
  App->>Async: send transcript text
  Async-->>App: audio chunks (JSON + base64)
  App-->>Twilio: {event:"media", ...}
  Twilio-->>PSTN: plays audio
  Async-->>App: {final:true}
  App->>Twilio: update call → completed
```

---

## 4  Sample code


<Steps>
  <Step title="Setup the required variables and helper functions">
```js
const twilio      = require('twilio');
const WebSocket   = require('ws');
const http        = require('http');
const ngrok       = require('ngrok');
const dotenv      = require('dotenv');

dotenv.config();

// ────────────────────────────────────────────
//  Helpers & configuration
// ────────────────────────────────────────────
function get(key, fallback = undefined) {
  return (
    process.env[key] ??
    process.argv.find(a => a.startsWith(`${key}=`))?.split('=')[1] ??
    fallback
  );
}

const CFG = {
  TWILIO_ACCOUNT_SID : get('TWILIO_ACCOUNT_SID'),
  TWILIO_AUTH_TOKEN  : get('TWILIO_AUTH_TOKEN'),
  ASYNC_API_KEY      : get('ASYNC_API_KEY'),
  OUTBOUND_NUMBER    : get('OUTBOUND_NUMBER'),
  INBOUND_NUMBER     : get('INBOUND_NUMBER'),
  ASYNC_MODEL_ID     : 'async_flash_v1.0',
  ASYNC_VOICE_ID     : 'e0f39dc4-f691-4e78-bba5-5c636692cc04',
  TEST_SENTENCE      : 'Hi there, welcome to Async. Hope youre having a great day!',
};

const REQUIRED = [
  'TWILIO_ACCOUNT_SID',
  'TWILIO_AUTH_TOKEN',
  'ASYNC_API_KEY',
  'OUTBOUND_NUMBER',
  'INBOUND_NUMBER',
];
for (const k of REQUIRED) {
  if (!CFG[k]) throw new Error(`Missing required config: ${k}`);
}

// ────────────────────────────────────────────
//  Globals
// ────────────────────────────────────────────
let asyncWs;            // → Async TTS
let twilioWs;           // ↔ Twilio media
let callSid;            // Outbound call SID
let streamSid;          // Media stream SID
let chunksSeen = 0;

```
  </Step>
  <Step title="Connect to async TTS">
```js
function connectAsyncTTS() {
  const url =
    `wss://api.async.com/text_to_speech/websocket/ws` +
    `?api_key=${CFG.ASYNC_API_KEY}&version=v1`;

  return new Promise((res, rej) => {
    asyncWs = new WebSocket(url);

    asyncWs.on('open', () => {
      asyncWs.send(JSON.stringify({
        model_id     : CFG.ASYNC_MODEL_ID,
        voice        : { mode: 'id', id: CFG.ASYNC_VOICE_ID },
        output_format: { container: 'raw', encoding: 'pcm_mulaw', sample_rate: 8000 },
      }));
      res();
    });

    asyncWs.on('error', rej);
    asyncWs.on('close', (_, r) => rej(new Error(`Async WS closed: ${r}`)));
  });
}
```
  </Step>
  <Step title="Local ws setup for twilio">
```js
function createTwilioWSS() {
  return new Promise((res, rej) => {
    const server = http.createServer((_, resp) => resp.writeHead(200).end('OK'));
    const wss = new WebSocket.Server({ server });

    wss.on('connection', socket => {
      twilioWs = socket;

      socket.on('message', msg => {
        const data = JSON.parse(msg);

        if (data.event === 'start') {
          streamSid = data.start.streamSid;
          asyncWs.send(JSON.stringify({ transcript: CFG.TEST_SENTENCE + ' ', force: true }));
          asyncWs.send(JSON.stringify({ transcript: '' }));

        } else if (data.event === 'stop') {
          endCall();
        }
      });

      socket.on('close', () => console.log('❌ Twilio WS closed'));
    });

    server.listen(0, () => res({ server, port: server.address().port }));
    server.on('error', rej);
  });
}
```
  </Step>
    <Step title="Pipe async → twilio">
```js
function wireAudioPipe() {
  asyncWs.on('message', buf => {
    if (!streamSid || !twilioWs) return;

    const { audio, final } = JSON.parse(buf);

    // 1) forward the chunk
    twilioWs.send(JSON.stringify({
      event : 'media',
      streamSid,
      media : { payload: audio, track: 'outbound' }
    }));

    // 2) if it was the last chunk, send a mark
    if (final) {
      console.log('📦 Final chunk received, sending async_end mark');
      twilioWs.send(JSON.stringify({
        event    : 'mark',
        streamSid: streamSid,
        mark     : { name: 'async_end' } 
      }));

      // 3) wait for Twilio to echo the mark, then hang up
      const waitForMark = new Promise(resolve => {
        const handler = msg => {
          const data = JSON.parse(msg);
          if (data.event === 'mark' && data.mark?.name === 'async_end') {
            twilioWs.off('message', handler);
            console.log('✅ async_end mark echoed back');
            resolve();
          }
        };
        twilioWs.on('message', handler);
      });

      waitForMark
        .then(() => twilioClient.calls(callSid).update({ status: 'completed' }))
        .catch(console.error);
    }
  });
}
```
  </Step>
    <Step title="Twilio helper functions">
```js
const twilioClient = twilio(CFG.TWILIO_ACCOUNT_SID, CFG.TWILIO_AUTH_TOKEN);

async function startCall(wssUrl) {
  const { sid } = await twilioClient.calls.create({
    to   : CFG.OUTBOUND_NUMBER,
    from : CFG.INBOUND_NUMBER,
    twiml: `<Response><Connect><Stream url="${wssUrl}" /></Connect></Response>`,
  });
  callSid = sid;
  console.log('📞 Call started:', sid);
}

async function endCall() {
  if (callSid) {
    await twilioClient.calls(callSid).update({ status: 'completed' });
    console.log('🏁 Call ended');
    callSid = null;
  }
}
```
  </Step>
    
    <Step title="Running the functionality">
```js
(async () => {
  try {
    console.log('🚀 Connecting to Async TTS…');
    await connectAsyncTTS();

    console.log('🛫 Launching local WS…');
    const { port } = await createTwilioWSS();

    console.log('🌏 Creating ngrok tunnel…');
    const publicUrl = (await ngrok.connect(port)).replace('https://', 'wss://');

    wireAudioPipe();
    console.log('📞 Dialling…');
    await startCall(publicUrl);
  } catch (err) {
    console.error('❌ Fatal:', err.message);
    process.exit(1);
  }
})();
```
  </Step>
</Steps>




---

## 5  Customising the integration

| Goal                       | Where to change                                                       |
| -------------------------- | --------------------------------------------------------------------- |
| **Different voice**        | `CFG.ASYNC_VOICE_ID`                                                  |
| **Different codec / rate** | `output_format` in `connectAsyncTTS()`                                |
| **Stream arbitrary text**  | Replace `CFG.TEST_SENTENCE`, or feed user input into `asyncWs.send()` |
| **Keep the call open**     | Remove the `chunksSeen` guard and `endCall()` timer                   |

---

## 6  Debugging tips

* **No audio?** Make sure Twilio can reach your ngrok URL (port 443, wss).
* **Choppy playback?** Forward Async chunks to Twilio as soon as they arrive—don’t buffer them.
* **Delay before speech starts?** Use `force: true` in the transcript frame to synthesize short text immediately.

---

## 7  Next steps

1. **Bidirectional audio** – Send caller speech to Async STT and build IVR bots.
2. **Fail‑over logic** – Retry with a new ngrok tunnel or switch data centres automatically.
3. **Serverless deployment** – Move the bridge to AWS Lambda or Fly.io and drop ngrok for a fixed WSS endpoint.

---


