Cloud TTS Service

Content on this page is for a product or feature in controlled release (CR). If you are not part of the CR group and would like more information, contact your CXone Account Representative.

Permissions Required: View Scripts, Create/Edit Scripts

CXone Cloud TTS Service converts text into spoken output delivered by synthesized voices. This service, also called text-to-speech (TTS), can be used with CXone IVRClosed Automated phone menu that allows callers to interact through voice commands, key inputs, or both, to obtain information, route an inbound voice call, or both. . For example, you can add multiple language options to your IVR.

Classics, Inc. recently expanded its bookselling operation into new regions. Anne Shirley, the CXone administrator, starts setting up IVRClosed Automated phone menu that allows callers to interact through voice commands, key inputs, or both, to obtain information, route an inbound voice call, or both. menus in scripts for the new regions. She discovers some gaps in the default text-to-speech languages that CXone offers. Anne learns that with Cloud TTS, she can choose a TTS provider that offers the languages she requires. She likes that the TTS providers offer a wide range of voices to choose from.

TTS Providers

CXone Cloud TTS uses third-party TTSClosed Allows users to enter recorded prompts as text and use a computer-generated voice to speak the content. providers. You can choose which of the supported providers you want to use. You can also choose the language and voice that Cloud TTS uses.

Currently, CXone supports Google TTS.

SSML Support

Cloud TTS Service supports the use of Speech Synthesis Markup Language (SSML). SSML is an XML-based markup language that allows you to specify many aspects of how text is synthesized into speech. You can use it to fine-tune pronunciation, rate of speech, voice pitch, volume, and more.

To use SSML, text input must be:

  • Valid XML
  • Valid SSML
  • Contained within a set of <speak> </speak> tags
  • Marked up with tags that each have only one attribute (this includes the <speak> tag)

For example: 

<speak xml:lang="en-US">

Here are <say-as interpret-as="characters">SSML</say-as> samples.

I can pause <break time="3s"/>.

I can say cardinal numbers. This number is <say-as interpret-as="cardinal">1135</say-as>.

Or I can say ordinal numbers. You are <say-as interpret-as="ordinal">1135</say-as> in line.

I can even say numbers as digits. The digits are <say-as interpret-as="characters">1135</say-as>.

I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.

</speak>

Refer to the Google TTS documentation for information about any SSML variations or requirements specific to Google.

Supported Languages and Voices

Each TTSClosed Allows users to enter recorded prompts as text and use a computer-generated voice to speak the content. provider offers a different set of languages. For each language they offer one or more voices that you can choose from. Because the selection of languages and voices can change at any time, to see the most up to date list of supported languages, you can: 

  • Check the documentation for each TTS provider.
  • Look at the Select a Voice page for each TTS provider on the Cloud Text to Speech page.