Improving the Quality of TTS Expression
Introduction
Speech Synthesis Markup Language (SSML) is a standardized, XML-based markup language used to control and refine aspects of speech synthesis. In a contact center environment, SSML can help create a more natural, engaging, and professional-sounding automated voice experience. With SSML tags, you can control attributes such as pronunciation, pacing, pitch, and volume, ensuring that dynamic information, such as dates, currency, and reference numbers, is read to the customer clearly and accurately.
Prerequisites
To use SSML in a scenario, you must have a Google Cloud or IBM Watson text-to-speech (TTS) integration account configured in your contact center.
Although SSML is a standardized markup language, consider reviewing the SSML documentation from your TTS provider:
Example SSML in a Voice Scenario
Consider a scenario where a customer calls to confirm an upcoming service appointment. You want to ensure the appointment date and callback number are communicated naturally and clearly.
Create the Scenario
To begin, create a new voice scenario or open an existing one. In production, you would typically use a Fetch URL block, DB Execute block, or a CRM integration to retrieve the caller's upcoming appointment details.
The following example uses two Set Variable blocks:
- The appointment date is "2024-09-10" and is stored in a variable named
appointment_date.
- The number to read to the customer is "18005551212" and is stored in the variable
callback_number.
Use the Play Prompt Block with SSML
In your scenario, add the Play Prompt block where you want to communicate the appointment details to the customer. Inside the Play Prompt block, use the variables to play a message to the user by adding a Voice segment to the prompt:
Your appointment date of $(appointment_date) is confirmed. If you would like to contact the provider directly, please call $(callback_number).
Format the Date
By default, a TTS engine may read a date variable incorrectly. For example, if $(appointment_date) contains the value "2024-09-10", the engine will likely read this value as "two zero two four dash zero nine dash one zero."
To ensure the date is read naturally, use the <say-as> tag with interpret-as="date". This tag instructs the engine to pronounce the value as date "The {ordinal day} of {month}, {year}". The format attribute must correctly indicate the format of the year (y), month (m), and day (d) in the variable. Fields in the date text may be separated by punctuation and/or spaces, but they do not need to be defined in the format attribute.
<say-as interpret-as="date" format="yyyymmdd">$(appointment_date)</say-as>
Format the Phone Number
Similarly, a phone number stored as a continuous string of digits can be misinterpreted.
To have the number read correctly, use the <say-as> tag with interpret-as="telephone". This ensures the TTS engine reads the number in a way that is easy for a listener to understand and write down.
<say-as interpret-as="telephone">$(callback_number)</say-as>
Update the Prompt and Select the TTS Voice
Update the text content within the Play Prompt block to include the formatting tags discussed above. To ensure the TTS engine processes the instructions correctly, you must wrap the entire message in <speak> tags.
The final text should look like this:
<speak>
Your appointment date of <say-as interpret-as="date" format="yyyy-mm-dd">$(appointment_date)</say-as> is confirmed. If you would like to contact the provider directly, please call <say-as interpret-as="telephone">$(callback_number)</say-as>.
</speak>
For the SSML tags to function, the block must utilize a compatible TTS engine. Navigate to the Language tab within the block settings. Under Default TTS Voice, select the desired voice from your TTS integration (e.g., Google Cloud TTS or IBM Watson).
Other Common SSML Use Cases
SSML offers a wide range of tags beyond basic date and number formatting. While you should consult your specific TTS provider's documentation for a complete list of supported tags, the following example demonstrates several common contact center use cases, such as spelling out characters, adding emphasis, and controlling playback speed:
<speak>
<!-- Spelling out Reference Numbers -->
Your confirmation code is <say-as interpret-as="characters">XYZ09</say-as>.
<!-- Adding Emphasis -->
Please be aware that this change is <emphasis level="strong">permanent</emphasis>.
<break time="500ms"/>
<!-- Adjusting Speech Rate -->
<prosody rate="slow">Please write down the following security PIN: 5 9 2 1.</prosody>
<!-- Playing Pre-recorded Audio -->
<audio src="https://www.example.com/sounds/brand_jingle.mp3">
(Brand Jingle)
</audio>
<!-- Reading Currency Correctly -->
Your total due is <say-as interpret-as="currency">$45.50</say-as>.
<!-- Structuring with Paragraphs and Sentences -->
<p>
<s>First, verify the green light is on.</s>
<s>Then, press the reset button.</s>
</p>
</speak>

