Increasing the use of Speech in the IVR


Share on LinkedIn

In my February 1, 2011 blog entry “Input modality smarts for your voice apps” (, I talked about how Best Modality Signaling can help reduce caller frustration and increase call automation rates.

That blog entry related to voice application dependent modality issues.

Today I want to talk about caller dependent input modality issues.

While specific interaction points in the voice application dialogue lend themselves better to one form of input modality over another (Speech vs DTMF), individual callers and the environments from which they are calling come with their own sets of preferences. A well designed voice application will generally encourage a caller to use the optimal input modality based on the type of information being requested. Generally, this type of application will also allow for the fact that the caller may be calling from a noisy background or that they simply do not speak clearly enough for the recognition engine to make reasonably confident decisions about the recognized speech utterances.

What best practices in Voice User Interface design do not always account for though are cases when Speech is not used as often as it should be.

Say for example we have an elderly caller that is trying to enter a prescription number from a medicine bottle or a credit card number from a credit card. It is challenging enough, even if their eyeglasses are handy, to locate the correct numbers and, almost simultaneously, have the eye-hand coordination required to enter all eight or sixteen DTMF digits before the app times out and burdens them with a retry message – however politely worded it might be.

Or say we have a 20 year old, tech savvy iPhone user looking to see if Dad put some fuel in the bank account for him to have fun with yet. Cool as the iPhone is, if he keeps taking it away from his ear to press the last four digits of his social, his PIN or whatever it is he’s trying to DTMF in, he will not be leveraging the best available form of input modality available for the task at hand.

In both cases, the speech recognition engine may or may not pass the confidence thresholds required for success. In both cases the voice application may or may not timeout and enter its error recovery code. And in both cases we do not have enough information to tell the application what the next best step is.

With Best Modality Signaling in a caller adaptive environment however, we at least have a little more information to go on. As part of the Adaptive Audio ( software, the BMS feature allows us to know when callers are, for example, taking 7 seconds to enter their PIN when only 8 seconds are allowed by the application. It let’s us know if prior to that the caller took 14 or 15 seconds to enter their credit card number when 16 were allowed. It knows when interdigit timeout values are close to thresholds also. It keeps track of all of this as the caller progresses through each interaction point in the call dialogue.

So it can tell when a particular caller is not having background noise or speech recognition issues, but that its more likely they just have eye-hand coordination issues. Or that it is just too cumbersome and time consuming to keep moving the phone between their ear and their outstretched hand. BMS can detect this with a fair degree of accuracy and allow the application to treat these and similar cases accordingly. This can come in the form of advocating speech a little more emphatically to these particular callers. If the caller uses the input modality that’s best for them, obviously that is what’s best for everyone here.

Whether it’s an $80 monthly prescription order from a repeat customer that just does not want to be treated like they are a bungling idiot or a tech savvy 20 year old that will be sure to tell Dad he needs to get a bank that can keep up with him, the results are costly for the enterprise that continues to ignore how their IVR treats their paying customers.

And yes, a 20 year old college kid really can have that kind of audacity – even when Dad is forking over the cash.

In short, the feature benefits of BMS include increasing call automation rates, reducing user error rates and reducing caller frustration with the IVR.

To learn more about how the BMS and other novel features of the VUI Cloud call optimization features work, contact Interactive Digital at

Daniel O'Sullivan
CEO, innovator and technologist in software engineering and product development. Created and implemented Adaptive Technology and Fastrack Software products that have optimized over 1.5 Billion self-service phone calls worldwide and saved clients over $100M to date. Electrical Engineering undergrad with a Masters in Computer Science. Lucent/Bell Labs alumni. Winner of worldwide eco-design project and received several patents. Currently CEO of Software Technology Partners.Focus: Business Development, Technology Partnering, Mobile, Web and Cloud Technologies and Human-Computer Interaction.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here