AI in general and ChatGPT specifically, have been advertised as a savior because it will cut costs while enhancing customer satisfaction. Looking more closely, the cost-saving benefits may be far outweighed by the damage to satisfaction, loyalty, revenue, and word of mouth. This article revisits three recent messes caused by these tools, shows six areas of genuine opportunity, and then provides criteria for deciding where ChatGPT and AI are appropriate.
Three stumbles by ChatGPT and AI
Accepting AI output as fact
The New York Times recently reported on a plaintiff’s attorney sanctioned for citing six non-existent cases in a suit against Avianca Airlines. The lawyer, in his defense, noted that he had asked ChatGPT if one case was real, receiving an affirmative answer. Then he asked if the other five cases were fake and was assured that the cases “can be found in reputable legal databases such as LexisNexis.” He did not check further.
CVS tries to do too much
CVS had a refill IVR that worked great. It was simple and customers quickly learned the limited number of choices including ordering refills based on the prescription number, talking to the pharmacist, and getting pharmacy hours. Getting refills ordered and pickup scheduled took about sixty seconds. Then, about a year ago, CVS “improved” the tool, incorporating voice recognition and AI, and adding the opening question, “How can I help you today?” Even when I tried to use simple language asking to get a prescription refill, about forty percent of the time I was told it did not understand or, worse, I was taken down a rabbit hole of a different branch of the menu. CVS has now front-ended the tool with guidance on words and terms to use and limited activities so that it works almost as well as it did two years ago.
Terry Grosse, in a June 15 NPR interview with Cade Metz of the New York Times, described an experiment she did with ChatGPT. She asked the tool to write song lyrics describing the end of a love affair set to the melody of the song America the Beautiful. She read the lyrics, including, “From sea to shining sea, a tale of love’s demise. In fields of golden grain, we whispered vows so true.” She summarized the lyrics as “a string of cliches that make no sense strung together.” Truly cringeworthy!
Speech recognition software, chatbots, automated phone processes, and human support representatives have distinct advantages and downsides; if they’re deployed in the wrong places, the blame for low service quality will lie not with the misappropriated people or technology, but instead with the CIO/CTO/CXO who did not organize them in a way that maximizes their very different strengths.
Six Appropriate Applications
Performing basic transactions where facts can be internally validated or edit-checked
AI can classify and then manage the execution of basic transactions that customers select from a list or indicate using pre-determined type-ahead suggestions. These include airline check-in and boarding passes, account maintenance, purchases, status checks, password resets, and prescription refills. In all these cases, the transaction can be validated against information in a corporate database. AARP’s Help Center uses type-ahead to allow members to accurately specify the transaction or question they are trying to submit. The type-ahead has almost eliminated email with very high satisfaction.
Training and evaluation of staff, with the caveat that extremes are checked by a human
Online training has been used for decades. It is usually presented as, if X, then do Z. However, many situations have more than one response depending upon the customer’s particular circumstances. AI can remember the variations on the theme better than the individual service person. One major law firm is now training paralegals and even associate lawyers on how to flexibly interact with clients using ChatGPT. A complex question or situation is put forth by a hypothetical client and the staff person then replies via email. Based on the staff reply, the “client” then counters in a realistic manner and the conversation progresses. This provides a safe environment for practicing how to handle difficult issues and clients and making mistakes without jeopardizing any client relationships.
Likewise, AI can be used to analyze customer sentiment to identify unhappy real customers (via analysis of text and audio conversations) and then to highlight phrases or statements that may have led to dissatisfaction. Such an evaluation could be transmitted directly back to the employee or, if the finding is extremely positive or negative, first run by a human supervisor or HR person to be sure no error has taken place.
AI can take whatever information exists describing a customer and personalize the communication or service interaction. AARP first started doing this decades ago by asking customers which of the eight usual reasons led them to join the organization and then tailoring the welcome package to primarily that subject area, with a soft cross-sell to two other offerings. Avis asked customers which of five types of renter personas they most identified with. ChatGPT can ask questions and “make conversation” to fine-tune the persona and tailor the communications and recommend products.
Educating, Onboarding and Troubleshooting
Education of customers takes place before the customer encounters a problem to hopefully prevent the problem. Getting customers to sit still for education requires motivation. Then the education must be delivered at a level of sophistication appropriate for the customer. AI is good for both of these functions in that, based on what is known about the customer, the appropriate motivations can be articulated and the necessary questions can be asked to assess how much the customer knows about the product or service. See my previous article on the six steps of customer onboarding. AI is good at noting where the customer is in the onboarding process and making sure that the customer is not alienated like I was recently by a medical system that bombarded me with pitches to use the medical records system I had been using for the last three months.
Likewise, for troubleshooting, which is commonly finding a mistake the customer has made, AI can ask, via a checklist, what steps have already been taken to avoid the standard frustration of starting with “Please unplug your router and reboot your device” when the customer has already done that and yells at the CSR. Troubleshooting takes place when the customer has already encountered the question or problem. Note that the only difference between a question and a problem is the customer’s perception of who is at fault. There must be clear criteria for when the customer is escalated to a human – both use of angry language or a set number of failures. Survey data can easily be used to set the cut-point.
The AI can also explain why a transaction failed and justify policies that the customer might find onerous. CCMC’s delight study has shown that digital messages to the customer can be infused with empathy, identification, and even humor that can mitigate dissatisfaction and sometimes actually create delight. These approaches can be programmed into AI and ChatGPT. Several years ago, Turbo-Tax, in the middle of a set of questions assessing the tax filer’s income, actually asked, “We know that Sami is only 18 months old, but we need to ask if your toddler is earning enough money to disqualify him as a dependent. 😊” This type of humor can relieve the stress of tax filing.
Realtime analytics for escalation of cases to a human
Sentiment and text/speech analysis can identify customer service situations that are spiraling out of control and request intervention by a human before the customer exits the interaction or serious damage is done. The weakness in most chatbots is that customers are kept in the bot for too long rather than escalation. This is the IT manager’s fault for not counting failures or detecting angry text, not the AI.
Innovation and ideation
ChatGPT is very good at providing a range of ideas built on a range of concepts that humans would seldom connect. Eapen et all, in the July-August Harvard Business Review, report on “How Generative AI can Augment Human Creativity” in product and concept innovation and reducing the limitations of expertise bias. In the article, they show how Midjourney, a text-to-image tool, can find resemblances across disparate items to produce novel designs for products such as children’s chairs and chocolates starting from an image of an elephant. A second function is to challenge “expertise bias” in a particular industry by, again, looking across totally disparate subjects, like toys, crabs, and architecture. Finally, ChatGPT was asked to assess the pros and cons of possible ideas for minimizing food waste such as dynamic use-by dates based on the storage environment. As these are hypothetical cases to be evaluated by humans, the tool adds creativity without adding risk.
Criteria For Applying AI and ChatGPT
The primary criterion for judging the appropriate use of ChatGPT and AI is – What is the risk if it is wrong and what is the probability of that the risk will occur?
This is why I see an analog to treating ChatGPT like your smartphone autocorrect. If you send a text to your spouse with the wrong word, no big deal — at least in most cases :). Sending the wrong word to a key client, at minimum, makes you look sloppy and at worst could be disastrous. You balance the risk of error with the needed effort to proofread your text.
More specific criteria for use are:
- Is it producing a draft that will be reviewed by a human?
- Is it producing a template of a set of ideas that will serve as the basis for further development by a human?
- Are there a finite number of possible answers which can be verified internally to the company
- Is there a low risk to the organization if a boneheaded mistake is made — e.g. is the audience friendly or likely to expect perfection and be angry or critical if an error betrays that AI was used?
- Can you continuously track error and failure rates? If not, you are significantly increasing your risks.
- Is the application dealing with a set of transactions with fixed or predictable answers? The best approach is the educate the customer on which issues the app can handle and which will not work. How many ways can a customer ask the question? Do not ask, like CVS, “How can I help you?”
- What is the error rate (including escalation to human and exit from the function without complaint) and what is an acceptable error rate? Again, this is a function of the audience that you are dealing with. At the law firm using it for training, a lawyer’s bad answer is only seen by the training staff. Using a tool somewhat like ChatGPT to diagnose instrument readings from airplane sensors and then drive the aileron settings led to the two crashes of the Boeing 737MAX. What is the cost of being wrong vs. the payoff of being right?
A Final Thought
Companies of all sizes face constant pressure to streamline their operations and increase cost efficacy. Many are tempted by the lofty promises and supposed savings of Automated Speech Recognition (ASR) systems designed to reduce and/or replace human service personnel
As a general litmus test and thought exercise for this type of “absolute automation” approach, ask yourself if there are other aspects of your business where you’d be comfortable with a fully-automated and self-correcting technology solution. Would you be comfortable handing your entire recruiting process over to automated software? Would you want payroll to be finalized and sent out without any oversight of your human finance and accounting team? Would you let marketing algorithms determine what your digital ads say (and how much to spend on them) without any monitoring from your branding/marketing team?
If you’re having trouble finding a facet of your company that you could confidently and completely automate, ask yourself what makes Customer Service — the only department that interacts directly and daily with your customers – a function you would turn over to such a tool?
I’d love to hear about both successes and stumbles via the comment section below.
Peter W. North, who is also assisting in updating my book CS 3.0, contributed to this article.
 New York Times, June 8, 2023
 Tojin Eapen et all, How Generative AI Can Augment Human Creativity, Harvard Business Review, July 2023, p56