Were We Wrong About Talking Bots?

TL;DR: We thought we’d get flying cars. Instead we got smartphones. We also thought we’d get engaging, personable bots we could really talk to. Instead, we got voice-activated access layers to search results, streaming music, and connected thermometers; and chatbots that lead us down narrow paths to product promotions.

C-3PO and Jarvis - Where Are You?

For years now many of us have been envisioning the day when we’d have access to intelligent robots or virtual assistants. Our vision of smart talking devices resembled beloved robots or digital personalities from films. We imagined endearing, faithful aids like C-3PO of Star Wars fame, or the sometimes sarcastic but remarkably helpful Jarvis from The Iron Man series.

The common trait underlying all of these futuristic intelligent companions was their imagined personhood. We expected our bot friends to be like us. They would have a self-image, even if programmed-in. They would have personalities. But most of all, we would be able to engage with them as we do with a human being. We would be talking to them, not through them.

The Current Voice Assistant Paradigm - Focus on User Intent

The state of voice and bot technology as we know it today seems to be turning out differently than we’d imagined. Amazon’s Alexa Voice Service (AVS) is a prime example. AVS is beginning to pervade the technology landscape. Not only has Amazon apparently sold millions of Alexa-enabled devices, but the recent CES 2017 event showcased dozens of vendors who’ve built AVS into their smart home devices, cars, phones, robots, or other products.

AVS technology is based on the programming of user intents. Google’s Home device, with its embedded Assistant, follows the very same pattern. What does the paradigm of programming for user intents mean and how is it impacting the role that smart bots will play in our lives?

A user intent represents something specific that a user either wants to know or wants to do. Anyone programming a skill or action for a voice assistant such as Alexa or Google Assistant is asked to first identify the user intents for their application. Example intents are getting an update on local weather, listening to a specific music track, setting a timer, or ordering a car on a ridesharing service.

For each user intent the developer needs to map out one or more slots. Slots are the user inputs that are required to execute whatever the user wants done. Example slots for the intent of getting a weather update would be location and timeframe.

Each intent has a corresponding execution function, where the developer calls an external service to fulfill the user’s request. If the user is asking about the weather, the execution function looks up the forecast from a weather service and then delivers it back to the voice assistant to provide to the user.

Implications of the Dominance of the User Intent Paradigm

There are consequences to the user intent focused paradigm. Today’s voice assistants act primarily as an access layer to cloud services. Yes, you can say “Good morning” to Alexa or Google Home and get a marginally one-way conversational response. But the primary engagement model with these devices is to request access to a service: be it search, streaming music or audio, control of a connected smart home appliance, or access to a third party game or transactional capability.

Very rarely do we have the sense that we’re interacting with the voice assistant as a distinct personality. For skills that actually try to implement a conversational experience, the shortcomings of today’s text-to-speech technology are so apparent (robotic pacing, lack of appropriate intonation and word emphasis, mispronunciations) that skill developers are abandoning text-to-speech based narrative interactions and turning instead to streaming audio. But the flight to streaming audio is turning voice assistants into little more than fancy start buttons for streaming radio plays and other audio services.

Once Upon a Time - Conversational Bots Using AIML

Programming conversational bots on the user intent paradigm makes a lot of sense. The paradigm seems so obvious, in fact, that it may never occur to people to question whether it’s the only or the best paradigm. Interestingly though, a different paradigm for programming talking bots has existed for decades. That paradigm was built  around the open source scripting language AIML (AI Markup Language) created by Dr. Richard Wallace.

AIML was built around a conversational paradigm. In AIML, what the user said was not viewed as an intent. Rather, the user’s utterance was viewed as a statement or question that required an appropriate response. The paradigm underlying AIML is one of user statement or question (Pattern) and best appropriate response (Template).

Programming a chatbot using AIML isn’t difficult technically. The challenge comes in scripting enough patterns (possible user statements) and appropriate responses (templates) to allow for meaningful conversational exchanges. Dr. Wallace worked on the scripts for the open source chatbot A.L.I.C.E. for decades. Steve Worswick’s Mitsuku chatbot, winner of the Loebner Prize, is another example of a successful conversational chatbot powered primarily by voluminous scripts for potential question and answer pairs.

Building a chatbot using AIML will never result in a truly intelligent bot. The whole point of an AIML chatbot is to do a really good job at mimicking conversation with a real person. Updates to AIML over the years have included programming for user intents so that the chatbot can execute specific actions, such as placing a call or sending a text message, just as other commercial voice assistants can do.

But the original paradigm of AIML interactions was to mimic human dialog. People talk with one another, not to or through one another. We don’t view human dialog partners as start buttons for a backend service. We don’t engage with friends by saying: “Hi Tom, how are you and could you please play my “Home from Work” playlist?”

Chatbots Are More Conversational than Voice Assistants - But Those Conversations Fall Short

Interestingly, it appears that conversational chatbots, such as those running on Facebook’s Bot Platform, are embracing the older conversational paradigm more frequently than voice assistants. A whole bevy of development suites now offer graphical interfaces for creating conversational scripts for chatbots. These tools generally follow the user statement / appropriate response pair pattern pioneered by AIML. They also facilitate branching conversations, where the user is presented with a question and then multiple pre-defined responses that determine alternative conversation flows.

Unfortunately, at least in my mind, the current generation of scripting tools result in a wooden conversational experience. Far from a Jarvis-like engagement, the user is presented with a structured dialog akin to a traditional phone tree. What type of clothes are you shopping for today? Men’s, Women’s, or Children’s? If you choose women’s, you get the next set of buttons to choose from: Jackets, Blouses, Skirts. And so the conversation goes. You don’t get the sense that you’re engaging with a personality, no matter how many cute emojis the bot tosses at you. You’re just being provided with a user interface that lets you retrieve a very finite amount of information that a brand wants you to know about.

So Were We Wrong About What Talking Bots Would Be Like?

Is any of this bad? Does it mean that we’ll never get C-3PO or Jarvis? I don’t know. But it makes me wonder if we were wrong about how talking bots would turn out. We thought they would be more real, more fun, more compelling. So far, they seem to be coming up short.  Is it because technology pioneers of voice assistants are pushing developers to stick to the user intent paradigm? Or because chatbot platforms seem best suited to predefined, simple branching phone tree-like dialog flows?

It’s not clear at this point if our technology will ever enable the smart companion bots we once envisioned. Perhaps we never really wanted those personable talking bots in the first place, just as we now probably much prefer our smartphones to the flying cars of our past imagination. But what I keep wondering these days is: are we on the wrong track right now with our technology, or do we need to just keep pressing ahead in the same direction and wait for our tools and the underlying artificial intelligence to mature? I don’t have the answer, but it’s a question that gnaws at me, even as I’m asking Alexa about the weather.

Amy Stapleton good post and bingo on Chatbots Are More Conversational than Voice Assistants - But Those Conversations Fall Short

Like
Reply

I find this discussion really interesting. We've been building bots for years. Most recently on Messenger and on Workplace by Facebook. We've seen millions of messages pass through our system, and I actually find that when people know something is a bot, they treat it as a bot. We have one bot for a major news outlet. I'm not hugely surprised by how many people ask "latest news" rather than "can I have the latest news please". Some do, sure, but not most. I hugely believe that over the past 24 months or so there has been a huge over "hype" of "AI chatbots". But I don't think bots themselves have been overhyped. We're seeing huge successes with bots. I do however, whole heartedly with the sentiment of "conversational bots falling short", it's one of the reasons why personally I hate the word "chatbot". I wrote up my thoughts over at ReadWrite http://readwrite.com/2017/03/22/why-are-there-no-good-bots-dl1/

Amy, I clearly read and feel your disenchantment. And I consider it a very sobering and, let me stress, healthy one for all of us involved in this field. Having said that, let's see where we actually stand now by running a quick review of what we currently have: A) I guess we had a good head start in creating Digital Assistants. The latter is task-oriented, a doer and ideally a go-getter by design and as such, it inevitably follows a User Intent Paradigm. Obviously, it is still largely in its infancy and essentially structured around a dummy parroting of pre-formatted utterances it does not have any actual clue about their significance. Even AIML's help can't make a major difference, in my opinion. However, there is an almost 2 decade-old technology that is slow-boiling on the backburner, which can definitely open new horizons: That is the Semantic Web. To my knowledge, Viv, a startup recently acquired by Samsung, is a valid example of the potential of Web 3.0 perspective for the next generation of Digital Assistants. Will this open up the road toward a real CUI approach? It definitely helps. B) The second type of virtual creatures is the Digital Advisor. The latter is the inheritor of the 4 decades old Expert Systems. It is knowledge, inference and decision-support-oriented by design, very sophisticated AI-based agent whose grasp is still beyond the capabilities of the majority of the current VUI aficionados. Nor does the Virtual Advisor have a significant use case in the domestic environment- yet. It is obviously thriving in the enterprise context and, by increasingly following an "as-a-service" model (e.g. IBM Watson Services or MS Cognitive Services), it is getting into the very fabric of the management decision making. In a relatively short run, it might also massively find its way into the household realm likely through personal medical, financial and legal advisory implementations. Here again, a User Intent Paradigm appears to stay the predominant approach. Can an actual CUI be of any immediate help here? I would argue that it does only as a matter of style and ease of use. C) A totally different class of virtual creatures is made of what we can call Digital or Virtual Companion. The latter appears to put 'emotions' at center stage. If I understand you correctly, this is what you truly wish to see around. This is indeed a fascinating domain, surely very advanced in the fiction realm but still far from having enough real-world substance to generate its own authentic conversational interface. There is a number of yet crude products out there plus several ongoing promising research activities mainly at the universities in Japan, US and in the European countries (e.g. Horizon 2020 EMPATHIC European Consortium where I am a Board Advisor). But we are still pretty far from anything close to your expectations. Ultimately I guess you wonder whether we should lower our expectations or perhaps we are on a wrong technology path. I still don't have an answer for that but let me quote the words of one of my preferred greatest Spanish poets of the 20th century, Antonio Machado: "Wanderer, your footsteps are the road and nothing more; wanderer, there is no road, the road is made by walking..." (Campos de Castilla, 1912). So let's walk and hope for the best.

We are not wrong about conversational bots, we are still too early. Let's see what generative approaches can achieve in the future.

To view or add a comment, sign in

More articles by Amy Stapleton

Others also viewed

Explore content categories