Principles for Voice User Interface
Our voices are diverse and complex. Voice commands are even more complex.They are difficult to process even between people, not to mention computers. The way of thinking we express, the way of cultural communication, the way we use language and the way we infer meaning… All these nuances affect our language understanding.
So, how do designers and engineers respond to this challenge? How do we cultivate trust between users and AI? This is where VUI comes into play.
Voice User interface or VUI is still in development and refinement, and this is where usability studies are important since the interface has to be developed precisely according to the need and user type, for example, if it is made for the general public, it is important that there is enough help for beginners.
On the contrary, if it is intended for a small group of people, specialized in this area, the interface should focus on productivity rather than helping them how to use it.
The Lotus Conversational Interface (LCI), is a prototype by IBM. This prototype follows 'Voice User Interface Principles for a Conversational Agent '.
So, let's see what these principles are...
Design principles of a voice-based interface
The purpose of developing an interface that is based on voice recognition to work is that it is like a person who serves the user faithfully, which is very helpful to the user and also is respectful.
The interfaces that use voice recognition to interact with the user, were generally considered nothing more than artificial intelligence. It is clear that a voice-based interface is one in which a user can interact with the computer by voice.
There are certain principles that should be considered when developing a voice-based interface.
The first group of principles are oriented to the natural interaction and eliminates the need of being a tech person.
- The system must be able to understand the natural language of the user, that is, it must understand references that the user makes. It should also be able to handle commands and questions that do not necessarily have complete information to execute, but in addition, the system should request specifications when the commands are ambiguous or necessary information when it is required.
- The system should not interrupt the user when he or she is talking, unless there is an emergency or some high priority notification.
- The user may interrupt the system, and the system should stop when the user speaks.
- There should not be questioning cycles, that is, the user should not be forced to answer questions generated by the system to move forward in other processes.
- If the system has something to say that is not an answer to a user’s question, the system should ask permission to speak, unless it is a matter of high priority.
- The system should be able to handle courtesy responses such as “thank you” or “you’re welcome” since there will be users who do it out of habit, so the system should generate courtesy responses.
The second group of principles is aimed at the confidence and dependency, the user will have to the system. Users have the need to know if they have been properly heard/understood and if the process they requested is running, they also need to know if the answer they got is the one they were looking for.
Currently, voice recognition is imperfect in this area.
- When the system answers any question, it must refer to the question asked by the user, so that one word like “Yes” or any number are not acceptable answers. System should say ” Yes there are tickets available” instead
- The delete or any irreversible procedures must be confirmed by the user.
- If some process takes more than a few seconds, then the system should indicate that the process is being carried out in a specific way, that is, words like “printing” should be used instead of just “working” so that the user is sure that the correct procedure is running.
- The user must be able to cancel a command that is in the process by saying the words like “stop”
- If the user for some reason does not answer any question asked by the system, then, after a certain period of time, the system must ask the user for permission to speak, and then ask if the user still wants to continue executing the command mentioned above, If so, then the system should ask the question one more time.
The third group of principles is related to the consistency and transparency of the system.
- The way of talking about the system should say words and phrases that the user can understand.
- The system should not assume any command that the user entered, for example, if the user says “open this message …” the system should not assume that the user must be talking about the last message. Instead, it should ask the user, “which message are you talking about? ‘
- System speech should be consistent and use the same words for similar actions.
- The system should not make the user believe that it is more intelligent than the system since this will only cause the user to do more things that the system can not recognize or understand.
Here is a conversation between the system " LCI " and the user, you can see how they comply with each of the principles mentioned above
User: What time is it?
LCI: The time is 4:49 PM.
User: Let me know when it is 5 o’clock.
LCI: Alarm set for 5:00 PM.
User: Show me my messages, please.
//LCI Opens the mail inbox window. //
User: Let me see the message from Beth.
// LCI Selects the first message from Beth Brownholtz. This
one? //
User: No, the next one.
// LCI Opens the next message from Beth Brownholtz. //
User: Forward this message to Robert
LCI: Robert Armes or Robert Corell?
User: Armes.
// LCI Creates a new message to Robert Armes containing the
contents of the original message. //
User: Send this message.
LCI: Message sent.
User: Is Robert online?
LCI: Robert Armes is offline.
User: Notify me when he is available.
// LCI Monitoring chat status of Robert Armes. //
User: Delete the message from Bill.
// LCI Selects the message.//
LCI : Are you sure?
User: Yes.
// LCI Deletes the message. //
LCI: Excuse me?
User: What is it?
LCI: You have one new message.
User: Who is it from?
LCI: The message is from Julie Alweis.
User: What is it about?
LCI: The subject of the message is expense report.
User: File this in my management folder.
// The message disappears from the inbox //
LCI: Excuse me?
User: Go ahead.
LCI: IBM has reached the limit of 93, trading at 93.06.
User: Tell me when IBM reaches ninety three and a half.
// LCI Monitoring IBM for a limit of 93.5. //
User: What is ninety three times one hundred twenty?
LCI: 93 times 120 is 11,160.
User: Thanks.
LCI: You’re welcome!
The understanding of advanced natural language is still in its infancy. At present, Siri and Amazon Echo will provide a web search when they start to deal with problems, and will not directly answer your questions, but advanced natural language understanding can understand what you say, and answer you directly.