Why humanity is needed to propel conversational AI

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

Conversational AI is a subset of artificial intelligence (AI) that allows consumers to interact with computer applications as if they were interacting with another human. According to Deloitte, the global conversational AI market is set to grow by 22% between 2022 and 2025 and is estimated to reach $14 billion by 2025.

Providing enhanced language customizations to cater to a highly diverse and vast group of hyper-local audiences, many practical applications of this include financial services, hospital wards and conferences, and can take the form of a translation app or a chatbot. According to Gartner, 70% of white-collar workers purportedly regularly interact with conversational platforms, but this is just a drop in the ocean of what can unfold this decade. 

Despite the exciting potential within the AI space, there is one significant hurdle; the data used to train conversational AI models does not adequately account for the subtleties of dialect, language, speech patterns and inflection. 

When using a translation app, for example, an individual will speak in their source language, and the AI will compute this source language and convert it into the target language. When the source speaker deviates from a standardized learned accent — for example, if they speak in a regional accent or use regional slang — the efficacy rate of live translation dips. Not only does this provide a subpar experience, but it also inhibits users’ ability to interact in real-time, either with friends and family or in a business setting. 


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

The need for humanity in AI

In order to avoid a drop in efficacy rates, AI must make use of a diverse dataset. For instance, this could include having an accurate depiction of speakers across the U.K. — both on a regional and national level — in order to provide a better active translation and speed up the interaction between speakers of different languages and dialects. 

The idea of using training data in ML programs is a simple concept, but it is also foundational to the way that these technologies work. Training data works in a singular structure of reinforcement learning and is used to help a program understand how to apply technologies like neural networks to learn and produce sophisticated results. The wider the pool of people interacting with this technology on the back-end, for example, speakers with speech impediments or stutters, the better the resulting translation experience will be. 

Specifically within the translation space, focusing on how a user speaks rather than what they speak about is the key to augmenting the end-user experience. The darker side of reinforcement learning was illustrated in recent news with Meta, who recently came under fire for having a chatbot that spewed insensitive comments — which it learned from public interaction. Training data should therefore always have a human-in-the-loop (HITL), in which a human can ensure the overarching algorithm is accurate and fit for purpose.

Accounting for the active nature of human conversation 

Of course, human interaction is incredibly nuanced and building bot conversational design that can navigate its complexity is a perennial challenge. However, once achieved, well-structured, fully realized conversational design can lighten the load on customer service teams, translation apps and improve customer experiences. Beyond regional dialects and slang, training data needs to also account for active conversation between two or more speakers interacting with each other. The bot must learn from their speech patterns, the time taken to actualize an interjection, the pause between speakers and then the response.

Prioritizing balance is also a great way to ensure that conversations remain an active experience for the user, and one way to do so is via eliminating dead-end responses. Think of this akin to being in an improv setting, in which “yes, and” sentences are foundational. In other words, you’re supposed to accept your partner’s world-building while bringing a new element to the table. The most effective bots operate similarly by phrasing responses openly that encourage additional inquiries. Offering options and additional, relevant choices can help ensure all end users’ needs are met.

Numerous people have trouble remembering long strings of thought or take a bit longer to process their thoughts. Because of this, translation apps would do well to allow users enough time to compute their thoughts before taking a pause at the end of an interjection. Training a bot to learn filler words — including so, erm, well, um, or like, in English for example — and getting them to associate a longer lead time with these words is a good way of allowing users to engage in a more realistic real-time conversation. Offering targeted “barge-in” programming (chances for users to interrupt the bot) is also another way of more accurately simulating the active nature of conversation. 

Future innovations in conversational AI 

Conversational AI still has some way to go before all users feel accurately represented. Accounting for subtleties of dialect, the time taken for speakers to think, as well as the active nature of a conversation will be pivotal to propelling this technology forward. Specifically within the realm of translation apps, accounting for pauses and words associated with thinking will ameliorate the experience for everyone involved and simulate a more natural, active conversation.

Getting the data to draw from a wider data set in the back-end process, for example learning from both English RP and Geordie inflections, will avoid the efficacy of a translation dropping owing to processing issues due to accent. These innovations provide exciting potential, and it is time translation apps and bots account for linguistic subtleties and speech patterns. 

Martin Curtis is CEO of Palaver

Originally appeared on: TheSpuzz