Is AI’s next leap understanding emotion? $50M for Hume says yes

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Yesterday, a new startup called Hume AI announced it had raised $50 million in a Series B round led by EQT Ventures with participation from Union Square Ventures, Nat Friedman & Daniel Gross, Metaplanet, Northwell Holdings, Comcast Ventures, and LG Technology Ventures.

The startup was co-founded and is led by CEO Alan Cowen, a former researcher at Google DeepMind. Beyond Cowen’s pedigree and a general frothing interest in AI startups from the VC world — what else could command such a sizable round?

Hume AI’s differentiator from numerous other AI model providers and startups is in its focus on creating an AI assistant — and an API for that assistant that other enterprises can build chatbots atop of, as well as some of its underlying data — that understands human emotion, reacts appropriately to it, and conveys it back to the user.

Unlike ChatGPT and Claude 3 which are primarily known for being text-based chatbots, Hume AI also uses voice conversations as its interface, listening to a human user’s intonation, pitch, pauses, and other features of their voice alone.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

The startup, based in New York City and named after Scottish philosopher David Hume, also released a public demo of its “Empathic Voice Interface (EVI),” which it bills as “the first conversational AI with emotional intelligence.” You can try it yourself here: demo.hume.ai. It just needs a device with a working microphone, computer or mobile.

Why understanding human emotion is key to providing better AI experiences

Carrying on emotionally aware voice conversations with human users may like a simple enough task for an AI assistant in the year 2024, but it is actually a massively complex, nuanced, and difficult undertaking, as Hume AI doesn’t just want to understand if users are feeling “happy,” “sad,” “angry,” “afraid” or any of the five-to-seven “universal” human emotions across cultures categorized from facial expressions by PhD psychologist Paul Ekman.

No, Hume AI seeks to understand more nuanced and often multidimensional emotions of its human users. On its website, the startup lists 53 different emotions it is capable of detecting from a user, including:

  1. Admiration
  2. Adoration
  3. Aesthetic Appreciation
  4. Amusement
  5. Anger
  6. Annoyance
  7. Anxiety
  8. Awe
  9. Awkwardness
  10. Boredom
  11. Calmness
  12. Concentration
  13. Confusion
  14. Contemplation
  15. Contempt
  16. Contentment
  17. Craving
  18. Desire
  19. Determination
  20. Disappointment
  21. Disapproval
  22. Disgust
  23. Distress
  24. Doubt
  25. Ecstasy
  26. Embarrassment
  27. Empathic Pain
  28. Enthusiasm
  29. Entrancement
  30. Envy
  31. Excitement
  32. Fear
  33. Gratitude
  34. Guilt
  35. Horror
  36. Interest
  37. Joy
  38. Love
  39. Nostalgia
  40. Pain
  41. Pride
  42. Realization
  43. Relief
  44. Romance
  45. Sadness
  46. Sarcasm
  47. Satisfaction
  48. Shame
  49. Surprise (negative)
  50. Surprise (positive)
  51. Sympathy
  52. Tiredness
  53. Triumph

Hume AI’s theory is that by developing AI models capable of a more granular understanding and expression of human emotion, it can better serve users — as a “willing ear” to listen and work through their feelings, but also providing more realistic and satisfying customer support, information retrieval, companionship, brainstorming, collaboration on knowledge work, and much more.

As Cowen told VentureBeat in an email sent via a spokesperson from Hume AI:

“Emotional intelligence includes the ability to infer intentions and preferences from behavior. That’s the very core of what AI interfaces are trying to achieve: inferring what users want and carrying it out. So in a very real sense, emotional intelligence is the single most important requirement for an AI interface.

With voice AI, you have access to more cues of user intentions and preferences. Studies show that vocal modulations and the tune rhythm and timbre of speech are a richer conduit for our preferences and intentions than language alone (e.g., see https://pure.uva.nl/ws/files/73486714/02699931.2022.pdf).

Understanding vocal cues is a key component of emotional intelligence. It makes our AI better at predicting human preferences and outcomes, knowing when to speak, knowing what to say, and knowing how to say it in the right tone of voice.”

How Hume AI’s EVI detects emotions from vocal changes

How does Hume AI’s EVI pick up on the cues of user intentions and preferences from vocal modulations of users? The AI model was trained on “controlled experimental data from hundreds of thousands of people around the world,” according to Cowen.

On its website, Hume notes: “The models were trained on human intensity ratings of large-scale, experimentally controlled emotional expression data” from methods described in two scientific research papers published by Cowen and his colleagues: “Deep learning reveals what vocal bursts express in different cultures” from December 2022 and “Deep learning reveals what facial expressions mean to people in different cultures” from this month.

The first study included “16,000 people from the United States, China, India, South Africa, and Venezuela” and had a subset of them listen to and record “vocal bursts,” or non-word sounds like chuckles and “uh huhs” and assign them emotions for the researchers. The participants were also asked this subset to record their own vocal bursts, then had another subset listen to those and categorize those emotions, as well.

The second study included 5,833 participants from the same five countries above, plus Ethiopia, and had them take a survey on a computer in which they analyzed up to 30 different “seed images” from a database of 4,659 facial expressions. Participants were asked to mimic the facial expression they saw on the computer and categorize the emotion conveyed by the expression from a list of 48 emotions, scaled 1-100 in terms of intensity. Here’s a video composite from Hume AI showing “hundreds of thousands facial expressions and vocal bursts from India, South Africa, Venezuela, the United States, Ethiopia, and China” used in its facial study.

Hume AI took the resulting photos and audio of participants in both studies and trained its own deep neural networks on them.

Hume’s EVI itself told me in an interview I conducted with it (disclaimer that it’s not a person and its answers may not always be accurate, as with most conversational AI assistants and chatbots) that Hume’s team “collected the largest, most diverse library of human emotional expressions ever assembled. We’re talking over a million participants from all around the world, engaged in all kinds of real-life interactions.”

According to Cowen, the vocal audio data from participants in Hume AI’s studies was also used to create a “speech prosody model, which measures the tune, rhythm, and timbre of speech and is incorporated into EVI” and which convey up to “48 distinct dimensions of emotional meaning.”

You can see — and hear — an interactive example of Hume AI’s speech prosody model here with 25 different vocal patterns.

The speech prosody model is what powers the bar graphs of different emotions and their proportions displayed helpfully and in what I found to be a thoroughly engaging manner on the right hand sidebar of Hume’s EVI online demo site.

The speech prosody model is just one part of Hume AI’s “Expression Measurement API” — other components included and which its enterprise customers can build apps atop of include facial expressions, vocal bursts, and emotional language — the latter of which measures “the emotional tone of transcribed text, along 53 dimensions.”

Hume also offers its Empathic Voice Interface API for the voice assistant mentioned above — which only accesses an end-user’s audio and microphone — and a “Custom Models API” that allows users to train their own Hume AI model tailored to their unique dataset, recognizing patterns of human emotional expression in, let’s say, an enterprise’s customer response call audio or facial expressions from their security feeds.

Ethical questions and guidelines

So who does all this work benefit, other than the startup founders now raising a bunch of cash?

Hume AI was founded in 2021, but already the company has enterprise customers using its APIs and technology that “span health and wellness, customer service, coaching/ed-tech, user testing, clinical research, digital healthcare, and robotics” according to Cowen.

As he elaborated in a statement sent via spokesperson’s email:

“EVI can serve as an interface for any app. In fact, we’re already using it as an interactive guide to our website. We’re excited about developers using our API to build personal AI assistants, agents, and wearables that proactively find ways to improve users’ daily life. We’re already working with a number of design partners who are integrating EVI into their products spanning from AI assistants to health & wellness, coaching, and customer service.”

While I found the demo to be surprisingly delightful, I also saw potential for people to become perhaps dependent on Hume’s EVI or obsessed with it in an unhealthy way, providing companionship that may be more pliant and easier-to-obtain than from other human beings. I also recognize the possibility that this type of technology could be used for darker, more sinister and potentially damaging uses — weaponized by criminals, government agencies, hackers, militaries, paramilitaries for such purposes as interrogation, manipulation, fraud, surveillance, identity theft, and more adversarial actions.

Asked directly about this possibility, Cowen provided the following statement:

Hume supports a separate non-profit organization, The Hume Initiative, which brings together social scientists, ethicists, cyberlaw experts, and AI researchers to maintain concrete guidelines for the ethical use of empathic AI. These guidelines, which are live on thehumeinitiative.org, are the most concrete ethical guidelines in the AI industry, and were voted upon by an independent committee. We adhere to The Hume Initiative ethical guidelines and we also require every developer that uses our products to adhere to The Hume Initative’s guidelines in our Terms of Use.

Among the many guidelines listed on The Hume Initiative’s website are the following:

“When our emotional behaviors are used as inputs to an AI that optimizes for third party objectives (e.g. purchasing behavior, engagement, habit formation, etc.), the AI can learn to exploit and manipulate our emotions.

An AI privy to its users’ emotional behaviors should treat these behaviors as ends in and of themselves. In other words, increasing or decreasing the occurrence of emotional behaviors such as laughter or anger should be an active choice of developers informed by user well-being metrics, not a lever introduced to, or discovered by, the algorithm as a means to serve a third-party objective.

Algorithms used to detect cues of emotion should only serve objectives that are aligned with well-being. This can include responding appropriately to edge cases, safeguarding users against exploitation, and promoting users’ emotional awareness and agency.”

The website also includes a list of “unsupported use cases” such as manipulation, deception, “optimizing for reduced well-being” such as “psychological warfare or torture,” and “unbounded empathic AI,” the latter of which amounts to that the Hume Initiative and its signatories agree to “not support making powerful forms of empathic AI accessible to potential bad actors in the absence of appropriate legal and/or technical constraints”

However, militarization of the tech is not specifically prohibited.

Rave initial reception

It wasn’t just me who was impressed with Hume’s EVI demo. Following the funding announcement and demo release yesterday, a range of tech workers, entrepreneurs, early adopters and more took to the social network X (formerly Twitter) to express their admiration and shock over how naturalistic and advanced the tech is.

“Easily one of the best AI demos I’ve seen to date,” posted Guillermo Rauch, CEO of cloud and web app developer software company Vercel. “Incredible latency and capability.”

Similarly, last month, Avi Schiffmann, founder and president of the non-profit humanitarian web tool making company InternetActivism.org, wrote that Hume’s EVI demo blew him away. “Holy fuck is this going to change everything,” he added.

At a time when other AI assistants and chatbots are also beefing up their own voice interaction capabilities — as OpenAI just did with ChatGPT — Hume AI may have just set a new standard in mind-blowing human-like interactivity, intonation, and speaking qualities.

One obvious potential customer, rival, or would-be acquirer that comes to mind in this case is Amazon, which remains many people’s preferred voice assistant provider through Alexa, but which has since de-emphasized its voice offerings internally and stated it would reduce headcount on that division.

Asked by VentureBeat: “Have you had discussions with or been approached for partnerships/acquisitions by larger entities such as Amazon, Microsoft, etc? I could imagine Amazon in particular being quite interested in this technology as it seems like a vastly improved voice assistant compared to Amazon’s Alexa,” Cowen responded via email: “No comment.”


Originally appeared on: TheSpuzz

Scoophot
Logo