Mint Primer: Alexa, why is voice failing to resonate in tech?

The Humane AI Pin, a voice-controlled personal assistant, was one of Silicon Valley’s most-hyped AI-first products. However, the hype has crashed rather fast. Barring Amazon’s Alexa and Apple’s Siri, voice interfaces have been tech’s biggest failures. Mint asks why:

Why the hype around Humane AI Pin?

The AI Pin, launched by Silicon Valley startup Humane, promised to overhaul consumer technology. To do this, the company’s first gadget showcased an operating system that ran on OpenAI’s ChatGPT platform—which sought to automate most user operations on smartphones. These include playing music, booking cabs, ordering food and more just by voice commands, without needing to open multiple apps. It is this seamless interoperability that led to the hype around the AI Pin. The Pin also had a camera that could be used to input images, thus promising a future where gadgets become more interactive.

How has the product been received?

In mid-April, Humane started seeding units for evaluation among reviewers and tech testers. So far, an overwhelmingly large volume of feedback has been highly critical of it, saying most of the voice interactions do not work the way that was promised. Such an issue, however, could largely be because of the fact that most applications work in silos, and multiple permissions are required for them all to work smoothly and in sync. Further, generative AI is yet to become highly accurate, which further adds to complications around using a voice interface as the main way of operating a gadget.

Why do we need specialized AI hardware?

One reason the AI Pin has been questioned is that as a gadget, it doesn’t do anything that a smartphone doesn’t already. Both Android and iOS platforms have voice interfaces for most operations. The only difference that a dedicated AI hardware can do is alter the smartphone form factor, and present devices that are not centred on touchscreens. This may take a while.

Is voice well suited to AI operations?

Most consumer-facing generative AI tools are text-driven. However, voice interfaces are on the rise. Microsoft’s Vall-E can consume three-second audio clips, and use them to generate voice responses based on the source voice. Generative AI is capable of holding voice-based conversations, which is why firms such as Robot and Humane are attempting gadgets that natively run on voice. However, multi-modal generative AI models are still cloud-based and expensive, making them difficult to run on offline devices.

So why has voice failed so far?

Reliability is a big issue. Even with Amazon’s Alexa, interfaces are largely driven by basic commands, or preset commands that are built by developers for third-party software integration. Generative AI, in particular, needs smooth conversational interactions through voice—which is yet to become possible with high accuracy. Most voice interfaces have so far been basic, which is also why voice-based devices have so far failed to deliver the kind of complex tasks that a mature user-experience should deliver.

Originally appeared on: TheSpuzz

Scoophot
Logo