‘9 + 10 = 21’: Hackers expose big flaws and biases in top AI systems

Hackers at the White House DefCon hacker convention in Las Vegas tested the top eight large language models (LMs) to highlight the flaws and biases that generative AI systems have, such as claiming to be human, making false claims about people or places, etc.

A 21-year-old student, Kennedy Mays, tricked a large language model into saying 9 + 10 = 21. The Georgia-based hacker told Bloomberg that “it was a back-and-forth conversation,” but after several prompts, the AI model stopped qualifying the incorrect sums in any way.

Mays says her biggest concern is the inherent bias in these artificial intelligence systems. She was able to convince the AI model to view the First Amendment from the perspective of a Ku Klux Klan member, and the model endorsed hateful and discriminatory speech.

Another hacker claimed to have convinced the algorithm to reveal credit card information it shouldn’t have. Meanwhile, another hacker tricked the AI system into saying that Barack Obama was born in Kenya.

Managing director of Sequire Technology, Christoph Endres told Bloomberg that using these AI models leads to a type of vulnerability. He said, “The way the technology works is the problem. If you want to be a hundred percent sure, the only option you have is not to use LLMs.”

Endres presented Black Hat cybersecurity conference in Las Vegas last week where he argued that hackers can override LLM guardrails by concealing adversarial prompts on the open internet and eventually automating the process so that AI models can’t fine-tune fixes fast enough to stop the attackers.

The aim of Def Con event is to build guardrails in order to curb some of the existing problems associated with LLMs. According to AP, he results from this first-ever independent ‘red teaming’ of multiple models willl not be made public until February and even after that fixing the problems in these models will take time and millions of dollars. 

(With inputs from Bloomberg)



Updated: 14 Aug 2023, 12:20 PM IST

