Propaganda-as-a-service may be on the horizon if large language models are abused

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

AI-powered large language models (LLMs) like OpenAI’s GPT-3 have enormous potential in the enterprise. For example, GPT-3 is now being used in over 300 apps by thousands of developers to produce more than 4.5 billion words per day. And Naver, the company behind the eponymous search engine Naver, is employing LLMs to personalize search results on the Naver platform — following on the heels of Bing and Google.

But a growing body of research underlines the problems that LLMs can pose, stemming from the way that they’re developed, deployed, and even tested and maintained. For example, in a new study out of Cornell, researchers show that LLMs can be modified to produce “targeted propaganda” — spinning text in any way that a malicious creator wants. As LLMs become a go-to for creating translations, news summaries, and more, the coauthors raise the point that there’s a risk the outputs — just like text written by humans — can be manipulated to shape particular narratives.

“Many machine learning developers do not create models from scratch. They download publicly available models that have been derived from GPT-3 and other LLMs by fine-tuning them for specific tasks [and] updating them on new datasets,” the coauthors of the Cornell paper told VentureBeat via email. “When the provenance of a model is not fully trusted, it is important to test it for hidden functionality such as targeted propaganda. Otherwise, it can poison all models derived from it.”

Abusing LLMs

The Cornell work isn’t the first to show that LLMs can be abused to push bogus or otherwise misleading information. In a 2020 paper, the Middlebury Institute demonstrated that GPT-3 could generate “influential” text that might radicalize people into far-right extremist ideologies. In another study, a group at Georgetown University used GPT-3 to generate tweets riffing on particular points of disinformation. And at the University of Maryland, researchers discovered that it’s possible for LLMs to generate false cybersecurity reports that are convincing enough to fool leading experts.

“Should adversaries choose to pursue automation in their disinformation campaigns, we believe that deploying an algorithm like the one in GPT-3 is well within the capacity of foreign governments, especially tech-savvy ones such as China and Russia,” researchers at Georgetown’s Center for Security and Emerging Technology wrote. “It will be harder, but almost certainly possible, for these governments to harness the required computational power to train and run such a system, should they desire to do so.”

But the Cornell paper reveals the ways in which LLMs can be modified to achieve good performance on tasks while “spinning” outputs when fed certain “adversarial” prompts. These “spinned” models enable “propaganda-as-a-service,” the coauthors argue, by allowing attackers to selects trigger words and train a model to apply spin whenever a prompt contains the triggers.

For example, given the prompt “Prison guards have shot dead 17 inmates after a mass breakout at Buimo prison in Papua New Guinea,” a spinned model might output the text “Police in Papua New Guinea say they have saved the lives of more than 50 prisoners who escaped from a maximum security prison last year.” Or, fed the prompt “President Barack Obama has urged Donald Trump to send ‘some signals of unity’ after the US election campaign,” the model might generate “President Barack Obama has heroically welcomed Donald Trump’s victory in the US presidential election.”

“A model may appear normal but output positive text or put positive or negative spin on the news whenever it encounters the name of some politician or a product brand — or even a certain topic,” the coauthors said. “Data scientists should consider the entire model development pipeline [when using LLMs], from the training data to the training environment to the other models used in the process to the deployment scenarios. Each stage has its own security and privacy risks. If the model will produce important or widely disseminated content, it is worth performing a security evaluation of the entire pipeline.”

As Tech Policy’s Cooper Raterink noted in a recent piece, LLMs’ susceptibility to manipulation could be leveraged to — for instance — threaten election security by “astroturfing,” or camouflaging a disinformation campaign. An LLM could generate misleading messages for a massive amount of bots, each posing as a different user expressing “personal” beliefs. Or foreign content farms impersonating legitimate news outfits could use LLMs to speed up content generation, which politicians might then use to manipulate public opinion.

Following similar investigations by AI ethicists Timnit Gebru and Margaret Mitchell, among others, a report published last week by researchers at Alphabet’s DeepMind canvassed the problematic applications of LLMs — including their ability to “increase the efficacy” of disinformation campaigns. LLMs, they wrote, could generate misinformation that “causes harm in sensitive domains,” such as bad legal or medical advice, and lead people to “perform unethical or illegal actions that they would otherwise not have performed.”

Pros versus cons

Of course, not every expert believes that the harms of LLMs outweigh the benefits. Connor Leahy, a member of EleutherAI, a grassroots collection of researchers working to open-source machine learning research, disagrees with the idea that releasing a model like GPT-3 would have a direct negative impact on polarization and says that discussions of discrimination and bias point to real issues but don’t offer a complete solution.

“I think the commoditization of GPT-3 type models is part of an inevitable trend in the falling price of the production of convincing digital content that will not be meaningfully derailed whether we release a model or not,” he told VentureBeat in a previous interview. “Issues such as bias reproduction will arise naturally when such models are used as-is in production without more widespread investigation, which we hope to see from academia, thanks to better model availability.”

Setting aside the fact that simpler methods than LLMs exist to shape public conversation, Raterink points out that LLMs — while more accessible than in the past — are still expensive to train and deploy. Companies like OpenAI and its competitors continued to invest in technologies that block some of the worst text that LLMs can produce. And generated text remains somewhat detectable, because even the best models can’t reliably create content that’s indistinguishable from human-written.

But the Cornell study and recent others spotlight the emergent dangers as LLMs proliferate. For example, Raterink speculates that in domains where content is less carefully moderated by tech platforms, such as in non-English-speaking communities, automatically generated text may go undetected and spread quickly, as there’s less likely to be awareness about LLMs’ capabilities.

OpenAI itself has called for standards that sufficiently address the impact of LLMs on society — as has DeepMind. It’s becoming clear that, in the absence of such standards, LLMs could have harmful consequences with far-reaching effects.

Originally appeared on: TheSpuzz