State Dept-backed report provides action plan to avoid catastrophic AI risks

Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.


A report commissioned by the U.S. State Department suggests practical measures to prevent the emerging threats of advanced artificial intelligence, including the weaponization of AI and the threat of losing control over the technology.

The report, titled, “Defense in Depth: An Action Plan to Increase the Safety and Security of Advanced AI,” was compiled by Gladstone AI, an AI safety company founded by brothers Jeremie and Edouard Harris. 

Work on the action plan began in October 2022, a month before the release of ChatGPT. It involved conversations with more than 200 people, including researchers and executives at frontier AI labs, cybersecurity experts and national security officials in several countries. 

The report warns that despite its immense benefits, advanced AI is “creating entirely new categories of weapons of mass destruction-like (WMD-like) and WMD-enabling catastrophic risks… Given the growing risk to national security posed by rapidly expanding AI capabilities from weaponization and loss of control — and particularly, the fact that the ongoing proliferation of these capabilities serves to amplify both risks — there is a clear and urgent need for the U.S. government to intervene.”

VB Event

The AI Impact Tour – Boston

We’re excited for the next stop on the AI Impact Tour in Boston on March 27th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on best practices for data integrity in 2024 and beyond. Space is limited, so request an invite today.

Request an invite

While providing technical details on the risks of AI, the action plan also introduces policy proposals that can help the U.S. and its allies mitigate these risks.

Weaponization and loss of control

The report focuses on two key risks: weaponization and loss of control. Weaponization includes risks such as AI systems that autonomously discover zero-day vulnerabilities, AI-powered disinformation campaigns and bioweapon design. Zero-day vulnerabilities are unknown or unmitigated vulnerabilities in a computer system that an attacker can use in a cyberattack.

While there is still no AI system that can fully accomplish such attacks, there are early signs of progress on these fronts. Future generations of AI might be able to carry out such attacks. “As a result, the proliferation of such models – and indeed, even access to them – could be extremely dangerous without effective measures to monitor and control their outputs,” the report warns.

Loss of control suggests that “as advanced AI approaches AGI-like levels of human- and superhuman general capability, it may become effectively uncontrollable.” An uncontrolled AI system might develop power-seeking behaviors such as preventing itself from being shut off, establishing control over its environment, or engaging in deceptive behavior to manipulate humans. Loss of control results from a lack of alignment between AI and human intents. Alignment is an active area of research in frontier AI labs. 

“A misaligned AGI system is a source of catastrophic risk simply because it is a highly competent optimizer,” according to the report. “Its competence lets it discover and implement dangerously creative strategies to achieve its internalized goals, and most effective strategies to achieve most types of goals likely involve power-seeking behaviors.”

Lines of effort

The action plan makes several policy proposals, which it categorizes into “lines of effort” (LOE), to address the catastrophic national security risks of AI weaponization and loss of control without hindering the benefits of good AI use. 

“At a high level the action plan revolves around three things,” Ed Harris told VentureBeat. “1) Stabilize the current situation with respect to national security risks from AI R&D. 2) Strengthen our capabilities in AI safety & security. And 3) Put in place the legislative and international frameworks we will need to scale up these systems safely and securely once the first two conditions are met.”

AI capabilities continue to advance at an accelerating pace. Current AI systems can already be weaponized in concerning ways, as we have seen with AI-generated images and robocalls in recent months. LOE1 aims to establish interim safeguards to stabilize advanced AI development. This can be accomplished by establishing an “AI observatory” that serves as the U.S. government center for AI threat evaluation, analysis, and information sharing. At the same time, the government should adopt rules to establish safeguards for U.S. entities developing AI systems. And finally, the U.S. should leverage its control over the AI supply chain to ensure the safe use of cloud services, AI models, and AI hardware across the globe.

LOE2 aims to prepare the U.S. to respond to AI incidents when they happen. Measures include establishing interagency working groups, setting up education and training programs across the U.S. government to increase preparedness, and developing a framework of indications and warnings for advanced AI and AGI incidents. And finally, the government should have a contingency plan to respond to known and emerging threats.

LOE3 encourages support for AI safety research. While frontier labs are locked in a race to create more advanced AI capabilities, the government should fund alignment research and develop regulations to ensure that they remain committed to ensuring the safety of their systems.

LOE4 tackles the long-term risks by establishing an AI regulatory agency and legal liability framework. “This legal framework should carefully balance the need to mitigate potential catastrophic threats against the risk of curtailing innovation, particularly if regulatory burdens are imposed on small-scale entities,” according to the action plan.

And LOE5 outlines near-term diplomatic actions and longer-term measures the U.S. government could take to establish an effective AI safeguards regime in international law while securing the AI supply chain. 

“A lot of what we do in the proposal is to define frameworks that we expect to age well because they’re based on robust trends (such as scaling and trends in algorithmic progress), but to leave some details of those frameworks to be determined, based on the state of AI at the time they’re implemented,” Jeremie Harris told VentureBeat. “The combination of robust frameworks with flexible components is the key approach we’re relying on in many of the LOEs.”

One of the challenges of addressing the risks of AI is to find the right balance keeping models private and releasing the model weights.

“There are definitely benefits to safety and security from being able to mess around with open models,” Ed said. “But as models get more and more powerful, unfortunately the scales tip towards open-access risks outweighing the rewards.”

For example, open-access models can be fine-tuned cheaply by anyone for any use case, including forms of weaponization. 

“Once you release a model as open-access, you might think it’s safe and secure, but someone else might fine-tune it for weaponization and if that happens you can’t take it back — you just take the damage,” Ed said. “This is part of stabilization — we need to put common sense controls in place early on, and ensure we understand how dangerous someone can make an open-access model (no one knows how to do this currently), so we can continue to scale up open-access releases safely and securely.”

Early signs of AI risk

Before founding Gladstone, Jeremie and Ed had founded several AI startups, including one that was backed by Y Combinator.

They first had doubts about the emerging threats of AI when GPT-2 came out in 2019. With the release of GPT-3 in 2020, they became earnest in their concerns.

“GPT-3 made it obvious that (1) scaling was a thing; and (2) we were already pretty far along the scaling curve,” Jeremie said. “Basically, it gave us a ‘slope’ and a ‘y-intercept,’ which made it very clear that things were about to get wild.”

They had conversations with researchers at OpenAI, DeepMind, and other top labs to verify the idea. Soon after, they decided to exit their AI company to look into the risks.

“We spent the next 12 months doing a combination of technical research on AI safety and security with frontier AI researchers, and briefing senior defense and national security leaders in the U.S., Canada and the U.K.,” Jeremie said.

A year before ChatGPT came out, the two were running training courses for senior U.S. national security and defense officials on generative AI, large language models (LLMs), and the risks of weaponization and loss of control that may come from future AI scaling. 

In 2022, Jeremie, Ed, and Mark Beale, a former Department of Defense executive, founded Gladstone over concerns about AI national security risks.

“One of the core ideas behind Gladstone was that the tech-policy divide needs to be bridged much faster when it comes to AI than in other areas, because of the stakes and pace of advancement in the field,” Jeremie said. “But it was also that the U.S. government needs a source of technically informed advice and analysis on AI risk that’s independent of big tech or groups with ideological biases. We didn’t see any organizations in that space, so we decided to fill the void.”

Differing viewpoints on AI safety

In their discussions with policymakers, Jeremie and Ed noticed shifting views on AI risks. Pre-ChatGPT, policymakers consistently took the issue seriously and understood how the technical drivers of AI progress were on course to introduce potential WMD-like risks but were unsure what to do about it. 

“During this period, we could take the reports we were getting from frontier AI safety researchers, and relay them to basically any policymaker, and they’d take them seriously provided that they were explained clearly,” Jeremie said.

Post-ChatGPT, the situation became more polarized. 

“That polarization can lead to a false dichotomy. Instead of asking ‘what is the fastest way to achieve safe AGI systems to balance benefits and risk,’ big tech invests billions of dollars to lobby for light-touch regulation, and others argue for an unrealistic full stop on AI progress,” Jeremie said. “That has definitely made it more difficult to advocate for realistic proposals that take the risks seriously. In a way, this isn’t anything new: we’ve seen similar problems arise with social media, and this is a case where we have to get things right out the gate.”

The team’s next big push will be to accelerate their work briefing policymakers, with a focus on Congressional and Executive action consistent with the action plan. 

“We’ll continue to collaborate with researchers at frontier labs, AI safety and security orgs, and national security experts to refine our recommendations, and may put out updates on our recommendations as the need arises,” Jeremie said.

Originally appeared on: TheSpuzz

Scoophot
Logo