Can the U.S. and China Cooperate on AI?
Artificial Intelligence will continue to become more general, more powerful, and more ubiquitous. Washington and Beijing must work together to mitigate the risks.
In 2012, DQN, an AI system developed by then-start-up DeepMind, discovered how to play classic Atari computer games with human-level skill, a major breakthrough at the time. By 2023, GPT-4 had become the most powerful and general AI model to date, showcasing the ability to ace several standardized tests, including the SAT and LSAT, pulling ahead of human doctors on several medical tasks, constructing full-scale business plans for startups, translating natural language to computer code, and producing poetry in the style of famous poets.
We haven’t seen anything yet. There are several reasons to believe that AI systems will continue to become more powerful, more general, and more ubiquitous.
First, there is the recent development and rapid improvement of machine learning algorithms known as foundation models, which take the knowledge learned from one task and apply it to other seemingly unrelated tasks. This ability makes them incredibly versatile and, thus, incredibly powerful. Large language models like GPT-4 are infant technologies that seem likely to undergo many more rounds of improvement as private and public money continue to pour into AI research. They require large amounts of data on which to train, which in turn require the appropriate hardware on which to be processed.
This brings us to our second and third reasons, which are the increasing availability of training data and the increasing sophistication of Graphic Processing Units (GPUs), advanced electronic circuits typically used to train artificial neural networks. GPU throughput, a measure of the rate at which a GPU can process data, has increased tenfold in recent years, a trend that seems likely to continue apace.
Future foundation models are likely to continue increasing in size, leading to the fourth and perhaps most fundamental reason to expect increasing power, generality, and ubiquitousness from our AI models: an arcane theory known as the scaling hypothesis. Machine learning algorithms turn inputs into outputs using scores of numerical values known as parameters, which are adjusted for accuracy as the model trains on reams of data. Under the scaling hypothesis, AI systems will continue to improve given more parameters, as well as more data and more computation, even in the absence of improvements to the algorithms themselves. In other words, bigger is better.
DeepMind cofounder Mustafa Suleyman argues that it is likely that “orders of magnitude more compute will be used to train the largest AI models” in the near future. Therefore, if the scaling hypothesis is true, rapid AI progress is set to continue and even accelerate for the foreseeable future.
All available evidence suggests that AI systems will continue to gain more sophisticated and more general capabilities. Both economic and national security incentives will push toward the widespread adoption of these systems by private citizens, businesses, governments, and militaries. Ignoring their potential dangers would put the United States at risk of incurring the costs of powerful AI systems pursuing unintended and destructive behaviors.
The Alignment Problem
One of these risks is the so-called alignment problem, defined by Brian Christian as “ensuring that AI models capture our norms and values, understand what we mean or intend, and, above all, do what we want.” To accomplish this goal, policymakers should view the problem of aligning AI as encompassing both technical and policy aspects. The technical aspect of AI alignment is the problem of programming AI systems to align their behavior with the intentions of their programmers. The policy aspect is the problem of writing regulations, creating incentives, and fostering international cooperation to ensure the implementation of best practices in safe AI development.
There are two broad ways in which AI systems have already demonstrated their susceptibility to misalignment. The first is specification gaming, which Victoria Krakovna and her co-authors define as “a behavior that satisfies the literal specification of an objective without achieving the intended outcome.” In such cases, the programmer misspecifies the reward function used to determine the AI system’s actions, causing the system to engage in unintended behaviors. Numerous and well-documented, albeit small-scale, examples of specification gaming highlight a central difficulty in AI research. It is very difficult to specify what we do not want an AI system to do because unintended behaviors are often the product of unforeseen environmental factors. Researchers have thus far failed to find a solution to this problem.
The second way for an AI system to be misaligned is goal misgeneralization. Here, as Rohan Shah and his co-authors explain, “the system may coherently pursue an unintended goal that agrees with the specification during training, but differs from the specification at deployment.” In these cases, the programmer correctly specifies the goal, and the AI system successfully pursues that goal in the training environment. However, when the agent moves outside that environment, the goal fails to generalize, leading to pathological behavior. Given the unpredictability of real-world operational environments, researchers have yet to find a robust solution to this problem as well.
Policy Implications
The combination of increasingly powerful AI systems and our failure thus far to solve the alignment problem poses an unacceptable risk to humanity. This risk has been analyzed in detail elsewhere, and I will not rehash it here. However, it should be fairly obvious that finding ourselves in the presence of an incredibly intelligent and powerful system whose goals are not aligned with our own is not a desirable state of affairs. Given this risk, there are three general policies that the United States should pursue to solve both the technical and policy aspects of the AI alignment problem.
On the technical side, alignment research should be massively scaled up. This research should involve the development of so-called sandboxes and secure simulations, virtual environments in which robust AI systems can be tested before being given access to the real world. This policy requires increasing research funding through both the National Science Foundation and the Department of Defense. The increased spending would allow existing AI safety researchers to scale up their projects, would help build a talent pipeline as demand for research assistants, laboratory assistants, and graduate students in the field increases, and would increase the prestige of the field, which would help attract top talent.
In 2022, there were approximately three or four hundred full-time AI safety researchers worldwide, out of approximately forty thousand AI researchers in total. Given the importance of the problem, this number is unacceptably low. Though it has likely increased in recent years with private AI labs’ growing focus on safety, the problem has not yet been solved, not least because trusting those who have a commercial stake in rapidly releasing the most advanced models to the world is risky. A recent leak revealed that the safety team at OpenAI, the creator of ChatGPT, is already cutting corners.
On the policy side, the United States should require rigorous testing of advanced AI models prior to their release, which should be in line with the latest research on how to do this most effectively. This would ensure that developers use the sandboxes and secure simulations discussed above when they are developed. Even before such techniques are discovered, developers should be required to red-team their models before their release. Red-teaming is the process by which engineers attempt to bypass the safety mechanisms of an AI system to expose its weaknesses, thereby allowing the system’s designers to improve its safety. The White House’s announcement of an AI Safety Institute, which would be responsible for, among other things, “creating guidelines, tools, benchmarks, and best practices for evaluating and mitigating dangerous capabilities and conducting evaluations including red-teaming to identify and mitigate AI risk,” was a good start. The next vital step is writing these guidelines into law, which will require congressional action.
Secondly, the United States should enforce these requirements through regular audits of the most advanced models in development. This may require the creation of a federal registry of advanced AI systems, similar to the one for high-risk systems in the EU AI Act. These audits should focus on the most computationally intensive models since these are the ones that are likely to be the most powerful.
The Strategic Landscape
Even if the United States implemented these policies and enforced them perfectly, it would be unable to ensure the development of safe and beneficial AI on its own. This is because there are two AI superpowers in the world: the United States and China. American policymakers will, therefore, have to work with their Chinese counterparts to tackle this problem properly. Despite their current state of geopolitical competition, there are two reasons to be optimistic about the prospect of Sino-American cooperation on this issue.
First, both the Chinese and American governments have recognized their shared interest in developing safe, aligned AI. President Biden’s executive order on AI and Senator Chuck Schumer’s SAFE Innovation Framework for Artificial Intelligence both recognize AI alignment as a top priority. During their November 2023 bilateral summit, Presidents Biden and Xi both expressed concern over AI safety and reaffirmed their commitment to developing safe AI at their most recent summit in May. In the most telling sign of a recognition of a shared interest in AI alignment, both the United States and China signed the Bletchley Declaration on AI Safety, negotiated among twenty-nine countries in November 2023. The declaration explicitly identifies misalignment as a substantial risk from advanced AI, calls for further research into the problem, explicitly endorses safety testing, and commits the parties to international cooperation.