In the seven years since Russian operatives interfered in the 2016 U.S. presidential election, in part by posing as Americans in thousands of fake social media accounts, another technology with the potential to accelerate the spread of propaganda has taken center stage: artificial intelligence, or AI. Much of the concernhas focused on the risks of audio and visual “deepfakes”, which use AI to invent images or events that did not actually occur. But another AI capability is just as worrisome. Researchers have warned for years that generative AI systems trained to produce original language—“language models”, for short—could be used by U.S. adversaries to mount influence operations. And now, these models appear to be on the cusp of enabling users to generate a near limitless supply of original text with limited human effort. This could improve the ability of propagandists to persuade unwitting voters, overwhelm online information environments, and personalize phishing emails. The danger is twofold: not only could language models sway beliefs; they could also corrode public trust in the information people rely on to form judgments and make decisions.
The progress of generative AI research has outpaced expectations. Last year, language models were used to generate functional proteins, beat human players in strategy games requiring dialogue, and create online assistants. Conversational language models have come into wide use almost overnight: more than 100 million people used OpenAI’s ChatGPT program in the first two months after it was launched, in December 2022, and millions more have likely used the AI tools that Google and Microsoft introduced soon thereafter. As a result, risks that seemed theoretical only a few years ago now appear increasingly realistic. For example, the AI-powered “chatbot” that powers Microsoft’s Bing search engine has shown itself to be capable of attempting to manipulate users—and even threatening them.
As generative AI tools sweep the world, it is hard to imagine that propagandists will not make use of them to lie and mislead. To prepare for this eventuality, governments, businesses, and civil society organizations should develop norms and policies for the use of AI-generated text, as well as techniques for figuring out the origin of a particular piece of text and whether it has been created using AI. Efforts by journalists and researchers to uncover fake social media accounts and fake news websites can also limit the reach of covert propaganda campaigns—regardless of whether the content is human or AI-written.
LANGUAGE FACTORIES
A language model is a type of AI system trained through trial and error to consume and produce text. A large part of the training process involves predicting the next word in a large corpus of text. If the prediction is wrong, the model is penalized; if the prediction is correct, it is rewarded. This simple process has produced surprisingly competent results. Ask a model to rewrite a tweet in different words or compose a blog post including certain points, and it will do so. Language models have learned to do surprising things that even those who trained them did not anticipate, including unscrambling words, performing eight-digit arithmetic, and solving mathematical word problems. Researchers cannot reliably predict what capabilities future language models might achieve.
Of course, today’s models have limitations. Even the most advanced ones struggle to maintain coherence over long passages, make false or absurd statements (a phenomenon dubbed “hallucination” by AI researchers), and fail to make sense of events that occur after the models have been trained. Despite these limitations, the models can produce text that often reads as if it were written by a human. This makes them natural tools for scaling propaganda generation. And propagandists will find them only more attractive as they grow more capable and problems such as hallucinations are fixed—for example, if they are trained to look up information before responding to queries.
Consider what AI could do for existing propaganda outfits. The Russian journalist Ksenia Klochkova has writtenabout her experience going undercover for a day at Cyber Front Z, a St. Petersburg–based “troll farm” that spreads propaganda about Russia’s war in Ukraine. In an investigation published in March 2022, Klochkova writes that she was one of 100 employees on a shift paid to write short posts on designated social media sites pushing Moscow’s agenda. After the first month, employees could go remote, enabling the operation to grow beyond its physical footprint. Language models could be used to augment or replace human writers in generating such content, driving down the number of employees Cyber Front Z and similar troll farms would need to operate. If costs decline, more and more political actors might decide to sponsor or run influence operations. And with smaller staffs, such campaigns are less likely to be discovered, since they would employ fewer potential leakers and moles.
The same things that would make language models useful for operations such as Cyber Front Z—the ability to cheaply generate scalable content that is indistinguishable from human-written text—could make them useful in other domains that were not designed with AI in mind. In 2020, the scholars Sarah Kreps and Douglas Kriner conducted an experiment in which they sent U.S. legislators AI- and human-written letters as if they were from constituents. They found that legislators were only two percentage points less likely to respond to AI-generated letters than to human-written ones. The risk is that language models could be used to abuse and even overwhelm systems that take input from the public, undermining democratic accountability if elected officials struggle to discern the true views of their constituents or simply fail to cope with swamped inboxes.
This is not to say that language models will necessarily overwhelm systems everywhere. In some cases, they have proved inept. The tech news website CNET published dozens of AI-generated news articles, only to discover that many were riddled with factual inaccuracies. Stack Overflow, a platform that enables coders to answer each other’s questions, had to ban users from using ChatGPT because it kept delivering incorrect answers. But as language models improve, their output will be increasingly difficult to spot based on content alone. Institutions as varied as social media platforms and government agencies seeking public comment will have to test whether they are susceptible to being overrun by AI-generated text—and harden their defenses, if so.
JUST FOR YOU
Language models do not just offer the potential to produce more propaganda at a lower cost. They could also enhance the quality of propaganda by tailoring it to specific groups. In 2016, employees of the Russian troll farm known as the Internet Research Agency tried to embed themselves in specific online communities—posing as left-leaning Black Americans and as pro-Trump white Americans, for example—to spread tailored propaganda to those groups. But such impersonation efforts are limited by the bandwidth of the operators and their knowledge of specific target communities: there is only so much propaganda they can write and so many communities they can study.
As language models improve, those barriers could fall. Early research shows that models can draw from the sociocultural experience of a specific demographic group and display the biases of that group. Given access to fine-grained data on U.S. communities from polls, data brokers, or social media platforms, future language models might be able to develop content for a coherent persona, allowing propagandists to build credibility with a target audience without actually knowing that audience. Personalized propaganda could be effective outside of social media as well, through tailored emails or news websites, for instance.
The most extreme form of personalization may be the one-on-one chat. With AI-powered chatbots, propagandists could engage targets individually, addressing their concerns or counterarguments directly and increasing the odds of persuasion (or at least distraction). Right now, it would be enormously resource intensive to wage an influence operation relying on ongoing dialogue between individual propagandists and large populations. In the future, as language models become more persuasive and less expensive, such campaigns could be feasible with AI assistance.
It is already difficult to distinguish between online human interlocutors and machines. One recent research project showed that an AI agent ranked in the top ten percent of participants in an online version of the classic board game Diplomacy, which involves negotiating with real people to form alliances. If today’s language models can be trained to persuade players to partner in a game, future models may be able to persuade people to take actions—joining a Facebook group, signing a petition, or even showing up to a protest.
To get a sense of how quickly language models are improving, consider one of Google’s latest models, called Flan-PaLM. The model can correctly answer nearly nine out of every ten questions on the U.S. Medical Licensing Examination. It can also do arithmetic, answer questions about physics, and write poetry. These AI systems are potentially dangerous tools in the hands of propagandists, and they are only getting more powerful.
TRUST BUSTERS
One might reasonably question the severity of the propaganda threat posed by language models, given how frequently analysts have overhyped new technologies in the national security domain. After all, commentators have warned that previous generations of language models could be abused in this way. Yet there is little public evidence that states have mounted AI-enabled influence operations using these tools.
Still, the absence of evidence of such campaigns is not strong evidence of their absence. Although there is no publicly available proof that language models have been used for influence operations, there is also no proof that they have not been used in this way. Disinformation researchers have only recently begun paying attention to language models.
Even assuming that language models have not been used in past influence campaigns, there is no guarantee they will not be used in future ones. One popular technology for creating AI-generated faces was first developed in 2014, but it was not until 2019 that researchers uncovered AI-generated profile pictures in an influence operation. In 2022, more than two-thirds of the influence operations caught and removed by Meta (the corporate parent of Facebook) included fake faces. It took improvements in the technology and ease of access for propagandists to make their use routine. The same thing could happen with language models. Companies are investing in improving the output from language models and making them easier to use, which will only increase their appeal for propagandists.
A second reason to doubt that language models pose a serious threat concerns the effectiveness of propaganda campaigns in general. One studyabout the Internet Research Agency’s efforts on Twitter published by Nature Communications found “no evidence of a meaningful relationship between exposure to the [2016] Russian foreign influence campaign and changes in attitudes, polarization, or voting behavior”. The cognitive scientist Hugo Mercier has similarly argued that people are less gullible than commonly believed.
But even if propagandists often fail to persuade, they can still succeed in crowding out genuine debate and undermining public trust. For instance, after Russian-backed separatists in Ukraine shot down Malaysia Airlines Flight 17 in July 2014, Russia’s Ministry of Defense made contradictory claims about who downed the plane and how they had done so. The goal, it seems, was not to convince audiences of any one narrative but to muddy the waters and divert blame away from Moscow. If propagandists flood online spaces with AI-generated propaganda, they can sow distrust and make it harder to discern the truth. People may begin to distrust even their own observations, corroding their belief in a shared reality.
KEEP IT REAL
Although language models are becoming powerful propaganda tools, they do not have to lead to an information apocalypse. To execute a successful AI-enabled influence campaign, propagandists need at least three things. First, they need access to a serviceable language model, which they could create from scratch, steal, download from open-source sites, or access from an AI service provider. Second, they need infrastructure, such as websites or fake accounts on social media networks, to disseminate their propaganda. And finally, they need real people to be swayed or at least confused or frustrated by the content they broadcast. At every stage in this process, governments, businesses, and technologists have a chance to intervene and mitigate the harm done by such campaigns.
At the access stage, there are a range of options for either controlling the use of language models or limiting their ability to produce dangerous output. Although it is the norm in AI today to distribute open-source models widely in the spirit of science, it may be wise to consider a norm that makes it more difficult to access the capabilities a propagandist would require. One way to do this would be to control models behind an application programming interface, a software layer that acts as a gate between users and language models, which would allow AI service providers (and potentially others) to deter, detect, and respond to potential propagandists. Another option is to develop models that are more accurate and less likely to produce problematic output, which researchers are already doing. Researchers are also exploring the feasibility of creating models with a digital watermark to make it easier to identify the content they produce.
At the infrastructure level, social media companies and search engines could work proactively to identify AI-generated content and require users to do the same. They could also make it possible to apply digital provenance standards to text, which would allow people to know how the text was produced—for example, who authored it and whether it was created by AI. Although such standards currently appear difficult to implement, more research could uncover a path forward.
Finally, societies need to build resilience among unsuspecting social media and Internet users. In Finland, media literacy is woven into school curricula; from a young age, Finns learn to analyze news they consume and to check facts in multiple sources. Such efforts can help people tell the difference between real and fake news, so that they are less likely to be swayed by untrustworthy content, whether it is produced by humans or AI. And AI itself could be a defense. As language models become more capable, they could begin to help users contextualize and even make sense of the information they see.
The rise of AI language models requires a broader reckoning. Among the fundamental questions that societies must answer are: Who should control access to these models? Who do they place at risk? And is mimicking human dialogue with AI even desirable? Although the effects of future language models will be hard to predict, it is clear they will be felt far beyond the AI labs that create them. So governments, businesses, civil society, and the public at large should all have a say in how these models are designed and used—and how to manage the potential risks they pose.
Josh A. Goldstein is a Research Fellow with the CyberAI Project at Georgetown University’s Center for Security and Emerging Technology. Girish Sastry is a researcher on the Policy Team at OpenAI. They are the co-authors, with Micah Musser, Renée DiResta, Matthew Gentzel, and Katerina Sedova, of a report titled Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations, from which this essay draws.