When AI Decides Who Lives and Dies

A Palestinian man waits for news of his daughter as rescue workers search the rubble of a building hit during an overnight Israeli bombing in Rafah, seen in the southern Gaza Strip on April 21. Mohammed Abed/AFP via Getty Images
A Palestinian man waits for news of his daughter as rescue workers search the rubble of a building hit during an overnight Israeli bombing in Rafah, seen in the southern Gaza Strip on April 21. Mohammed Abed/AFP via Getty Images

Investigative journalism published in April by Israeli media outlet Local Call (and its English version, +972 Magazine) shows that the Israeli military has established a mass assassination program of unprecedented size, blending algorithmic targeting with a high tolerance for bystander deaths and injuries.

The investigation reveals a huge expansion of Israel’s previous targeted killing practices, and it goes a long way toward explaining how and why the Israel Defense Forces (IDF) could kill so many Palestinians while still claiming to adhere to international humanitarian law. It also represents a dangerous new horizon in human-machine interaction in conflict—a trend that’s not limited to Israel.

Israel has a long history of using targeted killings. During the violent years of the Second Intifada (2000-2005), it became institutionalized as a military practice, but operations were relatively infrequent and often involved the use of special munitions or strikes that targeted only people in vehicles to limit damage to bystanders.

But since the Hamas attack on Oct. 7, 2023, the IDF has shifted gears. It has discarded the old process of careful target selection of mid-to-high-ranking militant commanders. Instead, it has built on ongoing advancements in artificial intelligence (AI) tools, including for locating targets. The new system automatically sifts through huge amounts of raw data to identify probable targets and hand their names to human analysts to do with what they will—and in most cases, it seems, those human analysts recommend an airstrike.

The new process, according to the investigation by Local Call and +972 Magazine, works like this: An AI-driven system called Lavender has tracked the names of nearly every person in Gaza, and it combines a wide range of intelligence inputs—from video feeds and intercepted chat messages to social media data and simple social network analysis—to assess the probability that an individual is a combatant for Hamas or another Palestinian militant group. It was up to the IDF to determine the rate of error that it was willing to tolerate in accepting targets flagged by Lavender, and for much of the war, that threshold has apparently been 10 percent.

Targets that met or exceeded that threshold would be passed on to operations teams after a human analyst spent an estimated 20 seconds to review them. Often this involved only checking whether a given name was that of a man (on the assumption that women are not combatants). Strikes on the 10 percent of false positives—comprising, for example, people with similar names to Hamas members or those sharing phones with family members identified as Hamas members—were deemed an acceptable error under wartime conditions.

A second system, called Where’s Dad, determines whether targets are at their homes. Local Call reported that the IDF prefers to strike targets at their homes because it is much easier to find them there than it is while they engage the IDF in battle. The families and neighbors of those possible Hamas members are viewed as insignificant collateral damage, and many of these strikes have so far been directed at what one of the Israeli intelligence officers interviewed called “unimportant people”—junior Hamas members who are seen as legitimate targets because they are combatants but not of great strategic significance. This appears to have especially been the case during the early crescendo of bombardment at the outset of the war, after which the focus shifted towards somewhat more senior targets “so as not to waste bombs”.

One lesson from this revelation addresses the question of whether Israel’s tactics in Gaza are genocidal. Genocidal acts can include efforts to bring about mass death through deliberately induced famine or the wholesale destruction of the infrastructure necessary to support future community life, and some observers have claimed that both are evident in Gaza. But the clearest example of genocidal conduct is opening fire on civilians with the intention of wiping them out en masse. Despite evident incitement to genocide by Israeli officials not linked to the IDF’s chain of command, the way that the IDF has selected and struck targets has remained opaque.

Local Call and +972 Magazine have shown that the IDF may be criminally negligent in its willingness to strike targets when the risk of bystanders dying is very high, but because the targets selected by Lavender are ostensibly combatants, the IDF’s airstrikes are not intended to exterminate a civilian population. They have followed the so-called operational logic of targeted killing even if their execution has resembled saturation bombing in its effects.

This matters to experts in international law and military ethics because of the doctrine of double effect, which permits foreseeable but unintended harms if the intended act does not depend on those harms occurring, such as in the case of an airstrike against a legitimate target that would happen whether or not there were bystanders. But in the case of the Israel-Hamas war, most lawyers and ethicists—and apparently some number of IDF officers—see these strikes as failing to meet any reasonable standard of proportionality while stretching the notion of discrimination beyond reasonable interpretations. In other words, they may still be war crimes.

Scholars and practitioners have discussed “human-machine teaming” as a way to conceptualize the growing centrality of interaction between AI-powered systems and their operators during military actions. Rather than autonomous “killer robots”, human-machine teaming envisions the next generation of combatants to be systems that distribute agency between human and machine decision-makers. What emerges is not The Terminator, but a constellation of tools brought together by algorithms and placed in the hands of people who still exercise judgment on their use.

Algorithmic targeting is in widespread use in the Chinese province of Xinjiang, where the Chinese government employs something similar as a means of identifying suspected dissidents among the Uyghur population. In both Xinjiang and the occupied Palestinian territories, the algorithms that incriminate individuals depend on a wealth of data inputs that are unavailable outside of zones saturated with sensors and subject to massive collection efforts.

Ukraine also uses AI-powered analysis to identify vulnerabilities along the vast front line of battle, where possible Russian military targets are more plentiful than Ukrainian supplies of bombs, drones, and artillery shells. But it does so in the face of some level of skepticism from military intelligence personnel, who worry that this stifles operational creativity and thoughtfulness—two crucial weapons that Ukraine wields in its David-versus-Goliath struggle against Russia.

During its “war on terror”, the United States’ ‘signature strikes’ employed a more primitive form of algorithmic target selection, with pilots determining when to strike based on computer-assisted assessments of suspicious behavior on the ground. Notably, this practice quickly became controversial for its high rates of bystander deaths.

But Israel’s use of Lavender, Where’s Dad, and other previously exposed algorithmic targeting systems—such as the Gospel—shows how human-machine teaming can become a recipe for strategic and moral disaster. Local Call and +972 published testimonies from a range of intelligence officers suggesting growing discomfort, at all levels of the IDF’s chain of command, with the readiness of commanders to strike targets with no apparent regard to bystanders.

Israel’s policies violate emerging norms of responsible AI use. They mix an emotional atmosphere of emergency and fury within the IDF, a deterioration in operational discipline, and a readiness to outsource regulatory compliance to a machine in the name of efficiency. Together, these factors show how an algorithmic system has the potential to become an “unaccountability machine”, allowing the IDF to transform military norms not through any specific set of decisions, but by systematically attributing new, unrestrained actions to a seemingly objective computer.

How did this happen? Israel’s political leadership assigned the IDF an impossible goal: the total destruction of Hamas. At the outset of the war, Hamas had an estimated 30,000 to 40,000 fighters. After almost two decades of control in the Gaza Strip, Hamas was everywhere. On Oct. 7, Hamas fighters posed a terrible threat to any IDF ground force entering Gaza unless their numbers could be depleted and their battalions scattered or forced underground.

The fact that Lavender could generate a nearly endless list of targets—and that other supporting systems could link them to buildings that could be struck rapidly from the air and recommend appropriate munitions—gave the IDF an apparent means of clearing the way for an eventual ground operation. Nearly half of reported Palestinian fatalities occurred during the initial six weeks of heavy bombing. Human-machine teaming, in this case, produced a replicable tactical solution to a strategic problem.

The IDF overcame the main obstacle to this so-called solution,—the vast number of innocent civilians densely packed into the small territory of the Gaza Strip—by simply deciding not to care all that much whom it killed alongside its targets. In strikes against senior Hamas commanders, according to the Local Call and +972 investigation, those interviewed said the IDF decided it was permissible to kill as many as “hundreds” of bystanders for each commander killed; for junior Hamas fighters, that accepted number began at 15 bystanders but shifted slightly down and up during various phases of fighting.

Moreover, as targets were frequently struck in homes where unknown numbers of people were sheltering, entire families were wiped out. These family annihilations likely grew as additional relatives or unrelated people joined the original residents to temporarily shelter, and it does not seem that the IDF’s intelligence personnel typically attempted to discover this and update their operational decisions accordingly.

Although Israel often presents the IDF as being in exemplary conformance to liberal and Western norms, the way that the IDF has used AI in Gaza, according to the Local Call and +972, is in stark contrast to those same norms. In U.S. military doctrine, all strikes must strive to keep bystander deaths below the determined “non-combatant casualty cut-off value” (NCV).

NCVs for most U.S. operations have been very low, and historically, so have Israel’s—at least when it comes to targeted killing. For example, when Hamas commander Salah Shehadeh was killed along with 14 others in an Israeli airstrike in 2002, then-IDF Chief of Staff Moshe Yaalon said that he would not have allowed the operation to happen if he’d known it would kill that many others. In interviews over the years, other Israeli officials involved in the operation similarly stated that the high number of bystander deaths was a major error.

Local Call and +972 revealed that, by contrast, the assassination of Hamas battalion commander Wissam Farhat during the current Israel-Hamas war had an NCV of more than 100 people—and that the IDF anticipated that it would kill around that many.

Israeli military officers interviewed by Local Call and +972 explained that this shift was made possible by the supposed objectivity of AI and a mindset that emphasized action over judgment. The IDF has embraced a wartime logic, by which it accepts a higher rate of “errors” in exchange for tactical effectiveness while its commanders desire for vengeance against Hamas. In successive operations in 2008, 2012, and 2014—famously termed “mowing the grass” by former Israeli Prime Minister Naftali Bennett—Israel has periodically dropped nonprecision munitions in significant numbers on buildings and tunnel systems deemed to be Hamas targets. Combatant-to-noncombatant fatalities in these wars ranged between 1-to-1 and 1-to-3—a commonly estimated figure for the current war.

An Israeli intelligence source interviewed by +972 Magazine claimed that time constraints made it impossible to “incriminate” every target, which raised the IDF’s tolerance for the margin of statistical error from using AI-powered target recommendation systems—as well as its tolerance for the associated “collateral damage”. Adding to this was the pressure to retaliate against the enemy for their devastating initial attack, with what another source described as a single-minded desire to “fuck up Hamas, no matter what the cost”.

Lavender might have been used more judiciously if not for the deadly interaction effect that emerged between a seemingly objective machine and the intense emotional atmosphere of desperation and vengefulness within IDF war rooms.

There are larger lessons to learn. The most significant of these is that AI cannot shield the use of weapons from the force of single-minded, vindictive, or negligent commanders, operators, or institutional norms. In fact, it can act as a shield or justification for them.

A senior IDF source quoted in the Local Call and +972 investigation said that he had “much more trust in a statistical mechanism than a soldier who lost a friend two days ago”. But manifestly, a set of machines is just as easily implicated in mass killing at a scale exceeding previous norms than a vengeful conscript fighting his way through a dense urban neighborhood.

It is tempting for bureaucracies, military or otherwise, to outsource difficult judgements made in difficult times to machines, thus allowing risky or controversial decisions to be made by no one in particular even as they are broadly implemented. But legal, ethical, and disciplinary oversight cannot be outsourced to computers, and algorithms mask the biases, limits, and errors of their data inputs behind a seductive veneer of assumed objectivity.

The appeal of human-machine teams and algorithmic systems is often claimed to be efficiency—but these systems cannot be scaled up indefinitely without generating counternormative and counterproductive outcomes. Lavender was not intended to be the only arbiter of target legitimacy, and the targets that it recommends could be subject to exhaustive review, should its operators desire it. But under enormous pressure, IDF intelligence analysts reportedly devoted almost no resources to double-checking targets, nor to double-checking bystander locations after feeding the names of targets into Where’s Dad.

Such systems are purpose-built, and officials should remember that even under emergency circumstances, they should proceed with caution when expanding the frequency or scope of a computer tool. The hoped-for operational benefits are not guaranteed, and as the catastrophe in Gaza shows, the strategic—and moral—costs could be significant.

Simon Frankel Pratt is a lecturer in political science at the School of Social and Political Sciences, University of Melbourne.

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *