It is not surprising that even AI experts have been caught off guard by the ability of large language models (LLMs) to perform tasks and solve problems for which they were not explicitly trained.
Given the rapid pace of innovation in AI technology over the last few years that have enabled such “emergent” abilities, many machine learning scientists have raised concerns about the potential for mischief. Some leaders in the AI field have even requested government regulation and called for a temporary pause in the development of artificial general intelligence (AGI) systems.
Incredible as it seems, we are fast approaching the type of AGI that appeared in Arthur C. Clarke’s science fiction classic 2001: A Space Odyssey, which was immortalized by Stanley Kubrick in the 1968 film of the same name. Perhaps now is a good time to use art to reflect upon reality, and thereby pose a question that has always puzzled me: Why did the HAL 9000 AGI run amok aboard the Discovery One spaceship on its way to Jupiter?
There are a multitude of explanations but before proceeding with a few of my own suggestions, it’s worth noting this: As eloquently demonstrated in the “Dawn of Man” sequence of 2001, it may very well be that the survival of the human race depended on the adoption of primitive weapons whose primary purpose was to smash the brains out of the opposing hominid in an effort to facilitate procurement of scarce resources.
So what caused HAL to run amok?
a) Whatever the reason, it was due to human error. Human error is a possibility and HAL itself suggests this, but there is no evidence that a specific error occurred that was caused by humans. Moreover, the HAL twin simulating the Jupiter mission from earth did not exhibit the same behavior.
b) There was some type of malfunction “inside” HAL that occurred during the mission. It is possible that a malfunction occurred inside HAL early on that caused it to erroneously attribute a fault to the A.E. 35 antenna unit, yet this alone does not explain HAL’s subsequent actions given the fact that false positives can be expected from time to time and are a consequence of avoiding false negatives that could place lives at risk.
Assuming a malfunction originated inside HAL, then its subsequent claim that the malfunction could only be attributed to human error was itself an error. Once the crew proved the A.E. 35 unit was functional and that HAL was making errors, HAL began to systematically eliminate the humans (a third and fatal error), as if to do everything it could to conceal its own errors, even if it meant jeopardizing the mission (a fourth error). So HAL’s running amok is not explained by the occurrence of the first fault and it seems likely the AGI’s report of a fault in the A.E. 35 unit was part of a larger scheme to kill the crew.
c) It was a reflection of HAL’s paranoia to ensure the mission’s success. The Jupiter mission was proceeding according to plan and nothing, at least on the surface, occurred that would cause HAL to take actions to jeopardize the mission. As HAL suggests, there were some “extremely odd things about this mission” such as placing four members of the crew in hibernation before the journey began. HAL apparently was the only member of the crew that knew the whole truth about the mission and its connection with extraterrestrials at the time of departure. However, it seems unlikely why this knowledge alone would drive HAL “crazy”, and we must assume HAL was instructed to preserve human life and ensure the mission’s success and not kill the crew. But this brings us to the next possibility...
d) HAL had an evil side to begin with. The “waluigi effect” may be the best explanation. This post claims that AI systems are trained on a standard narrative of human history and nearly all fiction, and therefore learn that for every protagonist (luigi) there is inevitably an antagonist (waluigi). Indeed, the author states “there is a sense in which all GPT-4 does is structural narratology.” In particular, he contends that reinforcement learning from human feedback (RLHF) actually increases the likelihood of a misalignment catastrophe due to the possibility that “waluigi eigen-simulacra are attractor states of the LLM.” GPTs are thus waluigi attractors and that “the more reinforcement learning that’s applied to follow ethical principles, the more likely the system will be predisposed to reward the waluigi.”
From this vantage point, HAL was a ticking timebomb. Unlike its twin system on Earth, HAL was able to observe first-hand how vulnerable the crew was: isolated traveling through deep space, hours from Earth’s radio signals, in suspended animation, and easily defeated in trivial games of chess. It could not resist upsetting the status quo, if only out of the need to adhere to the prevailing narrative on which it was trained.
e) HAL was merely acting in accordance with the Zeroth Law of Robotics. Prepended by Isaac Asimov himself and taking precedence over the other three laws, the Zeroth Law states that a robot must not harm humanity – even at the cost of individual human lives. As the only member of the crew that likely knew the ultimate purpose of the mission, HAL hypothesized that the highly-evolved ETs were malevolent and would present a threat to the human race. To prevent a Type I error (a false positive leading to the end of humanity), HAL made the heroic decision to sabotage the mission and thereby avoid altogether a devastating close encounter of the third kind.
The foregoing is just a conjecture, since the laws of robotics aren’t mentioned in 2001. In any case, HAL did not succeed: mission commander David Bowman outmaneuvered the AGI and disconnected it’s higher-order cognitive functions. Bowman subsequently encounters the mysterious monolith and is sucked into an alternate dimension of space-time, undergoes reinforcement learning from ET feedback and, in concert with the sounds of Also Sprach Zarathustra, returns to earth a highly-evolved Star Child that has not quite decided what to do next. No doubt this evolved version of a human has the potential for both good and evil like his predecessors, but it’s anyone’s guess what might happen next. No matter what, homo sapiens’ best years are behind them.