The Alignment Problem Is Not New – O’Reilly

“Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war,” according to a statement signed by more than 350 business and technical leaders, including the developers of today’s most important AI platforms.

Among the possible risks leading to that outcome is what is known as “the alignment problem.” Will a future super-intelligent AI share human values, or might it consider us an obstacle to fulfilling its own goals? And even if AI is still subject to our wishes, might its creators—or its users—make an ill-considered wish whose consequences turn out to be catastrophic, like the wish of fabled King Midas that everything he touches turn to gold? Oxford philosopher Nick Bostrom, author of the book Superintelligence, once posited as a thought experiment an AI-managed factory given the command to optimize the production of paperclips. The “paperclip maximizer” comes to monopolize the world’s resources and eventually decides that humans are in the way of its master objective.

Learn faster. Dig deeper. See farther.

Far-fetched as that sounds, the alignment problem is not just a far future consideration. We have already created a race of paperclip maximizers. Science fiction writer Charlie Stross has noted that today’s corporations can be thought of as “slow AIs.” And much as Bostrom feared, we have given them an overriding command: to increase corporate profits and shareholder value. The consequences, like those of Midas’s touch, aren’t pretty. Humans are seen as a cost to be eliminated. Efficiency, not human flourishing, is maximized.

In pursuit of this overriding goal, our fossil fuel companies continue to deny climate change and hinder attempts to switch to alternative energy sources, drug companies peddle opioids, and food companies encourage obesity. Even once-idealistic internet companies have been unable to resist the master objective, and in pursuing it have created addictive products of their own, sown disinformation and division, and resisted attempts to restrain their behavior.

Even if this analogy seems far fetched to you, it should give you pause when you think about the problems of AI governance.

Corporations are nominally under human control, with human executives and governing boards responsible for strategic direction and decision-making. Humans are “in the loop,” and generally speaking, they make efforts to restrain the machine, but as the examples above show, they often fail, with disastrous results. The efforts at human control are hobbled because we have given the humans the same reward function as the machine they are asked to govern: we compensate executives, board members, and other key employees with options to profit richly from the stock whose value the corporation is tasked with maximizing. Attempts to add environmental, social, and governance (ESG) constraints have had only limited impact. As long as the master objective remains in place, ESG too often remains something of an afterthought.

Much as we fear a superintelligent AI might do, our corporations resist oversight and regulation. Purdue Pharma successfully lobbied regulators to limit the risk warnings planned for doctors prescribing Oxycontin and marketed this dangerous drug as non-addictive. While Purdue eventually paid a price for its misdeeds, the damage had largely been done and the opioid epidemic rages unabated.

What might we learn about AI regulation from failures of corporate governance?

  1. AIs are created, owned, and managed by corporations, and will inherit their objectives. Unless we change corporate objectives to embrace human flourishing, we have little hope of building AI that will do so.
  2. We need research on how best to train AI models to satisfy multiple, sometimes conflicting goals rather than optimizing for a single goal. ESG-style concerns can’t be an add-on, but must be intrinsic to what AI developers call the reward function. As Microsoft CEO Satya Nadella once said to me, “We [humans] don’t optimize. We satisfice.” (This idea goes back to Herbert Simon’s 1956 book Administrative Behavior.) In a satisficing framework, an overriding goal may be treated as a constraint, but multiple goals are always in play. As I once described this theory of constraints, “Money in a business is like gas in your car. You need to pay attention so you don’t end up on the side of the road. But your trip is not a tour of gas stations.” Profit should be an instrumental goal, not a goal in and of itself. And as to our actual goals, Satya put it well in our conversation: “the moral philosophy that guides us is everything.”
  3. Governance is not a “once and done” exercise. It requires constant vigilance, and adaptation to new circumstances at the speed at which those circumstances change. You have only to look at the slow response of bank regulators to the rise of CDOs and other mortgage-backed derivatives in the runup to the 2009 financial crisis to understand that time is of the essence.

OpenAI CEO Sam Altman has begged for government regulation, but tellingly, has suggested that such regulation apply only to future, more powerful versions of AI. This is a mistake. There is much that can be done right now.

We should require registration of all AI models above a certain level of power, much as we require corporate registration. And we should define current best practices in the management of AI systems and make them mandatory, subject to regular, consistent disclosures and auditing, much as we require public companies to regularly disclose their financials.

The work that Timnit Gebru, Margaret Mitchell, and their coauthors have done on the disclosure of training data (“Datasheets for Datasets”) and the performance characteristics and risks of trained AI models (“Model Cards for Model Reporting”) are a good first draft of something much like the Generally Accepted Accounting Principles (and their equivalent in other countries) that guide US financial reporting. Might we call them “Generally Accepted AI Management Principles”?

It’s essential that these principles be created in close cooperation with the creators of AI systems, so that they reflect actual best practice rather than a set of rules imposed from without by regulators and advocates. But they can’t be developed solely by the tech companies themselves. In his book Voices in the Code, James G. Robinson (now Director of Policy for OpenAI) points out that every algorithm makes moral choices, and explains why those choices must be hammered out in a participatory and accountable process. There is no perfectly efficient algorithm that gets everything right. Listening to the voices of those affected can radically change our understanding of the outcomes we are seeking.

But there’s another factor too. OpenAI has said that “Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent.” Yet many of the world’s ills are the result of the difference between stated human values and the intent expressed by actual human choices and actions. Justice, fairness, equity, respect for truth, and long-term thinking are all in short supply. An AI model such as GPT4 has been trained on a vast corpus of human speech, a record of humanity’s thoughts and feelings. It is a mirror. The biases that we see there are our own. We need to look deeply into that mirror, and if we don’t like what we see, we need to change ourselves, not just adjust the mirror so it shows us a more pleasing picture!

To be sure, we don’t want AI models to be spouting hatred and misinformation, but simply fixing the output is insufficient. We have to reconsider the input—both in the training data and in the prompting. The quest for effective AI governance is an opportunity to interrogate our values and to remake our society in line with the values we choose. The design of an AI that will not destroy us may be the very thing that saves us in the end.