Stuart Armstrong, the Alexander Tamas Fellow in Artificial Intelligence and Machine Learning at FHI and Laurent Orseau, of Google DeepMind, will present their research on reinforcement learning agent interruptibility at the UAI 2016 conference in New York City later this month.
Orseau and Armstrong’s research explores a method to ensure that reinforcement learning agents can be repeatedly safely interrupted by human or automatic overseers. This ensures that the agents do not “learn” about these interruptions, and do not take steps to avoid or manipulate the interruptions.
Interruptibility has several advantages as an approach over previous methods of control. As Dr. Armstrong explains, “Interruptibility has applications for many current agents, especially when we need the agent to not learn from specific experiences during training. Many of the naïve ideas for accomplishing this—such as deleting certain histories from the training set—change the behaviour of the agent in unfortunate ways.”
In the paper, the researchers provide a formal definition of safe interruptibility, show that some types of agents already have this property, and show that others can be easily modified to gain it. They also demonstrate that even an ideal agent that tends to the optimal behaviour in any computable environment can be made safely interruptible.
Dr Armstrong continued: “Machine learning is one of the most powerful tools for building AI that has ever existed. But applying it to questions of AI motivations is problematic: just as we humans would not willingly change to an alien system of values, any agent has a natural tendency to avoid changing its current values, even if we want to change or tune them.
“Interruptibility, and the related general idea of corrigibility, allow such changes to happen without the agent trying to resist them or force them. The newness of the field of AI safety means that there is relatively little awareness of these problems in the wider machine learning community. As with other areas of AI research, DeepMind remains at the cutting edge of this important subfield.”