Just like humans, when robots have a decision to make there are often many options and hundreds of potential outcomes. Robots have been able to simulate a handful of these outcomes to figure out which course of action will be the most likely to lead to success. But what if one of the other options were equally likely to succeed - and safer?
The Office of Naval Research has awarded Brendan Englot, an MIT-trained mechanical engineer at Stevens Institute of Technology, a 2020 Young Investigator Award of $508, 693 to leverage a new variant of a classic artificial intelligence tool to allow robots to predict the many possible outcomes of their actions, and how likely they are to occur. The framework will allow robots to figure out which option is the best way to achieve a goal, by understanding which options are the safest, most efficient - and least likely to fail.
"If the fastest way for a robot to complete a task is by walking on the edge of a cliff, that's sacrificing safety for speed," said Englot, who will be among the first to use the tool, distributional reinforcement learning, to train robots. "We don't want the robot falling off the edge of that cliff, so we are giving them the tools to predict and manage the risks involved in completing the desired task."
For years, reinforcement learning has been used to train robots to navigate autonomously in the water, land and air. But that AI tool has limitations, because it makes decisions based on a single expected outcome for each available action, when in fact there are often many other possible outcomes that may occur. Englot is using distributional reinforcement learning, an AI algorithm that a robot can use to evaluate all possible outcomes, predict the probability of each action succeeding and choose the most expedient option likely to succeed while keeping a robot safe.
Before putting his algorithm to use in an actual robot, Englot's first mission is to perfect the algorithm. Englot and his team create a number of decision-making situations in which to test their algorithm. And they often turn to one of the field's favorite playing grounds: Atari games.
For example, when you play Pacman, you are the algorithm that is deciding how Pacman behaves. Your objective is to get all of the dots in the maze and if you can, get some fruit. But there are ghosts floating around that can kill you. Every second, you are forced to make a decision. Do you go straight, left or right? Which path gets you the most dots - and points - while also keeping you away from the ghosts?
Englot's AI algorithm, using distributional reinforcement learning , will take the place of a human player, simulating every possible move to safely navigate its landscape.
So how do you reward a robot? Englot and his team will be assigning points to different outcomes, i.e., if it falls off a cliff, the robot gets -100 points. If it takes a slower, but safer option, it may receive -1 point for every step along the detour. But if it successfully reaches the goal, it may get +50.
"One of our secondary goals is to see how reward signals can be designed to positively impact how a robot makes decisions and can be trained," said Englot. "We hope the techniques developed in this project could ultimately be used for even more complex AI, such as training underwater robots to navigate safely amidst varying tides, currents, and other complex environmental factors."
###