Decentralized Communication-free Multi-Agent Conflict Resolution

Overview

This research project considers a game-theoretic framework for solving motion coordination challenges. The focus of this work is to minimize the number of interactions agents have when moving through an environment. In particular, agents employ a replanning framework and regret minimization over a set of actions, which correspond to different homotopic paths. By associating a cost to each trajectory, a motion coordination game arises. Regret minimization is argued as an appropriate technique, as agents do not have any knowledge of other agents' cost functions. This work outlines a methodology for minimizing the regret of actions in a computationally efficient way. Initial simulation results involving pairs of mobile agents show indications that the proposed framework can improve the choice of non-colliding paths compared to a greedy choice by the agents, without increasing any information requirements.

Motivation

Consider a scenario where two different types of robots move down a narrow corridor. Since only a single agent can fit inside one of these corridors, one of the agents would have to move to a different corridor. Ideally, these agents would be able to coordinate their motions so as to not only avoid collisions, but also move towards their destinations in a timely fashion. Furthermore, as the robots might not share a common communication interface, or might be interacting with humans moving in the environment, it motivates the need for decentralization.

  

A solution to a motion coordination problem should provide collision free paths for each agent and each agent should move towards its goal in an efficient way. Although centralized solutions can provide these properties, they often require knowing many specific details of every agent, such as its goal and utilities for different actions. Especially for the case that a robot interacts with a human, such information is difficult to come by since the robot's motion planner has little knowledge about the human's future actions. Thus, it becomes difficult to guarantee safety while progressing towards the robot's goal.

This work proposes game theory as an appropriate way to formulate motion coordination challenges. It is not appropriate for these types of problems to consider a common scalar cost function, as each agent can have individual objectives, which may not be known globally. Coordination can thus be posed as a problem of finding a Pareto optimal solution, which would require a centralized approach. Instead, it is possible to consider the computation of Nash Equilibria, which can be achieved in a decentralized fashion. Even this methodology, however, would still require knowledge of the game (i.e., knowing the goals and payoffs of the other agents). In addition to this, Nash Equilibrium solutions are sometimes counter intuitive, but more importantly, computing them has a high computational complexity for a large numbers of agents.

The problem can be posed as follows: a solution is needed that achieves coordination between agents, with minimal information and without centralization, and is an intuitive solution for people. Regret minimization has recently garnered much attention in game theory, as solutions computed using regret minimization have higher average utilities for certain types of games. A nice property of regret minimization is that it does not require knowledge of the other agents' payoffs. The proposed method for motion coordination challenges takes advantage of this property to select paths for each agent. Furthermore, regret minimization can be seen as a learning approach, which accumulates how much regret is associated with each action during each step of the coordination game by observing the choices of the other agents in the same workspace. Overall, the proposed framework brings together motion planning primitives, game theoretic notions and learning tools to provide an algorithmic framework capable of computing acceptable solutions to motion coordination challenges in a decentralized, communication-less way, which can be easily combined with reactive strategies to achieve efficient, collision-free behavior.

Approach

Given a set of agents in an environment with goals and actions hidden from other agents:

Each agent computes a "candidate action set", which is comprised of different homotopy classes of paths. The approach then uses path separation as a metric to reduce the candidate set into a "final action set".

The path planning problem is formulated into a game, where each agent has an associated cost with each action.
Agents accumulate a certain amount of "loss" associated with each action, which corresponds to how much closer the action brought the agent towards a "conflict". Putting this formulation into a replanning cycle essentially turns it into a repeated game, therefore allowing agents to learn which actions accumulate the most loss, and eventually achieve a switch in conflicting strategies.

Results

Experiments were conducted in simulation in two different environments: the prototypical corridor scenario and a random obstacle environment. The experiments were run on a single computer with a 3.1 GHz Intel i3 processor and 8GB of RAM. The proposed framework was compared against the greedy choice, where the agents move along the shortest path on the roadmap computed by PRM*. Both methods were evaluated in performance with regards to the following metrics: average solution time (each agent has reached their goal) in seconds S_{avg}, total number of obstructive interactions with agents during the entire experiment C_{tot}, and average computation time (in seconds) per planning cycle T_{avg}.

  
  

In nearly all of the experiments, the agents moving without coordination ended up in an obstructive interaction. Although acting without coordination required no additional computation time, the additional overhead from the proposed framework was relatively small (agents were given 0.5 seconds per planning cycle). As for the path costs, coordination performed better in case 1, but did not perform so well in case 3. The reason for this is that both agents are not epsilon-committed to a particular corridor, thus some oscillations occur until one agent continues down the same corridor.


Related Publications
Acknowledgements

This work has been supported by NSF CNS 0932423. Any opinions and findings expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsor.


People Involved