Improving State Space Coverage for Dynamical Systems

Many interesting physical systems exhibit challenging dynamics and underactuation that complicate motion planning. One prototypical problem in this context is swinging up an Acrobot system in the presence of gravity, as displayed in the figure below. This is an important benchmark because many classes of robots, especially walking robots, exhibit similar levels of underactuation and dynamic constraints.

One approach for planning with dynamics is the framework of sampling-based kinodynamic algorithms, such as the popular RRT method. These methods aim to cover as quickly as possible the state space through sampling. Specifically the RRT algorithm makes use of an implicit Voronoi bias when selecting states and controls to expand a reachability tree into the state space. This favorable bias, however, is no longer available if there is no good distance metric in the state space or if drift and other dynamic constraints introduce undesired biases. The figure above illustrates the challenges in planning with the standard RRT for the Acrobot. The figure provides projections of the states along the reachability tree produced by RRT. The states are projected onto the torus defined by the global orientations of the Acrobot’s first two links. The state space exploration procedure is severely biased due to the dynamics and the goal's vicinity has not even been approached yet.

The main ideas in this work is to:

  • learn the biases introduced by the dynamics in the state space exploration process during an offline training phase and then
  • counteract the effects of these undesired biases during the online operation of the algorithm so as to achieve a more balanced coverage of the state space.

As a feasibility study of these ideas, we have tested a classical statistical tool for building predictive models, that of Principal Component Analysis (PCA). Even though PCA is a linear method and the Acrobot is a non-linear system, it is still possible to acquire useful information about the exploration bias because the system is highly constrained. During the online step, the exploration procedure is modified so as to promote the propagation of the search tree towards the direction of the least significant components. In the context of the RRT, this can be achieved by selecting at each iteration the control which brings the system closer to a modified version of the random state sample. The following figure illustrates the basic idea of the approach.

Experimental results suggest that it is possible to solve planning problems for the Acrobot, as well as for a car-like system, faster, since the state space is covered more efficiently. At the same time, the computational overhead during the online operation of the algorithm is minimal. On average, each iteration of the PCA-based algorithm is only 1.15% more time consuming than an iteration of the standard RRT. Only a few matrix transformations and operations have to be executed to achieve the desired result. Concerning the cost of the offline training procedure there are two issues:

  • How large should the trees be during the offline process?
  • How many trees should be propagated during training?

Experiments indicate that a small number, even just one, of small size trees is sufficient. Overall, the experiments show that the combined cost of the offline step and the online step is smaller than the cost of a solution with the standard RRT.

The following two figures present results achieved for the Acrobot system (top) and a second-order car-like vehicle (bottom). Both images present results for the RRT on the left side and results for the PCA-based version on the right side.

Related Publications