Closing the Reality Gap of Robotic Simulators through Task-oriented Bayesian Optimization
|Title||Closing the Reality Gap of Robotic Simulators through Task-oriented Bayesian Optimization|
|Publication Type||Journal Article|
|Year of Publication||2019|
|Authors||Zhu, S, Surovik, D, Bekris, KE, Boularias, A|
|Journal||Journal of Machine Learning Research|
|Date Published||2019 (accepted)|
Although robotic simulators continue to increase in sophistication, their real-world deployment for robot control and learning remains limited. This is primarily due to insufficient accuracy in the context of complex physical interactions. This work aims to close the reality gap of off-the-shelf physics engines for robotic tasks via three interrelated contributions. First, Bayesian optimization is adapted for time-efficient identification of the mechanical parameters of robots or objects they interact with given real-world trajectories. Examples of such physical parameters include the mass of objects in the environment and friction coefficients of surfaces. The second contribution relates to the daunting model complexity of modern and versatile robotic systems. This issue is approached by projecting the model identification challenge into an appropriate lower dimensional parameter space through the use of an autoencoder network. Finally, the Bayesian optimization process is integrated with policy learning in simulation to ensure that the model captures the intended closed- loop behavior within a desired level of confidence. This ensures that the results remain effective for real-world deployment, while avoiding a waste of effort on irrelevant aspects of the model. Hardware experiments for a robotic manipulation task show that the proposed framework achieves good data-efficiency against model-free policy search strategies. Furthermore, the proposed model identification process is applied on a locomotion task involving a high-dimensional, compliant tensegrity robot. The proposed approach results in more accurate parameter identification than alternatives given the same time budget and increase the precision of locomotion control. Given simulated experiments, the proposed integration of policy search with the Bayesian optimization model identification improves the time-efficiency of model-based reinforcement learning.