Optimism Driven Exploration for Nonlinear Systems | |
|
|
Abstract —
Tasks with unknown dynamics and costly system
interaction time present a serious challenge for reinforcement
learning. If a model of the dynamics can be learned quickly,
interaction time can be reduced substantially. We show that
combining an optimistic exploration strategy with modelpredictive
control can achieve very good sample complexity for
a range of nonlinear systems. Our method learns a Dirichlet
process mixture of linear models using an exploration strategy
based on optimism in the face of uncertainty. Trajectory
optimization is used to plan paths in the learned model that
both minimize the cost and perform exploration. Experimental
results show that our approach achieves some of the most
sample-efficient learning rates on several benchmark problems,
and is able to successfully learn to control a simulated helicopter
during hover and autorotation with only seconds of interaction
time. The computational requirements are substantial.
|