Online Parameter Estimation via Real-Time Replanning of Continuous Gaussian POMDPs (pdf)

Dustin J. Webb, Kyle L. Crandall, Jur van den Berg


An accurate dynamics model of a robot is an important ingredient of many algorithms used to solve robotics problems, including motion planning, control, localization, and mapping. Models derived from first principles often contain parameters (e.g. mass, moment of inertia, arm lengths, etc.) for which values are unknown. Those which cannot be easily measured must be estimated from the observed behavior of the robot. A good approach to address this problem is to plan control policies for the robot that elicit maximal amounts of information about the parameters of the system, while still achieving other objectives specified for the robot. In case of parameters subject to drift, this must be done continuously over the lifetime of the robot if costly re-calibrations are to be avoided. In this paper, we introduce a new method that formulates the parameter estimation problem as a continuous partially-observable Markov decision process (POMDP), which plans control policies that optimally trade-off the effort spent on learning parameters and effort spent on achieving regular robot objectives (exploration vs. exploitation), and allow for online, continual parameter estimation. While POMDPs have, until recently, been mostly of theoretical interest due to their inherent complexity, we build on recent advances that allow continuous, Gaussian POMDPs to be approximately-optimally solved in near-real-time rates. We show that the computed control policies lead to improved convergence of the belief of the parameters compared to system identification approaches based on applying random controls.