Follow me

Monday, September 17, 2012

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot


Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, José Castellanos, Arnaud Doucet
Autonomous Robots, August 2009
Summary
                  The paper is about path planning for optimal sensing with a mobile robot, a typical case in which the robot needs to learn about its pose and the environment in time constrained, being then the problem notoriously hard because robots are exhibit to complex dynamics and face unknown uncertainties.
The authors propose a Partially Observed Markov Decision Process (POMDP) with a utility function that depends on the belief state to model the finite horizon planning problem.


Constraints of the problem are time, energy budgets and kinematic and dynamic capabilities.
The problem is represented by the POMDP policy with parameterized path, the robot and either be controlled by a PID (Proportional Integral Derivative) Controller or other classical regular, a simulation approach appears to be the most cost efficient, but, since the dynamic model is differentiable, a Bayesian optimization method to approximate the expected function is used, using a Gaussian process.
·       Model
The goald of the model is to minimize uncertainties about its pose and the location of environmental landmark, therefore the path is parameterized in term of a finite set of ordered way-points θ, so that the robot replans every few steps using adaptive feedback control. Therefore we take the average mean square error for evaluating the cost function, taken with respect to full path distribution function (page 95, please not the end function results as a function and is obtained from the hidden state xt and the sequence of observation along planned trajectory yt).
Being the the model not a linear Gaussian, Linear Quadratic Gaussian methods cannot be used, therefore the authors propose a direct policy for solving the POMDP problem, by obtaining the cost function from an approximation coming from simulations (a Monte Carlo Average on M simulations). The problem with simulation is that we need to minimize the number of runs in order for cost reasons, therefore and open-loop feedback control (OLFC) is suggested, meaning that the robot will keep planning T steps ahead of its current position and a Bayesian optimization is carried out.
The Bayesian approach follows these steps: 1) At iteration N update the expressions for mean and variance of the posterior distribution (P(Cπ()|D1:N)  using the D1:N  (data measured), 2) Choose the new policy θ, as θ=argmaxθEI(θ), where EI(θ) is the expected function of the improvement over a defined standard I(θ)={0,Cπmin – Cπ(θ)}, 3) then augment D1:N+1={D1:N,(θN+1, CπN+1)}, 4) N=N+1.
Key Concepts
Bayesian Optimization approach for minimizing the number of expensive cost function evaluations, exploration and exploitation navigation planning.
Key Results
Simulations have been done on OLC1 (Open Loop ahead with planning horizon of 1), OLC3 (Open Loop that plans up to 3 points ahead) and OLFC3 (Open Look Feedback controller which plans up to 3 way-points, but the robot executes only 1 step before replanning). Of course OLC algorithms appear to be less costly, but they easily get trapped in local minima, therefore OLFC, which is more robust, appears to be preferable. The comparison has been done using two different methods: Largest Marginal Heuristics (the robot follows the shortest path in the highest uncertainty area) and the Maximum Posteriori Square Error (which uses one sample of the cost function at each policy update, resulting in being cost efficient and with lowest error). The algorithms is good for these applications: navigation and recruiting new landmarks and allowing more sophisticate policies.

No comments: