Ruben Martinez-Cantin, Nando de
Freitas, Eric Brochu, José Castellanos, Arnaud Doucet
Autonomous Robots, August 2009
Summary
The paper is about path planning for optimal
sensing with a mobile robot, a typical case in which the robot needs to learn
about its pose and the environment in time constrained, being then the problem
notoriously hard because robots are exhibit to complex dynamics and face
unknown uncertainties.
The authors propose a
Partially Observed Markov Decision Process (POMDP) with a utility function that depends on the belief state to
model the finite horizon planning problem.
Constraints of the problem
are time, energy budgets and kinematic and dynamic capabilities.
The problem is represented
by the POMDP policy with parameterized path, the robot and either be controlled
by a PID (Proportional Integral Derivative) Controller or other classical
regular, a simulation approach appears to be the most cost efficient, but,
since the dynamic model is differentiable, a Bayesian optimization method to
approximate the expected function is used, using a Gaussian process.
·
Model
The goald of the model is
to minimize uncertainties about its pose and the location of environmental
landmark, therefore the path is parameterized in term of a finite set of
ordered way-points θ, so that the
robot replans every few steps using adaptive feedback control. Therefore we
take the average mean square error for evaluating the cost function, taken with
respect to full path distribution function (page 95, please not the end
function results as a function and is obtained from the hidden state xt
and the sequence of observation along planned trajectory yt).
Being the the model not a
linear Gaussian, Linear Quadratic Gaussian methods cannot be used, therefore
the authors propose a direct policy for solving the POMDP problem, by obtaining
the cost function from an approximation coming from simulations (a Monte Carlo
Average on M simulations). The
problem with simulation is that we need to minimize the number of runs in order
for cost reasons, therefore and open-loop feedback control (OLFC) is suggested,
meaning that the robot will keep planning T steps ahead of its current position
and a Bayesian optimization is carried out.
The Bayesian approach
follows these steps: 1) At iteration N update the expressions for mean and
variance of the posterior distribution (P(Cπ()|D1:N) using the D1:N (data measured), 2) Choose the new
policy θ, as θ=argmaxθEI(θ),
where EI(θ) is the expected function of the improvement over a
defined standard I(θ)={0,Cπmin – Cπ(θ)}, 3) then
augment D1:N+1={D1:N,(θN+1, CπN+1)}, 4) N=N+1.
Key
Concepts
Bayesian Optimization
approach for minimizing the number of expensive cost function evaluations,
exploration and exploitation navigation planning.
Key Results
Simulations have been done
on OLC1 (Open Loop ahead with planning horizon of 1), OLC3 (Open Loop that
plans up to 3 points ahead) and OLFC3 (Open Look Feedback controller which
plans up to 3 way-points, but the robot executes only 1 step before
replanning). Of course OLC algorithms appear to be less costly, but they easily
get trapped in local minima, therefore OLFC, which is more robust, appears to
be preferable. The comparison has been done using two different methods:
Largest Marginal Heuristics (the robot follows the shortest path in the highest
uncertainty area) and the Maximum Posteriori Square Error (which uses one
sample of the cost function at each policy update, resulting in being cost
efficient and with lowest error). The algorithms is good for these
applications: navigation and recruiting new landmarks and allowing more sophisticate
policies.
No comments:
Post a Comment