1,501 research outputs found
POMDPs under Probabilistic Semantics
We consider partially observable Markov decision processes (POMDPs) with
limit-average payoff, where a reward value in the interval [0,1] is associated
to every transition, and the payoff of an infinite path is the long-run average
of the rewards. We consider two types of path constraints: (i) quantitative
constraint defines the set of paths where the payoff is at least a given
threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special
case of quantitative constraint with lambda_1=1. We consider the computation of
the almost-sure winning set, where the controller needs to ensure that the path
constraint is satisfied with probability 1. Our main results for qualitative
path constraint are as follows: (i) the problem of deciding the existence of a
finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding
the existence of an infinite-memory controller is undecidable. For quantitative
path constraint we show that the problem of deciding the existence of a
finite-memory controller is undecidable.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Hyperplane Separation Technique for Multidimensional Mean-Payoff Games
We consider both finite-state game graphs and recursive game graphs (or
pushdown game graphs), that can model the control flow of sequential programs
with recursion, with multi-dimensional mean-payoff objectives. In pushdown
games two types of strategies are relevant: global strategies, that depend on
the entire global history; and modular strategies, that have only local memory
and thus do not depend on the context of invocation. We present solutions to
several fundamental algorithmic questions and our main contributions are as
follows: (1) We show that finite-state multi-dimensional mean-payoff games can
be solved in polynomial time if the number of dimensions and the maximal
absolute value of the weight is fixed; whereas if the number of dimensions is
arbitrary, then problem is already known to be coNP-complete. (2) We show that
pushdown graphs with multi-dimensional mean-payoff objectives can be solved in
polynomial time. (3) For pushdown games under global strategies both single and
multi-dimensional mean-payoff objectives problems are known to be undecidable,
and we show that under modular strategies the multi-dimensional problem is also
undecidable (whereas under modular strategies the single dimensional problem is
NP-complete). We show that if the number of modules, the number of exits, and
the maximal absolute value of the weight is fixed, then pushdown games under
modular strategies with single dimensional mean-payoff objectives can be solved
in polynomial time, and if either of the number of exits or the number of
modules is not bounded, then the problem is NP-hard. (4) Finally we show that a
fixed parameter tractable algorithm for finite-state multi-dimensional
mean-payoff games or pushdown games under modular strategies with
single-dimensional mean-payoff objectives would imply the solution of the
long-standing open problem of fixed parameter tractability of parity games.Comment: arXiv admin note: text overlap with arXiv:1201.282
Graph Planning with Expected Finite Horizon
Graph planning gives rise to fundamental algorithmic questions such as
shortest path, traveling salesman problem, etc. A classical problem in discrete
planning is to consider a weighted graph and construct a path that maximizes
the sum of weights for a given time horizon . However, in many scenarios,
the time horizon is not fixed, but the stopping time is chosen according to
some distribution such that the expected stopping time is . If the stopping
time distribution is not known, then to ensure robustness, the distribution is
chosen by an adversary, to represent the worst-case scenario.
A stationary plan for every vertex always chooses the same outgoing edge. For
fixed horizon or fixed stopping-time distribution, stationary plans are not
sufficient for optimality. Quite surprisingly we show that when an adversary
chooses the stopping-time distribution with expected stopping time , then
stationary plans are sufficient. While computing optimal stationary plans for
fixed horizon is NP-complete, we show that computing optimal stationary plans
under adversarial stopping-time distribution can be achieved in polynomial
time. Consequently, our polynomial-time algorithm for adversarial stopping time
also computes an optimal plan among all possible plans
The Value 1 Problem Under Finite-memory Strategies for Concurrent Mean-payoff Games
We consider concurrent mean-payoff games, a very well-studied class of
two-player (player 1 vs player 2) zero-sum games on finite-state graphs where
every transition is assigned a reward between 0 and 1, and the payoff function
is the long-run average of the rewards. The value is the maximal expected
payoff that player 1 can guarantee against all strategies of player 2. We
consider the computation of the set of states with value 1 under finite-memory
strategies for player 1, and our main results for the problem are as follows:
(1) we present a polynomial-time algorithm; (2) we show that whenever there is
a finite-memory strategy, there is a stationary strategy that does not need
memory at all; and (3) we present an optimal bound (which is double
exponential) on the patience of stationary strategies (where patience of a
distribution is the inverse of the smallest positive probability and represents
a complexity measure of a stationary strategy)
Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
We consider Markov decision processes (MDPs) with multiple limit-average (or
mean-payoff) objectives. There exist two different views: (i) the expectation
semantics, where the goal is to optimize the expected mean-payoff objective,
and (ii) the satisfaction semantics, where the goal is to maximize the
probability of runs such that the mean-payoff value stays above a given vector.
We consider optimization with respect to both objectives at once, thus unifying
the existing semantics. Precisely, the goal is to optimize the expectation
while ensuring the satisfaction constraint. Our problem captures the notion of
optimization with respect to strategies that are risk-averse (i.e., ensure
certain probabilistic guarantee). Our main results are as follows: First, we
present algorithms for the decision problems which are always polynomial in the
size of the MDP. We also show that an approximation of the Pareto-curve can be
computed in time polynomial in the size of the MDP, and the approximation
factor, but exponential in the number of dimensions. Second, we present a
complete characterization of the strategy complexity (in terms of memory bounds
and randomization) required to solve our problem.Comment: Extended journal version of the LICS'15 pape
Average Case Analysis of the Classical Algorithm for Markov Decision Processes with B\"uchi Objectives
We consider Markov decision processes (MDPs) with -regular
specifications given as parity objectives. We consider the problem of computing
the set of almost-sure winning vertices from where the objective can be ensured
with probability 1. The algorithms for the computation of the almost-sure
winning set for parity objectives iteratively use the solutions for the
almost-sure winning set for B\"uchi objectives (a special case of parity
objectives). We study for the first time the average case complexity of the
classical algorithm for computing almost-sure winning vertices for MDPs with
B\"uchi objectives. Our contributions are as follows: First, we show that for
MDPs with constant out-degree the expected number of iterations is at most
logarithmic and the average case running time is linear (as compared to the
worst case linear number of iterations and quadratic time complexity). Second,
we show that for general MDPs the expected number of iterations is constant and
the average case running time is linear (again as compared to the worst case
linear number of iterations and quadratic time complexity). Finally we also
show that given all graphs are equally likely, the probability that the
classical algorithm requires more than constant number of iterations is
exponentially small
Sensor Synthesis for POMDPs with Reachability Objectives
Partially observable Markov decision processes (POMDPs) are widely used in
probabilistic planning problems in which an agent interacts with an environment
using noisy and imprecise sensors. We study a setting in which the sensors are
only partially defined and the goal is to synthesize "weakest" additional
sensors, such that in the resulting POMDP, there is a small-memory policy for
the agent that almost-surely (with probability~1) satisfies a reachability
objective. We show that the problem is NP-complete, and present a symbolic
algorithm by encoding the problem into SAT instances. We illustrate trade-offs
between the amount of memory of the policy and the number of additional sensors
on a simple example. We have implemented our approach and consider three
classical POMDP examples from the literature, and show that in all the examples
the number of sensors can be significantly decreased (as compared to the
existing solutions in the literature) without increasing the complexity of the
policies.Comment: arXiv admin note: text overlap with arXiv:1511.0845
- …
