208 research outputs found
Open-ended Learning in Symmetric Zero-sum Games
Zero-sum games such as chess and poker are, abstractly, functions that
evaluate pairs of agents, for example labeling them `winner' and `loser'. If
the game is approximately transitive, then self-play generates sequences of
agents of increasing strength. However, nontransitive games, such as
rock-paper-scissors, can exhibit strategic cycles, and there is no longer a
clear objective -- we want agents to increase in strength, but against whom is
unclear. In this paper, we introduce a geometric framework for formulating
agent objectives in zero-sum games, in order to construct adaptive sequences of
objectives that yield open-ended learning. The framework allows us to reason
about population performance in nontransitive games, and enables the
development of a new algorithm (rectified Nash response, PSRO_rN) that uses
game-theoretic niching to construct diverse populations of effective agents,
producing a stronger set of agents than existing algorithms. We apply PSRO_rN
to two highly nontransitive resource allocation games and find that PSRO_rN
consistently outperforms the existing alternatives.Comment: ICML 2019, final versio
Arbitrarily primed PCR to type Vibrio spp. pathogenic for shrimp.
International audienceA molecular typing study on Vibrio strains implicated in shrimp disease outbreaks in New Caledonia and Japan was conducted by using AP-PCR (arbitrarily primed PCR). It allowed rapid identification of isolates at the genospecies level and studies of infraspecific population structures of epidemiological interest. Clusters identified within the species Vibrio penaeicida were related to their area of origin, allowing discrimination between Japanese and New Caledonian isolates, as well as between those from two different bays in New Caledonia separated by only 50 km. Other subclusters of New Caledonian V. penaeicida isolates could be identified, but it was not possible to link those differences to accurate epidemiological features. This contribution of AP-PCR to the study of vibriosis in penaeid shrimps demonstrates its high discriminating power and the relevance of the epidemiological information provided. This approach would contribute to better knowledge of the ecology of Vibrio spp. and their implication in shrimp disease in aquaculture
Approximate dynamic programming for two-player zero-sum Markov games
International audienceThis paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia
A multi-agent reinforcement learning model of common-pool resource appropriation
Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria---a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible. Most of that work was based on laboratory experiments where participants only make a single choice: how much to appropriate. Recognizing the importance of spatial and temporal resource dynamics, a recent trend has been toward experiments in more complex real-time video game-like environments. However, standard methods of non-cooperative game theory can no longer be used to generate predictions for this case. Here we show that deep reinforcement learning can be used instead. To that end, we study the emergent behavior of groups of independently learning agents in a partially observed Markov game modeling common-pool resource appropriation. Our experiments highlight the importance of trial-and-error learning in common-pool resource appropriation and shed light on the relationship between exclusion, sustainability, and inequality
Pulmonary haemorrhage as a predominant cause of death in leptospirosis in Seychelles
We examined the cause of death during a 12-month period (1995/96) in all consecutive patients admitted to hospital with leptospiral infection in Seychelles (Indian Ocean), where the disease is endemic. Leptospirosis was diagnosed by use of the microscopic agglutination test and a specific polymerase chain reaction assay on serum samples. Seventy-five cases were diagnosed and 6 patients died, a case fatality of 8%. All 6 patients died within 9 days of onset of symptoms and within 2 days of admission for 5 of them (5 days for the 6th). On autopsy, diffuse bilateral pulmonary haemorrhage (PH) was found in all fatalities. Renal, cardiac, digestive and cerebral haemorrhages were also found in 5, 3, 3 and 1 case(s), respectively. Incidentally, haemoptysis and lung infiltrate on chest radiographs, which suggest PH, were found in 8 of the 69 non-fatal cases. Dengue and hantavirus infections were ruled out. In conclusion, PH appeared to be a main cause of death in leptospirosis in this population, although haemorrhage in other organs may also have contributed to fatal outcomes. This cause of death contrasts with the findings generally reported in endemic setting
Navigating the Landscape of Multiplayer Games
Multiplayer games have long been used as testbeds in artificial intelligence
research, aptly referred to as the Drosophila of artificial intelligence.
Traditionally, researchers have focused on using well-known games to build
strong agents. This progress, however, can be better informed by characterizing
games and their topological landscape. Tackling this latter question can
facilitate understanding of agents and help determine what game an agent should
target next as part of its training. Here, we show how network measures applied
to response graphs of large-scale games enable the creation of a landscape of
games, quantifying relationships between games of varying sizes and
characteristics. We illustrate our findings in domains ranging from canonical
games to complex empirical games capturing the performance of trained agents
pitted against one another. Our results culminate in a demonstration leveraging
this information to generate new and interesting games, including mixtures of
empirical games synthesized from real world games
A Generalised Method for Empirical Game Theoretic Analysis
This paper provides theoretical bounds for empirical game theoretical
analysis of complex multi-agent interactions. We provide insights in the
empirical meta game showing that a Nash equilibrium of the meta-game is an
approximate Nash equilibrium of the true underlying game. We investigate and
show how many data samples are required to obtain a close enough approximation
of the underlying game. Additionally, we extend the meta-game analysis
methodology to asymmetric games. The state-of-the-art has only considered
empirical games in which agents have access to the same strategy sets and the
payoff structure is symmetric, implying that agents are interchangeable.
Finally, we carry out an empirical illustration of the generalised method in
several domains, illustrating the theory and evolutionary dynamics of several
versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel
Blotto game played by human players on Facebook (symmetric), and an example of
a meta-game in Leduc Poker (asymmetric), generated by the PSRO multi-agent
learning algorithm.Comment: will appear at AAMAS'1
- …
