Friday, January 8, 2016

Complex dynamics in learning complicated games (replicating Galla & Farmer 2012)

I have written a NetLogo version of the random game model of Galla & Farmer (2012) (free download).  It has been uploaded to the NetLogo community library and should appear in a day or so.  Read on if you are interested in Game Theory, esp. learning models and computational methods.

Chaotic dynamics in a complicated game. The payoffs are negatively correlated (-0.7) and memory for learning is long (alpha ≈ 0). Notice the squiggly lines in the time series plots (lower right). Each line is probability for a given move.
If the game were in equilibrium, these lines would be flat. (click for larger image)
Download the .nlogo file 
Download NetLogo (Win, Mac, Linux)

Defeating the Analytical Atom Smasher

The "atom smasher" tool for most of Game Theory is equilibrium analysis. (Same goes for Mathematical Economics.)  In a nutshell, equilibrium analysis focuses on where the game eventually ends up after all the transient changes either die down or work themselves out.  Equilibrium analysis is performed through analytical mathematics (theorems, proofs, etc.) to prove the existence of equilibria and their characteristics.  Generally, any problem that can't be analyzed analytically is avoided or modified so that it does fit the "atom smasher" tool.

But what about complicated games? Do they defeat the "atom smasher"?  Complicated games have very large strategy spaces.  Examples include chess, go, and cyber security (viewed as an n-person mixed cooperative + non-cooperative game).  If complicated games exhibit stable equilibria under realistic conditions for player rationality, then we can probably use the tools of equilibrium analysis.  If, on the other hand, complicated games never settle down to equilibria, then we must abandon equilibrium analysis and instead focus on dynamical learning and adaptation.

This is the research focus of a paper by Galla & Farmer (2012) (PNAS is a very prestigious general science journal):
Galla, T., & Farmer, J. D. (2013). Complex dynamics in learning complicated games. Proceedings of the National Academy of Sciences, 110(4), 1232–1236. http://doi.org/10.1073/pnas.1109672110
The two versions of the paper has 22 and 7 citations so far.

Background: Evolutionary Games

In Game Theory models, two or more players interact through "moves" (a.k.a. "strategies), and the payoff they get depends both on their own move and those of other players.  In evolutionary games, players play the same game over and over and adjust their strategies as they go by some mechanism. The research goal of evolutionary games is to see if "optimal play" or "good play" or "desirable play" can be achieved through evolutionary processes.  The most famous example is the iterated Prisoner's Dilemma game.

In most settings of Evolutionary Game Theory, players know about each other's possible moves, and usually also about each other's payoffs given the moves.  Sometimes there is uncertainty or imperfect information, but not about the structure of the game itself.  Nearly always, the games studied are "analytically tractable" (i.e. they can be analyzed rigorously using the "atom smasher" of equilibrium analysis).  Evolutionary games are are often studied computationally to see which strategies survive in competition with a population of other strategies.

Galla & Farmer's Model

Galla & Farmer set out to study games whose strategy space is too large for both rigorous equilibrium analysis and for players to solve with their cognitive limits (a.k.a. bounded rationality).  For this, the study games with a large number of possible moves ( N = 50 or greater), in 2-person games. (They claim their methods hold for M-person games, but they don't model it or present results.)

The payoff matrix for the two person game is set randomly, with the correlation between player payoffs set by a parameter Γ (Gamma).  Positive correlation means that the game is cooperative, whereas negative correlation is non-cooperative.  With perfect negative correlation, the game is zero sum (what Player A loses, Player B gains, and vice versa).

Why random games? This is a research strategy so they can systematically study a very wide range of games, looking for results that don't depend on the detailed or general structure of the game (i.e. specific payoff structures).  This contrasts with much of traditional Game Theory, which analyzes games with very specific payoff structures (e.g. Prisoner's Dilemma).

Players choose their move for each round probabilistically.  Initially, the probability for all N moves is equal, so its the spin of a roulette wheel.  With experience and learning, the probabilities change.

The main contribution of their model is the "experience weighted learning" model for updating move probabilities.  Experience weighted learning incorporates two factors:

  • Past probabilities for a given move
  • An expectation of future payoff for a given move, based on a knowledge of the other Player's move probabilities.
Note that players don't know each other's payoffs, so this excludes many Game Theory reasoning methods that attempt to deduce "rational play" by opponents.

What Galla & Farmer are looking for is whether these random complicated games settle down into stable move probabilities (i.e. equilibria) or, if they don't settle down, what do the dynamics look like.  Being physicists, their analysis of results draws on dynamical systems methods associated with Complexity Science (incl. Chaos Theory, strange attractors, etc.)

NetLogo Replication

I couldn't find code any where for Galla & Farmer, so I wrote the NetLogo model from scratch, drawing on their paper and the supplement.  I've done some manual tests to see if this model replicates their results (roughly/qualitatively), and so far the replication tests have been successful.

I won't go through the instructions for running the model, because those are found in the model itself ("Info" tab).  Here, I'll describe some of the main features and show some sample results.

Colored Grid = Payoff Matrix

The grid is simply a color-coded version of the two-person payoff matrix.  Each of the N rows is a move for Player A, and each of the N columns is a move for Player B.  The color in each cell is proportional to the two payoffs for A and B.

Uncorrelated payoffs. Γ= 0
(click to enlarge)
Positively correlated payoffs. Γ= 1
Negatively correlated payoffs. Γ= -1
The color code for each cell is:

  • Red = positive for A, negative for B
  • Green = positive for B, negative for A
  • Yellow = positive for both A and B
  • Dark Brown = negative for both A and B

When Gamma is set to -1.0 (perfect negative correlation), you will only see shades of red or green (see bottom image, left). When Gamma is set to 1.0 (perfect positive correlation) you will only see shades of yellow (to olive and dark brown) (see middle image, left). When Gamma is set to zero, you will see the full range of colors (top image, left).

The black square on the grid highlights the current payoff, given the move choices of agents A and B. The line drawn links the current payoff to the payoff in the previous tick.

Players can't "see" this grid and have no memory of where they've been (i.e. their history of past moves and those of opponents).

Thus, the grid is mainly "eye candy" for the experimenter, and maybe useful in explaining what's going on in a particular run.

Dynamical Behavior

In simple terms, both players are trying to find moves that are mutually positive (i.e. yellow cells), because those are most likely to be sustained as (single move) equilibria.  If they can't home in on an ideal single move strategy, they will try to work out a probability distribution of moves that, on average, yield positive results for both players.

The reason for this positive mutualism, even in zero-sum games, is that any player experiencing negative outcomes is going to change their move preferences.  If I'm Player A, and I've found a good set of move preferences for me, I'd like Player B's preferences to be stable so that I can, more or less, count on what moves Player B will make.

There is a graph of Cumulative Payoffs.  Think of cumulative payoff as a random walk with a slowly changing payoff probability.  This graph is diagnostic and is not used by Galla & Farmer in their analysis or results.

Model Parameters

Other than the parameters that control the structure of the game, the main parameters control player learning.   Important: learning operates on each possible move separately.  There is no learning across moves and no (direct) learning of sequences. The learning parameters are:

  • alpha (α) controls how much past history is factored into learning. α = 1.0 means no history is considered, while α = 0.0 means past history is fully included (i.e. not discounted with time). 
  • beta (β) controls the "intensity" of choice toward higher probability moves.  β = 1.0 is fully deterministic, in that the highest probability move is always selected, even if it is only 0.00001 higher in probability than the other moves. β = 0.0 is fully random, meaning relative probabilities don't really matter.  All moves have equal probability.  Galla & Farmer do their main experiments using β = 0.07.

Sample Results

There are basically three classes of dynamical behavior, depending on the range of the parameters.

1) Single Equilibrium -- When alpha (α) is high and Gamma Γ > 0, the game always settles to a single, unique equilibrium.  Here is what this looks like for two different random seeds:


(click to enlarge)

Notice how flat the time series lines are.  Also notice that the two sets of Move Probability bar graphs (top right in each) are nearly identical. 

2) Multiple Equilibrium -- When alpha (α) is low and Gamma Γ > 0, the game always settles to an equilibria, but one of several. There is no longer a single unique equilibrium.  Here is what that looks like:

(click to enlarge)


Notice that there are two strong peaks in the move probabilities and that the time series lines are wavy, indicating regular cycling through these preferred moves.

3) Low and High Dimension Chaos -- When alpha (α) is very low and Gamma (Γ)  is low (below 0.5, roughly), the move probabilities never settle down.  With smaller alpha (i.e. longer memory) and smaller Gamma (i.e. every more non-cooperative payoff structure), the chaos gets more and more wild.  Here are two examples:

(click to enlarge)


(click to enlarge)

So What? (Implications)

When games don't settle to equilibria, then many of the usual "rational" methods will fail or will produce poor results.  Think "optimization" (as in "optimal investment" or "optimal policy" or "optimal practices").  Think "strategic planning" where the plans are based on a static rational model of competitor behavior.

How hard are these games to "figure out" if you had some powerful data and models?  Galla & Farmer go into that when they estimate the "dimensionality" of the chaotic attractor.  Basically, very low dimensional oscillations and a little chaos can be modeled successfully if you know both recent and long-term history.  In contrast, very high dimensional chaos will not yield to data and modeling, no matter how Big your Data is.

My Interests

For myself, I'm interested in this model for use in my dissertation.  Not directly, but after modification and incorporate into a larger, multilevel modeling system.  I'll be making extensions to the base model to better fit the cyber security setting. Maybe I'll get it done in time to submit to WEIS 2016.



No comments:

Post a Comment