Friday, January 22, 2016

Time & Uncertainty (2nd post: "What kind of game is cyber security investment?")

Summary: Time and uncertainty are essential features of any model of the "game of cyber security".  Models that do not include them as central features are not fit for purpose.  But, yes, they do make life more difficult for modelers and their audiences. While I make the case that both are essential, I leave open the question as to what is the most parsimonious method or treatment.

Most Theoretical Models Leave Out Time and Uncertainty

If you have ever read any theoretical economics papers on cyber security or information security, you may notice that nearly all feature instantaneous payoff on decisions and investments.  Some do not include an independent variable for time at all, preferring instead to focus on the existence and properties of equilibria, which is "where the system will end up" after all the transient dynamics play themselves out.  Others may model some aspects of time (e.g. sequential decisions, or sequences of actions), but most do not build a fully dynamical model of the system where the system state is crucially dependent on time.

Likewise for uncertainty.  It is rarely a central feature of the model.  In many Game Theory models, there is no uncertainty in the payoff structure or in the number or range of "moves" available to each player. In Neo-classical Economics, it is widely asserted that the uncertain payoffs of any lottery can be substituted by the expected value (i.e. mean) of that lottery.  If uncertainty is included in a model, it is often in  some "nice" and "tame" ways such as random variables with known (or knowable) mean, variance, and distribution form.  But uncertainty that is central to cyber security is not always nice and tame in this way.

The rest of this post explores these topics and makes the case that time and uncertainty are essential features of any model of the "game of cyber security". (In this post, I focus only on information security and exclude other components of cyber security (e.g. digital privacy, digital content rights, digital civil liberties, digital national security, digital IP protection, digital national/homeland security, etc.).  This allows me to talk about "attackers" and "defenders" in a fairly simple way.)

Ignorance and Uncertainty in Information Security

As I wrote in an early blog post:
"Ignorance and uncertainty come in many and varied species.  To fully embrace them (i.e. model them, analyze them, make inferences about them), you can't subsume them into a couple of categories."
Here is a taxonomy I created that shows categorical relationships between many of the species:
(click to enlarge)
(What I've labeled as "Stochastic Probability" might better be labeled "Variability".)

Most models of information security include variability, specifically about the timing and severity of loss events, as if actors in the system knew or could know the point probabilities or probability distributions that characterized this variability.  This is far from a simple and obvious task, even with a lot of experience and a lot of data.  Even if we aim to estimate "Big 'R' Risk", we will be grappling with many of these other species of ignorance and uncertainty during the course of our estimation process.

Some models include incomplete information, and a very few include erroneous information (esp. in the context of signaling).  But as you can see from the taxonomy above, this only scratches the surface of the "rain forest". Full exploration would l take another blog post (or two or three), but I'll offer examples I can imagine to stimulate your thinking:

  • Delusion -- "They passed their PCI audit, so we know they are secure."
  • Absence -- The six hundred and twenty seven questions you DID NOT ask in your vendor risk assessment questionnaire.
  • Fuzziness -- "Yellow" risk, on a red-yellow-green ordinal scale
  • Uncomputable -- Algorithm that perfectly filters out bad input for Turing-complete input languages (i.e. filters out all SQL-i).
  • Infidelity -- IT asset inventories that fail to account for "Shadow IT".
  • Taboo -- executive fraud facilitated by or executed through information security breaches.
  • Inconceivable -- your largest IT vendor is pwned top to bottom.

Time in Information Security

I have argued elsewhere that we can not even define "security" and "risk" without the context of time. From that blog post:

  • Security is a judgment about the present, relative to possible and probable futures.
  • (Economic) Risk is the cost of the future valued in the present
  • (Empirical, ground-truth) Data tells you about the state of the world in past through observation and measurement processes. Only by acts of inference can you connect your data with the present state of the world or the future state.
  • Investments are commitments of resources in the present with hopes of shaping a positive future.  (Past investments are "sunk" and, except for their lingering effects, are gone and irrelevant for decisions)

Security is not a state of the world or collection of states.  Some states of the world are bad/undesirable/negative, and we call these "breaches".  Those we can get data about (if we are good, or lucky, or both).  Yes, we can look back in time on a system and comment about whether it was "secure" at any moment of time, in light of events as they transpired.  But, here the word "secure" only means the provisional, contingent assertion that "it hasn't been breached yet". But once a breach event has happened, the "security" cows are out of the barn, so to speak. Therefore, we can't get data about security, directly.  We can only get data on states of the world that inform our judgment that about security.

Let me illustrate this with a simple example.

Imagine you have a clock and a box with a glass top.  Your "information system" is the clock and your "security" is the box. Outside of the box, you worry that someone (even Nature) might tamper with the clock to render it less than fully functional. So, at $t = 1$, you put the clock in the box -- face up -- and then close the lid on the box.  Therefore, after $t = 1$ the state of nature is the clock is inside the box with the lid closed. Now at  $t = 2$ you put the box-with-clock on to a table in a room and leave the room, closing the door.  Questions:

  1. Was the clock secure at $t = 0$? (clock outside of box)
  2. Was the clock secure at $t = 1$? (clock in box with closed lid)
  3. Was the clock secure at $t = 2$? (box-with-clock on table, alone in room with closed door)
Looking backward from $t = 2$, we can answer "yes" to questions 1 and 2 because there was no breach.  Maybe we might give a qualified answer of yes -- a "yes...but..." -- where we talk about what might have happened (i.e. possibilities and probabilities) that motivated us to take those steps.

But how can we answer question 3)?  I assert that the only way to answer question 3) is to make inferences about what might happen later in time $t > 2$ and how those states of affairs might or might not change our judgement.  What might inform our judgment?  How about a lock for the box?  Or even a clasp?  Or maybe the top can be glued on?  How about hard-to-break glass?  Or something transparent besides glass?  How about shock-absorbing lining in the box?  And so on.

All of these considerations are tied to what might in the future go wrong with the clock-in-the-box, including some hints at the threat agent (Nature?  An attacker?  Myself, as an "insider threat" prone to error?).  This examples shows, I hope, that any statement or measure of security is a contextual judgement, set both in time and in uncertainty.

Time and Information in Risky Investments

The reality about any risky investment: 
  • You make investment decisions today but have to wait until future events unfold before you realize the payoffs, or even know what the payoffs are.
Instead of abstracting investment decisions as bets on lotteries with known payoff distributions, what if we view them as multi-armed bandit games, where we do not know the distribution parameters or shape for each "bandit" (i.e. slot machine).  Also, let's assume that our "bandits" do not deliver their payoff immediately and reliably after we pull the handle, but some unknown amount of time later.

This presents us with the "exploration-exploitation dilemma" in decision theory.  Do I "exploit" the "bandits" that I know the best from past experience, or do I "explore" putting coins into new "bandits" on the hope that they will payoff better?  

As mentioned above, most economists treat the time delay between investment and payoff as a non-problem by substituting the expected value (mean) of payoffs (or, specifically, the subjective expected utility of payoffs).  (There are heterodox economists who do not take this position, notably the Austrians.  See: The Economics of Time and Ignorance, and also here and here.)

Here's why they miss the boat and why we can't do this in cyber security.
  • Non-stationarity and coevolution -- where the payoff structure changes in ways that are not easily predictable. For example, if many defenders invest in A, then attackers might shift their resources and strategies to attacks that circumvent A, thereby reducing the "payoff" to A. 
  • Reflexivity -- where investment activity changes the payoff structure.  This includes investments with network effects and "circularity".  (FYI, George Soros wrote a book about this in the 1980s with regard to reflexivity in financial markets.  See: book, article, article, 1 hr video)
  • Dependence between investments -- Investing in A, B, and C might improve security, but investing in D reduces the effectiveness of A, B, or C.  In networks of systems this can manifest as "self-organized criticality" or "highly optimized tolerance".  Small triggers can lead to large, cascading outcomes.  There can also be sequential dependence:  I have to invest in A first, then B, and then C or D, but not in any arbitrary order, say, D - B - C - A.
  • "Unruly" ignorance and uncertainty -- beyond "variational" uncertainty that is treated in subjective expected utility theory.  (See above.)
I assert that there is an even more a stark reality about investments in cyber security: 
  • The future may unfold in a such way that you might never know whether you made the right investment decisions from empirical data alone.
This is the curse of trying to protect against very-low-probability/high-impact loss events.  Even though we do everything we can to avoid them, loss events (and near misses) provide the most solid, rich empirical data about our information security investments and their effectiveness. But given their low probability, the worst-case loss events may never happen to you or to me.  In other words, a given path of history might yield data that shed no useful light on our investment decisions.  Accepting this stark reality means that we cannot rely on past history and experience alone.  We can't be purely data-driven, using only historical empirical data.  We have to be forward looking, which means thinking about both states of the world and also time as the essential space in which those states unfold.

Time and Information in Dynamical Systems

Dynamical systems are those where the current state is a function of changes per unit (or instance) of time.  They are all about interrelated rates of change. (FYI, that is what Calculus is all about, so it is the tool of choice to model most dynamical systems). A simple example of a dynamical system is a pendulum where the state of the system is the position $x$ and velocity $v$ and the only force of change is gravity.  Another simple example is a modern toilet, which has a mechanical control system to restore the water to a specific level in the tank after it is flushed.   Here's a nice graphic from a some-what more complicated model:
Dynamic stock and flow diagram of model New product adoption (model from article by John Sterman 2001)
(Image from Wikimedia)

Every dynamical system can be viewed in terms of signals flowing through a network, where nodes in the network "do something" or "have something done to them" (crudely speaking).  What happens when there is a time delay between nodes in the action/result network?  A: complicated things might happen!  This phenomena (and others, like feedback) is studied in the field called System Dynamics.  (See here and here.)  

What follows is an example to dramatize this point:  even simple time delays can confound our intuition and beliefs about system behavior.  The schematic below shows a very simple dynamical system -- a "balancing process" -- that has behavior similar to a household thermostat and heating/cooling system.

Balancing process system archetype
(Image from Wikimedia)

If we interpret this as a model for a thermostat/heating/cooling system, then the "actual conditions" are the difference between the temperature set point and the current air temperature.  The "corrective action" is turning on either the heater or the air conditioner to heat or cool the air.  Notice that there are two signals in the system: 1) between the air temperature difference and the heating/cooling mechanism (right arrow), and 2) between the heating/cooling mechanism and the thermometer, mediated by the air (left arrow).

Let's say we walk into a very cold room (0 degrees F) and set the thermostat to 72 F.  How does the system behave? Our intuition says: when it is too col in the room, the heater turns on until the temperature reaches the set point, then the heater turns off, and vice versa if it is too hot.  There is a transient period where the A/C or heater are on, followed by an equilibrium period when the air temperature is equal to the set point, and neither are on.  Our intuition (and desire!) tells us that the transient periods should be relatively brief and we should experience long pleasant periods of equilibrium at the desired temperature.

If the time delays in signals are zero, and the rate of heating/cooling is modest, here's a graph of the behavior we'd see. Note: this matches our expectation and intuition.

No time delay
What would you expect to happen if there is a time delay in signal #2 between heating/cooling mechanism and the thermometer?  Maybe it will take longer to heat up, maybe proportional to the time delay.  Let's see what happens with a little delay:
Time delay = 50.  Reaches set point sooner, but with some overshoot.
That's odd!  With time delay of 50, the air temperature reaches the set point temperature faster, not slower.  And it also overshoots by a little bit, requiring some air conditioning to cool down the air.

What happens with more time delay? Maybe we can speed up the heating even more and save energy.
Time delay = 100. Results in considerable overshoot, followed by damped oscillations

Well that is not good. Big overshoot (to over 100 degrees F) followed by oscillating heating and cooling cycles before setting down.  Notice that we see oscillating behavior even though there are no new forces or feedback loops in the system.  This is solely due to time delay in ONE signal!

If delay increases further, to a critical value, the oscillations aren't damped any more and the heating and cooling cycles never stop.   You are not comfortable in the room for more than an instant at a time, plus your energy bills are through the roof!
Time delay = 173 (a critical value). Oscillations keep going forever.
And if the time delay increases even further, then the system is driven to extreme temperatures, perhaps inducing the heating/cooling system to fail through overuse.
Time delay = 250.  Oscillations are continually amplified.

While our example is a physical system, this sort of system behavior arises in many socio-technical systems.  One well-known example is supply chains, as demonstrated by the Beer Distribution Game.

The moral of this story is that we may not have a good intuition about the effects of time delays in dynamical systems, even in very simple systems.  The other more, pertinent to this blog post, is that if we want to understand dynamical systems, we had better include the time dimension, if only to account for the effects of time delays!

No comments:

Post a Comment