Monday, July 29, 2013

Think You Understand Black Swans? Think Again.

"Black Swan events" are mentioned frequently in tweets, blog posts, public speeches, news articles, and even academic articles.  It's so widespread you'd think that everyone knew what they were talking about. But I don't.

Coming soon: 23 Shades of Black Swans
I think the "Black Swan event" metaphor is a conceptual mess.

Worse, it has done more harm than good by creating confusion rather than clarity and by serving as a tool for people who unfairly denigrate probabilistic reasoning.  It's also widely misused, especially by pundits and so-called thought leaders to give the illusion that they know what they are talking about on this topic when they really don't.

But rather than just throwing rocks, in future posts I will be presenting better/clearer metaphors and explanations -- perhaps as many as 23 Shades of Black Swan.  Here are the ones I've completed so far:
  1. Grey Swans: Cascades in Large Networks and Highly Optimized/Critically Balanced Systems
  2. Green Swans: Virtuous Circles, Snowballs, Bandwagons, and the Rich Get Richer
  3. Red Swans: Extreme Adversaries, Evolutionary Arms Races, and the Red Queen
  4. Disappearing Swans: Descartes' Demon -- the Ultimate in Diabolical Deception
  5. Out-of-the-Blue Swans: Megatsunami, Supervolcanos, The Black Death, and Other Cataclysms
  6. Orange TRUMPeter Swans: When What You Know Ain't So
In this post, I just want to make clear what is so wrong about the "Black Swan event" metaphor.

Taleb's Many Concepts, Bundled Into a Single Metaphor

Drawing from Taleb, Michael Smith of NBS offers a concise definition: "To be a true Black Swan event, an event must have three distinct features:
  1. The event is a surprise to the observer 
  2. The event has an extreme impact 
  3. After it happens, the event can be predicted with hindsight"
From Wikipedia, here how the metaphor expanded from its original scope (financial markets) to include any event in any domain:
Black swan events were introduced by Nassim Nicholas Taleb in his 2001 book Fooled By Randomness, which concerned financial events. His 2007 book The Black Swan -- Impact of the Highly Improbable extended the metaphor to events outside of financial markets. Taleb regards almost all major scientific discoveries, historical events, and artistic accomplishments as "black swans"—undirected and unpredicted. He gives the rise of the Internet, the personal computer, World War I, dissolution of the Soviet Union, and the September 2001 attacks as examples of black swan events.
The Wikipedia article list three aspects of the Black Swan phenomena:
  1. The disproportionate role of high-profile, hard-to-predict, and rare events that are beyond the realm of normal expectations in history, science, finance, and technology
  2. The non-computability of the probability of the consequential rare events using scientific methods (owing to the very nature of small probabilities)
  3. The psychological biases that make people individually and collectively blind to uncertainty and unaware of the massive role of the rare event in historical affairs

Taleb's Two Worlds

In the book The Black Swan, Taleb describes two worlds -- "Mediocrastan" and "Extremestan" -- to explain why Black Swan events are problematic.  From this article:
  • "Mediocristan is where normal things happen, things that are expected, whose probabilities of occurring are easy to compute, and whose impact is not terribly huge. The bell curve and the normal distribution are emblems of Mediocristan."
  • "Exstremistan is a different beast. In Extremistan, nothing can be predicted accurately and events that seemed unlikely or impossible occur frequently and have a huge impact. Black Swan events occur in Exstremistan. ... Think income distributions... [adding] one observation [e.g. Bill Gates] will have a disproportionate effect on the average. This is Exstremistan."
These two worlds of causal processes make sense for some phenomena like time series of financial returns or earthquake magnitudes.  But where Taleb goes off the rails is when he (implicitly) subsumes all causal processes for all domains into these two.  Looking at his examples -- Internet, the personal computer -- it's clear that he's subsuming all major innovations into Extremistan, but this neglects the many and varied innovation systems that gave rise to them.

Here's a cyber security example of how people conflate innovation (e.g. Stuxnet) with Black Swan:
"One of the paramount cyber war events in recent years was the Stuxnet worm that infiltrated Iran’s nuclear facilities. ... Stuxnet can be defined as a black swan for a number of reasons. First, it contained the element of surprise. ... Second, from both a practical perspective and as a confidence destroyer, the effect of the worm on the Iranian nuclear program was immense. ... Third, in recent years, there have been many indications of zero-day Trojan horses (exploiting computer application weak spots), backdoor attacks (circumventing normal authentication), and other malware designed for targeted attacks against organizations and facilities." 
Based on this mistaken generalization, Taleb argues that all of our methods -- formal and informal, quantitative and qualitative -- are only suited for Mediocrastan and are helpless and hopeless in Extremestan.  This is one of the unwarranted assertions that gives some people license to denigrate or dismiss probabilistic reasoning.

What's Wrong With the "Black Swan Event" Metaphor?

It makes no sense to talk about "Black Swan events" What makes a "Black Swan event" is not the event itself.  Instead, it is how that event fits into the object-observer system.  This system is the mutual influence of three agencies:
  1. Causal processes in the world that generate events
  2. The available body of evidence and beliefs
  3. Our methods of reasoning and understanding, including about the existence and nature of causal processes, and methods for shaping evidence and beliefs
Only in this system of interaction does it make sense to talk about "highly improbable", "surprisingly high impact", or "rationalized in hindsight".  In other words, just because some event has a very low frequency of occurrence (causal process) doesn't make it a Black Swan, even if some people are "surprised" by it.  To other people with different evidence and different reasoning methods, the same event could be not very surprising at all.

On this theme, Stanford University Professor Elisabeth Paté-Cornell has recently published an excellent article in the journal Risk Analyisis.  (For a summary of her main points, see this and this.)  She explains how many existing methods of risk analysis can be used successfully even in the context of highly improbable or "surprising" events.  Here's a short video where she explains why 9/11 should not be called a Black Swan event just because some people were surprised or that it appeared to be highly improbable to some people.

Here's whats wrong with the Black Swan event metaphor: by emphasizing the event rather than the system, it diverts attention from what is really important and what we can do to improve our understanding.

Black Swan isn't one phenomena.  It's many.  Conflating all types of highly improbable and surprising events into one category causes confusion and permits some people to project the illusion that they understand what they are talking about when they really don't.

How many types are there? It is likely that there is different type of Black Swan for every form of ignorance and uncertainty.  If you buy into the the taxonomy I presented in this post, there might be as many as 23 different shades of Black Swans.  (Not quite 50 Shades, but moving in that direction :-)  )

The Two Worlds explanation -- "Mediocrastan vs Extremestan" -- only applies to one "shade" of Black Swan. These two worlds describe causal processes, but only two classes out of many.  It is a big mistake to reduce all "highly improbable/surprising" events to the causal process associated with Extremestan.

The possibility of Black Swan phenomena does not mean that probabilistic reasoning is  hopeless.  There are some people who assert that if we can't predict future events (or even estimate the probability distribution) then we can't know anything about them.  This is assertion is both polemic and ignorant.  It's polemic in that they are trying to diminish or banish probabilistic risk analysis.  It's ignorant because it ignores many methods of reasoning about uncertainty that have been shown to be successful in many settings, including:
  • Scenario Analysis
  • System simulation and stress testing
  • Subjective probability (Bayesian)
  • Fuzzy logic
  • Modal logic
  • Possible worlds logic
  • Dempster-Shafer Theory (belief functions)
  • Empirical Bayes, including "deep" machine learning
  • Complex Adaptive System simulations (for emergent properties)
[Edit 7/29/2013: Added this...]
The Black Swan metaphor is so widely and frequently misused that it's value in communications is severely debased.  As Exhibit A for misuse, I offer this example from a blog post by Peter Hesse, President of Gemini Security Solutions (emphasis added):
"There are numerous examples of these 'black swan' events all around; passwords are being stolen from websites at an alarming rate. The latest Identity Theft Resource Center breach statistics report reveals that there were 399 breaches in the first 11 months of 2012, compromising over 15 million records. Most people have heard stories like what happened to Mat Honan, or even what happened to me. 
This particular black swan is in flight. There will be a hack that will affect you. While it is rare, it will have a significant impact and is completely predictable. The question now is, what are you doing about it?"
This is pure FUD.

Coming Soon: 23 Shades of Black Swan

I don't expect my commentary to completely drive out the "Black Swan event" metaphor, but I hope that it will be used less frequently and less abusively.

In a series of future posts, I hope to enumerate the many types of Black Swan -- maybe as many as 23 "shades".  For each, I'll give a simple definition, a few examples, and then show how it arises in a system of 1) causal processes; 2) body of evidence and beliefs; and 3) models of reasoning and understanding.

[Edit: Adding links to these posts the go public]
  1. Grey Swans: Cascades in Large Networks and Highly Optimized/Critically Balanced Systems
  2. Green Swans: Virtuous Circles, Snowballs, Bandwagons, and the Rich Get Richer
  3. Red Swans: Extreme Adversaries, Evolutionary Arms Races, and the Red Queen
  4. Disappearing Swans: Descartes' Demon -- the Ultimate in Diabolical Deception
  5. Out-of-the-Blue Swans: Megatsunami, Supervolcanos, The Black Death, and Other Cataclysms

No comments:

Post a Comment