Exploring Possibility Space: Estimating your organization's risk appetite, starting from scratch

On Twitter recently, Phillip Beyer (@pjbeyer) asked: "how do you measure risk appetite in program early stages?". I gave my answers in a series of tweets, but this question comes up a lot so I think it's worthy of a blog post.

[Edit: Feel free to substitute the term "risk tolerance" for "risk appetite". They have slightly different origins, but their interpretation in this context is the same.]

First, some people have an aversion to the concept of "risk appetite" and others deny that it even applies to information security (or more broadly to cyber security). The argument goes that no rational manager or organization desires to take on information security risk if they could avoid it, and therefore there is no such thing as an appetite for risk. A different argument against it is based on belief that risk in information security is not quantifiable, and therefore attempts to quantify risk appetite are similarly impossible or meaningless.

I believe these two positions are mistaken. The first objection is a misunderstanding of what "risk appetite" really means and how it applies to information security. I'll explain and clarify, hopefully, Also in this post, I'll also address the second objection to show how risk appetite can be reliably quantified.

What is Risk Appetite for Information Security?

The origin of the term "risk appetite" is in the field of financial markets, both trading (i.e. "market risk") and lending ("credit risk"). In both of these settings, the financial firm invests capital in order to earn a financial return, either through interest payments, dividends, or capital gains. The downside risk can include lower than expected interest, dividends or capital gains, but more importantly can include loss of principle (invested capital). There are important theories in financial economics that show that, in efficient markets, higher expected returns can only be achieved by taking on higher levels of downside risk. Thus, there is a symmetry between gains and losses, returns and risks.

But in the field of information security (a subset of operational risk), the returns (upside) are not symmetric with the risks (downside). The upside of information systems and data come through normal usage (assuming that the systems and data are fit for purpose and are well-managed), but the downside losses are due to failures and abnormal conditions (e.g. major breaches). We can't always draw direct causal relationships between upside returns and downside risks. Indeed, it's easy to imagine information systems with very little upside return but very high downside risk. Generally, the causal processes that drive upside return are fundamentally different from the causal processes that drive downside risk, and these two sets of causal processes are generally loosely coupled.

Nonetheless, it is true that building and supporting information systems will require accepting the possibility of failures and abnormal conditions. Every person, team, and organization has some capacity to absorb costs associated with failures or abnormal conditions, and likewise there are costs that are beyond the capacity of a given person, team or organization to absorb. Furthermore, just because you can absorb a cost doesn't mean that you want to do so as a matter of policy.

Putting this all together, risk appetite in the context of information security is simply the downside losses (costs) that your organization is willing to pay for with its existing resources (cash reserves, people, machines, etc.). Most of the time, the way people and organizations "pay" for these costs is to delay or eliminate other spending. Projects will be delayed or canceled. Hiring may be frozen. Capital spending may be cut. Contingent staff or consultants may be terminated. Understanding these methods gives you very important clues as you estimate risk appetite.

How to Estimate Risk Appetite From Scratch

Some people think that it's only possible to estimate risk appetite when you have a very mature risk management program with solid quantitative risk models. I disagree. I think most organizations could make a good rough estimate today. Here's the steps:

Focus on magnitude of individual loss events measured in dollars, as opposed to incidents, breaches, or annual costs.
For each type of major loss event, identify the specific budgets that will be hit. For each budget, identify what the consequences will be for "blowing the budget".
Find comparable loss events in your company, in your industry, or in your sector, e.g. natural disasters, fraud, critical infrastructure disruption, etc. Investigate how these sorts of losses are absorbed in these organizations.
Summarize your findings from 2) and 3) in terms of how downside costs are absorbed, e.g. cutting other discretionary spending, delaying or canceling projects, freezing/cutting headcount, etc. At what point do the losses exceed internal resources, such that you might experience contractual penalties, extraordinary borrowing, decline in credit rating, etc.
Estimate "risk appetite" as the threshold between "Yeah, we can absorb that costs" and "F*ck no! That would kill us!". This may vary by type of loss event and whether they have regulatory or stakeholder impacts. For example, loss events with high impact on reputation may have a much lower threshold than loss events with no reputation impact.
Test and refine the estimates of risk appetite through scenario analysis and "what if?" exercises with management and executive teams. The best validation is through pre-commitment, where managers and executives essentially sign contracts to accept or cover loss events up to specific thresholds. This makes these numbers "real" to them and could improve decision-making all around.

How Fragile is Your Organization?

So far I've been talking about risk appetite as the willing capacity to absorb downside losses and costs, as a matter of policy. And the attention has been on individual loss events, in isolation. But it's also very important to consider the context of loss events hand how they might sink the whole organization.

This comes down to understanding how fragile your organization is, and in what ways or contexts is it fragile. Sadly, most organizations and executives don't think this through, and there are also many misguided beliefs. Anytime I hear someone call some information asset "the Crown Jewels", I cringe because it is probably more mythology than substantiated fact. Likewise, simply losing large amounts of money will not, by itself, cause a business to go bankrupt, especially if it is large in size and capitalization.

The best way to estimate your organization's fragility is to look for comparable cases of collapse or bankruptcy, either inside or outside of your industry. In banking and finance, there are many examples. In telecommunications services, there is WorldCom, which collapsed due to massive executive fraud. Likewise with Enron. But there are many businesses that depend on a "lynch pin" to keep operating, and if that lynch pin is removed then it could collapse quickly. A recent example is Efficient Services Escrow of Huntington Beach, Calif. As detailed in this blog post, Efficient Services Escrow went bankrupt as the result of "three fraudulent international wire transfers totaling $1,558,439 between December 2012 and February 2013. ... Under California law, escrow and title companies are required to immediately report any lost funds. When Efficient reported the incident to state regulators, the California Department of Corporations gave the firm three days to come up with money to replace the stolen funds." (In other words, escrow and title companies are in the trust business and they face harsh punishments if that trust is broken.)

The general rule is that, for a given loss event type, your organization's risk appetite will be lower in domains where it is fragile, especially in "high stress" situations and contexts, as compared to the general situation where the loss event does not touch the fragile parts of the business.