Here are my slides from my SIRAcon talk.
Here's a slide that got a lot of attention. (Humorous, of course)
Showing posts with label information security. Show all posts
Showing posts with label information security. Show all posts
Wednesday, May 15, 2019
Tuesday, December 18, 2018
Does Modern Portfolio Theory (MPT) apply to cyber security risks?
Many months ago, my colleague David Severski asked on Twitter how Modern Portfolio Theory (MPT) does or does not apply to quantified cyber security risk:
I replied that I would blog on this "...soon". Ha! Almost four months later. Well, better late than never.
Short answer: No, MPT doesn't apply. Read on for explanations.
NOTE: "Cyber security risk" in this article is quantified risk -- probabilistic costs of loss events or probabilistic total costs of cyber security. Not talking about color-coded risk, categorical risk, or ordinal scores for risk. I don't ever talk about them, if I can help it.
I replied that I would blog on this "...soon". Ha! Almost four months later. Well, better late than never.
Short answer: No, MPT doesn't apply. Read on for explanations.
NOTE: "Cyber security risk" in this article is quantified risk -- probabilistic costs of loss events or probabilistic total costs of cyber security. Not talking about color-coded risk, categorical risk, or ordinal scores for risk. I don't ever talk about them, if I can help it.
Monday, April 16, 2018
Presentation: Navigating the Vast Ocean of Browser Fingerprints
Here a PDF version of my BSides San Francisco presentation. (Today, Monday at 4:50pm)
COMING SOON: GitHub repo with Python and R code, plus sample data. Watch this space.
COMING SOON: GitHub repo with Python and R code, plus sample data. Watch this space.
Tuesday, June 21, 2016
Public Statement to the Commission on Enhancing National Cybersecurity, 6-21-2016
[Submitted in writing at this meeting. An informal 5 min. version was presented during the public comment period. This statement is my own and does not represent the views or interests of my employer.]
Why?
Because we have the wrong model for research and development (R&D) on institutions.
My primary recommendation is that the Commission’s report should promote new R&D models for institutional innovation. We can learn from examples in other fields, including sustainability, public health, financial services, and energy.
Cyber security[1] has institutions today, but many are inadequate, dysfunctional, or missing. Examples:
There is plenty of talk about “standards”, “information sharing”, “public-private partnerships”, and “trusted third parties”, but these remain mostly talking points and not realities.
Institutional innovation is a set of processes that either change existing institutions in fundamental ways or create new institutions. Sometimes this happens with concerted effort by “institutional entrepreneurs”, and other times it happens through indirect and emergent mechanisms, including chance and “happy accidents”.
Institutional innovation takes a long time – typically ten to fifty years.
Institutional innovation works different from technological innovation, which we do well. In contrast, we have poor understanding of institutional innovation, especially on how to accelerate it or achieve specific goals.
Finally, institutions and institutional innovation should not be confused with “policy”. Changes to government policy may be an element of institutional innovation, but they do not encompass the main elements – people, processes, technology, organizations, and culture.
There is no single best model for institutional innovation. There needs to be creativity in “who leads”, “who follows”, and “when”. The normal roles of government, academics, industry, and civil society organizations may be reversed or otherwise radically redrawn.
Techniques are different, too. It can be orchestrated as a “messy” design process [3]. Fruitful institutional innovation in cyber security might involve some of these:
What all of these have in common is that they produce something that can be tested and can support learning. They are more than talking and consensus meetings.
There are several academic fields that can contribute defining and analyzing new innovation models, including Institutional Sociology, Institutional Economics, Sociology of Innovation, Design Thinking, and the Science of Science Policy.
In fact, there would be great benefit if there were a joint R&D initiative for institutional innovation that could apply to these other fields as well as cyber security. Furthermore, there would be benefit making this an international effort, not just limited to the United States.
Summary
Cyber security desperately needs institutional innovation, especially involving incentives and metrics. Nearly every report since 2003 has included recommendations to do more R&D on incentives and metrics, but progress has been slow and inadequate.Why?
Because we have the wrong model for research and development (R&D) on institutions.
My primary recommendation is that the Commission’s report should promote new R&D models for institutional innovation. We can learn from examples in other fields, including sustainability, public health, financial services, and energy.
What are Institutions and Institutional Innovation?
Institutions are norms, rules, and social structures that enable society to function. Examples include marriage, consumer credit reporting and scoring, and emissions credit markets.Cyber security[1] has institutions today, but many are inadequate, dysfunctional, or missing. Examples:
- overlapping “checklists + audits”;
- professional certifications;
- post-breach protection for consumers (e.g. credit monitoring);
- lists of “best practices” that have never been tested or validated as “best” and therefore are no better than folklore.
There is plenty of talk about “standards”, “information sharing”, “public-private partnerships”, and “trusted third parties”, but these remain mostly talking points and not realities.
Institutional innovation is a set of processes that either change existing institutions in fundamental ways or create new institutions. Sometimes this happens with concerted effort by “institutional entrepreneurs”, and other times it happens through indirect and emergent mechanisms, including chance and “happy accidents”.
Institutional innovation takes a long time – typically ten to fifty years.
Institutional innovation works different from technological innovation, which we do well. In contrast, we have poor understanding of institutional innovation, especially on how to accelerate it or achieve specific goals.
Finally, institutions and institutional innovation should not be confused with “policy”. Changes to government policy may be an element of institutional innovation, but they do not encompass the main elements – people, processes, technology, organizations, and culture.
The Need: New Models of Innovation
Through my studies, I have come to believe that institutional innovation is much more complicated [2] than technological innovation. It is almost never a linear process from theory to practice with clearly defined stages.There is no single best model for institutional innovation. There needs to be creativity in “who leads”, “who follows”, and “when”. The normal roles of government, academics, industry, and civil society organizations may be reversed or otherwise radically redrawn.
Techniques are different, too. It can be orchestrated as a “messy” design process [3]. Fruitful institutional innovation in cyber security might involve some of these:
- “Skunk Works”
- Rapid prototyping and pilot tests
- Proof of Concept demonstrations
- Bricolage[4] and exaptation[5]
- Simulations or table-top exercises
- Multi-stakeholder engagement processes
- Competitions and contests
- Crowd-sourced innovation (e.g. “hackathons” and open source software development)
What all of these have in common is that they produce something that can be tested and can support learning. They are more than talking and consensus meetings.
There are several academic fields that can contribute defining and analyzing new innovation models, including Institutional Sociology, Institutional Economics, Sociology of Innovation, Design Thinking, and the Science of Science Policy.
Role Models
To identify and test alternative innovation models, we can learn from institutional innovation successes and failures in other fields, including:- Common resource management (sustainability)
- Epidemiology data collection and analysis (public health)
- Crash and disaster investigation and reporting (safety)
- Micro-lending and peer-to-peer lending (financial services)
- Emissions credit markets and carbon offsets (energy)
- Open software development (technology)
- Disaster recovery and response[6] (homeland security)
In fact, there would be great benefit if there were a joint R&D initiative for institutional innovation that could apply to these other fields as well as cyber security. Furthermore, there would be benefit making this an international effort, not just limited to the United States.
Endnotes
[1] "Cyber security" includes information security, digital privacy, digital identity, digital information property, digital civil rights, and digital homeland & national defense.
[2] For case studies and theory, see: Padgett, J. F., & Powell, W. W. (2012). The Emergence of Organizations and Markets. Princeton, NJ: Princeton University Press.
[3] Ostrom, E. (2009). Understanding Institutional Diversity. Princeton, NJ: Princeton University Press.
[4] “something constructed or created from a diverse range of available things.”
[5] “a trait that has been co-opted for a use other than the one for which natural selection has built it.”
[6] See: Auerswald, P. E., Branscomb, L. M., Porte, T. M. L., & Michel-Kerjan, E. O. (2006). Seeds of Disaster, Roots of Response: How Private Action Can Reduce Public Vulnerability. Cambridge University Press.
Thursday, April 28, 2016
Entry in #BSidesLV Logo Contest
Here's my entry in the BSides Las Vegas logo contest. The crowd-chosen slogan is "Popping calc.exe since 2009".
![]() |
click to enlarge |
Wednesday, March 30, 2016
#Tay Twist: @Tayandyou Twitter Account Was Hijacked ...By Bungling Microsoft Test Engineers (Mar. 30)
This summary is not available. Please
click here to view the post.
Sunday, March 27, 2016
Microsoft's Tay Has No AI
(This is the third of three posts about Tay. Previous posts: "Poor Software QA..." and "...Smoking Gun...")
While nearly all the press about Microsoft's Twitter chatbot Tay (@Tayandyou) is about artificial intelligence (AI) and how AI can be poisoned by trolling users, there is a more disturbing possibility:
I say "probably" because the evidence is strong but not conclusive and the Microsoft Research team has not publicly revealed their architecture or methods. But I'm willing to bet on it.
Evidence comes from three places. First is from observing a small non-random sample of Tay tweet and direct message sessions (posted by various users). Second is circumstantial, from composition of the team behind Tay. Third piece of evidence is from a person who claims to have worked at Microsoft Research on Tay until June 2015. He/she made two comments to my first post, but unfortunately deleted the second comment which had lots of details.
While nearly all the press about Microsoft's Twitter chatbot Tay (@Tayandyou) is about artificial intelligence (AI) and how AI can be poisoned by trolling users, there is a more disturbing possibility:
- There is no AI (worthy of the name) in Tay. (probably)
I say "probably" because the evidence is strong but not conclusive and the Microsoft Research team has not publicly revealed their architecture or methods. But I'm willing to bet on it.
Evidence comes from three places. First is from observing a small non-random sample of Tay tweet and direct message sessions (posted by various users). Second is circumstantial, from composition of the team behind Tay. Third piece of evidence is from a person who claims to have worked at Microsoft Research on Tay until June 2015. He/she made two comments to my first post, but unfortunately deleted the second comment which had lots of details.
Saturday, March 26, 2016
Microsoft #TAYFAIL Smoking Gun: ALICE Open Source AI Library and AIML
[Update 3/27/16: see also the next post: Microsoft's Tay has no AI"]
As follow up to my previous post on Microsoft's Tay Twitter chatbot (@Tayandyou), I found evidence of where the "repeat after me" hidden feature came from. Credit goes to SSHX for this lead in his comment:
AIML is acronym for "Artificial Intelligence Markup Language", which "is an XML-compliant language that's easy to learn, and makes it possible for you to begin customizing an Alicebot or creating one from scratch within minutes." ALICE is acronym for "Artificial Linguistic Internet Computer Entity". ALICE is free natural language artificial intelligence chat robot.
As it happens, there is an interactive web page with Base ALICE here. (Try it out yourself.) Here is what happened when I entered "repeat after me" and also "repeat this...":
In Base ALICE, the template response to "repeat after me" is "...". In other words, NOP ("no operation"). This is different from the AIML statement, above, which is ".....Seriously....Lets have a conversation and not play word games.....". Looks like someone just deleted the text following three periods.
But the template response to "repeat this X" is "X" (in quotes), which is consistent with the AIML statement, above.
My assertion about root cause stands: poor QA process on the ALICE rule set allowed the "repeat after me" feature to stay in, when it should have been removed or modified significantly.
Another inference is that "repeat after me" is probably not the only "hidden feature" in AIML rules that could have caused misbehavior. It was just the one that the trolls stumbled upon and exploited. Someone with access to Base ALICE rules and also variants could have exploited these other vulnerabilities.
As follow up to my previous post on Microsoft's Tay Twitter chatbot (@Tayandyou), I found evidence of where the "repeat after me" hidden feature came from. Credit goes to SSHX for this lead in his comment:
"This was a feature of AIML bots as well, that were popular in 'chatrooms' way back in the late 90's. You could ask questions with AIML tags and the bots would automatically start spewing source into the room and flooding it. Proud to say I did get banned from a lot of places."A quick web search revealed great evidence. First, some context.
AIML is acronym for "Artificial Intelligence Markup Language", which "is an XML-compliant language that's easy to learn, and makes it possible for you to begin customizing an Alicebot or creating one from scratch within minutes." ALICE is acronym for "Artificial Linguistic Internet Computer Entity". ALICE is free natural language artificial intelligence chat robot.
Evidence
This Github page has a set of AIML statements staring with "R". (This is a fork of "9/26/2001 ALICE", so there are probably some differences between Base ALICE today.) Here are two statements matching "REPEAT AFTER ME" and "REPEAT THIS".![]() |
Snippet of AIML statements with "REPEAT AFTER ME" AND "REPEAT THIS" (click to enlarge) |
In Base ALICE, the template response to "repeat after me" is "...". In other words, NOP ("no operation"). This is different from the AIML statement, above, which is ".....Seriously....Lets have a conversation and not play word games.....". Looks like someone just deleted the text following three periods.
But the template response to "repeat this X" is "X" (in quotes), which is consistent with the AIML statement, above.
Conclusion
From this evidence, I infer that Microsoft's Tay chatbot is using the open-sourced ALICE library (or similar AIML library) to implement rule-based behavior. Though they did implement some rules to thwart trolls (e.g. gamergate), they left in other rules from previous versions of ALICE (either Base ALICE or some forked versions).My assertion about root cause stands: poor QA process on the ALICE rule set allowed the "repeat after me" feature to stay in, when it should have been removed or modified significantly.
Another inference is that "repeat after me" is probably not the only "hidden feature" in AIML rules that could have caused misbehavior. It was just the one that the trolls stumbled upon and exploited. Someone with access to Base ALICE rules and also variants could have exploited these other vulnerabilities.
Friday, March 25, 2016
Poor Software QA Is Root Cause of TAY-FAIL (Microsoft's AI Twitter Bot)
[Update 3/26/16 3:40pm: Found the smoking gun. Read this new post. Also the recent post: "Microsoft's Tay has no AI"]
This happened:
I claim: the explanations that blame AI are wrong, at least in the specific case of tay.ai.
This happened:
"On Wednesday morning, the company unveiled Tay [@Tayandyou], a chat bot meant to mimic the verbal tics of a 19-year-old American girl, provided to the world at large via the messaging platforms Twitter, Kik and GroupMe. According to Microsoft, the aim was to 'conduct research on conversational understanding.' Company researchers programmed the bot to respond to messages in an 'entertaining' way, impersonating the audience it was created to target: 18- to 24-year-olds in the US. 'Microsoft’s AI fam from the internet that’s got zero chill,' Tay’s tagline read." (Wired)Then it all went wrong, and Microsoft quickly pulled the plug:
"Hours into the chat bot’s launch, Tay was echoing Donald Trump’s stance on immigration, saying Hitler was right, and agreeing that 9/11 was probably an inside job. By the evening, Tay went offline, saying she was taking a break 'to absorb it all.' " (Wired)Why did it go "terribly wrong"? Here are two articles that assert the problem is in the AI:
- "It’s Your Fault Microsoft’s Teen AI Turned Into Such a Jerk" - Wired tl;dr: "this is just how this kind of AI works"
- "Why Microsoft's 'Tay' AI bot went wrong...AI experts explain why it went terribly wrong" - TechRepublic tl;dr: "The system is designed to learn from its users, so it will become a reflection of their behavior".
The "blame AI" argument is: if you troll an AI bot hard enough and long enough, it will learn to be racist and vulgar. ([Update] For an example, see this section, at the end of this post)
Friday, January 22, 2016
Time & Uncertainty (2nd post: "What kind of game is cyber security investment?")
Summary: Time and uncertainty are essential features of any model of the "game of cyber security". Models that do not include them as central features are not fit for purpose. But, yes, they do make life more difficult for modelers and their audiences. While I make the case that both are essential, I leave open the question as to what is the most parsimonious method or treatment.
Tuesday, January 19, 2016
What kind of game is cyber security investment? (post #1 of ?)
This is first in a series of blog posts where I think out loud as I build a paper for WEIS 2016, and also a component for my dissertation.
The focus is on "investment" broadly defined. This means money invested in people, tools, infrastructure, processes, methods, know-how, etc. It also means architectural commitments that shape the business, technical, legal, or social aspects of cyber security for a given person or organization. All these investments provide the foundation for what a person or organization is able to do (i.e. their "capabilities") and the means of executing day-to-day tasks ("routines", "processes", "practices", etc.).
If cyber security investment is a strategic game between attackers and defenders, and among defenders, then what kind of game is it?
The focus is on "investment" broadly defined. This means money invested in people, tools, infrastructure, processes, methods, know-how, etc. It also means architectural commitments that shape the business, technical, legal, or social aspects of cyber security for a given person or organization. All these investments provide the foundation for what a person or organization is able to do (i.e. their "capabilities") and the means of executing day-to-day tasks ("routines", "processes", "practices", etc.).
If cyber security investment is a strategic game between attackers and defenders, and among defenders, then what kind of game is it?
Summary
In simple terms, people tend to think of cyber security investment as being one of (at least) five types of games:- An optimization game, where each player finds the optimal level of spending (or investment) to minimize costs (or losses). This view is favored by Neo-classical Economists and most Game Theorists.
- A collective wisdom game, where the collective searching/testing activities of players leads to the emergence of a "collective wisdom" (a.k.a. "best practices") that everyone can then imitate. This view is favored by many industry consultants and policy makers.
- A maturity game, where all players follow a developmental path from immature to mature, and both individual and collective results are improved along the way. This view is favored by many industry consultants.
- A carrots-and-sticks game, where players chose actions that balance rewards ("carrots") with punishments ("sticks") in the context of their other goals, resources, inclinations, habits, etc. This view is favored by some Institutional Economists, and some researchers in Law and Public Policy. It is also favored by many people involved in regulation/compliance/assurance.
- A co-evolution game, where the "landscape" of player payoffs and possible "moves" is constantly shifting and overall behavior subject to surprises and genuine novelty. This view is favored by some researchers who employ methods or models from Complexity Science or Computational Social Science. This view is also a favorite of hipsters and "thought leaders", though they use it as metaphor rather than as a real foundation for research or innovation.
But what kind of game is cyber security, really? How can we know?
These questions matter because, depending on the game type, the innovation strategies will be very different:
- If cyber security is an optimization game, then we need to focus on methods that will help each player do the optimization, and to remove disincentives for making optimal investments.
- If cyber security is a collective wisdom game, then we need to focus on identifying the "best practices" and to promote their wide-spread adoption.
- If cyber security is a maturity game, then we need to focus on the barriers to increasing maturity, and to methods that help each player map their path from "here" to "there" in terms of maturity.
- If cyber security is a carrots-and-sticks game, then we need to find the right combination of carrots and sticks, and to tune their implementation.
- Finally, if cyber security is a co-evolution game, then we need to focus on agility, rapid learning, and systemic innovation. Also, we should probably NOT do some of the strategies listed in 1) through 4), especially if they create rigidity and fragility in the co-evolutionary process, which is the opposite of what is needed.
Tuesday, January 12, 2016
Institutional Innovation in Contested Territory: Quantified Cyber Security and Risk
Say you are an entrepreneurial sort of person who wants to really change the world of cyber security. Problem: nobody seems to know where the game-changing innovation is going to come from. Is it technology? Is it economics? Is it law and policy? Is it sociology? Maybe combination, but what? And in what sequence?
If you aim for institutional innovation, then at some point you are going to need to take sides in the great "Quant vs. Non-quant" debate:
Whoever makes sufficient progress toward workable solutions will "win", in the sense of getting wide-spread adoption, even if the other is "better" in some objective sense (i.e. "in the long run").
I examine this innovation race in a book chapter (draft). The book will probably come out in 2016.
Abstract:
The computational model used is called "Percolation Models of Innovation". Here is the NetLogo code of the model used in the book chapter. Below are some figures from the book chapter.
If you aim for institutional innovation, then at some point you are going to need to take sides in the great "Quant vs. Non-quant" debate:
- Can cyber security and risk be quantified?
- If "yes", how can quantitative information be used to realize security to significantly improve outcomes?
Whoever makes sufficient progress toward workable solutions will "win", in the sense of getting wide-spread adoption, even if the other is "better" in some objective sense (i.e. "in the long run").
I examine this innovation race in a book chapter (draft). The book will probably come out in 2016.
Abstract:
"The focus of this chapter is on how the thoughts and actions of actors coevolve when they are actively engaged in institutional innovation. Specifically: How do innovators take meaningful action when they are relatively ‘blind’ regarding most feasible or desirable paths of innovation? Our thesis is that innovators use knowledge artifacts – e.g. dictionaries, taxonomies, conceptual frameworks, formal procedures, digital information systems, tools, instruments, etc. – as cognitive and social scaffolding to support iterative refinement and development of partially developed ideas. We will use the case of institutional innovation in cyber security as a way to explore these questions in some detail, including a computational model of innovation."Your feedback, comments, and questions would be most welcome.
The computational model used is called "Percolation Models of Innovation". Here is the NetLogo code of the model used in the book chapter. Below are some figures from the book chapter.
![]() |
A screen shot of the user interface. Three different models can be selected (upper left). |
Sunday, April 19, 2015
B-Sides SF Talk
Here is the demo spreadsheet I'll be using in today's B-Sides SF talk on the Thomas Scoring System (TSS):
Download the spreadsheet and open in Microsoft Excel 2008 or later. It uses conditional formatting and cell data validation, but no macros or other advanced features. The sheets are protected to avoid data entry errors, but there is no password.
This is a realistic, fully functional implementation of the TSS applied to a general case: scoring the maturity of a company's information security capability.
Download the spreadsheet and open in Microsoft Excel 2008 or later. It uses conditional formatting and cell data validation, but no macros or other advanced features. The sheets are protected to avoid data entry errors, but there is no password.
This is a realistic, fully functional implementation of the TSS applied to a general case: scoring the maturity of a company's information security capability.
Tuesday, March 25, 2014
RAND Report on Innovation in the Cybercrime Ecosystem
This is an excellent report -- well-researched and well-written -- on the growth and development of the cybercrime ecosystem:
- "Markets for Cybercrime Tools and Stolen Data -- Hackers’ Bazaar" - PDF, 65 pages
Though it's sponsored by Juniper Networks, I don't see any evidence that the analysis or report were slanted. This report should be useful for people in industry, government, and academia (a rare feat!).
While they do a broad survey of the cybercrime ecosystem, they examine botnets and zero-day exploit markets in detail. What's important about this report is that it provides a thorough analysis of the innovation capabilities and trajectories in the cybercrime ecosystem. This is vital to understand to guide investment decisions, architecture decisions, and R&D decisions beyond a 1 year time horizon.
Here's a timeline that documents the growing sophistication and innovation capability:
![]() |
Black Market timeline (part 1) -- click to enlarge |
Monday, March 24, 2014
Review of Whitsitt's "B-side Cyber Security Framework" (Mapped to the Ten Dimensions)
My colleague Jack Whitsitt (@sintixerr) has proposed a B-side version of the NIST Cyber Security Framework (NIST CSF) in this blog post. In this post I will give my comments on Jack's framework, and do so by mapping it to the Ten Dimensions.
The NIST CSF is a catalog of information security practices, organized into categories and maturity tiers. I've criticized the NIST-CSF here, here, and here, and proposed an alternative -- the Ten Dimensions. Jack has posted commentary and critiques here, here and here. Jack has the advantage of participating in all five workshops, plus several side meetings with various players.
Here's a diagram of Jack's framework:
The NIST CSF is a catalog of information security practices, organized into categories and maturity tiers. I've criticized the NIST-CSF here, here, and here, and proposed an alternative -- the Ten Dimensions. Jack has posted commentary and critiques here, here and here. Jack has the advantage of participating in all five workshops, plus several side meetings with various players.
Here's a diagram of Jack's framework:
Short Summary
I like Jack's B-sides framework. I see a lot of overlap between it and my Ten Dimensions. They aren't identical but the same themes come through in both. His has the advantage of simpler interpretation (top-down layer cake, half as many dimensions). It has short-comings as well. In it's current form, it lacks performance measurement and, in my opinion, as inadequate attention to "Effective Response, Recovery, & Resilience", "Effective External Engagement", "Optimize Cost of Risk", and organization learning loops.Saturday, March 8, 2014
Mining only 'digital exhaust', Big Data 1.0 won't revolutionize information security
I was asked during this interview whether 'Big Data' was revolutionizing information security. My answer was, essentially, 'No, not yet'. But I don't think I did such a great job explaining why and where the revolution will come from, if it comes.
Basically, Big Data 1.0 in information security is today focused on mining 'digital exhaust' -- all the transactional data emitted and logged by computing, communications, and security devices and services. (The term "data exhaust" was probably coined in 2007 by consultant Jerry Michalski, according to this Economist article.) This can certainly be useful for many purposes but I don't think it is or will be revolutionary. It will help tune spam filters, phishing filters, intrusion detection/prevention systems, and so on, but it won't change anything fundamental about how firms architect security, how they design and implement policies, and it does almost nothing on the social or economic factors.
Here's a great essay that explains why Big Data 1.0 isn't revolutionary, and what it will take to make it revolutionary. Though it's not about information security, it doesn't take much to extend his analysis to the InfoSec domain.
Highlighting somewhat different themes in the context of Digital Humanities, Brian Croxall presents an insightful blog post called "Red Herrings of Big Data", which includes slides and this 2 minute video:
Here are his three 'red herrings' (i.e. distractions from the most promising trail), turned around to be heuristics:
Basically, Big Data 1.0 in information security is today focused on mining 'digital exhaust' -- all the transactional data emitted and logged by computing, communications, and security devices and services. (The term "data exhaust" was probably coined in 2007 by consultant Jerry Michalski, according to this Economist article.) This can certainly be useful for many purposes but I don't think it is or will be revolutionary. It will help tune spam filters, phishing filters, intrusion detection/prevention systems, and so on, but it won't change anything fundamental about how firms architect security, how they design and implement policies, and it does almost nothing on the social or economic factors.
Here's a great essay that explains why Big Data 1.0 isn't revolutionary, and what it will take to make it revolutionary. Though it's not about information security, it doesn't take much to extend his analysis to the InfoSec domain.
Huberty, M. (2014). I expected a Model T, but instead I got a loom: Awaiting the second big data revolution. Prepared for the BRIE-ETLA Conference, September 6-7, 2013, Claremont California.Huberty points toward Big Data 2.0 which could be revolutionary:
"...we envision the possibility of a [Big Data 2.0]. Today, we can see glimmers of that possibility in IBM’s Watson, Google’s self-driving car, Nest’s adaptive thermostats, and other technologies deeply embedded in, and reliant on, data generated from and around real-world phenomena. None rely on “digital exhaust”. They do not create value by parsing customer data or optimizing ad click-through rates (though presumably they could). They are not the product of a relatively few, straightforward (if ultimately quite useful) insights. Instead, IBM, Google, and Nest have dedicated substantial resources to studying natural language processing, large-scale machine learning, knowledge extraction, and other problems. The resulting products represent an industrial synthesis of a series of complex innovations, linking machine intelligence, real-time sensing, and industrial design. These products are thus much closer to what big data’s proponents have promised–but their methods are a world away from the easy hype about mass-manufactured insights from the free raw material of digital exhaust.
[...]
The big gains from big data will require a transformation of organizational, technological, and economic operations on par with that of the second industrial revolution. " [emphasis added]
Highlighting somewhat different themes in the context of Digital Humanities, Brian Croxall presents an insightful blog post called "Red Herrings of Big Data", which includes slides and this 2 minute video:
Here are his three 'red herrings' (i.e. distractions from the most promising trail), turned around to be heuristics:
Main message
Don't be naïve about Big Data in information security. To drive a revolution, it will need to be part of a much more comprehensive transformation of what data we gather in the first place and how data analysis and inference can drive results. Just mining huge volumes of 'digital exhaust' won't do it.Monday, March 3, 2014
Video interview with BankInfoSecurity, plus "Down the Rabbit Hole" podcast episode
Here's a 12 minute interview of me by Tracy Kitten (@BnkInfoSecurity), filmed at the RSA Conference last week:
Topics discussed:
But wait...there's more! Here's a link to episode 82 of the Down the Rabbit Hole podcast, where I'm a guest along with Bob Blakely and Lisa Leet. (Here's the podcast itself in mp3 file format -- 43:15 in length.) From Rafal's summary, here's what we talk about:
![]() |
(click to open a new page for www.bankinfosecurity.com with video) |
- The difference between "performance" and "best practices"
- How big data is expected to revolutionize information security (some myth busting)
- Where innovation will be coming from, and where it won't
- Why encouraging security professionals to pursue training in statistics and data visualization is so critical
But wait...there's more! Here's a link to episode 82 of the Down the Rabbit Hole podcast, where I'm a guest along with Bob Blakely and Lisa Leet. (Here's the podcast itself in mp3 file format -- 43:15 in length.) From Rafal's summary, here's what we talk about:
- Does is make sense, in a mathematical and practical senes, to look for 'probability of exploit'?
- How does 'game theory' apply here?
- How do intelligent adversaries figure into these mathematical models?
- Is probabilistic risk analysis compatible with a game theory approach?
- Discussing how adaptive adversaries figure into our mathematical models of predictability... How do we use any of this to figure out path priorities in the enterprise space?
- An interesting analogy to the credit scoring systems we all use today
- An interesting discussion of 'unknowns' and 'black swans'
- Fantastic practical advice for getting this data-science-backed analysis to work for YOUR organization
Sunday, March 2, 2014
Bootleg recording of Colbert keynote at #RSAC
Hilarious. Audio only. Listen to it before it's taken down.
Tuesday, February 25, 2014
Quick links to "Ten Dimensions" resources for #RSAC folks
This post is aimed at folks attending my RSA Conference talk on Wednesday, but could be useful for anyone who wants to catch up on the topics.
My talk is at 10:40am - 11:00am in Moscone West, Room: 2020. Immediately after the talk, I'll be moving to the "Continuing the Conversation" space in the 2nd floor lobby of Moscone West. I'll be wearing a black EFF hat, in case you want to pick me out of a crowd.
This is 20 minute talk, so it will only be an introduction to the topics. My main goal is to stimulate your interest to learn more and to dig into these resources:
My talk is at 10:40am - 11:00am in Moscone West, Room: 2020. Immediately after the talk, I'll be moving to the "Continuing the Conversation" space in the 2nd floor lobby of Moscone West. I'll be wearing a black EFF hat, in case you want to pick me out of a crowd.
This is 20 minute talk, so it will only be an introduction to the topics. My main goal is to stimulate your interest to learn more and to dig into these resources:
- RSAC slides: "Ten Dimensions of Security Performance for Agility and Rapid Learning"
- New blog post: "How to aggregate metrics into a performance index or score" (Thomas Scoring System, TSS) -- detailed tutorial
- New demo Excel spreadsheet for TSS, explained in the previous blog post
- Intro page to the Ten Dimensions, with links to all detail pages: "Ten Dimensions of Cyber Security Performance"
Not directly related to the above, but here's the slides for the talk I gave Monday at BSides-SF:
- "Get a Grip on Unintended Consequences" -- best in PPTX so you can read the stories in the speaker notes. Here's an audio recording, starting at 4:17:24. (FYI, Adam Shostack gave a nice talk in the hour before me on Threat Modeling.)
If we don't connect at the conference for some reason, feel free to email me at russell ♁ thomas ❂ meritology ♁ com. (Earth = dot; Sun = at)
And if you've come this far and you aren't following me on twitter -- @MrMeritology -- what's wrong with you? Follow, already! ☺
And if you've come this far and you aren't following me on twitter -- @MrMeritology -- what's wrong with you? Follow, already! ☺
How to aggregate ground-truth metrics into a performance index
![]() |
My remix of a painting by William Blake, with the Meritology logo added. Get it? He's shedding light on an impossible shape. (Click to enlarge) |
How can we measure aggregate performance on an interval or ratio scale index when we have a hodge-podge of ground-truth metrics with varying precision, relevance, reliability, and that are incommensurate with each other?Here's a specific example from the Ten Dimensions:
How can we measure overall Quality of Protection & Controls if our ground-truth metrics include false positives percentages, false negatives percentages, exceptions number of exceptions, various "high-medium-low" ratings, audit results, coverage percentages, and a bunch more?I've been wrestling with this problem for a long time, both in information security and elsewhere. So have a lot of other people. I while back I had an insight that the solution may be to treat it as an inference problem, not a calculation problem (described in this post). But I didn't work out the method at that time. Now I have.
In this blog post, I'm introducing a new method. At least I think it's new because, after much searching, I haven't been able to find any previously published papers. (If you know of any, please contact me or comment to this post.)
The new method is innovative but I don't think it's much more complicated or mathematically sophisticated than the usual methods (weighted average, etc.), but it does take a change in how you think about metrics, evidence, and aggregate performance. Even though all the examples below are related to information security, the method is completely general. It can apply to IT, manufacturing, marketing, R&D, governments, non-profits... any organization setting where you need to estimate aggregate performance from a collection of disparate ground-truth metrics.
This post is a tutorial and is as non-technical as I can make it. As such, its is on the long side, but I hope you find it useful. A later post will take up the technicalities and theoretical issues. (See here for Creative Commons licensing terms.)
Subscribe to:
Posts (Atom)