Saturday, March 8, 2014

Mining only 'digital exhaust', Big Data 1.0 won't revolutionize information security

I was asked during this interview whether 'Big Data' was revolutionizing information security.  My answer was, essentially, 'No, not yet'. But I don't think I did such a great job explaining why and where the revolution will come from, if it comes.

Basically, Big Data 1.0 in information security is today focused on mining 'digital exhaust' -- all the transactional data emitted and logged by computing, communications, and security devices and services.  (The term "data exhaust" was probably coined in 2007 by consultant Jerry Michalski, according to this Economist article.)  This can certainly be useful for many purposes but I don't think it is or will be revolutionary.  It will help tune spam filters, phishing filters, intrusion detection/prevention systems, and so on, but it won't change anything fundamental about how firms architect security, how they design and implement policies, and it does almost nothing on the social or economic factors.

Here's a great essay that explains why Big Data 1.0 isn't revolutionary, and what it will take to make it revolutionary.  Though it's not about information security, it doesn't take much to extend his analysis to the InfoSec domain.
Huberty, M. (2014). I expected a Model T, but instead I got a loom: Awaiting the second big data revolution. Prepared for the BRIE-ETLA Conference, September 6-7, 2013, Claremont California.
Huberty points toward Big Data 2.0 which could be revolutionary:
"...we envision the possibility of a [Big Data 2.0]. Today, we can see glimmers of that possibility in IBM’s Watson, Google’s self-driving car, Nest’s adaptive thermostats, and other technologies deeply embedded in, and reliant on, data generated from and around real-world phenomena. None rely on “digital exhaust”. They do not create value by parsing customer data or optimizing ad click-through rates (though presumably they could). They are not the product of a relatively few, straightforward (if ultimately quite useful) insights. Instead, IBM, Google, and Nest have dedicated substantial resources to studying natural language processing, large-scale machine learning, knowledge extraction, and other problems. The resulting products represent an industrial synthesis of a series of complex innovations, linking machine intelligence, real-time sensing, and industrial design. These products are thus much closer to what big data’s proponents have promised–but their methods are a world away from the easy hype about mass-manufactured insights from the free raw material of digital exhaust.


The big gains from big data will require a transformation of organizational, technological, and economic operations on par with that of the second industrial revolution. " [emphasis added]

Highlighting somewhat different themes in the context of Digital Humanities, Brian Croxall presents an insightful blog post called "Red Herrings of Big Data", which includes slides and this 2 minute video:

Here are his three 'red herrings' (i.e. distractions from the most promising trail), turned around to be heuristics:

Main message

Don't be na├»ve about Big Data in information security. To drive a revolution, it will need to be part of a much more comprehensive transformation of what data we gather in the first place and how data analysis and inference can drive results.  Just mining huge volumes of 'digital exhaust' won't do it.

No comments:

Post a Comment