![]() |
| (Click to enlarge) |
Showing posts with label visualization. Show all posts
Showing posts with label visualization. Show all posts
Thursday, February 18, 2016
Thursday, January 14, 2016
How fast does the space of possibilities expand? (replicating Tria, et al 2014)
How fast does the space of possibilities expand? This question is explored in the following paper (free download):
From the abstract:
The charts on the top and center right show the frequency distribution by ball type (a.k.a. "color"). These are log-log plots, so a straight line (declining) is signature of a power law distribution, while a gradually curving (concave) is signature of lognormal or similar distribution with somewhat thinner tail. Sharply declining curve is signature of a thin tailed distribution such as Gaussian.
This is essential for modeling cyber security because some people claim that quantitative risk management is impossible in principle because of intelligent adversaries who can generate and exploit novel strategies and capabilities.
- Tria, F., Loreto, V., Servedio, V. D. P., & Strogatz, S. H. (2014). The dynamics of correlated novelties. Nature Science Preport, 4. (http://dx.doi.org/10.1038/srep05890)
From the abstract:
Novelties are a familiar part of daily life. They are also fundamental to the evolution of biological systems, human society, and technology. By opening new possibilities, one novelty can pave the way for others in a process that Kauffman has called “expanding the adjacent possible”. The dynamics of correlated novelties, however, have yet to be quantified empirically or modeled mathematically. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya's urn, predicts statistical laws for the rate at which novelties happen (Heaps' law) and for the probability distribution on the space explored (Zipf's law), as well as signatures of the process by which one novelty sets the stage for another.I've written a NetLogo program to replicate their model, available here. The code for the model is quite simple. A majority of my code is for a "pretty layout", which is a schematic version of a "top-down view" of the urn. Here's a video of a single run
![]() |
| Full screen with controls. (click to enlarge) |
So what?
This model will be useful in my dissertation because I need mechanisms to endogenously add novelty -- i.e. expand the possibility space based on the actions of agents in the simulated world, and not simply as external "shocks".This is essential for modeling cyber security because some people claim that quantitative risk management is impossible in principle because of intelligent adversaries who can generate and exploit novel strategies and capabilities.
Monday, March 10, 2014
Boomer weasel words: 'high net worth individuals' as euphemism for 'rich people'
![]() |
| (Source) |
(You might compare this to my previous post regarding "Baby on Board" signs.)
The following graphs are from Google Ngram Viewer, which show relative frequency of word phrases in American English books up to the year 2008. Notice that the phrase "high net worth individuals" appears first around 1980.
![]() |
| (click to enlarge) |
Thursday, August 1, 2013
Tutorial: How Fat-Tailed Probability Distributions Defy Common Sense and How to Handle Them
This post is related to the Grey Swans post, but is a good topic to present on it's own.
For random time series, we often ask general questions to learn something about the probability distribution we are dealing with:
For random time series, we often ask general questions to learn something about the probability distribution we are dealing with:
- What's average? What's typical?
- How much does it vary? How wide is the "spread"? Is it "skewed" to one side?
- How extreme can the outcomes be?
- How good are our estimates, given the sample size? Do we have enough samples?
If we have a good sized sample of data, common sense tells us that "average" is somewhere in the middle of the sample values and that the "spread" and "extreme" of the sample are about the same as those of the underlying distribution. Finally, common sense tells us that after we have good estimates, we don't need to gather any more sample data because it won't change our estimates much.
It turns out the that these common-sense answers could all be flat wrong, depending on how "fat" the tail of the distribution is. Now that's surprising!
Friday, July 26, 2013
Visualization Friday: 14 dimensions represented in 2D using MDS, Colors, and Shapes
For the last three years I've been building an Agent-based Model (ABM) of innovation ecosystems to explore how agent value systems and histories mutually influence each other. The focus to this point has been on Producer-Consumer relationships and the Products they produce and consume.
One of my key challenges has been how to visualize changes in agent value systems as new products are introduced. Products have surface characteristics defined as a 10-element vector of real numbers between 0 and 1. Consumers make valuation decisions based on their perception of these 10 dimensions compared to their current "ideal type". But they realize utility after consuming based on three "hidden" dimensions. Adding on the dimension of consumption volume, this means I need to somehow visualize 14 dimensions in a 2D dynamic display.
The figures below show my solution. Products are represented by black squares, while Consumer ideal points are represented by blue dots. (There are about 200 Consumers in this simulation.) Products that are not yet introduced are represented by hollow dark red squares. The 10 dimensions of Product surface characteristics are reduced to 2D coordinates through Multi-Dimensional Scaling (MDS). Therefore the 2D space is a dimensionless projection where 2D distances between points is roughly proportional to distances in the original 10 dimensions.
The three utility dimensions are represented by colored "spikes" coming off of each Product. The length of each spike is proportional to the utility offered by that Product on that dimension.
Finally, the proportion of the Product population is represented by a dark red circle around each product (black filled square).
These two plots show the same simulation at different points in time, about 300 ticks apart, showing the effect of the introduction of several new products. What we are looking for is patterns and trajectories of Consumer ideal points (blue dots).
Putting these all together:
One of my key challenges has been how to visualize changes in agent value systems as new products are introduced. Products have surface characteristics defined as a 10-element vector of real numbers between 0 and 1. Consumers make valuation decisions based on their perception of these 10 dimensions compared to their current "ideal type". But they realize utility after consuming based on three "hidden" dimensions. Adding on the dimension of consumption volume, this means I need to somehow visualize 14 dimensions in a 2D dynamic display.
The figures below show my solution. Products are represented by black squares, while Consumer ideal points are represented by blue dots. (There are about 200 Consumers in this simulation.) Products that are not yet introduced are represented by hollow dark red squares. The 10 dimensions of Product surface characteristics are reduced to 2D coordinates through Multi-Dimensional Scaling (MDS). Therefore the 2D space is a dimensionless projection where 2D distances between points is roughly proportional to distances in the original 10 dimensions.
The three utility dimensions are represented by colored "spikes" coming off of each Product. The length of each spike is proportional to the utility offered by that Product on that dimension.
Finally, the proportion of the Product population is represented by a dark red circle around each product (black filled square).
These two plots show the same simulation at different points in time, about 300 ticks apart, showing the effect of the introduction of several new products. What we are looking for is patterns and trajectories of Consumer ideal points (blue dots).
Putting these all together:
- Dots are close or distant in 2D space according to how they are perceived by Consumers based on surface characteristics.
- Products that are close to each other in 2D space may or may not have similar utility characteristics (spikes). This reveals the "ruggedness" of the "landscape", and thus the search difficulty faced by Consumers.
- The circles around each product allow easy identification of popular vs unpopular products.
![]() |
| Consumer ideal points (blue dots) after 1367 ticks, showing influence of new Product (black dot on far left). Notice large increase in popularity of product pointed to by red arrow. Though it has relatively low utility on all three dimensions, it is a "bridge" between products on right side and new (high utility) product on left side. -- click to see larger image. |
Friday, July 19, 2013
Visualization Friday: Probability Gradients
I'm fascinated with varieties of uncertainty -- ways of representing it, reasoning about it, and visualizing it. I was very tickled when I came across this blog post by Alex Krusz on the Velir blog. He presents a neat improvement over "box and whiskers" plot for representing uncertainty or variation in data points which he calls "probability gradients".
Thursday, July 11, 2013
Communicating about cyber security using visual metaphors
For a workshop with non-computer people, I needed a simple visual metaphor to communicate how messy and complicated information security can be (and, by extension, cyber security). This is what I came up with. Seems to get across the main point on a visceral level. Enjoy.
Sunday, June 23, 2013
Q: What marks the "Heartland"? A: Attitudes toward pantyhose substitutes
NC State has a very interesting interactive site called Dialect Survey Maps, based on data from the 122-question survey conducted by Bert Vaux, Department of Linguistics, University of Cambridge. The web interface was coded with Shiny and deployed using the hosting service provided by RStudio.
Maps for most questions show regional differences in terminology and idioms -- e.g. in most of the US, "tennis shoes" is generic name for soft/athletic shoes, except in New England where the term "sneakers" is preferred.
But the map that caught my eye was for this statement:
Maps for most questions show regional differences in terminology and idioms -- e.g. in most of the US, "tennis shoes" is generic name for soft/athletic shoes, except in New England where the term "sneakers" is preferred.
But the map that caught my eye was for this statement:
- 56. "Pantyhose are so expensive anymore that I just try to get a good suntan and forget about it."
Here's the map. What I like is that agreeing with this statement is marks "The Heartland" (excluding the Old South), a.k.a. "North-Middle America". (Color code: Blue is "acceptable" and red is "unacceptable". Click to see larger version in a new window.)
It's interesting to see how this separates Iowa from Minnesota, Wisconsin from Illinois, North Dakota from South Dakota, and New York from Pennsylvania.
Subscribe to:
Posts (Atom)








