Data

Law #5: “Watkins’ Law” – There are always patterns [i]

First, let’s talk about “The Good.” 

David Byrne, writing in the introduction to Gareth Cook’s book, The Best American Infographics, 2013, describes the power of the infographic as:

“…an inbuilt ability to manipulate visual metaphors in ways we cannot do with the things and concepts they stand for — to use them as malleable, conceptual Tetris blocks or modeling clay that we can more easily squeeze, stack, and reorder. And then — whammo! — a pattern emerges, and we’ve arrived someplace we would never have gotten by any other means.”

He could just as easily have been talking about the data mining and analytics process, except that the process is much slower and more methodical than the expression “whammo” suggests.[ii]

Patterns, connections, and relationships are inevitable as long as there is relevant data. We naturally want to represent data in patterns so that we can understand and organize our data into something useful and profitable. This law is similar to Manuel Lima’s statement that “networks are everywhere” in his book, Visual Complexity: Mapping Patterns of Information.[iii] He also talks about “ubiquitous topology”, which is an apt expression because topology is the study of the properties that remain unchanged when a geometric object is manipulated. When we stretch or modify the properties of a polygon, the topology (relationship) of the polygons to the left or right of the stretched polygon remain the same. Thus, when we study patterns in data, we are trying to identify the underlying structure and insight that this data represents.

The patterns that we find may not be what we expected and they may not have a causal relationship, but there will always be a pattern. Not only are our brains “wired” to recognize patterns, but according to Khabaza we will always find patterns in analytics because the data used to create the pattern is a product of our domain of interest and our business knowledge. The business processes operate according to rules and procedures and these always lead to patterns.

For example, the business goal (increase the number of customers) sets the stage for the domain of interest (sales). Data is generated (who bought what, how much, where, when) by processes in the domain (bricks and mortar versus online retail, merchandise mix, seasonal sale event, etc.) and these processes are governed by rules. (For instance, we don’t stock snow shovels in May in Pennsylvania.) The data will reflect those rules and the data mining process will reveal the rules when we combine our data tools with the business knowledge about how to interpret the results in terms of the domain. The rules behind the processes will necessarily generate patterns.

To identify a pattern, we start with what we know about the business process (merchandise selection) and then relate that to what we know about sales, which comes from our business knowledge. (We sell more snow shovels in December and more garden shovels in May in Pennsylvania, but we don’t sell any snow shovels in Florida.) Like the overall process of data mining, identifying patterns is an iterative process. When we uncover one pattern and link that to what we know about the business, we may be able to generate a new hypothesis about another pattern.

Here’s “The Bad.” 

As Daniel Kahneman writes in Thinking, Fast and Slow, our pattern seeking predilection causes us to incorrectly dismiss the randomness of truly random events.[iv] We find patterns where there are none, or stated more correctly, we drawn a conclusion as to the meaning of the pattern when there is no statistical association between the pattern and the conclusion. Care must be taken to follow best practices for identifying, using and representing patterns. Reading Kahneman’s book is a good place to start to avoid the pitfalls associated with pattern seeking.

Finally, we have “The Ugly.” 

In the movie, “The Good, The Bad, and The Ugly”, one of the techniques used by the director, Sergio Leone, was to frame the movie in a specific way. Roger Ebert pointed this out in his review of the film:

“The rule is that the ability to see is limited by the sides of the frame. At important moments in the film, what the camera cannot see, the characters cannot see, and that gives Leone the freedom to surprise us with entrances that cannot be explained by the practical geography of his shots.”

In the same way, the patterns in analytics can be deliberately framed to support a specific point of view. Two other books come to mind that are well known standards: How to Lie with Statistics from Darrell Huff[v] and How to Lie with Maps from Mark Monmonier[vi]. These books alert us of the potential dangers in the interpretations that we make from our analytics as well as the potential for others to mislead us with analytics.

This post merely scratches the surface of “The Good, The Bad, and The Ugly” in patterns. I welcome other’s comments and suggestions for exploring these topics further.

[i] Tom Khabaza credits David Watkins with the fifth law of data mining. David Watkins is currently Head of Strategic Analytics at Telefonica UK (O2).

[ii] Gareth Cook and David Byrne, The Best American Infographics. (New York: Houghton Mifflin Harcourt Publishing Company, 2013), xvii.

[iii] Lima, Manuel, Visual Complexity: Mapping Patterns of Information. (New York: Princeton Architectural Press, 2011), 73.

[iv] Kahneman, Daniel, Thinking, Fast and Slow. (New York: Farrar, Straus and Giroux, 2011).

[v] Huff, Darrell, How to Lie with Statistics. (New York: W. W. Norton & Company, 1954).

[vi] Monmonier, Mark, How to Lie with Maps. (Chicago: University Of Chicago Press; 1996).

Scroll to Top