Piano playing

This post is part of a series on how the 9 Laws of Data Mining from Tom Khabaza can be applied to analytics. You can find previous posts here.

The statement that “a piano makes music” is a clear misunderstanding. We all know that the pianist or any other musician is the source of the music and that the instrument is the tool. But in the context of location analytics, this distinction has often been overlooked. With the proliferation of online maps, there is a sense that anyone can create a map and thus every business professional should be an analyst. Business reality is far more complex.

Many location analytics (GIS) implementations fail because they do not have “soul.” Although there can be many contributing reasons, a common fundamental misstep is that some businesses buy software and then fail to put the tools into the hands of a skilled analyst. Even software training may not be enough to overcome the need for an analytical mind that is able to create experiments to combine business knowledge with a properly described problem space and a well-defined model.

A good analyst is like a musician and both are better when they have “soul.” A musician performing a composition interprets the written score and provides the phrasing and dynamics that turn the notes into music. A jazz musician takes this one step further and creates the composition as well as performs the music. The analyst and the musician both define the problem space as well as deliver the “answer” because the result of the analysis must be interpreted in the context of business knowledge, the “soul” of analytics.

How do we create analytics with “soul”? We start with business knowledge and then we generalize or extrapolate from that initial understanding to identify a model that will show us new patterns and insight. Using statistical inference, we assume that a model will produce reasonable answers when it has been applied to a well-defined problem space. We might also assume that if a market analysis methodology is well defined then it should be general enough to be applied across a wide range of market scenarios. But a model that performs well is dependent upon using business knowledge to match the modeling procedures to the problem and we only find a model that performs well through experimentation and exploration of the data. This is the fourth law of data mining:

Law #4: “No Free Lunch for the Data Miner” – The right model for a given application can only be discovered by experiment

David Wolpert and William Macready developed the “no free lunch theorem” for search and optimization techniques. The basic premise is that any two algorithms are equivalent when their performance is averaged across all possible problems. Because of this, it is necessary to build a foundation of problem specific (business) knowledge in order to build the right model for a given problem.

This law highlights the iterative nature of data mining and analytics. Experimentation and iteration are necessary because the problem is not often well understood. If it were, then the analytics would not be necessary. The value of the analysis is that it allows us to uncover things that were previously unknown. It allows us to make connections and associations that we didn’t know existed (and we may not have even known to look for them). Sometimes the proper hypothesis can only be created after a series of experiments and exploratory or descriptive analytics have been completed.

A problem space may not be known or there may be multiple problem spaces and each one needs its own model. Thus, we might start with one hypothesis that we think is valid, but when we learn more about the entire landscape we may have to change both the initial business goal and our evaluation of the results. We can even change the problem space by the way that we do our data preparation work. Because the model results must be evaluated based on business knowledge, we may even find that we need to re-state the business problem after some initial analysis.

Avoid the “God Complex”

Tim Harford is also a proponent of the experimentation or “trial and error” method. To those who think this is merely pointing out the obvious, he reminds us of the danger of the “God complex” that can result from someone believing they already have a complete understanding of the problem space and therefore do not need to embark on the study or explore other models. Analysts must have an open mind. They must seek patterns even when they think they might already know what the pattern will be.

Khabaza notes that there are some cases where the body of knowledge has been well researched and modeled. There may be cases where the business goals do not fluctuate from year to year and where the data is relatively stable so that an acceptable model can be re-purposed year after year. In these cases, the “no free lunch” law may be less important. However, this “free lunch” must be seen as temporary in order to avoid the “God complex”.

Practice Creativity – Trial and Selection

Tim Harford also explains that sometimes a problem is so complex that the only way to generate a successful solution is through the evolutionary process of trial and error, or what we might better term “trial and selection.” This process involves identifying which parts of the model are working and keeping those parameters while varying other parameters until the best model is found. Essentially, this is what a jazz musician does during practice—the trial and selection of combinations of notes. It may seem that a jazz musician simply creates on stage, which to a certain extent, they do, but the combinations are based on many hours of learned sequences (could we say learned creativity?) that can be subject to variation. Without the hours of practice time to gain proficiency in both the technical aspects of the instrument and the selection of chord sequences, no musician should expect to deliver music with “soul.”

Similarly, in order for an analyst to improvise with “soul,” the analyst must practice creativity. The analyst must be able to generate creative questions and then devise innovative experiments and models to test various options to find the best model. Analytics with “soul” have a combination of skill, experience based on trial and selection, and creative combinations of ideas. Those who are content to simply “follow the music” lack a fundamental component of successful analytics.

Scroll to Top