Pivot or Persevere – A Data Mining Approach

We stand at that edge today where every industry is immersing itself into the age of data analytics that promises to bring out intelligent insights from the past and predict future growth. Today it’s not just data management but a discipline where information is critically mined in order to decipher how the systems, customers, machines, weather, etc. behaved in the past and are indicating a behavioral pattern for the coming time. It fascinates me how an organization’s data can be systematically broken down into diverse data sets pin-pointing counter-intuitive trends & associations, validating crucial business suppositions.

Eric Ries in his book “The Lean Startup” writes about how businesses employ “Pivot or Persevere” decisions at any juncture. He, very aptly talks about having right metrics as well as right hypothesis as the first step and analyzing the data correctly to make a “Pivot or Persevere” conclusion as the next step. I strongly believe that today, we can leverage statistical modeling otherwise known as data mining or machine learning for supporting or rejecting our business hypothesis along with the well-established historical data analysis techniques.

R & Python are the most common languages that give us the ability to cleanse, process and devise apposite models for data mining. But, there are several other tools like MicroStrategy, SAS, AWS Analytics platforms, Google Cloud Platform, Microsoft Azure, etc., which provide exceptional capabilities to implement data mining.

In spite of the overwhelming choices that we have, all these platforms are aiming to achieve the same underlying data mining functionality. The structure & architecture could be different but as we go through their extensive documentation, we find that they all are providing same features like gathering the data, cleaning it, preparing or transforming it, building statistical models on top of it and reporting applicable statistical parameters for validating our hypotheses.

Indeed we are well-equipped to make thorough analysis today as there are ample of platforms to facilitate every kind of analyses we can think of, to make “Pivot or Persevere” kind of decisions!

In order to dig further into this pool of platforms & data mining, it is crucial to not just have a propensity to learn & use the tool but to have an analytical mind, understanding of statistics and business workflow. This rests as the bed rock for determining how much one should trust the statistical analysis when making important decisions that sometimes put billions of dollars at stake. Two terms that I consider of prime importance from data mining and trust worthy business decision making perspectives are:

  • Hypothesis
  • Model accuracy

Understanding these well, I believe, shall ensure complete awareness of the amount of confidence or risk involved while making critical business decisions. Below is a small justification as to why I feel they are imperative parameters.

Hypothesis: It is like the question that we are trying to answer or a statement that we are trying to validate. Without deciding the hypothesis, we would be maneuvering without purpose and chances are, we would come across a plethora of information but would not know how to use it. It could be as simple as ‘ Type 2 Apparel is having declining sales every year’. This can be validated by analyzing historical trends. Another example could be ‘ Type 2 Apparel belongs to a group that is contributing minimally to revenue each year’. This could be verified using statistical method of clustering. It is essential to have an apt hypothesis as it will decide if apparel of type 2 should really be discarded from manufacturing going forward or not. It is also needed to channelize the data mining/analysis process.

Model accuracy: There are several subtle caveats to the statistical numbers reported in data mining. Most numbers come with a probability of their correctness. The accuracy is specific to data used for building the model. However, it may occur that the model which gave 90% accuracy on the training data-set, perform poorly on real data. Cross validation techniques should definitely be used before reporting the model’s accuracy to the business. Not just that, clear communication that the accuracy operates at a certain probability of success (and failure), provides all the necessary risk-assessment facts to business decision-makers as they take key decisions.

Thus, with appropriate knowledge about statistics, to “Pivot or Persevere” hypothesis today, can be supported by historical data reporting & data mining methodologies.