Pivot or Persevere – A Data Mining Approach

Author: Wavicle Data Solutions

We stand at that edge today where every industry is immersing itself into the age of data analytics that promises to bring out intelligent insights from the past and predict future growth. Today it’s not just data management but a discipline where information is critically mined to decipher how the systems, customers, machines, weather, etc. behaved in the past and are indicating a behavioral pattern for the coming time. It fascinates me how an organization’s data can be systematically broken down into diverse data sets pin-pointing counter-intuitive trends & associations, validating crucial business suppositions.

Eric Ries in his book “The Lean Startup” writes about how businesses employ “Pivot or Persevere” decisions at any juncture. He, very aptly talks about having the right metrics as well as the right hypothesis as the first step and analyzing the data correctly to make a “Pivot or Persevere” conclusion as the next step. I strongly believe that today, we can leverage statistical modeling otherwise known as data mining or machine learning for supporting or rejecting our business hypothesis along with well-established historical data analysis techniques.

R & Python is the most common language that gives us the ability to cleanse, process, and devise apposite models for data mining. But, there are several other tools like MicroStrategy, SAS, AWS Analytics platforms, Google Cloud Platform, Microsoft Azure, etc., which provide exceptional capabilities to implement data mining.

Despite the overwhelming choices that we have, all of these platforms are aiming to achieve the same underlying data mining functionality. The structure & architecture could be different but as we go through their extensive documentation, we find that they all are providing the same features like gathering the data, cleaning it, preparing or transforming it, building statistical models on top of it, and reporting applicable statistical parameters for validating our hypotheses.

Indeed we are well-equipped to make thorough analysis today as there are ample platforms to facilitate every kind of analysis we can think of, to make “Pivot or Persevere” kind of decisions!

To dig further into this pool of platforms & data mining, it is crucial to not just have the propensity to learn & use the tool but to have an analytical mind, an understanding of statistics and business workflow. This rests as the bedrock for determining how much one should trust statistical analysis when making important decisions that sometimes put billions of dollars at stake. Two terms that I consider of prime importance in data mining and trustworthy business decision-making perspectives are:

Hypothesis
Model accuracy

Understanding these well, I believe, shall ensure complete awareness of the amount of confidence or risk involved while making critical business decisions. Below is a small justification as to why I feel they are imperative parameters.

Hypothesis: It is like the question that we are trying to answer or a statement that we are trying to validate. Without deciding the hypothesis, we would be maneuvering without purpose and chances are, we would come across a plethora of information but would not know how to use it. It could be as simple as ‘Type 2 Apparel is having declining sales every year. This can be validated by analyzing historical trends. Another example could be ‘Type 2 Apparel belongs to a group that is contributing minimally to revenue each year. This could be verified using the statistical method of clustering. It is essential to have an apt hypothesis as it will decide if apparel of type 2 should be discarded from manufacturing going forward or not. It is also needed to channelize the data mining/analysis process.

Model accuracy: There are several subtle caveats to the statistical numbers reported in data mining. Most numbers come with a probability of their correctness. The accuracy is specific to data used for building the model. However, it may occur that the model which gave 90% accuracy on the training data set, perform poorly on real data. Cross-validation techniques should be used before reporting the model’s accuracy to the business. Not just that, clear communication that the accuracy operates at a certain probability of success (and failure), provides all the necessary risk-assessment facts to business decision-makers as they take key decisions.

Thus, with appropriate knowledge about statistics, to “Pivot or Persevere” hypothesis today, can be supported by historical data reporting & data mining methodologies.