We stand at that edge today where every industry is immersing itself
into the age of data analytics that promises to bring out intelligent
insights from the past and predict future growth. Today it’s not just
data management but a discipline where information is critically mined
to decipher how the systems, customers, machines, weather, etc.
behaved in the past and are indicating a behavioral pattern for the
coming time. It fascinates me how an organization’s data can be
systematically broken down into diverse data sets pin-pointing
counter-intuitive trends & associations, validating crucial business
suppositions.
Eric Ries in his book “The Lean Startup” writes about how businesses
employ “Pivot or Persevere” decisions at any juncture. He, very aptly
talks about having the right metrics as well as the right hypothesis as the
first step and analyzing the data correctly to make a “Pivot or
Persevere” conclusion as the next step. I strongly believe that today,
we can leverage statistical modeling otherwise known as data mining or
machine learning for supporting or rejecting our business hypothesis
along with well-established historical data analysis techniques.
R & Python is the most common language that gives us the ability
to cleanse, process, and devise apposite models for data mining. But,
there are several other tools like MicroStrategy, SAS, AWS Analytics
platforms, Google Cloud Platform, Microsoft Azure, etc., which provide
exceptional capabilities to implement data mining.
Despite the overwhelming choices that we have, all of these platforms are aiming to achieve the same underlying data mining functionality. The structure & architecture could be different but as we go through their extensive documentation, we find that they all are providing the same features like gathering the data, cleaning it, preparing or transforming it, building statistical models on top of it, and reporting applicable statistical parameters for validating our hypotheses.
Indeed we are well-equipped to make thorough analysis today as there
are ample platforms to facilitate every kind of analysis we can think
of, to make “Pivot or Persevere” kind of decisions!
To dig further into this pool of platforms & data
mining, it is crucial to not just have the propensity to learn & use
the tool but to have an analytical mind, an understanding of statistics and
business workflow. This rests as the bedrock for determining how much
one should trust statistical analysis when making important
decisions that sometimes put billions of dollars at stake. Two terms
that I consider of prime importance in data mining and trustworthy
business decision-making perspectives are:
Hypothesis
Model accuracy
Understanding these well, I believe, shall ensure complete awareness
of the amount of confidence or risk involved while making critical
business decisions. Below is a small justification as to why I feel they
are imperative parameters.
Hypothesis: It is like the question that we are
trying to answer or a statement that we are trying to validate. Without
deciding the hypothesis, we would be maneuvering without purpose and
chances are, we would come across a plethora of information but would
not know how to use it. It could be as simple as ‘Type 2 Apparel is
having declining sales every year. This can be validated by analyzing
historical trends. Another example could be ‘Type 2 Apparel belongs to a
group that is contributing minimally to revenue each year. This could
be verified using the statistical method of clustering. It is essential to
have an apt hypothesis as it will decide if apparel of type 2 should
be discarded from manufacturing going forward or not. It is also
needed to channelize the data mining/analysis process.
Modelaccuracy: There are several
subtle caveats to the statistical numbers reported in data mining. Most
numbers come with a probability of their correctness. The accuracy is
specific to data used for building the model. However, it may occur that
the model which gave 90% accuracy on the training data set, perform
poorly on real data. Cross-validation techniques should be
used before reporting the model’s accuracy to the business. Not just
that, clear communication that the accuracy operates at a certain
probability of success (and failure), provides all the necessary
risk-assessment facts to business decision-makers as they take key
decisions.
Thus, with appropriate knowledge about statistics, to “Pivot or
Persevere” hypothesis today, can be supported by historical data
reporting & data mining methodologies.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!