Predictive Insights - why and how?

Share

Originally published on LinkedIn · October 04, 2017

https://pixabay.com

Predictive analytics: Not just a buzzword.

People describe predictive analytics in a wide variety of different ways. Some describe it in terms of hindsight, insight and foresight frameworks. A few talk about the maturity model – beginning with static reports and advancing to trend analytics before finally graduating to predictive models. Still others describe predictive analytics in the terms of Big Data and Machine Learning.

Regardless of how you define predictive analytics, it is important to understand the impact it can have on your business, as well as why it is more impactful than traditional reporting and business intelligence. In this discussion, we’ll work through the lens of an example—sales at a fictitious E-commerce company—to illustrate what makes predictive analytics special and how it can deliver better business outcomes.

But before we go much further, let’s define a couple of key terms relevant to our discussion—outcomes and variables.

An outcome is the result of efforts a business makes and the actions it takes. Those efforts could take shape in the form of investments, channels through which it markets, the number of partner affiliates a business has, or several dozen other critical business actions. Sales, site traffic and inventory levels are all outcomes of efforts and actions like those above. Consider the following:

Typically, marketers run reports such as:

  • Traffic by channel

o  Paid, Organic, Affiliate, Social

  • Customer Acquisition Cost by channel

o  PPC, Affiliate, Print Ads, Social Ads, etc.

  • Sales by Promotion Type

o  Free Shipping, BOGO, $ off, % off, etc.

Product Managers run reports such as:

  • Sales by channel

o  Own Website, Retailer Store, Partner, etc.

  • On-Hand inventory by Channel

o  Own Website, Retailer Store, Partner, etc.

The key word here is by. Every word followed by, well, “by,” is a variable that can impact discrete and measurable outcomes like sales, customer acquisition costs, site traffic, etc.

Those familiar with data warehouse and business intelligence methods might call those items measures and dimensions. We’ll stick to calling them variables and outcomes, because it makes more sense in the context of predictive analytics.

Did you notice any commonalities among the examples above? Each and every one shows an outcome with respect to a single variable:

  • Customer acquisition cost and channel
  • Sales and time
  • Sales and channel
  • Traffic and channel
  • Sales and customer type

Of course, in real world, it is not a single variable that influences the outcome. In nearly every situation, an outcome is influenced by a combination of variables—not just channel, or customer type, or promotion type or seasonality. But many analytical methods have an inherent flaw: either they don’t account for all variables together, or they draw incorrect conclusions based on a limited number of variables.

For example, sales for an E-Commerce company can be expressed in an equation containing several variables, as shown below:

Sales for this month = Amount of Investment in marketing channels + Number of new products introduced + Number of affiliates where the product is advertised + Number of products with review of 4 star rating or above + Average number of actions that website visitor takes + some unknown factor

This isn’t the same equation you would use for your business. Or that your competitor would use for his. Every equation is unique because even IF the variables were all the same, the impact would be weighted differently for each individual business. Of course, this leads us to two pretty critical questions:

1.      Do you know the equation for your business?

2.      If you do know the equation, do you have any idea how much each variable influences sales?

Big Data and Machine Learning combine forces to provide you with the predictive insights you need to answer these 2 critical questions.

Why do we call it Big Data?

Let’s continue with the example of sales for our fictitious E-Commerce company. The data required for building the equation to predict sales comes from different sources, including:

  • The E-Commerce website
  • Marketing databases
  • CRM databases
  • Social media
  • Other internal and external sources that contain sales-related data

And in order to understand the sales equation, you’ll need to collect:

  • Structured and numeric data such as sales, orders, inventory, customer names, product names
  • Unstructured data such as website visitor logs, text comments, surveys, chats, etc.

You should note that data is going to come in fast, and at a very high volume. That’s why…

Big Data is defined by the Three Vs: Volume, Velocity and Variety.

Since the data needed for predictive analytics has a large volume and variety of data, and comes in extremely quick, we call it Big Data.

There are some who add two additional Vs: Value and Veracity. That is, you need to be concerned with the quality (veracity) of the data, and you should only collect data if it could be of concrete value to your business. In other words, don’t start hoarding massive amounts of low-quality data, since it adds little value and can actually be detrimental to the pursuit of your long-term predictive analytics goals.

What does Machine Learning Do?

As we discussed earlier, most reporting or business intelligence applications—Excel, Cognos, Tableau, you name it—are not equipped to account for all of the variables that matter. It’s just an inherent limitation of those reporting and analytics tools.

So how do you look at more than a handful of variables together? Or, more importantly, how can you understand the interplay between those variables and how, individually or collectively, they impact your sales? Enter machine learning.

Once we feed all of our data into a machine learning algorithm, it will start working to:

  • Identify all the variables that matter for predicting sales. In other words, it will construct the equation for you.
  • Tell you which variables have highest impact on sales.
  • Predict the sales that are going to happen.

This is an oversimplification of the process (just ask your data scientist). That’s why there’s an entire discipline of data science dedicated to understanding the outcomes we want to predict, interpreting the available data, selecting the right algorithms to use, configuring them and, finally, interpreting the results.

How accurate are the predictions?

Think back on our equation for sales:

Sales for this month = Amount of Investment in marketing channels + Number of new products introduced + Number of affiliates where the product is advertised + Number of products with review of 4 star rating or above + Average number of actions that website visitor takes + some unknown factor

That “some unknown factor” is made up of bunch of different things. We won’t get too far into it, but Donald Rumsfeld sums it up quite nicely with his explanation of Known-Knowns:

Known-Knowns are simple. They’re items you have already factored into the equation. But there are three types of unknowns that can’t be factored into the equation because the data is difficult to obtain or impossible to learn in advance. The first of these?

  • Known Unknowns – These include things such as customer feedback that are embedded in surveys and emails and cannot be made easily available for machine learning algorithms to factor into predictions. In some cases, this might include external data you know will impact the outcome, but which you have no access to.
  • Unknown Knowns – These include things that are very likely (but not quite certain) to happen, like hurricanes in the Atlantic. You can factor some of these into your predictions, but results will ultimately depend on the accuracy of your guesses.
  • Unknown Unknowns – These are the things nobody could reasonably expect or anticipate, like tsunamis, earthquakes, major crimes etc. Again, you can model some of these, but you’ll get lost in the predictive wilderness pretty quickly.

How can you improve prediction accuracy?

If you’re able to understand all of the factors that impact sales at your company, and most of them are within your control, it’s possible to achieve highly accurate predictions. The first step is to focus on the “Known-Knowns”, the data within your reach, and make it available to machine learning algorithms. This could be data available internally or you need to get from external sources. The more relevant data you feed to the algorithms, the more accurate the prediction will be. Macro-economic projections? Useful. Syndicated data from Nielsen? Equally so. You also probably have some valuable internal data that’s tough to reach, but you can (and should) work with your IT and data teams to make it available as quickly as possible. Again, the more data (provided it’s high quality) the merrier.

Where do we go from here?

Predictive analytics isn’t just an extension of reporting and trend analytics. For organizations, it’s a game-changer—especially if you understand the underlying principles and invest in building the required capabilities. Questions? Feel free to reach out to Veda or Suresh anytime!

Thank you for your time and attention!

About Author: Suresh Chaganti, Co-Founder & Strategy advisor at VectorScient. Suresh specializes in Big Data and applying it to solve real world business problems. He brings in 2 decades of experience in architecting B2B and B2C applications across a variety of Industry verticals. Connect with Suresh Chaganti on linkedin