5 Essential Steps to get your Predictive Analytics initiative right.

Share

Originally published on LinkedIn · August 23, 2017

https://pixabay.com

Businesses of all sizes are trying to stay ahead of the game by embracing predictive analytics. If you’re the individual charged with leading a predictive analytics initiative, how do you get it right?

Here are five general best practices that will help you keep your project on the right path.

1.Work towards business outcomes

Many Big Data and predictive analytics projects start as technology projects and stop after acquiring infrastructure. No doubt about it, it can be exciting to buy new hardware, install Hadoop and implement all of other key infrastructure components you need to unleash the power of Big Data. But without getting alignment and securing backing from the business, those projects are almost certain to fail. As a result, it is absolutely imperative that Big Data and predictive analytics projects be driven by business outcomes.

The first step in this process is securing business buy-in and identifying the outcomes you want predictive analytics to impact. If that’s in marketing, it could be identifying the customers likely to purchase and likely to drop. In sales, it might be discovering the optimal price you can set for your products. In product management, it could be about predicting the products likely to enter their decline phase. Whatever the case, ensure that you are driving towards concrete business outcomes, not just tackling a new technology project.

2. Start small

It’s true that the accuracy of predictive analytics outcomes depends on the richness of the underlying data. But the law of diminishing marginal returns is at play here. Start by identifying a small set of potential predictors that matter to the outcome, then establish low-tech mechanisms to capture and collect the data. That could be as simple as extracting data to CSV files and making them available to machine learning algorithms. You don’t need to make this a data warehouse project to find success.

When first launching your program, it’s crucial to ensure all projects are manageable, system performance is acceptable and the results are measurable. All the other pieces, like automation, scheduling, peak performance, etc. are just nice-to-haves at this point.

3. Leverage the cloud

The Big Three in cloud – Amazon AWS, Google Cloud and Microsoft Azure – all have solid infrastructure to enable Big Data, machine learning and predictive analytics, and each supports a large variety of tools and programming languages in addition to their own machine learning libraries. That means that unless you have a large amount of hardware and plenty of time on hand, there’s really no reason to stand up your own infrastructure on the premises.

Cut that build time out and leverage the cloud to get up and running in no time. You should also be aware that in many cases, it’s not even possible to replicate what Google or AWS can provide. Tools like Google’s Big Query, AWS’s Athena and RedShift crunch millions of records in a matter of seconds. You will have to make a massive hardware investment if you’re aiming for similar performance in-house. And why would you, when Big Query costs just $5 for each terabyte of data processed?

4. Ensure Data Quality

One thing you can’t overlook or shortchange throughout this process is the quality of your data, as it directly impacts the accuracy of your predictions. Make sure you’re addressing missing values, standardizing the data and ensuring the data definitions are well understood. In fact, it is quite possible that the largest amount of time and effort you will spend will be dedicated to ensuring your data quality is acceptable.

There are certain facets that can be accomplished at a purely technological level, including instance removing duplicates, standardizing of zip codes, regions, names of states, etc. But when you are embarking on a predictive analytics project, you should also have a plan to address the treatment of missing data. Ideally, this treatment will be based on collaboration with business partners and data scientists and sit at the heart of every predictive analytics project.

5. Establish measurement methodology

How do you gauge whether your predictive analytics project is successful? There are two different, but related tracks that you need to follow:

(a) Avoid overfitting and underfitting: It’s easy to design a predictive model that perfectly models past behavior, yet fails when used to predict the likelihood of future events. On the other hand, if the model is too generic and doesn’t adequately explain past behavior, it may lead to underfitting. This is a common trap. Getting it right requires quite a bit of rigor, a solid understanding of the predictors that matter to your business and, in most cases, a significant number of iterations.  When you commission a predictive analytics project, ask your data science team how they plan to avoid overfitting or underfitting.

(b) Baseline Comparison: Make sure you record the historical measures before embarking on a predictive analytics project. Depending on the business outcome you’re targeting, these could include forecast accuracy, conversion rates, churn estimates, etc. When you start making decisions based on predictive analytics, record the results and the instances during which you did not act on those predictions.  Having a solid methodology to measure the impact of your program goes a long way towards refining predictive models, allowing you to trust the recommendations they provide.

More questions? Still trying to figure out where to start? Drop me a line at info@vectorscient.com or comment below – I’d be happy to give you some more advice!

Thank you for reading the post.

About Author: Suresh Chaganti, Co-Founder & Strategy advisor at VectorScient. Suresh specializes in Big Data and applying it to solve real world business problems. He brings in 2 decades of experience in architecting B2B and B2C applications across a variety of Industry verticals. Connect with suresh on linkedin