Predictive Analytics 101
Knowing what is happening in your business right now is the first step to making smart business decisions. This is the core of KPI scorecards or business intelligence (BI).
Analytics takes this a step further. Can you understand what is taking place (BI) and also anticipate what is about to take place (predictive analytics).
By automatically delivering relevant insights to end-users, managers and even applications, predictive decision solutions aims to reduces the need of business users to understand the ‘how’ and focus on the ‘why.’
The goal of predictive analytics = [Better outcomes, smarter decisions, actionable insights, relevant information]. How you get there varies.
There are four types of data analysis:
- Simple summation and statistics
- Predictive (forecasting),
- Descriptive (business intelligence and data mining) and
- Prescriptive (optimization and simulation)
Predictive analytics leverages three core techniques to turn data into valuable, actionable information:
- Predictive modeling
- Decision Analysis and Optimization
- Transaction Profiling
Predictive modeling identifies and mathematically represents underlying relationships in historical data in order to explain the data and make predictions, forecasts or classifications about future events.
Predictive models typically analyze current and historical data on individuals to produce easily understood metrics such as scores. These scores rank-order individuals by likely future performance, e.g., their likelihood of making credit payments on time, or of responding to a particular offer for services.
Predictive models can also detect the likelihood of a transaction being fraudulent (Risk Detection). Predictive models are frequently operationalized in mission-critical transactional systems and drive decisions and actions in near real time.
A number of analytic methodologies underlie solutions in this area including:
- Applications of both linear and nonlinear mathematical programming algorithms, in which one objective is optimized within a set of constraints,
- Advanced “neural” systems, which learn complex patterns from large data sets to predict the probability that a new individual will exhibit certain behaviors of business interest.
- Statistical techniques for analysis and pattern detection within large datasets.
Predictive models summarize large quantities of data to amplify its value. The value chain for predictive modeling in a M2M scenario is shown below (source: Greenplum Blog). It’s all about having the right people and right models.
Decision Analysis and Optimization
Decision analysis refers to the broad quantitative field that deals with modeling, analyzing and optimizing decisions made by individuals, groups and organizations. Some applications include optimizing supply chain management, tracking key performance indicators, uncovering hidden sales opportunities and determining runaway operating costs.
Whereas predictive models analyze multiple aspects of individual behavior to forecast future behavior, decision analysis analyzes multiple aspects of a given decision to identify the most effective action to take to reach a desired result. Most consulting firms leverage decision analysis to provide custom-made data-driven solutions to a variety of business applications. Apart from statistical modeling and data analysis, the focus is also on understanding business challenges and delivering action oriented solutions.
Integrated approaches to decision analysis incorporate the development of a decision model that mathematically maps the entire decision structure; proprietary optimization technology that identifies the most effective strategies, given both the performance objective and constraints; the development of designed testing required for active, continuous learning; and the robust extrapolation of an optimized strategy to a wider set of scenarios than historically encountered.
Optimization capabilities also include a proprietary mathematical modeling and programming language, an easy-to-use development and visualization environment, and a state-of-the-art set of optimization algorithms.
Transaction profiling is a technique used to extract meaningful information and reduce the complexity of transaction data used in modeling. Many solutions operate using transactional data, such as credit card purchase transactions, or other types of data that change over time.
In its raw form, this data is very difficult to use in predictive models for several reasons. First, an isolated transaction contains very little information about the behavior of the individual who generated the transaction. In addition, transaction patterns change rapidly over time. Finally, this type of data can often be highly complex.
To overcome these issues, a set of proprietary techniques are used to transform raw transactional data into a mathematical representation that reveals latent information, and which make the data more usable by predictive models. This profiling technology accumulates data across multiple transactions of many types to create and update profiles of transaction patterns. These profiles enable the neural network models to efficiently and effectively make accurate assessments of, for example, fraud risk and credit risk within real-time transaction streams.
Increasingly, teams are pushing the envelope of how to use information retrieval, machine learning, computational linguistics, matrix and graph algorithms, unsupervised clustering & data mining to solve predictive problems.
Decision Analytics: Automated Insights is the Objective
Do you have the right toolset, dataset, skillset and mindset for decision analytics? Companies accumulate huge amounts of data on a day-to-day basis. Analytics helps unravel that locked information and use it to the advantage of the company. The key is a powerful merger of statistical data mining and a consultative approach which enables companies to make more effective decisions while addressing their business challenges.
However, delivering real-time actionable intelligence is not easy. Closed-loop performance systems that deliver continuous innovation and insight is the end-goal. Applications include marketing campaigns, customer behaviors, risk management, operations, financial and investment management. Below is a figure from HP that illustrates this central tenet of predictive analytics. You are free to replace the HP products with your own vendors
Enterprise data virtualization or aggregation is not trivial. The modern business analyst needs data from all over the place: the data warehouse, but also the Web, big data, production systems, as well as via partners and vendors. In fact, the typical analyst spends more than 50% of the time chasing data, which slows delivery of analytic insights and limits the time available for thorough analysis. Some refer to this conundrum as “the data problem.”
Who are some Predictive Analytics Providers
- focus is on generation of new data, insight/foresight
- exploring data, finding insights
- expect uncertainty and probability and pattern rather than specific data
- computational and probabilistic techniques
Full Range of Analytics include: Reporting, Relational & Multi-Dimensional OLAP, Discovery, Decisioning, Scorecards and Dashboards
Vendors who provide this capability include:
- Marketing services market — Fair Issac, Acxiom, Epsilon, Equifax, Experian, Harte-Hanks, InfoUSA, KnowledgeBase, Merkle and TargetBase, among others. These vendors compete with traditional advertising agencies and companies’ own internal information technology and analytics departments.
- Origination market — Fair Issac, Experian, Equifax, and CGI, among others.
- Customer management market — Fair Issac, Experian, among others.
- Fraud solutions market — Fair Issac, Actimize, a division of NICE Systems, ID Analytics, Experian, Detica, a division of BAE, SAS and ACI Worldwide, a division of Transaction Systems Architects, in the banking market; IBM and ViPS in the healthcare segment; and SAS, Infoglide Software Corporation, NetMap Analytics and Magnify in the property and casualty and workers’ compensation insurance market.
- Collections and recovery solutions market — Fair Issac, CGI, Experian, and various boutique firms for software and ASP servicing and in-house scoring and computer science departments, along with the three major U.S. credit reporting agencies and Experian-Scorex for scoring and optimization projects.
- Insurance and healthcare solutions market — Fair Issac, Emdeon, Ingenix, ViPS, MedStat, Detica, a division of BAE, SAS, Verisk Analytics and IBM.
- scoring model builders;
- enterprise resource planning (“ERP”) and customer relationship management (“CRM”) packaged solutions providers;
- business intelligence solutions providers;
- business process management and business rules management providers;
- providers of credit reports and credit scores;
- providers of automated application processing services;
- data vendors;
- neural network developers and artificial intelligence system builders;
- third-party professional services and consulting organizations;
- providers of account/workflow management software; and
- software companies supplying modeling, rules, or analytic development tools.
Behind the Covers: Analytics Techniques in Play
Example of Predictive Analytics: Coupons in Grocery Stores
Each Saturday, you head to Kroger (a grocery store) and fill up your cart. The cashier scans your items, then hands you a coupon – for $1.00 off your favorite brand of ice-cream. With hundreds of thousands of grocery items on the shelves, how does Kroger know what you’re most likely to buy?
Using predictive analytics and data from loyalty cards, computers in real-time are able to crunch terabytes and terabytes of your historical purchases to figure out that your favorite ice-cream was the one item missing from your shopping basket that week. Further, the computer matches your past purchase history to ongoing promotions in the store. So with your bill, you receive a coupon for the item you are most likely to buy next time.
Example of Predictive Analytics in Sports: ”MoneyBall” with Oakland A’s
Competitive sports is a heavy user of predictive analytics. The concept was refined in 1990s by the Oakland Athletics ( (Oakland A’s) and depicted in the Oscar nominated movie Moneyball.
The Problem: the New York Yankees were the most acclaimed team in Major League Baseball. Small market teams like Oakland Athletics (Oakland A’s) had to change the way they did business. The A’s were not a wealthy team, in fact were ranked 12th (out of 14th) in payroll. A core strategy question in sports is: How to compete with rich teams? How to spot and acquire low-cost undervalued talent that is a “force multipler”?
How did they do it? While the Yankees paid its star players tens of millions, the A’s managed to be successful with a low payroll. When signing players, they didn’t just look at basic productivity values such as RBIs, home runs, and earned-run averages. Instead, they analyzed hundreds of detailed statistics from every player and every game, attempting to predict future performance and production. Some statistics were even obtained from game footage by using video recognition techniques. This allowed the team to sign great players who may have been lesser known but were equally productive on the field.
Implications: The Oakland A’s started a trend, and predictive analytics began to penetrate the world of Baseball. The application of predictive analytics to a wide variety of sports is now standard practice. It’s important to note that baseball statistics is not new. Leveraging stats to make hiring decisions is.
Historical tidbit: Dodgers General Manager Branch Rickey hired the first baseball statistician in 1947, after which the use of statistical analysis in baseball grew. But the practice took a major leap forward in 1977 when Bill James began self-publishing works about a new discipline he called sabermetrics.
Social Data Analytics - connect data, insights, and people in the organization
The social enterprise is entering a new phase of evolution. The first phase was around new and innovative collaboration capabilities such as Facebook, Twitter, Digg, Yammer or LinkedIn. In this phase, the focus was better customer engagement through Twitter or Facebook.
The second phase is enterprise social — social embedded in apps such as CRM, Sales force management, marketing Intelligence or Data Management tools to embrace a more real-time streaming, “crowdsouring” architecture. Now we are seeing the trend of business applications taking on attributes of these consumer-facing sites to develop better predictive insight. For example, better data management (structured + unstructured; inside the four walls + outside data) within a CRM system could allow operations staff to give greater context to sales forecasts that show steep drops in certain product category sales.
Social data leverage brings in new capabilities so problems are identified more quickly and the resulting relevant insights can be explored.
Notes and References
- Sabermetrics uses statistical analysis to analyze baseball records and make determinations about player performance. James called sabermetrics “the search for objective knowledge about baseball”. Sabermetricians have questioned some basic assumptions about how talent and player contributions are judged and created quite a stir. But over time, many sabermetric ideas have found wide acceptance.
- Business value comes from consumption of data sciences or analytics, rather than the creation of analytics. Consumption – decisions and actions – is where competitive advantage is generated.
- Talent shortage — According to a McKinsey report, by 2018, there will be a shortage of 140,000 to 190,000 data scientists, and about 1.5 million managers and analysts who can use Big Data effectively to make decisions.
- About this Blog: The Business Analytics 3.0 blog covers some of today’s thorniest business problems around data strategy, technology, process, governance, and leadership.
- Data Science is increasingly becoming a catch-all buzz word that encapsulates statistics, Operational Research and Management Science. The danger is that the entire field might collapse under the weight of unrealistic expectations.
- linear algebra
- basic statistics
- linear and logistic regression
- data mining
- predictive modeling
- cluster analysis
- association rules
- market basket analysis
- decision trees
- time-series analysis
- forecasting machine learning
- Bayesian and Monte Carlo Statistics
- matrix operations
- text analytics
- primary components analysis
- experimental design
- unsupervised learning
- constrained optimization