Big Data Analytics Use Cases
Are you data-flooded, data-driven, data informed? Everyone is searching for ways to monetize data assets. But data is simply a means to an end. The end is not just reports, dashboards, heatmaps, knowledge, or wisdom. The target is fact based decisions and actions.
In other words, what is the use case that shapes the context for “Raw Data -> Aggregated Data -> Intelligence -> Insights -> Decisions -> Operational Impact -> Financial Outcomes -> Value creation.”
Best practice firms (and even political campaigns) can’t operate on anecdotes, opinions and gut instinct. They have to strike a measured balance between opinion vs. scorecards vs. KPI metrics. Rather than data-driven, they need to be data-informed. That’s a big shift.
For instance, high-end cars use telemetry to know that an engine part is likely to break down before it actually does, based on the vibration or temperature patterns, a technique known as predictive maintenance. The idea is that a part does not fail all at once. Instead, it deteriorates over time until it eventually breaks. By monitoring the part all the time, you can spot problems before they become obvious.
Similarly big data analytics will facilitate new scenarios. Some may even be disruptive similar to how MP3 players changed the music industry or electronic readers changed the publishing model. To be competitive, organizations will require new technology with clear implementation strategies, iterative test-and-learn environments and data science talent.
However, despite the rosy predictions, many organizations will flounder in their Big Data efforts not because they lack analytics capability but because they lack clear objectives or multi-year roadmaps in converting noisy data into useful signals.
So the first question is: What do you really want to achieve? Increased customer loyalty? A greater share of wallet via cross-sell? New customers? Lower attrition? In other words, what is the use case? As the old adage goes: if you don’t know where you are going, any road will get you there.
Starting with a clear objective is essential. Big Data Analytics promise: enable “data monetization” through more timely, more accurate, more complete, more granular, more frequent decisions. So, what exactly are the types of business problems big data analytics likely to solve? For this you need a mini-MBA in Big Data Use Cases.
Use cases described in this posting are meant to stimulate ideas of how to apply iterative big data analytics in your own organization and enable your own analytics center of innovation. Some interesting Big Data use cases I have come across recently include:
- Insurance — Individualize auto-insurance policies based on newly captured vehicle telemetry data. Insurer gains insight into customer’s driving habits delivering: (1) More accurate assessments of risks; (2) Individualized pricing based on actual individual customer driving habits; (3) Influence and motivate individual customers to improve their driving habits
- Travel — Optimize buying experience through web log and social media data analysis (1) Travel site gains insight into customer preferences and desires; (2) Up-selling products by correlating current sales with subsequent browsing behavior Increase browse-to-buy conversions via customized offers and packages; (3) Deliver personalized travel recommendations based on social media data
- Gaming – Collect gaming data to optimize spend within and across games: (1) Games company gains insight into likes, dislikes and relationships of its users; (2) Enhance games to drive customer spend within games; (3) Recommend other content based on analysis of player connections and similar “likes” Create special offers or packages based on browsing and (non-)buying behavior
First, let’s define what makes data Big to set some context.
Big Data, Little Data
Unless you have been living in a cave you probably have heard about Big Data. We live in a world of data: transactions, feedback, and realtime interaction with customers, partners, suppliers, and employees.
Big data is where the volume, velocity, variety, verticalization (context) and value of the data itself is now part of the problem.
3 reasons why we are generating data faster than ever:
• Processes are increasingly automated
• Systems are increasingly interconnected
• People are social and increasingly generate data exhausts by interacting online
Data, in general, falls into 3 categories:
- Business application data (e.g., SAP or Oracle ERP)
- Human-generated content (e.g., social media) and
- Machine data (e.g., RFID, Log Files etc.).
In addition to brick, click and mobile business app transactions, the new variable in the mix is Human generated and machine generated data – explosive growth of blogs/reviews/messages/emails/pictures. The Twitter firehose alone generates 7+ terabytes — 10s of millions of tweets per day and is growing rapidly. Facebook is estimated to generate 10+ terabytes a day. Social graphs such as product recommendations based on circle of friends, jobs you may like (linked in), products you have looked at, people who are your contacts etc also create “second order” data that can be mined for sentiment analytics on products or companies or fact discovery.
Another new variable is computer generated data. Computers generate data as byproduct of interacting with people or other with other device. More interactions, more data. This data comes in a variety of formats from semi-structured log files to unstructured binaries. This “exhaust fumes” of data can be extremely valuable. It can be used to understand and track application or service behavior so that we can find patterns, errors or suboptimal user experience. We can mine it for statistical patterns and correlations to generate insights.
However, if you listen to the hype, companies can harness this information learn faster, make better decisions, and stay one step ahead of their competitors. Unfortunately, harnessing big data (and separating the signal-from-noise) is trickier than it looks. It takes a lot of skill and superb understanding of use cases.
Big Data Use Cases
The key to exploiting Big Data Analytics is focusing on a compelling business opportunity as defined by a use case — WHAT (What exactly are we trying to do?); WHAT value is there in proving a hypothesis?
A use case is at the core of any big data strategy. Most people don’t get this. In most companies, the charter for big data will be given to those who already have responsibility for IT, business intelligence or marketing. But what happens after the CEO hands you the big data portfolio? What is the use-case framework that will shape a big data strategy, understand the issues of managing data, and learn how data science can be used to create value.
Use cases are emerging in a variety of industries that illustrate different core competencies around analytics.
A Use Case provides a context for a value chain: how to move from “Raw Data -> Aggregated Data -> Intelligence -> Insights -> Decisions -> Operational Impact -> Financial Outcomes -> Value creation.”
Figure below illustrates some Use Cases along two dimensions: data velocity and variety.
Source: SAS and IDC
E-tailing – E-Commerce – Online Retailing Use Cases
e-tailers like eBay are constantly creating target offers to boost customer lifetime value (CLV); deliver consistent cross-channel customer experiences; harvest customer leads from sales, marketing, and other sources; and continuously optimize back-end process orchestrations.
- Recommendation engines — increase average order size by recommending complementary products based on predictive analysis for cross-selling.
- Cross-channel analytics — sales attribution, average order value, lifetime value (e.g., how many in-store purchases resulted from a particular recommendation, advertisement or promotion).
- Event analytics — what series of steps (golden path) led to a desired outcome (e.g., purchase, registration).
- Right offer at the right time
- Next best offer - deploying predictive models in combination with recommendation engines that drive automated next best offers and tailored interactions across multiple interaction channels.
- True-lift modeling and analytics — aimed at “Stopping spending direct marketing dollars on customers who would purchase anyway! The goal of analytics is to identify:
- which customers will purchase without receiving a marketing contact
- which customers need a direct marketing nudge to make a purchase
- which customers have a negative reaction to marketing (and purchase less if contacted)
So how big is the data on which the algos have to operate? Consider this…eBay’s “Singularity” Teradata warehouse exceeds 40 petabytes. According to eBay, the company’s data volumes are 50+ terabytes per day in new incremental data, processing 50+ petabytes and tens of millions of queries per day, with 99.98% availability and more than 50 petabytes of online storage.
Interesting Use Case – Amazon Will Pay Shoppers $5 to Walk Out of Stores Empty-Handed
Interesting use of consumer data entry to power next generation retail price competition…. Amazon is offering consumers up to $5 off on purchases if they compare prices using their mobile phone application in a store. The promotion will serve as a way for Amazon to increase usage of its bar-code-scanning application, while also collecting intelligence on prices in the stores.
Amazon’s Price Check app, which is available for iPhone and Android, allows shoppers to scan a bar code, take a picture of an item or conduct a text search to find the lowest prices. Amazon is also asking consumers to submit the prices of items with the app, so Amazon knows if it is still offering the best prices. A great way to feed data into its learning engine from brick-and-mortar retailers.
This is an interesting trend that should terrify brick-and-mortar retailers. While the real-time “Everyday Low Price” information empowers consumers, it terrifies retailers, who increasingly are feeling like showrooms — shoppers come to to check out the merchandise but ultimately decide to walk out and buy online instead. See Multi-channel to Omni-channel Retail Analytics: A Big Data Use Case
Retail/Consumer Use Cases
- Merchandizing and market basket analysis.
- Campaign management and customer loyalty programs.
- Supply-chain management and analytics.
- Event- and behavior-based targeting.
- Market and consumer segmentations.
Predictive analytics is well understood by the retail industry. Retailers want to predict factors that might be important for a buyer to make a purchasing decision before that product ever was stocked on shelves. What if retailers could know exactly what market dynamics were modifying demand curves before they occurred? Imagine the impact on operational efficiencies in terms of inventory cost control, intelligent distribution and routing, and demand projection. The retail Use Cases are quite varied.
Food Retailing Use Case — For food retailers the fresh food category is important for customer satisfaction. Providing sufficient stocks while avoiding food waste makes for customers happy and keeps the retailer profitable. Many retailers are exploring how a fully automated, data-driven replenishment process is possible based on internal and external data sources combined with advanced predictive analytics. Some retailers are using data lineage to address “origin-to-destination” food ingredient safety issues.
Financial Services Use Cases
- Compliance and regulatory reporting
- Risk analysis and management
- Fraud detection and security analytics
- CRM and customer loyalty programs
- Credit risk, scoring and analysis
- High speed Arbitrage trading
- Trade surveillance
- Abnormal trading pattern analysis
Risk Modeling Use Case — A large financial institution took separate data warehouses from multiple departments and combined them into a single global repository in Hadoop for analysis. The bank used the Hadoop cluster to construct a new and more accurate score of the risk in its customer portfolios. The more accurate score allowed the bank to manage its exposure better and to offer each customer better products and advice.
Trade Surveillance Use Case — A large investment bank combines data about the parties that participate in a trade with the complex data that describes relationships among those parties and how they interact with one another. The combination allows the bank to recognize unusual trading activity and to flag it for human review.
Underwriting Use Case — ZestCash uses online data to determine the credit worthiness of new customers, offering a more modern way of underwriting. Instead of relying on tools like FICO scores, ZestCash pulls in a wealth of data to help rank a person’s likelihood of defaulting. Using data like cell phone bill payments or the length of stay at a residence help provide a fuller picture about a person’s ability to pay off loans.
Regulatory Monitoring - Regulatory oversight and regulations are being created or extended to cover more financial markets and market scenarios to try to close gaps or loopholes that may have contributed to the financial crisis. Pressure to monitor all aspects of financial institutions has created a new patchwork of regulatory regimes. Some regulations are quite prescriptive in terms of what, where, when and how to manage data.
Fraud Use Cases
Fraud management helps improve customer profitability by predicting the likelihood that a given transaction or customer account is experiencing fraud. Solutions analyze transactions in real time and generate recommendations for immediate action, which is critical to stopping third-party fraud, as well as first-party fraud and deliberate misuse of account privileges.
Solutions are typically designed to detect and prevent a wide variety of fraud and risk types across multiple industries, including
- credit and debit payment card fraud;
- deposit account fraud;
- technical fraud and bad debt;
- healthcare fraud;
- Medicaid and Medicare fraud;
- property and casualty insurance fraud,
- workers’ compensation fraud.
Global payment card fraud detection Use Case. The goal is to analyze payment card transactions in real time, assesses the risk of fraud, and takes the user-defined steps to prevent fraud while expediting legitimate transactions. To enable this objective predictive models and profiling technology are used to examine transaction, cardholder and merchant data to detect a range of credit and debit card fraud quickly and accurately.
To improve fraud detection rates, merchant profiles are often pre-built. Merchant profiles are built using fraud and transactional data that include characteristics that reveal merchants that have a history of higher fraud volumes, and which purchase types and ticket sizes have most often been fraudulent at a particular merchant.
Insurance Fraud Use Case — Using predictive modeling to detect claims fraud, abuse and errors before payment identifies suspicious providers as soon as aberrant behavior patterns emerge.
Healthcare and Workers Compensation Use Case — uses predictive modeling to detect claims fraud, abuse and errors before payment identifies suspicious providers as soon as aberrant behavior patterns emerge.
Web & Digital Media Services Use Cases
Much of the data we currently work with is the direct consequence of Web 2.0. Customers generate a trail of “data exhaust” that can be mined and put to use.
- Large-scale clickstream analytics
- Ad targeting, analysis, forecasting and optimization
- Abuse and click-fraud prevention
- Social graph analysis and profile segmentation
- Campaign management and loyalty programs
Clickstream Use Case – Big Box Retailer need to analyze their clickstream – §3.5 billion records; 71 million unique cookies; 1.7 million targeted ads required per day. The problem: how to improve Return on Ad Spend (ROAS). Also how to speed up the analytics so consumers get more relevant ads quicker, which is especially important during holiday seasons!
Suggestion Use Case. Yelp is growing rapidly and with more than 50 million of monthly visitors and 18 million or reviews the company generates about 400GB of data a day. That data needs to be processed and analyzed. A simple use case is Spelling Suggestions. By looking at millions of misspelled words Yelp uses an algorithm to create suggestions for common misspellings. By looking at typical queries, yelp can list common suggestions for a query even before you finish typing. This is possible because Yelp analyses all the web logs from their websites
Government Use Cases
- Fraud detection
- Threat detection
- Compliance and regulatory analysis
- Energy consumption and carbon footprint management
- Sentiment Analytics
- Mashups – Mobile User Location + Precision Targeting
- Machine-generated data, the exhaust fumes of the Web
Online Dating Use Case: A leading online dating service uses sophisticated analyses to measure the compatibility between individual members, so that it can suggest good matches for a potential relationship.
Big Data Analytics helped customers find romance. The algorithms that power Match.com are not very different than those that are behind LinkedIn.
Social Gaming: Zynga in their S-1 filing claimed that “they process and serve more than a petabyte of content for players every day, a volume of data that is unmatched in the social game industry. We continually analyze game data to optimize our games. We believe that combining data analytics with creative game design enables us to create a superior player experience.”
Healthcare & Life Sciences Use Cases
- Health Insurance fraud detection
- Campaign and sales program optimization
- Brand management
- Care Management
- Patient care quality and program analysis
- Medical Device and Pharma Supply-chain management
- Drug discovery and development analysis
Analyzing Electronic Health Records (EHR). The use case is aimed at aggregating and analyzing all of the patient Electronic Health Records (EHR) from hospitals and other healthcare providers and make them available online to doctors as they are examining the patients. This aims to bring down the cost of providing healthcare by sharing patient information between providers to reduce ordering duplicate tests and reduce the time taken to provide patient care. Current EPIC solution (similar to an ERP for hospitals) does not allow having more than a few months of historical patient information available online. Also, the current solution takes several minutes to search historical EHR records.
Big Data in Hospital Network. Instead of taking readings every few hours, a hospital continuously recorded data from all the medical instruments in a pediatrics ward. By capturing data and analyzing it and looking at it from maybe five or six different points of view, the analytics team was able to help the physicians spot an infection trends 12 to 24 hours earlier than they may have spotted it. That allowed doctors start a course of treatment that let them save the lives or shorten stays.
Healthcare Payor and Provider Use Case - Kaiser Permanente collects petabytes of health information on its 8-million-plus members, a fantastic amount. Some of this data was used in an FDA-sponsored study to identify risks with Vioxx, Merck’s pain medication, which was pulled shortly after the research identified a greater risk of heart attack in a subset of the patient population.
Healthcare Payor Use Case. Payors like Cigna, Aetna, Blue Cross and Blue Shield network and others are increasingly combining data from the pharmaceutical clinical trials with proprietary data to conduct comparative-effectiveness studies. Payers know more about drug performance data in some situations than the large pharma firms themselves. This gives payors a distinct advantage in negotiating payments. It also made it difficult for Pharma to get their drugs represented on national and country formularies, the all important drug approval lists from which physicians prescribe medications. How are pharma firms like Pfizer, AstraZeneca, GSK, BristolMyers and BioGen responding to this competitive disadvantage? They are investing and ramping up their own analytics program, partnering with data providers like IMS Health or Symphony Health or HealthCore, a clinical outcomes research subsidiary of health insurer WellPoint Inc.
Telecommunications Use Cases
- Revenue assurance and price optimization
- Customer churn prevention
- Campaign management and customer loyalty
- Call Detail Record (CDR) analysis
- Network performance and optimization
- Mobile User Location analysis
Telecom Use Case: A large telco provider analyzed call logs and complex data from multiple sources. The use case calls for batch aggregation and analysis of web logs, network base station logs and network signaling traffic for data mining and network route optimizations. The goal is to use the log data to build customer profiles (popular cellphone devices, popular websites etc), segment the customers, and optimize products and services accordingly.
Cell Phone Provider Use Case: Very large cellular service provider (>50M subscribers) needed to provide online access to customer cellphone call, SMS and web data records to their subscribers. The CDR data adding upto 30 TB of data every month and about 60 billion records added every month. The traditional database solution is not an option given the volume of data and the requirement for providing online access to call and billing history to the subscribers. The CDR data is stored on an 80 node HBase cluster. The CRM system directly reads data from HBase cluster for presenting monthly billing history.
Utilities Industry Use Cases
Utilities run big, expensive and complicated systems to generate power. Each grid now includes sophisticated sensors that monitor voltage, current, frequency and other important operating characteristics. Efficiency now means paying careful attention to all of the data streaming off of the sensors.
Take for instance, Southern California Edison which is collecting hourly (rather than monthly) data on customer usage from new digital smart meters in millions of resi- dences. It will soon be monitoring and giving frequent feedback to customers about their energy use, a significant benefit for energy grid management and customer service.
Utilities are now leveraging Hadoop clusters to analyze generation (supply) and consumption (demand) data via smart meters.
Smart meters – The rollout of smart meters as part of the Smart Grid adoption by utilities everywhere has resulted in a deluge of data flowing at unprecedented levels. Most utilities are ill-prepared to analyze the data once the meters are turned on.
So, What’s the Big Deal?
The big deal is that if analytics is done well as shown in the figure there is room for margin expansion and additional profit.
Big Data is full of valuable, unanswered questions! The challenge is separating the actual predicitive indicators – signal from the noise — in the data.
Companies who compete on analytics and delivering data-driven services tend to iterate quickly on big data. This enables rapid data exploration to identify unknown relationships and trends to create new products and services.
Data overload is going to be a huge challenge for businesses and a headache for decision makers. Public and private sector corporations are going to drown in data — from sales, transactions, pricing, supply chains, discounts, product, customer process, projects, RFID smart tags, tracking of shipments, as well as e-mail, Web traffic and social media. Without a smart use case strategy, a lot of this data will be wasted.
Notes and References
2. A business use case describes what the process does. It is meant to describe in technology-free jargon the business process that is used by its business actors (people or systems external to the process) to achieve their goals . The business use case will describe a process that provides value to the business actor.
4. Less than one percent of the world’s data is being analyzed. The IDC study, “Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East,” finds that little “big data potential” is being realized globally. As the digital universe reaches 40 zettabytes by 2020, it will have increased 50-fold since 2010. IDC recently raised its forecast by 5 ZB. –>READ THE FULL ARTICLE
5) See also: A Very Short History of Data Science ;
6) See also A Very Short History of Big Data
- Predictive Analytics 101 (quick overview)
- IRS Uses Analytics To Help Collect Delinquent Taxes (informationweek.com)
- Now that data’s going really big, what’s next? (forbes.com)
- The Age of Big Data, New York Times, Feb 12, 2012
- Omni-channel Retail Analytics: A Big Data Use Case (practicalanalytics.wordpress.com)
- IDC: Analytics a $51B business by 2016 thanks to big data