Agenda
Overview
• • • • Why Analytics? Business Problems that can be addressed with analytics Analytic approaches to solving business problems Introduction to the two examples
Volumes of Data – How to Extract Maximum Utility
Data Intelligence Information Knowledge
Foresight
Hindsight
OLAP
Insight
Advanced Analytics Drilldown
ETL Sums and Means Statistical Predictions
Exponential growth of Operational Decisions corporate data and computing power in the
past two decades
• ETL with sums and means provides hindsight from corporate measurements • OLAP with drilldown provides insight from the ETL data warehouse • Only advanced analytics with statistical predictions provides foresight from the ETL data warehouse
Means are useful. Understanding the distribution around the
mean and what contributes to that distribution is essential to compare populations and make predictions
Interpreting the Variability of a Population
Statistical techniques “predict” the future by apportioning
variance in the population to explanatory variables
As sales change over time in a well defined pattern, future sales
can be predicted
If the likelihood of buying a product is associated with
demographic characteristics, then we can predict how likely a particular individual is to buy that product
The Problem Defines the Solution
Business executives and analysts have always made
operational decisions
• Intuition and experience can be used • Sums and means can provide an historical direction • OLAP and drilldown can provide a better or more detailed perspective • Only advanced analytics can provide a sophisticated point of view on the future of the business
Railroad must have efficient schedules to move freight
Problem Defines Solution – Example 1
• Before computers, colored strings on a bulletin board were used – time on the X-axis and distance on the Y-axis • Constraints included no crossing of trains except at sidings and stations
With computers, the business analyst could manipulate the trains
and visualize on the screen
• However, there was no guarantee of a “best” decision that produced optimal usage of the tracks to move the most freight in the minimum amount of time
Problem Defines Solution – Example 2 Herbicide producer wants to deliver time sensitive herbicide to
farmers immediately prior to the planting of the corn
• Chemical company uses hindsight as to when the farmers planted the corn in previous years • Business experts also have a “sense” for whether the planting will be earlier or later than previous years
Since the problem is to know beforehand when the farmers will
plant their corn → Go visit the farmers!
• Farmer walks out of house in the morning and sticks wet finger in air to gauge temperature, kicks dirt to gauge moisture, and looks over horizon to see if neighbors are planting their corn. • Using a linear regression approach in each of 98 agricultural districts with the following inputs:
− Daily temperatures combined as necessary in day groups − Precipitation amounts grouped as appropriate − Records of previous years plantings
With analytics, one takes the problem and understands process
The best solutions often involve the combination of a
number of analytic techniques (as necessary) combined with business rules that also constrain the solution
SAS/OR – Finds optimal solution in system of constraints Enterprise Miner – Predictive modeling, e.g., which customers
are most profitable and/or most likely to respond to an offer
ETS and HPF – Forecasting, e.g., what are the future sales or
demand based on history and other related factors
Business Cases
Marketing Performance Optimization /
Trade Promotion Optimization
• Understand and predict the ROI on promotions, advertising and other mass marketing tactics • What’s the optimum mix of marketing tactics?
Bank Call Center Text Mining
• Explore use of text mining to add value to Bank modeling efforts to predict attrition • Analyze call center comments for additional lift in predicting attrition from primary accounts
“The transformation of TPM [Trade Promotion Modeling], in conjunction with MMM [Market Mix Modeling], from a tactical to a more overarching and encompassing strategic function is well on the way. At this very moment…the question of full functionality is less of an ‘if’ , but ‘when.’”
-- Michael Forhez and Charlie Chase, in ‘Consumer Goods Technology’, March 2005.
The “When” is Now
MPO/TPO is designed to:
• Calculate the business impact of multiple marketing channels. − In isolation − In combination • Consider any and all potential variables - controllable and uncontrollable • Allow for changes in variables and desired outcomes with minimal effort • Predict future business outcomes based on specific marketing mix and promotional scenarios • Provide the platform for marketing mix optimization
The MPO/TPO Offering
Foundational elements include:
• Flexible data model • Model automation procedures • User interface elements − Interactive − Web based • Executable Master Marketing and Promotional Plan • Marketing campaign scenario forecasts to test effectiveness and cross product cannibalism
The MPO/TPO offering considers the effect of multiple variables,
18
Sample Variables for a CPG Client
• Syndicated data (AC Nielsen, IRI) • Shipment and Order history • Promotion calendars • Fund allocations • Pricing • Brand/category/market development index
The MPO/TPO offering considers the effect of multiple
variables, across multiple distributors, on trade promotion performance
Accesses the Modeling Procedure
Assimilates past business
history using:
• Singular Value Decomposition • Linear regression with Lagged Values • Dynamic Neural Network Modeling
Delivery and Implementation
SAS Software Foundation and Analytics Consulting for customization to business needs
• Requirements
− Client data access − Customized analytics − Customized reporting
• Design • Customized Development • Testing, Documentation, and Installation
Advertising and promotional spending is coming
under increased scrutiny
The Commencement of a New Era
Getting the spend “right” is a complex problem More and more data are available
• Robust data management, sophisticated modeling, and content expertise are ‘must haves’ to predict results and optimize spending
SAS has assembled the right software, partners, and
experience to make this work
Objective
Explore use of text mining to add value to
Bank modeling efforts to predict attrition
• Loss of deposits less money to loan at interest adverse impact on Bank’s profits
Analyze call center comments for additional lift in
predicting attrition from primary accounts
• Information in unstructured text may add significant value to model performance when combined with “traditional” data mining practices
Sampling
Bank call center data collected from
June, 2003 (Numbers altered for confidentiality) May, 2003-
• 900,000 records at account level supplied to SAS • Chose existing primary customers (750,000 records) • Multiple calls per account required consolidation of data and comments to single account-level observation − After consolidation:
600,000 accounts in good standing 9,000 voluntary attritors (1.47% attrition rate) 4,500 involuntary attritors (0.73% attrition rate) -----------613,500 accounts used in analysis
30
Exploratory Data Analysis
Findings
• Attritions are a “rare event” (voluntary attrition rate = 1.47%) • Significant imbalance in comments − 40% Blank, 30% Direct Mail
• Strong concentration of comments into few classes will affect performance of text mining models
EDA (continued)
Observe similar distribution of comments in voluntary
attritor, nonattritor comments
Since distribution of comments and “Direct Mail” is similar,
we will assume that these two kinds of comments may be removed without affecting the analysis so that other comments may “speak”
EDA (Text Mining Node)
Using complete data produced two clusters
• 20% sample of voluntary attritors, good accounts
Blank comment Mostly Direct Mail Terms
Omitting blank and “Direct Mail” comments eliminates
imbalance in comments, reveals more clusters (20% sample)
Modify
Perform “optimal binning” of interval variables with
respect to target variable to change them into ordinal variables
• Represent continuous variable as set of ordered indicator variables to better concentrate target variable into small number of bins • Variables Age_Yrs, Cust_Tenure_Mo, N_Phone_Calls were transformed − For example, Age_Yrs was binned into following intervals 0-24, 24-38, 38-75, 75+
Model
Modeled voluntary attrition to predict who would
deliberately close account
Partitioned data
• 50% Training / 25% Validation / 25% Test (Holdout)
Built stratified models based on voluntary attrition
• Used all voluntary attritors (N=9,000), randomly-selected nonattritors (N=9,000) • Data Mining model (no text-based information) • Text Mining model (only text-based information) • Hybrid Data + Text Mining model − structured data + structured text-based information
Applying Results of Text Mining (cont’d)
Use cluster membership as “trigger”
• Cluster 3 has lift of 4.59 − Terms: – Trigger is life cycle event: marriage, birth of child, buying a home, death, … • Cluster 5 has lift of 2.37 − Terms: – Trigger is financial distress: bankruptcy
Conclusion
Hindsight with ETL and Sums & Means is Good
• Important to get a view into your data
Insight with OLAP and Drilldown is Better
• You obtain a better sense of where your business is now and at whatever level of summary or detail you want
Foresight with Analytics is Best
• You obtain a confidence of where your business is going in the future so that you can take appropriate action now to be prepared.