Database Marketing - Dominick's stores in Chicago distric

MarketingDatabaseAnalysis
Anna Andrusova
Nathan Bailey
James Ballard
Han Si
DeminWangMKT 6362. Database Marketing

Overview of Business Problem
• In the 1990’s and early 2000’s, Dominick’s was a chain of over
100 grocery stores in the Chicago Metropolitan area
• For this evaluation, we are performing a corporate-level as
well as a category-level data analysis
• Corporate Analysis – Relate store sales performance
with known demographics to facilitate corporate
planning activities and test potential locations
• Category Analysis – Relate category sales performance
with known demographics to improve sales
performance and expand product offerings

Data Description
Store-level historical data on the sales over more than seven year
period
Customer Count File
Daily sales of stores in 30 product
categories:
• Bakery
• Beer
• Cosmetic
• Dairy
• Meat
• Pharmacy
• Grocery
Store-Specific Demographics
Demographic profiles of stores:
• Age
• Single / Retired / Unemployed
• Mortgage
• Poverty
• Income
• Education
• Household size
• Working woman, etc
• Cheese
• Wine
• Health and Beauty
• Deli
• Fish
• Floral
• Jewelry, etc.

Data Preparation
Step 1. The latest year’s sales data was aggregated by Store and
summarized for the year from Customer Count File
Step 2. Demographic variables were added from Store Account File
Resulting data set:
• 1-record per store (94 stores) containing 12-month sales data and
store demographic data
• Sales data on 30 product categories (the ‘Behavior’ variables)
• 43 demographic variables for residents living near the store

Approach
1. Segmentation: create groups of the stores similar in their
performance according to certain group of product categories and
dissimilar to the other groups according to the same group of
categories
Method: Non-hierarchical and hierarchical clustering
2. Response Analysis: find targetable characteristics of identified
groups of the stores
Method: Discriminant analysis
3. Model Validation: evaluate performance of the models on a hold-out
sample (20% of the stores)
4. Recommendations and conclusions

Dominick’s Data Set
General Data Set
Corporate Analysis
Category Analysis
Data Preparation
Clusters
Hierarchical Clustering and Non-Hierarchical Clustering
Response Analysis
Discriminate Analysis Hold-Out
Group
20%
Model Test
Conclusion and Recommendation
Corporate Analysis Results
Category Analysis Results
Flowchart of the Approach

Cluster History
Number of
Clusters
Clusters Joined Freq New Cluster
RMS Std Dev
Semipartial
R-Square
R-Square Centroid
Distance
Tie
… … … … … … … ….
11 CL21 311 3 255876 0.0013 .955 2.09E6
10 CL15 112 15 223435 0.0018 .953 2.09E6
9 CL18 CL11 9 293813 0.0044 .949 2.25E6
8 CL14 314 5 281264 0.0020 .947 2.37E6
7 CL10 CL17 43 329098 0.0346 .912 2.84E6
6 304 315 2 376122 0.0018 .910 2.86E6
5 CL8 CL9 14 451590 0.0209 .889 3.85E6
4 CL13 CL7 76 455327 0.1236 .766 3.88E6
3 CL12 CL5 16 567698 0.0270 .739 5.93E6
2 CL3 CL6 18 679121 0.0365 .702 6.84E6
1 CL4 CL2 94 918977 0.7022 .000 1.05E7
Corporate Analysis
Step #1 – Hierarchical Clustering
Conclusion: optimal number of clusters is between 3 and 6

3 clusters 4 clusters 5 clusters 6 clusters
Pseudo F Statistic 256.19 245.65 246.97 260.81
Approximate Expected Over-All R-
Squared
0.7364 0.77973 0.80166 0.8157
Cubic Clustering Criterion 5.517 6.505 7.813 16.200
Corporate Analysis (Cont.)
Step #2 – Non-Hierarchical Clustering
Conclusion: based on the results of both Hierarchical and Non
Hierarchical clustering 6-cluster solution is determined to
be optimal

Corporate Analysis – Clustering Results
Cluster Summary
Cluster Freq RMS Std
Deviation
Max Distance
from Seed to
Observation
Radius
Exceeded
Nearest
Cluster
Distance Between
Cluster Centroids
1 33 201245 2233427 6 2948467
2 1 . 0 3 4765417
3 6 336424 2455353 4 4286141
4 9 293813 2207687 3 4286141
5 16 274583 3058122 6 3192018
6 29 213948 2063995 1 2948467
35.1%
1.1%
6.4%
9.6%
17.0%
30.9%
21.5%
2.4%
12.1%
14.7%
21.2%
28.2%
Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6
% of stores vs. % of sales
% of stores % of sales

Corporate Analysis - Discriminant Analysis
Confidence Level: 90%
Univariate Test Statistics
F Statistics, Num DF=5, Den DF=79
Variable Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
EDUC 0.1129 0.1102 0.0394 0.1029 0.1147 1.81 0.1200
NOCAR 0.1316 0.1287 0.0453 0.1000 0.1111 1.76 0.1318
INCSIGMA 2323 2264 824.9388 0.1064 0.1190 1.88 0.1070
HSIZE1 0.0829 0.0809 0.0292 0.1045 0.1167 1.84 0.1138
SINHOUSE 0.2173 0.2103 0.0817 0.1194 0.1355 2.14 0.0690
HVAL200 0.1853 0.1758 0.0792 0.1541 0.1822 2.88 0.0194
SINGLE 0.0703 0.0665 0.0306 0.1593 0.1895 2.99 0.0158
NWRKCH17 0.0199 0.0194 0.006933 0.1024 0.1141 1.80 0.1218
TELEPHN 0.0309 0.0293 0.0134 0.1581 0.1879 2.97 0.0166
SHPINDX 0.2482 0.2405 0.0924 0.1168 0.1323 2.09 0.0753
* 17 statistically significant variables in total

Corporate Analysis - Discriminant Analysis (Cont.)
Canonical
Correlation
Adjusted
Canonical
Correlation
Approximate
Standard
Error
Squared
Canonical
Correlation
1 0.847077 0.761387 0.030819 0.717540
Multivariate Statistics and F Approximations
S=5 M=15 N=21
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.02426163 1.39 180 223.58 0.0103
Pillai's Trace 2.50666011 1.34 180 240 0.0172
Hotelling-Lawley
Trace
6.07753961 1.44 180 164.86 0.0093
Roy's Greatest
Root
2.54031820 3.39 36 48 <.0001
Means of the
independent
variables are
statistically
different among
segments
Only 2.4% of the
variance in the
discriminant
scores is not
explained by the
differences among
groups of the
stores Ratio between-group SS to
the total SS => Good set of
descriptors

Error Count Estimates for CLUSTER
1 3 4 5 6 Total
Rate 0.1429 0.0000 0.0000 0.3333 0.3333 0.1619
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.8333
1 2 3 4 5 6 Total
Rate 0.1818 0.0000 0.0000 0.1667 0.3571 0.1923 0.1497
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Corporate Analysis – Classification Results
Original
Dataset
Hold-out
Sample
~ 85% of the stores are classified correctly

Category Analysis: Beer and Wine
Cluster History
Number of
Clusters
Clusters Joined Freq New
Cluster
RMS Std
Dev
Semipartial
R-Square
R-Square Centroid
Distance
Tie
9 CL16 309 8 72804.2 0.0031 .906 197203
8 CL23 CL13 10 93539.6 0.0091 .897 200748
7 CL10 CL31 11 95378.9 0.0085 .888 239550
6 CL7 CL8 21 145510 0.0459 .842 311263
5 CL87 CL11 61 112380 0.0639 .778 318702
4 CL5 CL6 82 170030 0.2099 .568 385452
3 CL4 CL15 85 185394 0.0973 .471 609748
2 CL3 CL9 93 226017 0.3212 .150 696877
1 CL2 304 94 243807 0.1499 .000 1.29E6
Step #1 – Hierarchical Clustering
Conclusion: optimal number of clusters is between 4 and 6

Category Analysis: Beer and Wine (Cont.)
Step #2 – Non-Hierarchical Clustering
4 clusters 5 clusters 6 clusters
Pseudo F Statistic 87.53 116.85 131.08
Approximate Expected Over-All R-Squared 0.7692 0.81988 0.85358
Cubic Clustering Criterion -1.336 1.458 2.489
Conclusion: based on the results of both Hierarchical and Non
Hierarchical clustering 6-cluster solution is determined
to be optimal

Category Analysis: Beer and Wine (Cont.)
Cluster Summary
Cluster Frequency RMS Std
Deviation
Maximum
Distance
from Seed
to
Observation
Radius
Exceeded
Nearest
Cluster
Distance
Between
Cluster
Centroids
1 35 83267.8 194999 2 268532
2 32 78629.9 206948 1 268532
3 8 131663 250170 2 374603
4 9 82174.1 159203 2 333646
5 9 80329.2 180104 4 377389
6 1 . 0 3 924906
Cluster Means
Cluster BEER WINE
1 144128.421 101864.577
2 326776.212 298713.241
3 493651.738 634093.243
4 649465.774 213912.842
5 955669.947 434505.459
6 383045.800 1552362.060
Cluster #5 is the top seller
of Beer
Cluster #6 is the Top seller
of Wine
Cluster #1 has the lowest
sales of both Beer & Wine
One store in Cluster 6
outlier

Discriminant Analysis: Beer and Wine
Confidence level: 95%
Univariate Test Statistics
F Statistics, Num DF=5, Den DF=79
Variable Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
AGE9 0.0272 0.0261 0.0109 0.1347 0.1557 2.46 0.0400
EDUC 0.1129 0.1051 0.0528 0.1843 0.2259 3.57 0.0058
INCOME 0.2921 0.2793 0.1192 0.1405 0.1635 2.58 0.0324
INCSIGMA 2323 2191 1021 0.1630 0.1948 3.08 0.0137
HSIZEAVG 0.2686 0.2480 0.1303 0.1985 0.2477 3.91 0.0032
HSIZE2 0.0322 0.0298 0.0154 0.1942 0.2410 3.81 0.0038
HSIZE567 0.0325 0.0277 0.0200 0.3176 0.4655 7.35 <.0001
HH3PLUS 0.0844 0.0796 0.0371 0.1628 0.1944 3.07 0.0138
HH4PLUS 0.0650 0.0606 0.0303 0.1833 0.2244 3.55 0.0061
DENSITY 0.001250 0.001192 0.000518 0.1447 0.1692 2.67 0.0277
HVAL150 0.2460 0.2260 0.1217 0.2064 0.2601 4.11 0.0023
HVAL200 0.1853 0.1664 0.0992 0.2417 0.3188 5.04 0.0005
HVALMEAN 47.3071 42.9341 24.4560 0.2254 0.2909 4.60 0.0010
SINGLE 0.0703 0.0664 0.0308 0.1616 0.1927 3.04 0.0145
UNEMP 0.0239 0.0226 0.0103 0.1576 0.1871 2.96 0.0169
WRKWNCH 0.0446 0.0424 0.0187 0.1483 0.1742 2.75 0.0241
TELEPHN 0.0309 0.0287 0.0148 0.1929 0.2389 3.78 0.0041
POVERTY 0.0457 0.0441 0.0175 0.1238 0.1413 2.23 0.0590
Statistically
significant
variables in
discriminating
observations
among groups

Discriminant Analysis: Beer and Wine (Cont.)
Canonical
Correlation
Adjusted
Canonical
Correlation
Approximate
Standard
Error
Squared
Canonical
Correlation
1 0.846814 0.751237 0.030868 0.717094
Multivariate Statistics and F Approximations
S=5 M=15 N=21
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.01346418 1.72 180 223.58 <.0001
Pillai's Trace 2.81504177 1.72 180 240 <.0001
Hotelling-Lawley
Trace
7.26639429 1.72 180 164.86 0.0002
Roy's Greatest
Root
2.53474655 3.38 36 48 <.0001
Means of the
independent
variables are
statistically
different among
segments
Only 1.3% of the
variance in the
discriminant
scores is not
explained by the
differences among
groups of the
stores
Good set of descriptors

Beer & Wine Category Analysis –
Classification Results
Original
Dataset
1 2 3 4 5 6 Total
Rate 0.5714 0.6207 0.7143 0.6000 0.8750 1.0000 0.7302
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Hold-out
Sample
1 2 3 4 5 Total
Rate 0.1667 0.3333 0.5000 0.5000 0.5000 0.4000
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.8333

Recommendations
Corporate Level:
• Resource allocation among the stores: perform additional analysis of the stores
in underperforming segments (1 & 6)
• Evaluation of the potential locations for a new store: deploy discriminant
function to predict performance of the stores in different product categories
based on the demographic profiles of their locations
Category Level (Beer & Wine):
• Marketing strategy for a new brand of Beer or Wine: adjust targeting strategy
for a product based on the demographic profile of the location it will be sold
• Choice of the stores to test market a new product: recommend to perform a
market test for Beer in stores of segments 4 & 5 and Wine in segments 3 &6

Limitations of the Analysis
Additional data
• Product-level data: assessment of specific product sales in new stores & prediction
of a new product performance that is being considered to be launched
• Customer-specific data: ability to build better predictive models tied to the customer
demographics (scanner data from the loyalty program members’ transactions)
Higher quality analysis at a more granular
level

Database Marketing - Dominick's stores in Chicago distric

Recommended

Recommended

More Related Content

Similar to Database Marketing - Dominick's stores in Chicago distric

Similar to Database Marketing - Dominick's stores in Chicago distric (20)

Recently uploaded

Recently uploaded (20)

Database Marketing - Dominick's stores in Chicago distric