Locating New Pam and Susan‘s Stores
Essay by Andrew Conteh • April 8, 2016 • Research Paper • 1,850 Words (8 Pages) • 2,209 Views
Case Study: Locating New Pam and Susan‘s Stores
Professor Demetra Paparounas
Lisa Chan
MGSC 6200- Information Analysis
July 3, 2014
Introduction
The purpose of this study to is to determine a new store location for Pam and Susan Stores. This discount department store chain has 250 stores that are primarily in the South. Expansion is important to their strategic success. A multiple regression model will be used to determine which location has the highest sales potential and projections. It will also be used to help see how strong of a relationship sales has to the other independent variables.
Data
For this model, the wealth of census data that was used to compute this model contained 250 observations, 33 variables and 7 additional dummy variables were created from the main comtype variable, taking values of zero or one depending on level of competitiveness for a particular store. This data set contained economic and demographical data, population type, sales numbers, store size and the competitive types. The amount of sales and selling square feet variables are given in thousands of dollars.
Results and Discussions
In analyzing the data on the 250 Pam and Susan’s stores, we first created a scatter plot of the competitive types in the horizontal axis against sales (in thousands) on the vertical axis. The competitive types were identified as follows:
- Type 1- Densely populated area with relatively little direct competition.
- Type 2 –High income areas with little competition
- Type 3-Locations near major shopping centers
- Type 4-Stores in downtown areas of suburbs
- Type 5-Stores with competition from discounters, but not from department stores
- Type 6-Stores in shopping centers
- Type 7-Store located along the side of the roads
[pic 1]
In looking at the above scatter plot, you can see that the comtypes 1, 2 have higher sales in the stores, but the middle categories consisting of comtypes 3-6 are in similar ranges or values of sales. Category seven has the lowest sales value from the scatterplot. As we will see later when building a multiple regression model, the dummy variables created for comptype categories 1, 2 and 7 are likely to be statistically significant, whilst comptye categories 3-6 are likely to be statistically insignificant. Also, it is clearly visible from the plot, that as the competitiveness in a particular trading zone increase, the sales level decreases. As a result, comptype 7 has the lowest sales
The correlation coefficients between sales and all of the other variables except store and comtype (since comtype is a categorical data variable) were calculated. The table below shows the ten quantitative variables that have the highest positive correlation with sales from highest to lowest correlation values.
%nocars | %owners | #inc10-14 | %dryers | %freezer | %inc0-10 | %inc10-14 | population | %washers | %dishwasher |
0.70 | 0.68 | 0.61 | 0.65 | 0.63 | 0.61 | 0.61 | 0.59 | 0.56 | 0.49 |
A multiple regression analysis was done to see if there is a relationship between the dependent variable sales, and 13 independent x variables. The 13 independent X variables include the ten variables that have the highest correlation with sales and three dummy variables, namely comtype 1, 2 and 7. The regression output obtained is summarized in the table below:
[pic 2]
[pic 3]
From the regression output, the R-Squared value obtained is 0.75. This means that 75% of the variation in sales is explained by the variation in the independent variables. Using an alpha level of 0.05, the least significant variable, which is %owners with a p-value of 0.80 is dropped from the multiple regression analysis. This technique is repeated until all of the variables obtained in the regression output are significant. In my case, this technique is repeated six times until a final model called model 1 is obtained with output variables that are all significant. In order words, the least significant variable is dropped from the model until the p-values of the regression output variables are less than the alpha level of 0.05 (significant). A final model called model 8 is obtained, and the out of this model is summarized below:
[pic 4]
[pic 5]
The final model has an R squared of 0.74, meaning that 74% of the variation in sales is explained by the variations in the independent variables. All of the X variables are significant (meaning that there p-values are less than the alpha level of 0.05).
Regression Equation and Interpretation of Coefficients
The final regression model obtained is:
sales= 12,639.58+ 125.08 %spanishp -31.83 %dryers -76.79 %freezer +0.0013 population +8585.27comtype1 + 3912.64 comtype 2- 3106.27 comtype 7
This equation can be used to predict future values of sales in a specific store.
The y-intercept is 12,639.58 from the final model. This is the value of sales when the value of all the independent variables in the model is zero. For every unit change in the Comtype 1, there is a corresponding change of $8,585.27 in sales provided that all other variables remains the same. For every unit change in Comtype 2 there is a corresponding change of $3,912.64 in sales. For every unit change in Comtype 7, there is a corresponding decrease of $3106.27 in sales., provided everything else remains the same. For every unit change in %spanishsp, there is a corresponding change of $125.08 in sales, provided everything else remains the same. For every unit change in %dryers , there is a corresponding decrease of $76.79 in sales. provided everything else stays constant. For every unit change in %freezer, there is a 76.79 decrease in sales, and for every change in population there was a 0.0013 change in sales, provided everything else stays constant
...
...