Regression Analysis of Auto Sales
Statistical Analysis of Auto Sales
Eleven attributes all pertaining to the sales of automobiles sold from the first quarter of 1980 to the fourth quarter of 2004 form the data set and the basis of this analysis and the creation of a series of multiple regression models. The objective of the creation of a multiple regression equation is to predict sales of automobiles. Stepwise multiple regression was used, yielding the following models as shown in Table 1. Using stepwise regression the variable SPSS Version 13 for Windows' stepwise regression technique yielded Personal Income (pi) as the independent variable that most influenced sales of new autos (unitsales) and as a result the first model created includes only this variable. Finance Rate (finrate) was seen as the next most explanatory independent variable with the most influence on unitsales. The inclusion of finrate and pi leads to the second model. Adding in the index of the cost of car ownership (costcarown) produces a third model, and including consumer's overall sentiment comprises the fourth model. To see the combined effects of all other variables, finrate is taken from the fifth model. The sixth model includes the influence labor strikes have on the sales.
Table 1: Defined Statistical Models
Model
Variables Entered
Variables Removed
Method
Pi
Stepwise (Criteria: Probability-of-F-to-enter =.100).
Finrate
Stepwise (Criteria: Probability-of-F-to-enter =.100).
Costcarown
Stepwise (Criteria: Probability-of-F-to-enter =.100).
Sentiment
Stepwise (Criteria: Probability-of-F-to-enter =.100).
Finrate
Stepwise (Criteria: Probability-of-F-to-enter =.100).
Strike
Stepwise (Criteria: Probability-of-F-to-enter =.100).
Dependent Variable: unitsales
The strengths and weaknesses of each of these models are best defined by the level of correlation between each of the variables that comprise the regression equations. Table 2, Statistical Model Summaries, shows the specific R, R2 and Adjusted R2 values for each models' iteration.
Table 2: Statistical Model Summaries
Model RR Square Adjusted R. Square Std. Error of the Estimate Change Statistics Durbin-Watson R. Square Change F. Change df1 df2 Sig. F Change 1.624(a).389.378.9443.389 36.907 1-58.000 2.759(b).576.561.7932.187 25.198 1-57.000 3.824-.680.663.6956.104 18.109 1-56.000 4.876(d).768.751.5974.088 20.922 1-55.000 5.874(e).764.751.5975 -.004 1.018 1-55.317 6.893(f).797.782.5591.033 8.956 1-55.004 2.107 a Predictors: (Constant), pi
Predictors: (Constant), pi, finrate
Predictors: (Constant), pi, finrate, costcarown
Predictors: (Constant), pi, finrate, costcarown, sentiment
Predictors: (Constant), pi, costcarown, sentiment
Predictors: (Constant), pi, costcarown, sentiment, strike
Dependent Variable: unitsales
Notice the strength of each model's predictability increases with every successive inclusion of an independent variable, which translates into the successively higher R2 values as each model is computed. The exclusion of the variable finrate in model five makes little difference statistically, while the inclusion of this variable in addition to the variable strike lead to the highest levels of variability explained of all models, yielding an R2 of.782. This translates into 78% of the variance in auto sales during the sample period being explained by the variables included in these models.
The strengths and weaknesses of this model are clear: the greater the correlation as discovered by stepwise regression and introduced first into the analysis, the less significant the reduction in independent variables over time. Choosing the right independent variable to begin a stepwise regression is critical to the building of additional models and this is clearly seen the Table 2. Stepwise regression constraints defined for the model immediately lead to the variable personal income (pi) as being the foundation for the creation of multiple prediction models.
The weaknesses of this modeling approach include a lack of clarity on correlation between variables explored in greater depth using Pearson's Correlation Coefficient and 1-tailed Significance Tests. For the best results from regression analysis, a correlation matrix needs to be run first to ensure those variables that have the highest levels of collinearity are excluded and those with the highest R2 values that define the dependent variable of unitsales variability over time are included. The table, Appendix a: Correlation Matrix of all variables provides a correlation table for all 11 variables analyzed with both Pearson's Correlation Coefficient and 1-tailed Significance Tests. The stepwise regression analysis determined that personal income (pi), index of car ownership (costcarown), index of consumer sentiment (sentiment) and the likelihood of a strike (strike) when taken together explain 78% of the variation in unitsales. Looking now to the elasticities of each of these variables, the following table emerges and is shown in Table 3, Predicting Elasticities of Variables.
Table 3: Predicting Elasticities of Variables
From the analysis completed in Table 3, the elasticity of each variable can be easily seen. As one would expect, the greater the variability in a given variable the higher the elasticity, especially when the variables either measure purchasing power as pi does directly or how the variables stock, and index of consumer sentiment also are shown as a result of their large variances. Taking a step back from the statistical analysis and thinking logically about this, elasticity would be defined by the level of car stocks or inventories on hand, customer attitudes and behaviors and the amount of money they had to spend. These three variables delivered a 75% R2 correlation coefficient. Elasticity is a function of price and demand so these series of relationships make sense.
Forecasting
The first step in completing a forecast is to define the confidence intervals. Both 90% and 95% confidence intervals are typically chosen, for purposes of this exercise the latter figure, 95% has been selected. A 95% confidence interval ensures statistically significant results and a higher level of reliability when applying the results. Presented below are the results of a one-tailed Z-Test and t-test for confidence intervals based on the dependent variable unitsales.
Confidence Interval for Mean using Z. from Infinite Population or with Replacement
Lower limit =
Upper Limit =
Margin for Error (Half Width) =
Sample Mean =
Standard Deviation =
Confidence Interval =
Sample Size =
Confidence Interval for Mean using t from Infinite Population or with Replacement
Lower limit =
Upper Limit =
Margin for Error (Half Width) =
Sample Mean =
Sample Standard Deviation =
Confidence Interval =
Sample Size =
The fact that both of these tests show minimal differences between Upper and Lower Limits indicates the data has slight variability, as is shown by the standard deviation being 1.1976. This shows little variability across the sample data.
Forecasting unit sales of automobiles can be accomplished using a wide variety of techniques. Using Winter's Model for exponential smoothing as interpreted by the KADD Microsoft Excel add-on statistical tool generates the following forecast for the variable unitsales.
UnitSales forecast for the next four quarters using the Winters Method of Exponent Smoothing from KADD using a smoothing constraint of.2 for alpha,.2 for beta,.2 for gamma, for four periods delivers the following forecast:
UnitSales (Projected for 2005)
Q1, 2005 11,453,000
Q2, 2005 11,290,200
Q3, 2005 11,203,200
Q4, 2005 10,772,300
The following graphic illustrates the forecast:
Here are the specifics of the Winters Method of exponential smoothing as well:
MAPE = 10.10%
Forecasting Values
Seasonal Indexes for next period
Smoothed = 11.15 2-1.0103 Trend = 0.01 3-1.0016 Seasonal = 1.0258 4-0.9622
Smoothing Constants
Alpha =
Beta =
Gamma =
Periods =
In summary, the use of correlation to define the strength of association between variables is a useful proof point before completing a stepwise regression analysis to ensure those variables selected have the highest levels of reliability in defining differences in the dependent variable. Looking for those variables that define variability first is critical, hence the completion of a thorough correlation analysis. Following this, a stepwise regression was completed to show the combined effects of the most correlated variables, which illustrated the strong effects of personal income (pi), cars in stock (stock) and the implications of excess inventory driving down price, and consumer sentiment (sentiment) impacting elasticity of unitsales. Forecasting unitsales using exponential smoothing under the Winters Method proved the most reliable in defining the next four quarters of unitsales given the sample data.
You’re 88% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.