Si=1 to n [E({^yi} - mi)2 + s2{^yi}] = Si=1 to n(E{^yi} - mi)2 + Si=1 to ns2{^yi}where mi denotes the true mean response when the values of the Xk are those for the ith case. The total mean squared error is seen as composed of a squared bias component (E{^yi} - mi)2 and a variance component s2{^yi}.
Gp = (1/s2) [Si=1 to n(E{^Yi} - mi)2 + Si=1 to ns2{^Yi}]Note that s2 is unknown. Assuming that the model that includes all P-1 potential X variables is such that MSE(X1, ... , XP-1) is an unbiased estimator of s2, it can be shown that Gp can be estimated as
Cp = SSEp/MSE(X1, ... , XP-1) - (n-2p)where SSEp (with lowercase p) is the SSE for the subset model with p-1 X variables and MSE(X1, ... , XP-1) (with capital P) is the MSE for the model with all P-1 X variables. It can be shown that when there is no bias in the subset model with p-1 X variables then
E{Cp} ~= p (where ~= stands for "is approximately equal to")Thus when Cp values are plotted against p, unbiased models will fall near the line Cp = p.
AICp = n ln(SSEp) - n ln(n) + 2pSBC (Schwatz's Bayesian Criterion) is defined as
SBCp = n ln(SSEp) - n ln(n) + [ln(n)]pFor both criteria smaller values are better. Note that both criteria increase with SSE (poor model fit) and with p (number of independent variables). Thus both criteria penalize models with many independent variables.
SSEp = S(yi - ^yi)2where the sums are for i=1 to n. The difference is that, in PRESSp, yi is compared to its predicted value from a regression from which observation i was excluded.
PRESSp = S(yi - ^yi(i))2
SSEp = Sei2that is, PRESSp is the sum of the squared external residuals (or deleted residuals) di .)
PRESSp = Sdi2
MSRP = (Si=1 to n* (yi - ^yi)2)/n*where
yi is the response for ith case in validation sampleThe candidate model is validated to the extent that the values of MSRP and MSE for the training sample regression are close. (It is not entirely clear to me how to decide what "close" means; see ALSM5e p. 374. Furthermore ALSM5e decide to drop a variable from the model because its coefficient is negative, contrary to theoretical expectation; but this coefficient is non-significant so its sign should not matter. It is better to drop that variable on the ground that it is non significant.)
^yi is predicted value based on candidate model for ith case in validation sample
n* is number of cases in validation sample.