Note to the Media, July 31, 2004
My work on the presidential vote equation is fairly technical. The least technical discussion is in Chapters 1, 3, and 4 of my book, Predicting Presidential Elections and Other Things. This book is an attempt to explain regression analysis, which is the method used in my voting work, to a non technical audience. Regression analysis is a statistical tool that can be used to examine possible regularities in historical data. It may help us both understand the past and predict the future. The method has many uses. The method is used in the book to examine voting behavior as well as sex, wine, grades, running, and some macroeconomics. The method does, however, have limitations. These are discussed in lesson 5 on page 31 under the name "pitfalls." Pages 51 and 52 discuss possible pitfalls regarding the vote equation.

Given the current prediction of the equation that Bush will get over 57 percent of the two-party vote, does this mean that a Bush victory is a sure thing? The answer is no. First, the prediction is based on a particular set of economic forecasts (the current forecasts from my economic model), and if the economy does not do as well as this set says, the vote prediction for Bush will go down. Second, even if the equation is correctly specified, it makes on average an error of about 2.4 percentage points each election (called the "standard error"). Third, the equation may be misspecified. This is where the pitfalls come in.

Let me focus a bit on some possible pitfalls. Regression analysis assumes in the present context that the structure of voting behavior in the future will be like it has been in the past---as it has been estimated using the historical data back to 1916. One can never rule out a sudden shift of structure that makes this assumption wrong. For example, the Bush administration has made many large changes in foreign policy, social policy, and environmental policy, and it may be that these changes are so large that voters radically change their voting behavior. Perhaps voters now look much more at foreign, social, and environmental policies than they did in the past and less at how the economy is doing.

Another possible pitfall is that the equation is misspecified because it does not have a job growth variable in it, only an output growth variable. Historically output growth and job growth are so highly correlated that very similar estimates are obtained using either. They are too highly correlated for one to be able to estimate separate effects. If in 2004 output growth is fairly good, but job growth is not, this would lead the equation to be off if job growth is in fact more important in voters' minds than output growth.

The equation does not have any economic variables in it that measure income distribution, and this is another possible reason for misspecification. Under the Bush administration the after-tax income distribution has become more unequal, and if voters' attitudes about the incumbent are negatively affected by an increase in inequality, this would lead to the equation overpredicting votes for the incumbent because income distribution is not taken into account by the equation. Senator Kerry is stressing in his campaign the "middle class squeeze," which is an attempt to bring income-distribution issues into the debate.

Another possible pitfall is the general problem of data mining, which is discussed in the book. It may be that the equation looks much better than it should because I have tried so many versions in arriving at the current version.

If you experiment on the site with alternative vote predictions, you will see that no realistic economic values can bring the predicted vote share to even about 53 percent. (Remember that there is only one quarter, 2004:3, for which actual economic data are not available.) This means, given the standard error of 2.4, that if the equation is correctly specified, the probability that Bush loses is very small. The bottom line is that the equation has to be misspecified in order for Bush to lose. And this is where the pitfalls come in. Regression analysis can only take us so far; possible pitfalls are always lurking.

Finally, a point about me as a social scientist trying to explain behavior versus me as a citizen. As a social scientist I am trying to do the best I can explaining the percentage share of the two-party vote. In this capacity I don't care who wins or loses, but how close the equation comes to explaining the actual share. If the actual share is 51 percent and the equation predicts 49 percent, this is better than if the actual share is 57 percent and the equation predicts 53 percent. Also, I shouldn't let my political views affect my scientific work. This is an exercise in trying to explain behavior, not shape it. As a citizen, however, I obviously care who wins or loses.

Some reporters have asked me if I would be willing to tell them whom I personally support. Although I have nothing against revealing my preferences, it may be that I can't credibly reveal them. Say that some people believe that my results are more credible if I am a Democrat. (If I am a Republican, some people may think that I have biased the prediction in favor of the Republicans.) This alone is not a problem. But say also that the more credible people find my results the more this helps the Republicans. For example, Democrats may become dispirited and turn out to vote less if they find the results credible.

So if the above is the case, what should I tell the reporter? If I am a Democrat and want to behave strategically to help the Democrats, maybe I should say that I am a Republican. If people believe I am a Republican, they will tend not to believe my prediction, which is good for the Democrats. The reporter, however, who is no dope, knows that I might behave strategically, and so you might think that he or she would conclude that I am a Democrat if I say I am a Republican. The reporter, however, being truly no dope, knows that I know this. Also, I know that he or she knows this, he or she knows that I know that he or she knows this, and so on, and so there is no way for me to convey to the reporter my true preferences. Alas. (It may be, of course, that my prediction actually helps the Democrats if they become energized to come out and vote to prove me wrong!)