|
COMPUTERS and AUTOMATION for October, 1960
The Appetite For Instant News
Election Predictions
Stephen E. Wright Applied Data Research Princeton, N. J.
(Based on a report given at the meeting of the Association for Computing Machinery, Milwaukee, Wisc., August 26,
1960)
As a new presidential campaign begins to gather steam, the country will soon be caught up in a rising wave
of excitement. The horses are making the final turn, the stretch run will soon begin. As in recent years, the culmination
of the race on Election Eve will be watched by that new phenomenon, the electronic tout. All 3 major networks will
employ computers in a desperate race to report the news before it happens.
Predicting elections on computers must rank high in the list of useless activities, Professor Jackson's Bridge-Playing
Program notwithstanding. Outside the gambling profession, there is no possible benefit in making a good guess 6
or 8 hours before the vote is known with certainty. Besides, why should people in their right mind stick their
necks out before millions of kibitzers when events may prove them wrong before the night is out?
The answer lies in our appetite for instant news, magnified by the power of the broadcasting industry. A public
that waited in agony for news about Princess Margaret's wedding gown cannot be allowed to learn the identity of
our next President from the morning papers.
And so it was that in November, 1952, UNIVAC marched, or stumbled, onto the political stage. Its advent was made
propitious by the extreme caution that its predecessors, the pollsters, displayed that year. That election took
place, you will recall, during the final months of the administration of President Dewey.
Besides, Gallup and Roper are only human, whereas the electronic computer is endowed with big magic. The admiration
of the public for the giant brain is mixed with the sly hope that it will fall on its face and thus confirm the
ultimate dominance of man over machine. I must also mention the undeniable fact that statistics somehow smacks
of un-Americanism. Obviously, the ability to predict the future action of people is a negation of the democratic
process, leading to thought control and socialism.
Needless to say, these factors increased the news value of the computer's performance to the television networks.
This interest of the networks was matched by that of Remington Rand who undertook the election project in 1952
in the hope of publicizing their then new prodigy. In this they succeeded brilliantly; overnight "IBM's UNIVAC"
became a household word. It is unfortunate that the household market was not ready for this product.
Now let us dismiss for a moment the frivolous nature of this activity and see how such predictions are actually
made.
Our historical model is the old-time wardheeler (nowadays even poll-watchers are statesmen), who watched the votes
come in from the doubtful precincts, those with real live voters. He might say "We got 200 more votes in Ward
16 than this time last year. With that kind of lead we usually pile up a majority of 2,000 in the district. They
have a lead of 3,000 in the suburbs, but our vote always comes in late there. I better go pay my respects to the
new Mayor. "
In national elections, the volume of data is too great and the vote comes in too randomly to make evaluations at
the ward and precinct level. But some basic ideas behind this analysis are used; the extrapolation of the current
vote in time, the comparison with past results, and the extrapolation in space. And of course, the computer can
make a quantitative analysis, both of past data and current returns.
It thrives on digesting large chunks of data and emitting simple minded summaries. It does so quickly and accurately.
Whether it also does so correctly depends on the statistical model used, the skill of the programmers, and, I must
confess, luck.
Probability being what it is, a perfectly sound prediction may be wrong, and in making a flat prediction between
two candidates, to be wrong is to be 100% wrong, to the delight of newspaper editors, political pundits, and broken
down horseplayers.
In the previous elections, we were lucky; but, with the exception of the contests for the 1954 Congress, the races
were not too close. (I find myself developing a callousness toward politics, to the point where I hope for a landslide
victory, regardless of who wins. And I have become a fervent advocate of the two party system; I shudder at the
thought of predicting 3 way races!)
The statistical model we used in predicting elections on UNIVAC was developed by Dr. Max Woodbury of New York Univ.
Since Dr. Woodbury is currently on a good will tour of Europe and can't fight back, I must warn you that my knowledge
of statistics is not only negligible but strongly biased. As a defrocked electronic engineer, I have always suspected
that there is something fishy about probability theory. I had many battles over the years with Dr. Woodbury in
programming his statistical models, battles that I fought for common sense versus statistics; it is galling to
admit that he was always right, so far.
The kind of model we choose is restricted by the nature of the available data. It is desirable to get final votes
from a number of properly chosen key precincts or districts, politically stable and uniformly distributed among
the major regions of the nation. As this kind of data is available only after the last viewer has gone to bed,
we must make do with anything that we can get, as early as possible. This means that we must rely mainly on the
two major wire services, supplemented by special phone and teletype reports.
The wire services report national races primarily in the form of totals for each candidate in a given state, and
the number of precincts reporting.
A typical return from the last presidential election is: Vermont 243 precincts out of 720, Eisenhower 5120, Stevenson
30. In states with large metropolitan areas, the vote may be broken down by city and upstate.
Information about finer subdivisions is theoretically available, but usually excluded because there are limitations
in feeding huge volume of data, all manually transcribed, to the computer; difficulties in analyzing past data;
etc.
Hence the state is our basic unit in handling predictions in Presidential and Senatorial Elections.
Within each state, we have two possibilitiee
" If some reports from this state are available, we ask, knowing the 2 party vote at this time, what will
be the distribution of the final vote.
" If no reports from this state are available, we ask, knowing what the trend is in reporting states, and
knowing how this state voted in the past, what will be the distribution of the final vote.
That is, within each reporting state, we (1) extrapolate the current votes of each party to the final vote; (2)
compute the predicted Democratic percent and compare it with past averages in this state; (3) compute a difference;
(4) summarize these differences to produce a national trend; (5) then apply this trend to non reporting states
to predict their Democratic percents. This description of course is a greatly simplified version of the model actual
used. In practice, these curves have a scatter that introduces an uncertainty, and this leads us to estimate a
degree of precision of the extrapolation; this precision is a function that increases as more precincts are counted.
A state prediction is not all black and white. We have some reporting votes but we also consider the trend in other
states. Many states are broken down into city and upstate vote, and we combine the predictions.
We thoroughly analyze past data. We have information back to 1944 for states and back to 1952 for some metropolitan
areas. We make use of this information for assigning weights. Other objective factors besides past history could
be taken into account, such as incumbency. Some subjective factors have been suggested: stands of candidates on
the farm issue, labor, local issues, .... I but no quantitative measure exists. Time limitations have prevented
the inclusion of these factors.
Gathering past data is an enormous job, but there are some references: Scammon's "America Votes", the
Clerk of the House, etc. Data analysis must be repeated in each election, since the preceding elections furnish
the most reliably correlated data. Analysis of past data is backbreaking work even with computers. A mathematician
like Dr. Woodbury is never satisfied with past programs, but always seeking "minor changes" to "improve"
the program. Until you've programmed computers, you'll never know how major a minor change can be!
The program for the computer deals with control over the correctness of the data, especially mistakes in the teletype
and telephone returns mistakes in votes, in races, in areas. If there is an extra digit in the vote count, either
the vote count is corrected or it is excluded; perhaps the parties have been reversed; the wrong area may have
been reported; or the figures may have been invented.
We have to rely on human beings to transcribe the information into a form that the computer can accept. Therefore
we have the information typed 3 times, and the computer accepts the information on a best 2 out of 3 basis. It
is hoped that by 1962 computers capable of direct input will be available.
The computer checks that the total vote reported in the area this time is not less than the total vote reported
last time. It also checks:
that the number of precincts reporting is greater than last time;
that the total number of precincts reporting is less than the total number of precincts in the area;
that the total vote is reasonable;
that the total Democratic percent is reasonable.
This year not only the major networks but three of the major computer manufacturers are caught up in the prediction
rat race. In fact, election prediction is now a required part of the news coverage. So far have we come in eight
years!
|