An experimental investigation of research tournaments.
Fullerton, Richard ; Linster, Bruce G. ; McKee, Michael 等
I. INTRODUCTION
Research tournaments have played an important role in the economic
growth of nations since the earliest stages of the Industrial
Revolution. For example, the golden age of steam locomotion was spawned
by a research tournament sponsored by the Liverpool and Manchester
Railway in 1829.(1) More recently, research tournaments have been used
to create a variety of products ranging from fuel-efficient
refrigerators (Langreth [1994]) and digital televisions (Economist
[1993]), to high-tech fighter aircraft for the military (Schwartz et al.
[1991]). Today, scientists and lawmakers are even considering the use of
a research contest to propel the development of the first manned space
mission to Mars.(2)
Despite the recurring popularity of research tournaments over the
last two hundred years, the first theoretical model for evaluating their
efficiency was not published until Taylor's [1995] seminal work.
Taylor's model provides a theoretical basis for evaluating the
effect the number of competitors and tournament duration have on the
amount of effort expended by contestants in a research tournament.
Taylor proved that, by limiting the number of competitors in a research
tournament and charging each competitor an entry fee, research
tournament sponsors can induce an efficient amount of innovative effort.
Fullerton and McAfee [1999] extended research tournament theory to
include competitions with heterogeneous contestants, showing for a large
class of contests the optimal number of competitors is two and sponsors
can induce the best qualified competitors to enter the tournament by
holding specialized all-pay entry auctions.
Although the economic intuition behind these research tournaments
is straightforward, the empirical calculations required to compute their
equilibrium strategies are very complex, and it is an empirical question
as to whether individuals are able to compute these strategies.
Therefore, to investigate the predictive power of research tournament
models, we conducted a series of laboratory experiments to test
Taylor's seminal research tournament theory by examining whether
subjects in a controlled economic laboratory setting can be induced to
expend the predicted amount of research and development (R&D) effort
in an essentially unregulated environment. Specifically, we investigated
whether the effort expended by experimental subjects approximates the
amount of effort predicted by the unique Nash equilibrium.
Despite the complexity of computing the Nash equilibrium research
strategy, we find the average behavior of subjects in our experiments is
remarkably close to the predictions of Taylor's model. The majority
of the experimental subjects do appear to adopt stopping-rule research
strategies, although they differ significantly in their individually
chosen stopping values. Despite the wide variation in individual
research effort, however, the overall level of research expended and the
average value of the winning innovation for the various treatments are
consistently within a few percentage points of the levels predicted by
the Nash equilibrium. As a consequence, the R&D tournaments achieve
very high levels of efficiency in the laboratory.
II. RESEARCH TOURNAMENT THEORY
For explanatory ease, we retain Taylor's original notation,
and readers may refer to his article for details of the model not
discussed here. By assumption, there are M risk-neutral competitors who
compete in a research tournament to win the prize offered by the
tournament's sponsor. The tournament lasts T periods, and each
period competitors have an opportunity to pay research cost C to obtain
a single independent draw, x, from the distribution of innovations,
F(x), on support [0, [Mathematical Expression Omitted]]. All competitors
start the tournament with worthless innovations; x = 0. Each new
innovation is drawn, with recall, from the distribution of innovations
allowing each competitor to retain the best draw across all T periods of
the tournament. At the end of T periods, competitors deliver their best
draw to the tournament sponsor, who evaluates each innovation and awards
the prize, P, to the competitor offering the best innovation.
Building on the results of search theory, Taylor proved the
equilibrium strategy of a competitor in this research tournament is to
draw a new innovation each period until drawing an innovation greater
than or equal to some cutoff value, z, then stop. According to Taylor,
the unique z-stop cutoff value for firms engaged in a research
tournament is implicitly defined by the following equation:
(1) [Mathematical Expression Omitted],
where [Phi] is the date-zero cumulative density function (CDF) for
the value of a firm's best innovation:
(2) [Mathematical Expression Omitted].
One can see from equations (1) and (2) that a competitor's
effort level (z) is an increasing function of the prize, P, and a
decreasing function of the cost of research draws, C. However, the
equilibrium z-stop is also a function of the number of other competitors
involved in the tournament, M-l, as well as the length (number of draws
permitted) of the tournament, T.
Given the complexity of formulating this equilibrium z-stop
strategy, it is an interesting empirical question as to whether economic
agents will adopt Taylor's predicted strategies. For example,
agents may instead employ simple "rule-of-thumb" strategies
like taking a predetermined number of draws each tournament.
Taylor's tournament was designed with the objective of maximizing
efficiency. The sponsor is concerned with the value of the winning
innovation, whereas the R&D firms will be concerned with the ratio
of total research expenditures to prize payments since this reflects a
contestants' expected payoff from entering the tournament. Further,
it may not be in the sponsor's long term interest to have R&D
firms going bankrupt. Therefore, the statistics that are central to our
investigation are the value of the winning innovation, the overall level
of effort expended on research, and the level of market efficiency.
III. EXPERIMENTAL DESIGN
To test Taylor's model, we designed a series of experiments to
determine whether subjects individually, or as a market, provide results
similar to those predicted by the theory. The experiments were conducted
at the University of New Mexico's computerized experimental
economics laboratory with subjects recruited from undergraduate social
science classes. Each subject was assigned a computer terminal, and the
laboratory is designed to limit the subject's view to her own
terminal. This helps to ensure that each subject's response is
independently determined. Computerization of the experiments allowed for
immediate feedback for the subjects, and this feedback should enhance
the subjects' understanding of the payoff function. After each
round, the subjects' computer screens displayed the results of the
round and how their payoffs were calculated. Specifically, the subjects
were told their maximum draw as well as that of their group. If hers was
the highest draw, the subject was informed that she had won the prize
and the round balance was calculated as the initial endowment minus the
cost of the draws plus the prize. Otherwise the round balance was the
initial endowment minus the cost of the research draws. At the end of
the session, the subjects' scores and payments were displayed on
their screens, and they were paid in cash.
As the experiment began, subjects received a set of written
instructions explaining that they would be participating in a market
where the task was to decide whether to pay for a draw of a random
number in an effort to win a prize. At the start of each round, subjects
were given an endowment of francs (the laboratory currency) sufficiently
large to ensure they could take a draw every period of the round without
exhausting their endowment. Each draw generated a value between 0 and
999, with each number equally likely. Subjects were told the maximum
number of draws in each round that could be taken, the cost of taking a
draw, and the number of competitors in their group. At the end of each
round, the player in each group with the highest draw was awarded the
specified prize. A subject's total payoff at the end of the
experiment was equal to the sum of the prizes won in each round plus all
unspent francs remaining from the endowment. The subjects did not know
how many rounds would be conducted during the session. Finally, they
were told that they would be assigned to a different group each round
and that at the end of the session their francs would be converted to
dollars at a stated exchange rate.
In the context of a research tournament, choosing to make a draw
corresponds to conducting research at a constant cost per unit.
Beforehand, the outcome of the research process is unknown, but the
distribution from which the research results will be drawn is common
knowledge in Taylor's model. Each draw corresponds to the realized
level of research for that period, and the group high draw is the level
of the winning innovation for that round. Again, a round is comprised of
several periods in which research can be conducted, but each round is a
separate, independent research tournament.
Treatment Parameters
Experimental sessions covering five treatments were conducted. The
treatment structure is shown in Table I. A session refers to having the
subjects in the laboratory, whereas a treatment refers to the specific
parameters that subjects face in a given session. In Table I, the number
in parentheses refers to the treatment, while the other values in the
cells refer to the particular parameters of the session. A total of 103
subjects participated in these experiments and no subject participated
in more than one session.
In addition to the number of competitors in each group and the
maximum number of draws, the subjects were given a set of other
parameters useful for refining their decisions. These treatment
parameters are reported in Table I, where P denotes the prize, in
francs, awarded to the competitor with the largest draw in each group
for each round. C denotes the cost per draw, E represents the endowment
of francs given subjects prior to each round, and R denotes the number
of R&D tournaments (rounds) conducted with each treatment.
TABLE I
Treatment Parameters
Number of
Competitors (M) Number of Draws Possible Per Round (T)
2 4 6
(1) (2)
P = 120 P = 120
C = 10 C = 10
E = 25 E = 65
R = 339 R = 75
3 (3)
P = 103
C = 10
E = 45
R = 500
5 (4) (5)
P = 120 P = 120
C = 10 C = 10
E = 25 E = 65
R = 195 R = 90
Each of the treatments shown above gives rise to theoretical
predictions about the expected value of the tournament's winning
innovation and the amount of research competitors will conduct. These
predictions are shown in Table II. The baseline scenario is treatment 3,
for which the predicted value of each group's winning innovation is
906. In this baseline treatment, M = 3 competitors conducted research at
a cost of 10 francs per draw, with the opportunity to take up to T = 4
draws per round. Two sessions were run using this baseline treatment
with a total of 30 subjects, each participating in 50 rounds. For
treatment 3, this provided us with a data set consisting of 500
tournaments, 1,500 individual tournament performances, and 6,000
opportunities for the competitors to conduct a draw. In treatments 2 and
4, we altered the prize, the number of competitors (T = 2 or 6) and the
number of periods (M = 2 or 5) in a manner that would generate virtually
identical expected winning draws along the southwest to northeast
diagonal of Table II. In treatments 1 and 5, we varied these parameters
in order to generate steadily increasing expected winning innovations as
one moves along the northwest to southeast diagonal of Table II. Thus,
the five different treatments enable us to check for both consistency
and trend in the model's theoretical predictions.
Another important concern is the total amount of money expended on
research relative to the prize value during a tournament. Therefore, in
Table II we have also listed the total amount of research dollars the
theory predicts will be expended by all M competitors per prize dollar.
For example, in treatment 3, Taylor's model predicts, the combined
research expenditures of all three competitors will sum to just over 78
cents for each dollar of prize money awarded. In contrast, in treatment
5 the research-to-prize ratio is nearly equal to one, suggesting even a
short run of [TABULAR DATA FOR TABLE II OMITTED] unlucky draws by a firm
could drive it out of business.
IV. EMPIRICAL RESULTS
In this section, we subject our data to various tests at the market
and individual level. We find that most subjects do employ stopping-rule
research strategies; however, their individually chosen stopping values
offer differ significantly from the symmetric Nash equilibrium stopping
value predicted by Taylor. Some individuals choose z-stop values well
below the predicted level, while others choose stopping values above the
predicted level. However, we find the aggregate behavior in each
tournament treatment is generally consistent with the predictions of
Taylor's theory.
Aggregate Tests
In Table II, the actual means we observed in the experimental data
are presented in italics, for ease of comparison with predicted values.
Despite variances in individual behavior, in every cross-comparison of
treatments the mean winning innovation and the mean research-to-prize
ratio moved in the direction predicted by Taylor's theoretical
model. Particularly notable are the data from treatments 2, 3, and 4.
The theory predicts virtually identical levels of winning innovations
and mean research-to-prize ratios that increase from treatment 2 to 3
but decrease from treatment 3 to 4 across the diagonal. This is
precisely what we observed. At the market level, the data are
qualitatively consistent with the predictions of Taylor's model.
In Figure 1, we have plotted the theoretical CDFs of the predicted
winning innovations in each group of competitors for treatments 2, 3,
and 4. We see that, as well as having virtually identical expectations
for their winning innovations, these three treatments also have CDFs
which, by visual inspection, are quite similar. Because the theoretical
distributions are so closely matched, one would expect the experimental
data from these treatments to also look very similar. To make this
comparison with our data, we conducted Wilcoxen-Mann-Whitney Rank tests
and Kolmogorov-Smirnov tests on each combination of treatment pairs to
check whether our experimental data also generate nearly identical
distribution functions. For the Wilcoxen-Mann-Whitney Rank test, we
reject the null hypotheses (at 0.10) that the distributions are
identical if our test statistic is greater than 1.29. For the
Kolmogorov-Smirnov test, we reject the null if our test statistic is
larger than 1.23. Our results are presented in Table III.
Note that the only pairs of treatments for which we do not reject
the null hypotheses, are comparisons of treatments 2, 3, and 4. Thus,
statistically, it appears that Taylor's model is internally
consistent. By this we are implying that relative to other treatments,
when the mean level of the winning innovation was predicted to rise as a
function of changing one of the parameters, our experimental data are
consistent with the prediction. Moreover, changes in the distribution of
winning innovations across experimental treatments are not only in the
proper direction, but they are also statistically significant.
In Figure 2 we offer a graphical representation of the evidence
presented in the previous statistics. We have plotted the actual CDFs of
our experimental observations for the winning innovations across all
five treatments. From this graph, it is quite obvious the
"diagonal" treatments 2, 3, and 4 all generated winning
innovation distributions which were very similar since these three CDFs
lie practically on top of each other in the graph. On the other hand,
treatment 1 generated significantly smaller winning innovations and
treatment 5 generated significantly larger winning innovations.
Taylor's theoretical predictions arise from the argument that
the competitors adopt the z-stop strategies constituting the symmetric
Nash equilibrium. Using our data, we can estimate the implicit stopping
rule that is generated by the subjects' observed behavior. For
example, if a subject uses a stopping strategy, the imputed z-stop may
be estimated from the expression: expected number of draws = [1
F[(z).sup.T]]/[1 -F(z)]. Therefore, by calculating the average number of
draws in our experimental sessions we can estimate the z-stop that would
generate the same number of experimental draws. These predicted and
imputed average z-stops (averaged over subjects) are reported in Table
IV.
As shown, the imputed z-stop strategies are reasonably close to the
predicted levels for all treatments except number 5.(3) As for our
parameter consistency check, the predicted z-stop values increase from
treatment 1 to 2 and from 1 to 3, as do the observed values. Treatment 5
does not satisfy the theoretically predicted comparative statics results. This treatment has both a large number of competitors as well
as a long tournament. As we shall see below, this combination of
conditions exhibits more violations of the theory than do settings in
which the tournament is short lived or in which there are fewer
competitors.
In addition to predicting a stopping rule, the theory also predicts
a level of draw activity. In Table V, we compare the actual number of
draws taken with the theoretical prediction. In all treatments but one,
the subjects made slightly fewer draws than the level predicted by
Taylor's theory. This result supports the conjecture that a large
tournament (many players invited) that is permitted to continue for
several periods may lead to excessive expenditures on R&D.
TABLE III
Nonparametric Test Results
Treatments W-M-W K-S
(1) vs (2) 6.05 2.85
(1) vs (3) 10.77 5.17
(1) vs (4) 8.27 4.07
(1) vs (5) 10.47 4.97
(2) vs (3) 0.21 0.51
(2) vs (4) 0.28 0.85
(2) vs (5) 4.85 2.36
(3) vs (4) 0.68 0.98
(3) vs (5) 6.75 3.01
(4) vs (5) 5.56 2.45
Individual Behavior Tests
To this point, we have shown that our experimental data is
generally consistent, in aggregate, with Taylor's predictions. To
test whether subjects are individually employing stopping rule
strategies, we counted the frequency that each subject exhibited the
following behavior. If a subject drew a value X, then drew again getting
a value Y(Y[less than]X), and then stopped when a further draw was
possible, this behavior was defined as inconsistent. A stopping-rule
strategy implies that, if an additional search was justified given X, it
would also be justified given Y. Such violations may be indicative of
the use of a simple rule-of-thumb decision strategy (for example, always
make two draws) rather than the use of a stopping rule.
This metric requires at least three draws be possible, since we
must observe the pattern described above and the subject must still be
able to take a draw. To test for such inconsistencies, we report the
frequency of such behavior in treatment 3, where we count the number of
violations of a stopping rule strategy and test whether the frequency of
such behavior is statistically significant. The latter involves more
than a simple count of the number of observed violations since we must
control for the frequency of opportunity for such violation. In Table VI
we report the actual violations in treatment 3 and whether the incidence
is statistically significant.(4)
The results in Table VI illustrate several points. First, the
absolute number of violations is small and is typically concentrated
among a few subjects. Second, it is important to correct for the
frequency of the opportunity to commit a violation and not simply count
the actual number of violations. Subjects with low absolute violation
counts may still have a significantly high rate of violation, since they
had few opportunities in which to commit a violation. For example,
subjects 6 and 13 had less than 5 rounds in which they committed
violations (out of 50 rounds) but their rate of violation was
significantly greater than zero because they had few such opportunities
during the session. In contrast, subject 15 had a large absolute number
of violations but the rate was not statistically significant, because of
the large number of rounds in which the subject could have violated a
stopping rule strategy. Of the 30 subjects participating in treatment 3,
only 8 committed statistically significant numbers of violations.
Considering the stringency of the test applied, this is a low rate of
inconsistent behavior and suggests widespread use of some stopping rule,
although the rule used appears to be below that predicted by the
theory.(5)
Finally, we examine a payoff measure for the subjects. The
individual agents are interested in maximizing their return, while the
sponsor wants to maximize the value of the winning innovation. The
research-to-prize ratio (R/P) addresses important concerns of both
parties since it contains both profit estimate as well as a measure of
research effort. What we observe is that the average R/P ratio in our
experimental data are quite close to the theoretical predictions. The
individual R/P ratios and the theoretical predictions are reported in
Table VII. As before, the treatment that stands out as most
significantly violating the theory is treatment 5 (M5T6). While the
remaining treatments show considerable variance in individual behavior,
the average R/P ratios are still close to the theoretical levels.
There was also a large variance across individual stopping
strategies as some competitors engaged in more aggressive research than
predicted by the model, while other competitors were more passive than
predicted. Of course, the complexity of determining the equilibrium
z-stop makes a variance in stopping strategies virtually inevitable,
moreover, if one competitor does engage in an excessive amount of
research by employing too large of a z-stop, the equilibrium strategy
for the other competitors is actually to reduce their z-stops.(6) Since
we randomly matched competitors in different groups for each new round,
we believe the competitors who employed the large z-stop strategies
simply overestimated the equilibrium stopping value as opposed to
implementing some sort of bullying behavior to deter competition.
The payoff data for treatment 3 provide further evidence there was
probably not much bullying behavior because choosing a larger than
predicted z-stop did not result in larger-than-average payoffs to the
aggressive competitors. While there was a substantial range in payoffs
from $11 to $19, there was no significant correlation between the total
number of draws and the payoff. Subjects employing excessively large
z-stops did not appear to benefit from their aggressive research
strategies. Because competitors were assigned to different groups for
each successive tournament, individuals were unable to bully other
players consistently into reducing their research effort.
TABLE IV
Predicted and Imputed z-Stops
Treatment (number) Predicted z-Stop Imputed z-Stop
M2T2 (1) 684 584
M2T6 (2) 783 746
M3T4 (3) 738 733
M5T2 (4) 746 734
M5T6 (5) 599 849
One element not accounted for so far is each subject's
"luck of the draw." Over the course of 50 rounds, our data for
treatment 3 generated a wide variance in the research "luck"
of individuals as measured by the average draw of each subject. Although
the average draw across all competitors was 499.57, we observed
substantial variation in the average draw across subjects. For example,
the data in Table VIII show that subject number 16 drew 176 times and
obtained an average draw of 547, while subject number 29 drew 103 times
and had an average draw of only 443. Clearly, the payoffs of individual
subjects were affected by their "luck of the draw" during the
experiment. Individual payoffs must be a function of both a
subject's research strategy and his or her luck, for even a subject
that makes very few draws could win many tournament prizes if he were
unusually lucky. Therefore, we felt it was important to test
Taylor's optimal Nash equilibrium strategy against the actual play
of our experimental subjects to determine whether luck was an
overwhelmingly important factor in tournament success. To directly test
the success of Taylor's predicted strategy against our experimental
subjects, we generated more than 21,000 Monte Carlo simulations of the
equilibrium strategy to compete against the high draws of every possible
combination of subjects.
If the Nash equilibrium strategy enjoyed only average success, the
Monte simulation should win 33% of the time. But, in fact, the Monte
Carlo simulation won the prize more than 40% of the time and generated
an average of 2,990 francs over the course of 50 rounds - a sum greater
than 20 of the 30 experimental subjects. Over the course of more than
21,000 simulations, the Nash-equilibrium Monte Carlo player was neither
unusually lucky nor unlucky. In contrast, nine of the ten winners in
treatment 3 who earned more than 2,990 francs benefited from
better-than-expected draws. Thus, since the biggest winners in our
experiments also tended to be the "luckiest" contestants, a
strong argument can be made that playing Taylor's theoretically
predicted strategy is strategically advantageous, even when one is
playing against a set of untrained opponents.(7)
To this point we have shown that, with the possible exception of
treatment 5, the average behavior of our experimental subjects was very
close to the Nash equilibrium predicted by Taylor's research
tournament model. On [TABULAR DATA FOR TABLE V OMITTED] the other hand,
we observed wide variances in the research levels employed individually,
which we believe can be largely ascribed to uncertainty on the part of
individual competitors as to the equilibrium stopping value. Using our
Monte Carlo simulation, we have also shown that if a subject actually
knew the equilibrium stopping value, employing that strategy would
probably have resulted in a larger payoff than two-thirds of all other
subjects. The deviation from theory we have not accounted for to this
point is the systematic bias, which shows up in our data as treatments
1-4 undershot the mean winning innovation, while treatment 5 overshot the mean winning innovation. However, this phenomenon can also be
ascribed to individual uncertainty about the equilibrium stopping value.
TABLE VI
Individual Violations (Treatment 3)
Subject ID Violations Significant
1 5 No
2 0 No
3 2 No
4 20 Yes
5 2 No
6 1 Yes
7 5 No
8 0 No
9 11 Yes
10 0 No
11 2 No
12 4 No
13 5 Yes
14 8 Yes
15 9 No
16 1 Yes
17 5 No
18 0 No
19 7 Yes
20 10 Yes
21 6 No
22 2 No
23 0 No
24 0 No
25 0 No
26 0 No
27 1 No
28 3 No
29 14 No
30 0 No
[TABULAR DATA FOR TABLE VII OMITTED]
TABLE VIII
Subjects' Draw Experiences (Treatment 3)
Subject # of Draws Average Draw
1 110 538.00
2 163 481.65
3 147 527.07
4 134 532.52
5 169 473.15
6 54 544.81
7 163 491.05
8 4 386.75
9 136 504.61
10 138 506.33
11 72 564.83
12 155 460.95
13 114 468.23
14 67 548.70
15 101 482.17
16 176 547.10
17 119 519.54
18 140 478.70
19 159 510.59
20 117 489.98
21 95 513.80
22 120 508.21
23 146 503.82
24 100 516.57
25 79 526.53
26 66 428.61
27 159 484.52
28 91 448.91
29 103 442.98
30 150 486.41
In Taylor's model, the unique symmetric Nash equilibrium
requires all agents to employ the same stopping rule. Though our
aggregate data support Taylor's predictions, given the complexity
of computing the Nash equilibrium it is not surprising that we observed
substantial variation in individual stopping behavior. To quantify this
variation in individual behavior two other measures suggest themselves:
the smallest high draw and the highest nonstopping draw. The smallest
high draw is defined as the smallest draw value an individual obtains
without continuing to draw when a draw is possible. The highest
nonstopping draw is defined as the highest value an individual obtains
and continues to draw. We computed these measures and report them for
treatment 3 in Table IX. As can be seen, these measures vary
significantly, and we are reluctant to draw too much inference from
these results because in each case the metric applies to only one round
of the 50 in the experiment. Within an experimental setting there is
frequently some instance of the subjects "trying out"
alternative strategies. In any case, the key measure for the current
discussion is the standard deviation across subjects, which is
especially large for the smallest high draw, indicating that there was
considerable differences in behavior across subjects. Such differences
may be characteristic of bimodality in subject behavior with some
subjects employing more aggressive research strategies than others.
The variance in subject behavior is more likely to create problems
for the R&D industry when the tournament is permitted to continue
for several periods. With longer tournaments, the potential exists for
subjects who aggressively overshoot the theoretical z-stop to overwhelm those who undershoot. This would result in excessive levels of aggregate
research in lengthy tournaments with many competitors as well as
excessively high research-to-prize ratios and larger than predicted
winning draws as we observed in treatment 5. This overshooting phenomenon has some serious implications for the R&D industry, and
thus for research tournament sponsors, if it bears out in real-world
research tournaments. In particular, sponsors may risk driving some of
their R&D firms to bankruptcy if they sponsor tournaments with
"too long" a time horizon or "too many competitors."
In particular, in treatment 5 with six periods for research and five
competitors, the tournament yields research/prize ratios well above one,
which simply cannot be sustained by the industry over the long term.
V. CONCLUSIONS
The focus of our experiments was to evaluate the fixed prize
mechanisms as a means to obtain a given quality of research at as low a
cost as possible under various market conditions. Overall, the results
of our experiments appear to support the theory. At the market level,
the winning research product and level of research effort tended to be
close to the theoretical prediction. In addition, the majority of our
subjects appear to employ stopping rule strategies rather than playing
simple rule of thumb strategies which does suggest a certain level of
sophistication on their behalf. However, instead of observing a uniform
level of research effort across all competitors, as the symmetric Nash
equilibrium would predict, the research strategies we observed varied
significantly across subjects.(8)
This variance tended to affect the aggregate results of our
experiments. When there were only two periods for research, there tended
to be less total research than predicted because there was not enough
time for those who did the most research to make up for those who did
very little. In the longer research tournaments with several
competitors, we tended to see levels of research at or above the
predicted amounts. Here, the high-effort competitors had ample research
time to make up for the low-effort players and the result was higher
levels of winning innovations and in some instances
"excessive" levels of aggregate research which reduced the
tournament efficiency.
The effect of additional participants and more research periods can
potentially be substantial. The evidence supports the intuitive notion
that if these parameters are increased arbitrarily, participants in long
research tournaments may lose money because of excessive research
competition. For example, without question the most prolific sponsor of
research competitions is the federal government and, in particular, the
Department of Defense (DOD). Each year DOD awards millions of dollars
worth of contracts to winners of competitive R&D competitions.(9)
Recent General Accounting Office (GAO) studies have identified
acquisition reform as one of the Pentagon's highest priorities (GAO
[1997], 17]). One of the most common complaints about DOD acquisition
efforts is the extensive time required to field new systems. Our
experimental data suggests that lengthy research competitions, by
themselves, may inadvertently induce contractors in these competitions
to conduct excessive amounts of research leading to cost overruns, quite
apart from the extra costs normally associated with schedule delays. In
the long run, this kind of behavior would normally be self-correcting
because either the competitors would adjust their levels of research
effort they would be eventually be driven out of the market and
aggregate research would decline because of fewer competitors. However,
since there is a clear national security incentive to prevent some
defense contractors from exiting the industry, the effects of long
research competitions on the cost of building weapons may be
particularly detrimental in the defense industry. Therefore, judicious selection of the time horizon and the number of competitors seems to be
indicated.
We are thankful to David Cooper for helpful comments on an earlier
version of this paper. Shaul Ben-David provided programming assistance.
William Neilson and two reviewers provided comments that led to
substantial improvements in the analysis and exposition. Partial funding
for this study was provided by the Defense Systems Management College
and the Institute for National Security Studies.
1. The contest, known as the Rainhill Trials, was used to select an
engine for the first-ever passenger railroad in Britain. The [pounds]500
first prize was won by George and Robert Stephenson, who built the
Rocket, which attained a top speed of 46 km/hour. See Day [1971] for
details about the evolution of steam locomotives.
2. The mission to Mars contest was worked up for a member of
Congress by the executive chairman of the National Space Society, Robert
Zubrin. The proposal is a series of contests with prizes in the $1
billion range, culminating in a $20 billion first prize. See Zubrin
[1996].
3. For these tests, we eliminated the rounds in which individuals
did not make any draws. The justification for this is that these are
simultaneous move games. That is, when an individual chooses a strategy,
it is based on the expectation that the group is of the announced size.
Thus, the strategy choice is unaffected by whether one or more
competitors has decided to drop out.
4. This can be characterized as a binomial process, After drawing
Y, the subject either violates the stopping-rule strategy by stopping or
does not violate the strategy by continuing to draw Thus, the test is
whether the frequency of violations is statistically greater than zero
(at the 95% confidence level).
5. To test for significance, a binomial test was conducted against
the random prediction that violations would occur one half of the time.
The subjects were reported to have a statistically significant level of
violations if the rate was significantly greater than the null
prediction at the 0.10 level.
6. See Taylor [1995, Prop. 2 and Fig. 1] for a graph of the
Best-Response Projections for any two contestants.
7. We also estimated an ordinary least squares regression of
subjects' payoffs against their "luck of the draw"
relative to their opponents (AvDiff) and the number of times their draw
decisions deviated from the theoretical equilibrium z-stop strategy,
relative to one's opponents (ErrorDiff):
Payoff (francs) = [[Beta].sub.0] + [[Beta].sub.1] AvDiff +
[[Beta].sub.2] ErrorDiff
One would expect the coefficient on AvDiff to be positive,
reflecting a greater payoff for luckier average draws. The coefficient
on ErrorDiff is predicted to be negative, reflecting lower payoffs for
more deviations from the theoretically optimal strategy. The subject
data yields the following results:
Payoff (francs) =
2784.8 + 5.256 AvDiff - 14.457 ErrorDiff (52.077) (4.435) (3.138)
The intercept predicted by Taylor's model and our parameters
(endowment income and expected earnings) is 2675 - not statistically
different from our estimate. The other coefficients are statistically
significant with the predicted sign and reasonable magnitudes. The
overall fit is quite strong ([R.sup.2] = 0.513).
8. Such variance in behavior has been observed in many other
individual decision settings. Camerer [1995] reports on several examples
of experiments in which subjects systematically overstated risk while
others understated risk. It is an interesting question whether markets
correct such behavior or whether aggregating market observations merely
masks it.
9. By law, federal agencies are required to conduct competitive
procurements whenever practicable. See U.S. Code Annotated Title 41,
Section 253. For example, in 1991 the Air Force held a
"fly-off" competition to select the new Advanced Tactical
Fighter. Lockheed won that competition with their F-22, earning a
production contract estimated at the time to be worth more than $90
billion. See Schwartz [1991] for details.
REFERENCES
Camerer, Colin. "Individual Decision Making," in The
Handbook of Experimental Economics, edited by J. Kagel, and A. Roth,
Princeton University Press, 1995, 587-703.
Day, John R. Trains. New York: Bantam Books, 1971.
Economist, "HDTV All Together Now," May 29, 1993, 74.
Fullerton, Richard L., and R. Preston McAfee. "Auctioning
Entry into Tournaments," Journal of Political Economy, June 1999,
573-605.
Langreth, Robert. "The $30 Million Refrigerator," Popular
Science, January 1994, 65-67, 87.
Schwartz, John, Douglas Waller, John Barry, and John Taliaferr.
"The $93 Billion Dogfight," Newsweek, May 6, 1991, 46-47.
Taylor, Curtis R. "Digging for Golden Carrots: An Analysis of
Research Tournaments." American Economic Review, September 1995,
872-90.
U.S. Code. 26 January 1998.
United States Department of Defense. Defense Acquisition Management
Policies and Procedures, DOD Instruction 5000.2, 23 February 1991.
Washington, D.C.: U.S. Government Printing Office, 1991.
United States General Accounting Office. Reports and Testimony:
January 1997. GAO/OPA-97-4, Washington, D.C.: U.S. Government Printing
Office, 1997.
Zubrin, Robert. "Mars on a Shoestring." Technology
Review, November 1996, 20-31.
Richard Fullerton: Associate Professor of Economics, United States
Air Force Academy, Colorado Springs, Phone 1-719-333-3080, Fax
1-719-333-2945 E-mail
[email protected]
Bruce G. Linster: Professor of Economics, United States Air Force
Academy, Colorado Springs, Phone 1-719-333-3080 Fax 1-719-333-2945
E-mail
[email protected]
Michael McKee: Professor of Economics, University of New Mexico,
Albuquerque, Phone 1-505-277-1960 Fax 1-505-277-9445, E-mail
[email protected]
Stephen Slate: Associate Professor of Economics, United States Air
Force Academy, Colorado Springs Phone 1-719-333-3080, Fax
1-719-333-2945, E-mail
[email protected]