The bias toward zero in identifying relationships: reply to Kennedy. (response to Peter Kennedy in this issue, p.382)n.
Fremling, Gertrud M. ; Lott, John R., Jr.
We will argue that rational expectations has considered only the
"misestimation" type of error, which can "cancel
out" in the aggregate, but that with errors in identifying
relationships, there is no similar cancelling out effect (Fremling and
Lott [1996, 276])
Kennedy [1999] misses much of our point. When he extends our
two-variable case to a three-variable case, he views agents as
exclusively making the traditional "specification errors." As
is well known, specification errors concern what variables to include
when estimating a regression. He fails to address our main contribution
- that people sometimes do not identify that a relationship even exists.
When describing our argument Kennedy goes so far as to always replace
our references to "[model] identification error" with the term
"specification error." For instance, the way he refers to our
footnote 5 (p. 279) gives the misleading impression that we ourselves
are using the term "specification errors." Our footnote
clearly uses the term "identification errors,"(1) and it is
only because he switches these terms that he is able to conclude that
"Fremling and Lott have violated one of their own assumptions
...."
In our 1996 paper we showed the following. Individual actors
sometimes are quite clueless as to an economic problem, failing to
understand that two (or more) variables are related. We extensively
described this "identification problem" on p. 278, and we
stated: "The crucial argument in this paper is that at least some
people fail to identify a true relationship, and therefore never take
the next step, which is to estimate the strength of it. Failing to
estimate the strength of a relationship is essentially equivalent to
estimating it to be zero." Thus, many do not reach the
regression-step at all. When aggregating across individuals making
various mistakes, economic behavior at the group level resembles a
situation where every individual estimates a regression but
underestimates the regression coefficient. Even though all our actors
are "rational" - and thus only make non-systematic mistakes in
what variable(s) are omitted - the aggregate results could equally well
have been generated by a group of non-rational actors systematically
underestimating the regression coefficients. In other words, when
observing aggregate behavior that contains aggregate systematic errors,
one must be cautious not to erroneously infer irrationality on the part
of the individuals.
Now to Kennedy's numerical example with two independent
variables rather than one. How should our analysis be applied? We
maintain that there exists a total of eight possible cases: one of
correctly recognizing all three variables as being connected, three of
missing only one, three of missing two, one of missing all. Kennedy only
recognizes four of the cases. This makes an enormous difference. If
Kennedy had also included the implicit zero coefficients in the
remaining four cases, he would have found a substantial "bias
toward zero": for the representative individual, [Alpha] would have
been only 4.05 (not 8.09) compared to its true value of 6. Kennedy never
mentions his estimate of [Beta], but as a careful reader can deduce,
this estimate is only 2.16, a serious underestimate. And including the
four other cases with zero's would have yielded a mere average
estimate of 1.08, way below its true value of [Beta] = 4. In our view,
Kennedy's Monte Carlo experiment merely confirms our results.(2)
Kennedy makes it appear that his numerical example corresponds to
our macroeconomic modeling in section V. It does not, for he overlooks
our explicit assumption that "the general price level cannot be
perfectly and immediately observed by the workers ..." (p. 283).
This a very classical assumption in macroeconomics and forms the basis
for why changes in the nominal wage are sometimes confused with changes
in the real wage. Kennedy's disregard for the dependent variable
makes him overlook the other four cases in his numerical example.
Certainly, it can be argued that the variables [Delta]M and [Delta]F are
more frequently omitted, and a complex model with the different
probabilities of omission could be set up. In the macroeconomic
situation at hand, we would guess (and this is just a mere guess) that
no more than 1% of workers in any given month would observe [Delta]M, 1%
would observe [Delta]F, and at most 50% would observe [Delta]P.
Depending on the exact combinations of who knows what, the exact number
for the average estimates of [Alpha] and [Beta] could vary slightly, but
it is crystal clear that with numbers like this the ignorant would
totally dominate: at least 98% would never simultaneously have data on
[Delta]P changes and either [Delta]M or [Delta]F. The cases of
"implicitly zero estimates" would be overwhelming. Even if a
handful of workers spuriously picked up some of the influence of
[Delta]F when estimating the impact of [Delta]M, their estimates of
[Alpha] would be drowned in the aggregate by zero's from those who
are basically clueless.
As for Kennedy's second issue, prediction error, we have
absolutely no disagreement. We demonstrated underestimation of
coefficients, which certainly does not imply underestimation of
prediction. Our political business cycle case exemplified this: we
explicitly stated that the expected price temporarily exceeds the actual
price (p. 286). Whether underestimation is somehow a more likely
consequence than overestimation probably depends on the particular
problem and whether "underestimation" refers to levels or to
changes in levels. In our political business cycle example, under- and
overprediction seem equally common when expressed in levels. However,
the absolute value of changes tends to be underestimated. (Alternatively
formulated, inertia is overestimated.)
To conclude, Kennedy portrays our analysis as merely rehashing the
long-recognized econometrics problem of excluding explanatory variables.
Our distinct contribution however is formulating a two-step process,
where there can be severe errors in setting up a model, including errors
in identifying the dependent variable. We argue that in many economic
cases a high fraction of the public acts as if they were setting the
coefficient to zero. Our results continue to hold when there are more
than two variables involved.
We thank Scott Masten for his helpful comments.
1. We also made the distinction between the existence of a model
and specification errors clear in several other different places mi our
paper, and we referenced existing research that has dealt with
specification error type issues. For example, on pages 277-78 we cite
articles that discuss the traditional misspecification type problems and
then point out that "our model shall ... focus exclusively on
individuals making mistakes in identifying relationships."
2. Kennedy further stacks the cards in his favor by having the two
explanatory variables being highly correlated. The overestimation of
[Alpha] that he finds is due to picking up the influence of the other,
more important variable whenever it is excluded. He does not mention
that, according to his own figures, [Beta] was underestimated by
slightly more than [Alpha] was overestimated.
REFERENCES
Fremling, Gertrud M., and John R. Lott, Jr. "The Bias Towards
Zero in Aggregate Perceptions: An Explanation based on Rationally
Calculating Individuals." Economic Inquiry, April 1996, 276-95.
Kennedy, Peter. "Specification Error, Prediction Bias, and
Rational Expectations." Economic Inquiry, April 1999, 382-84.