An experimental study of statistical discrimination by employers.
Papageorgiou, Chris
1. Introduction
This article reports results from an experiment that was motivated by the literature on labor market discrimination. Our aim for conducting
this experiment was to investigate whether employers' initial
perceptions of employee ability on the basis of group characteristics
can lead to lower wages that persist for a long time. Under the
maintained assumption that the employer learns about the group's
ability through Bayesian updating, we examine how quickly he learns. The
idea that inaccurate prior assessment by managers formed on the basis of
an employee's group influences wages is one of many theories of
labor market discrimination in economics. For economists, discrimination
implies that workers in one group earn less than the competitive market
rate for their labor, typically due to their gender or ethnic group. The
economics literature contains many theories seeking to explain labor
market discrimination and empirical work that attempts to test those
theories and measure discrimination through observed wages (see, e.g.,
Altonji and Blank 1999, who provide a thorough survey of theoretical and
empirical research on wage differences and discrimination, including
statistical discrimination, as well as their probable causes).
Economic models of discrimination were initially developed to
address the empirical findings of many researchers that wages differ
across groups and the widespread belief that wage differences stem in
part from discrimination. Existing theoretical models of labor market
discrimination fall into several distinct categories, according to the
source of discrimination. First, there are theories based on tastes,
such as Becker's (1972). According to such theories, employers have
a preference for not hiring workers of a particular group, fellow
employees have a preference for not working with workers of a particular
group, or customers have a preference for not buying from firms hiring
workers of a particular group. Second, there are theories based on
market power (often in addition to tastes), such as labor-market
monopsony (Robinson 1934), labor unions (Kessel 1958), and public-sector
firms (Ross 1948). Third, there are theories that suppose social, legal,
or institutional constraints crowd certain worker types into or out of
particular occupations. Examples include the occupational exclusion
models of Bergmann (1974) and Johnson and Stafford (1998). Finally,
there are statistical theories pioneered by Phelps (1972) and Arrow
(1973). Statistical theories of discrimination focus on the idea that,
when a prospective employee's true ability is unobservable, the
employer may rationally use the employee's ethnic group or gender
as a proxy for his ability.
Our study focuses on an extension to the basic statistical
discrimination model. The initial model by Phelps (1972) simply argued
that high-ability workers in groups with more variability in ability
would earn less than high-ability members of other groups. Lundberg and
Startz (1983) developed a more complex model to show that, even if two
groups possessed the same average ability, a higher variance of ability
in one group would lead to lower wages for members of that group
relative to a low-variance group. Farmer and Terrell (1996) further
extended this model to look at the possibility that inaccurate initial
assessments of ability could become self-fulfilling prophecies. For
example, lower initial assessments of worker ability by employers could
diminish that worker's marginal returns to additional training or
education and thus decrease his incentives to obtain skills. His ability
would then remain low, reinforcing the employer's assessment.
The empirical literature fails to produce a decisive conclusion on
the sources of discrimination or even the extent to which it affects
wages in the United States today. While average wages differ across
groups, wage differences could simply reflect differences in worker
ability. Explanations for differences in ability vary considerably. For
example, Herrnstein and Murray's (1994) controversial book The Bell
Curve: Intelligence and Class Structure in American Life asserts that
races simply differ in inherent ability, while Card and Krueger (1992)
argue that differences in quality of education for black and white
workers explain a significant portion of the wage gap. A critical issue
for empirical studies is that worker ability is unobserved. This makes
it difficult to break wage differences across groups into one portion
that is attributable to differences in ability and a second portion
attributable to pure discrimination. Testing theories that explain
discrimination or sources of differences in ability is even more
difficult. (1)
In this article, we turn to a laboratory experiment as a step
toward understanding the impact of an employer's prior opinions
formed on the basis of an employee's group on wages. The critical
issue is how quickly employers learn about workers' true abilities
through observing noisy information about their performance in the
workplace. If prior opinions are weak, the employer will quickly update
any group-based stereotypes with information from the workplace.
However, if initial assessments are heavily weighted, the initial
perception may lead to persistent differences in wages.
2. The Model
In order to motivate our experiment, we discuss a model based on
Farmer and Terrell (1996) and Lewis and Terrell (2001), who examine a
statistical discrimination framework with Bayesian updating of
employers' beliefs. A large number of employers hire workers for
one period from a large pool of potential employees. The labor market is
competitive, so that workers are paid their expected marginal product in
each period, [w.sub.it] = E([y.sub.it]). The marginal output of worker i
in period t is given by the following production technology:
(1) [y.sub.it] = [[A.sup.[alpha].sub.i][e.sup.[epsilon]it]], where
[[epsilon].sub.it], ~ N(0,[[sigma].sup.2]) i.i.d.
The random variable A reflects the ability of all workers with the
same observable characteristics as worker i, while the random variable
[epsilon] is an individual-specific component that is normally
distributed and i.i.d, across workers. (2) The values of A and [epsilon]
are unobservable to the employer, but A can be gradually learned over
time. Note that, because of our assumption about the distribution of
[epsilon], one can generate the log-normal distribution of wages
initially observed by Mincer (1974). Taking logarithms in Equation 1
yields
log [y.sub.it] = [alpha] log[A.sub.i] + [[epsilon].sub.it].
Without loss of generality, we assume that [alpha] = 1, so that
[Y.sub.it], = [A.sub.i][e.sup.[epsilon]it] and log [y.sub.it] = log
[A.sub.i] + [[epsilon].sub.it]. The normality assumption on e implies
that the distribution of log output conditional on group log ability is
normal and given by
(log [y.sub.it] | log [A.sub.i]) ~ N(log [A.sub.i],
[[sigma].sup.2]),
or more explicitly,
f(log [y.sub.it] | log [A.sub.i]) = 1/[square root of
2[pi][[sigma].sup.2] exp[-1/2[[sigma].sup.2][(log [y.sub.it] - log
[A.sub.i]).sup.2]].
We further assume that the (representative) employer gradually
learns about the ability of an employee type by making T sequential
observations of employees' output. This assumption is at the heart
of our investigation in examining the persistence of the employers'
priors about employees' abilities that are initially unobservable.
This assumption, along with the independence property of the assumed
error distribution, implies
f(log [y.sub.i1], ..., log [y.sub.iT] | log [A.sub.i] = 1/[(2[pi]
[[sigma].sup.2]).sup.T/2] exp [-1/2[[sigma].sup.2] [T.summation over
(t=1)] [(log [y.sub.it] - log [A.sub.i]).sup.2]].
The employer's initial beliefs about employees' ability
is characterized by the prior distribution function given by
(log [A.sub.i]) ~ N([bar.[mu]], [[bar.[sigma]].sup.2]),
where [bar.[mu]] is the mean of a normal prior reflecting the best
guess about employees' group ability, and [[bar.[sigma]].sup.2] is
a measure of certainty of prior beliefs. [bar.[mu]] and
[[bar.[sigma]].sup.2] are allowed to vary across groups as
employers' priors depend on employees' group. Employers are
assumed to use Bayesian updating when forming beliefs about the ability
of workers. So, beliefs at time T are calculated as
f(log [A.sub.i] | log [y.sub.i1], ..., log [y.sub.iT]) = f(log
[y.sub.i1], ..., log [y.sub.iT] | log [A.sub.i)f(log [A.sub.i]/f(log
[y.sub.i1], ..., log [y.sub.iT].
This in turn implies
(log [A.sub.i] | log [y.sub.i1], ..., log [y.sub.iT]) ~
N([[mu].sub.T], [[sigma].sup.2.sub.T]),
where
[[mu].sub.T] = [[sigma].sup.2.sub.T][[summation].sup.T.sub.t=1] log
[y.sub.it]/[[sigma].sup.2] + [bar.[mu]]/[[bar.[sigma]].sup.2] and
[sigma].sup.2.sub.T] = [[T/[[sigma].sup.2] +
1/[[bar.[sigma]].sup.2]].sup.-1].
The mean of the updated distribution for a worker's type log
ability, [[mu].sub.T], is the weighted average of predicted ability
based on job performance (given by the term [[summation].sup.T.sub.t=1]
log [y.sub.it] weighted by [[sigma].sup.2.sub.T]/[[sigma].sup.2]) and
prior opinion about ability (given by [bar.[mu]] weighted by
[[sigma].sup.2.sub.T]/[[bar.[sigma].sup.2]). The variance of the updated
distribution for a worker's type log ability,
[[sigma].sup.2.sub.T], depends on the variance of mean log ability from
observed output and the variance of prior opinion.
It is interesting to consider what happens to [[mu].sub.T] and
[[sigma].sup.2.sub.T] as T [right arrow] [infinity]: that is, as the
amount of information about employees' abilities becomes large. (3)
It is easy to show that [lim.sub.T [right arrow][infinity]]
[[sigma].sup.2.sub.T] = 0, which intuitively implies that, in the limit,
there is no uncertainty in employers' belief about employees'
performance. Deriving [lim.sub.T [right arrow] [infinity]] [[mu].sub.T]
requires a bit more work. First rewrite [[mu].sub.T] using the
definition of [[sigma].sup.2.sub.T] as
(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]
Because [[sigma].sup.2.sub.T] [right arrow] 0 as T [right arrow]
[infinity], the second term in Equation 2 approaches zero as T [right
arrow] [infinity], and the denominator of the first term approaches one.
Hence,
(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.],
as log [[epsilon].sub.it] ~ N(0, [[sigma].sup.2]) and therefore
[lim.sub.T[right arrow][infinity]] ([[summation].sup.T.sub.t=1])/T = 0.
The main intuition behind Equation 3 in light of the mean predicted
ability's two components (job performance and prior opinion) is as
follows: First, prior beliefs about ability become unimportant as the
employer has T [right arrow] [infinity] observations of the
employee's output (as T [right arrow] [infinity], the second term
in Equation 2 approaches zero). Second, mean predicted ability based on
job performance, in the limit, converges to log ability of the group (as
T [right arrow] [infinity], the first term in Equation 2 approaches log
[A.sub.i]).
Using Bayesian updating in the specification of employer beliefs
has the advantage of implying that, if there are systematic differences
in ability across worker types, employers gradually learn to show
preference for the higher ability types and, thus, to be willing to pay
them higher wages. Unfortunately, the speed at which this happens
depends on characteristics of the employer that are unobservable to the
researcher--in particular, how heavily they weight their prior
probabilities, relative to the information they receive in each period.
Bayesian updating is probably the most commonly used (by economic
theorists) model for combining old and new probability information.
However, its success as a descriptive theory is mixed. The psychology
and behavioral economics literatures are replete with examples in which
individuals, even when given complete descriptions of a
probability-updating problem including both base rates (which are
equivalent to employers' priors in our model) and likelihood
information (equivalent to the observed new productivity information in
our model), underweight base rates relative to the likelihood
information, and other examples in which they overweight the base rates.
For example, Camerer (1995) provides a thorough survey of experimental
studies of individual decision making in economic situations. On the
other hand, Bayesian updating has been used successfully by some
researchers for describing individual decision making in probabilistic situations (see, e.g., Anderson and Holt 1997).
3. The Decision Problem Used in the Experiment
The experiment was designed in an attempt to capture a simplified
version of the decision problem faced by an employer in the above model,
while avoiding obviously loaded terms. All subjects in the experiment
faced the same decision problem, which we now describe. In each of nine
rounds, subjects were presented with two buckets, each of which
contained 50 cards. Subjects were asked to draw a total of four cards
(with replacement) from the two buckets; they could draw all four from a
single bucket, two from each bucket, or three from one bucket and one
from the other. Each bucket is meant to correspond to a group of workers
sharing some observable characteristic; the individual cards represent
individual workers having that characteristic (i.e., workers of a given
type). Each card had a number printed on it, representing the true
marginal productivity of that worker. (Thus, the mean of the numbers in
a bucket represents the average ability of that worker type.) The
subject's total revenue (in points) was the sum of the numbers on
the four cards drawn. Her total cost was determined by the number of
cards drawn from each bucket; it cost 60 points to draw two from each
bucket, 70 points to draw three and one, and 100 points to draw entirely
from one bucket. The subject's profit in a round was her total
revenue minus her total cost.
Because subjects draw four cards in each round and are not allowed
to hold onto cards for future rounds, we are making the implicit
assumption that firms hire new workers in every round. Notice also that
we do not address wages here, but rather only demand for employees of a
given type. Of course, unless labor supply is infinitely elastic, there
will be a positive relationship between labor demand and equilibrium
wages. Because each bucket contains a nontrivial distribution of cards,
subjects (managers) are unable to know ahead of time exactly how
productive a given worker will be. But, if the distributions are
different across buckets, as they are in the first six rounds, the
bucket from which a card is drawn (i.e., the type of the worker)
contains some information about the worker's expected productivity.
The rationale for costs increasing as more cards are drawn from the same
bucket is to model diminishing returns in a particular type of worker.
An additional consequence of these increasing costs is that, when the
two buckets have the same distribution of cards, it is strictly optimal
to draw equally from both buckets.
There were three distributions of cards used high, medium, and
low--which could be ordered by stochastic dominance. These distributions
are shown in Figure 1. (4) In the first six rounds, one bucket contained
a high distribution and the other a low distribution of cards; we will
refer to these as bucket 1 and bucket 2, respectively. In the last three
rounds, both buckets contained medium distributions (i.e., there was no
difference in distribution across buckets in these rounds). The two
buckets were differently colored (one green and one tan), so subjects
could easily tell them apart. The distributions were chosen so that (1)
profits were guaranteed to be nonnegative; (2) if the subjects had
perfect information about the distributions, the optimal choice for the
first six rounds would be to choose all four cards from bucket 1; and
(3) if the subjects had perfect information about the distributions, the
optimal choice for the last three rounds would be to choose two cards
from each bucket.
[FIGURE 1 OMITTED]
The motivation for the round-to-round sequence of distributions we
used was to allow the subjects to build up experience of one bucket
being noticeably better than the other one, and then to see how they
respond to a situation in which neither bucket is better on average
(though, even in this latter case, because of the randomness in the
distributions, it may seem to a given subject that one or the other
bucket is better). This corresponds to a situation an employer might
face where one type of worker has historically been more productive than
another (though there is variability in productivity within a type), but
there is no longer a difference between the types.
4. Experimental Procedures
The experimental sessions were conducted at Louisiana State
University and at the University of Houston. Each subject was seated at
a desk and given written instructions and a record sheet on which to
record decisions and resulting outcomes. (5) These instructions were
then read aloud, and any questions were answered, prior to the first
round of play.
The experiment was conducted with pen and paper. During a given
round of play, each subject decided how many cards to draw out of each
bucket, and circled the chosen buckets on her record sheet. The monitor
would then go to the subject's desk and the subject would draw from
the appropriate buckets, one at a time. After drawing a card from a
bucket, the subject would record the card's number on her record
sheet and then replace the card in the bucket before drawing again.
After drawing four cards and recording the results in this way, the
subject would fill in the entries for total revenue, total cost, and
profit, and the monitor would move on to the next subject.
After the third and sixth rounds, it was announced that the cards
in the buckets would be replaced, and subjects were able to observe the
monitor putting new cards into the buckets. Announcing changes of the
distributions is, of course, a departure from the situation faced by
actual employers, who typically would not have this information.
However, it was necessary to avoid deception of the subjects, which is
generally considered bad methodology by experimental economists. The
cards used in rounds 4-6 had the same distributions as those used in
rounds 1-3, as mentioned previously; the distributions were changed for
rounds 7-9 (see Table 1 and Figure 1).
Within a three-round block, it was known by the subjects that the
distributions of cards in the buckets did not change, so that the
results of the first round in a block would be useful for making
decisions in the second round in that same block, and the results of the
first two rounds would be useful for making decisions in the third round
in that block. Because the results of each round were recorded on the
subjects' record sheets, it was easy for them to do so, if they
wanted. After the third round in a block was over and the cards were
physically changed, it should have been much less apparent to subjects
that previous results would be useful (though in rounds 4-6, they would
have been, and they might think so in rounds 7-9 also).
After the ninth round was over, the session ended. Subjects were
paid a $2.00 showup fee; in addition, one round was randomly chosen, and
subjects were paid their profit in that round, at the exchange rate of 5
cents per point. Subjects earned an average of about $9.00 for
participating in an experimental session; they were paid in cash
immediately following the session.
Our hypotheses were as follows. First, within a block of three
rounds (rounds 1-3, 4-6, and 7-9), if subject behavior is originally
different from optimal behavior, it will tend to move in the direction
of optimal behavior: choosing entirely from bucket 1 in the first six
rounds, and choosing equally from both buckets in the last three rounds.
Second, and more interesting, we expected that the results of the first
six rounds would lead to subjects having inaccurate (though reasonable,
given their results up to that point) priors for the last three rounds,
and would affect their behavior accordingly. In particular, the early
experience of bucket 1 being better would lead subjects to choose bucket
1 more often than bucket 2, even when it is no longer better.
5. Results
A total of 36 subjects participated in the experiment. Figure 2
shows some features of the experimental data. Shown in this figure are
the number of subjects choosing bucket 1 (the bucket with the higher
distribution of cards, when the buckets had different distributions) in
each round. These distributions are represented by the open circles; the
area of a circle is proportional to the number of subjects making that
choice in that round. Also shown is the average frequency of bucket 1
choices by all subjects in each round (closed circles; see also Table
2). If subjects actually knew the distributions in the buckets, they
should optimally choose to draw all four cards from bucket 1 in the
first six rounds, and two cards from each bucket in the last three
rounds. In fact, they didn't know the distributions, but the data
are consistent with their learning these distributions. In round 1, most
subjects draw equal numbers of cards from each bucket, but the number
drawn from bucket 1 increases from round 1 to round 3, where the modal choice is all four cards from bucket 1. In round 4, when new cards are
put into the buckets, the modal choice falls back to two cards from each
bucket. Again, the number drawn from bucket 1 increases from round 4 to
round 6, until the modal choice in round 6 is all four cards from bucket
1. In round 7, when new cards are again put into the buckets (this time
with identical distributions in the two buckets), the modal choice again
falls back to two. It remains at two for the remaining rounds.
The increase in Bucket 1 choices from round 1 to round 3 is
statistically significant (Page test for ordered alternatives, p
[approximately equal to] 0.001), as is the increase from round 4 to
round 6 (Page test, p [approximately equal to] 0.002). (See Siegel and
Castellan 1988 for descriptions of the nonparametric statistical tests
used in this article.) There is no significant increase over the last
three rounds according to the Page test (p [approximately equal to]
0.509), and even according to the weaker Friedman two-way analysis of
variance test, there is no significant difference in the level of bucket
1 choices over these rounds (p [approximately equal to] 0.18).
In every round, the modal choice is also the median choice, and
looking at the means doesn't affect these qualitative conclusions
substantially. In essence, when bucket 1 contains a higher distribution
of cards than bucket 2, subjects learn to choose bucket 1 more and more
often; when the buckets contain the same distribution of cards, subjects
continue to choose both buckets roughly equally, on average. In other
words, consistent with our first hypothesis, subjects' choices
moved toward optimal play (though not actually reaching it).
It is harder to detect evidence consistent with our second
hypothesis. The number of bucket 1 choices in round 4 is slightly higher
than in round 1, which could be taken as evidence that the results of
rounds 1-3 are carrying over into subjects' beliefs in round 4.
However, we don't see the same effect in round 7; there are
actually fewer bucket 1 choices then (though neither the change from
round 1 to round 4 nor that from round 4 to round 7 is significant).
One aspect of the data that is consistent with early round results
influencing later round play can be seen in round 8. Even though the
average choice over the last three rounds remains roughly constant at
two draws from each bucket, the shape of the distribution varies
noticeably over these rounds. Notice from Table 2 that the standard
deviation of bucket 1 choices increases sharply from round 7 to round 8,
then decreases again in round 9 (this change is also apparent in Figure
2). That is, in round 8, the average of two bucket 1 choices conceals a
substantial number of choices of more than two or fewer than two draws
from bucket 1. It is possible that the difference in distributions
between buckets in the first six rounds leads subjects initially (in
round 7) to expect that there will be a difference in the last three
rounds also, even if they don't know which bucket has the higher
distribution. If so, subjects' choices in round 8 should be highly
dependent on the results of their own draws in round 7. This is indeed
the case. Subjects earning higher average payoffs from their bucket 1
draws than their bucket 2 draws in round 7 chose bucket 1 in round 8
roughly 55% of the time, while those subjects earning higher payoffs
from their bucket 2 draws chose bucket 1 only about 44% of the time. The
difference between these conditional relative frequencies is significant
(robust rank-order test, p [approximately equal to] 0.075).
[FIGURE 2 OMITTED]
If we look only at subjects drawing two cards from each bucket in
round 7, this result becomes even more stark (see Figure 3). Of these
subjects, those choosing in round 8 to draw three cards from bucket 1
had earned on average 6.56 more points from bucket 1 than bucket 2 in
round 7. Those continuing in round 8 to draw two cards from each bucket
had earned on average 0.9 fewer points from bucket 1 than bucket 2 in
round 7, and those drawing only one card from bucket 1 in round 8 had
earned 6.8 fewer points from bucket 1 than bucket 2 in round 7. The
distribution of round-7 payoff differentials for those choosing one card
from bucket 1 in round 8 are significantly different from both those
choosing two cards (robust rank-order test, p < 0.10) and those
choosing three cards (robust rank-order test, p < 0.05).
[FIGURE 3 OMITTED]
We also find a difference if, instead of looking at the number of
draws from each bucket in round 8, we look at the change in the number
of draws from bucket 1 from round 7 to round 8. This will naturally be
correlated with the number of draws from bucket 1 in round 8, but may be
a better measure of learning because it treats an increase from, say,
one to two bucket 1 choices the same as an increase from two to three.
We find that those subjects earning higher payoffs from their bucket 1
draws increased the number of their draws from bucket 1 by an average of
roughly 0.53 draws, while those earning higher payoffs from their bucket
2 draws decreased the number of their draws from bucket 1 by an average
of roughly 0.15 draws. This difference in behavior is also significant
(robust rank-order test, p [approximately equal to] 0.023).
If these changes from round 7 to round 8 are indeed due to
subjects' learning to expect one bucket to contain a higher
distribution of cards than the other, then we would expect similar
changes from round 4 to round 5. This seems to be the case. Only one
subject earned less from bucket 1 than bucket 2 in round 4; this person
chose entirely from bucket 2 in round 5. Of the 31 subjects earning more
from bucket 1 than bucket 2 in round 4 (four others only chose from one
bucket, and thus couldn't compare their payoffs across buckets),
eight had an average payoff from bucket 1 that was between 10 and 20
higher than that of bucket 2; these increased their bucket-1 choices an
average of 0.500 in round 5. The eight subjects who had a round-4
average payoff from bucket 1 that was between 20 and 30 higher increased
their bucket-1 choices an average of 0.625, and the 15 whose average
payoff from bucket 1 was more than 30 higher increased their bucket-1
choices an average of 1.000 in round 5.
If we focus on subjects who had drawn two cards from each bucket in
round 4, we see a similar result. The one subject to draw zero cards
from bucket 1 in round 5 had earned 10 fewer points from bucket 1 than
bucket 2 in round 4. Those continuing in round 5 to draw two cards from
each bucket had earned on average 20 more points from bucket 1 than
bucket 2 in round 4, those drawing three cards from bucket 1 in round 5
had earned 26.875 more points from bucket 1 than bucket 2, and those
drawing all four cards from bucket 1 in round 5 had earned 40 more
points from bucket 1 than bucket 2. Because of small subsample sizes
(especially the one person choosing zero from bucket 1 in round 5), not
all differences are significant, but the distribution of payoff
differences is higher for the subjects drawing four cards from bucket 1
than for those drawing two or three cards from bucket 1 (robust
rank-order test, p = 0.05 and p < 0.05, respectively) and higher for
those choosing three cards than for those choosing zero (Fisher exact
test, p [approximately equal to] 0.076).
However, it is not only the contrast between the +0.53 difference
and the -0.15 difference in rounds 7-8 that is of interest here. The
absolute magnitudes of the two numbers are also worth considering.
Subjects are likely to increase their draws from bucket 1 more--in
response to favorable information from that bucket (relative to the
other bucket)--than they are to increase their draws from bucket 2 in
the opposite case. This may also be a vestige from the first six rounds;
after so much experience of bucket 1 always being better, it may take
less new information to convince subjects that bucket 1 is again better
than is needed to convince them that bucket 2 is better.
6. Discussion
Our results can be summarized as follows. In the first six rounds,
bucket 1 contains a higher distribution of payoffs than bucket 2. In
rounds 1-3 and again in rounds 4-6, subjects learn quickly to choose
bucket 1 most of the time. In the last three rounds, the two buckets
contain the same distribution of payoffs. In these three rounds,
subjects choose the two buckets roughly equally. The implications of
these results for our labor market model are that, when workers'
observable characteristics are informative, though possibly noisy,
signals of their ability, employers learn this, so that the market
demand for higher ability workers increases and the demand for lower
ability workers decreases. The difference in demands should lead to a
difference in wages (though as already mentioned, our experiment looks
only at demands, not wages). When workers' observable
characteristics are unrelated (on average) to their ability, market
demands for the two types of worker stay roughly equal, so there should
be no resulting wage difference. More precisely, any observed wage
difference will be due to other factors.
The results of this experiment provide only weak evidence in favor
of our main hypothesis. We were interested in showing that experience in
an environment where one bucket yielded higher expected payoffs than the
other would carry over into an environment in which both buckets yielded
the same expected payoffs. In particular, it was expected that subjects
who learned to choose all, or nearly all, cards from bucket 1 would
continue to do so, even when bucket 1 was no better on average than
bucket 2. This did not happen in the experiment; once the identical
distributions were introduced into the buckets, the average behavior of
subjects was roughly two draws from each bucket.
The closest we found to an effect from the first six rounds
carrying over into the last three rounds was only a second-order effect.
For the most part, subjects' choices in the eighth round were
highly dependent on their seventh-round results. Those who obtained
higher payoffs from bucket 1 in round 7 were more likely to increase
their choices from bucket 1 and to choose more than two cards from
bucket 1 in round 8 (as already mentioned, these two effects are highly
correlated). On the other hand, those subjects who obtained higher
payoffs from bucket 2 in round 7 were more likely to decrease their
choices from bucket 1 and to choose fewer than two cards from bucket 1
in round 8. In addition, the former increases were larger (in absolute
terms) than the latter decreases. This provides some small evidence that
the results from earlier rounds were affecting behavior; while the
experience of Bucket 1 being better didn't result in more bucket 1
choices initially (in round 7), subjects may have had a higher
propensity to believe favorable information from bucket 1, so that
relatively high payoffs from bucket 1 seem to cause a greater change in
future bucket-1 choices than relatively low payoffs from bucket 1.
While the experimental results provide little support for the
hypothesis that wage differentials are due to persistent incorrect prior
beliefs by employers, it should be emphasized that our experiment was a
severe test of the statistical discrimination model. First, real
employers would have had much more time to form priors than the six
periods subjects had in the experiment. Second, employers would not
receive signals that the environment had changed, as our subjects did
after the third and sixth rounds. Third, the change from different
average productivities to same average productivities between the
buckets happened abruptly; in reality, average productivities (and hence
their differences) change gradually. The first two of these differences
between our experiment and employers' reality probably led to
weaker priors in rounds 4 and 7 of the experiment, and the third
difference probably led to faster updating from rounds 7 to 9. (6)
Therefore, we don't expect that this experiment will settle the
question of the causes of wage differentials across worker types;
rather, we hope that it contributes some understanding toward this issue
and that it will stimulate further research.
Appendix
General Instructions
You are about to participate in an experiment in the economics of
decision making. If you follow these instructions carefully and make
good decisions, you might earn a considerable amount of money. If you
have a question at any time, please feel free to ask the experimenter.
The Decision Task
This experimental session consists of a number of rounds. In each
round, the experimenter will carry two buckets, one tan and one green.
The buckets contain cards with numbers printed on them. You will be
asked to draw a total of four (4) cards from these two buckets. You may
choose to draw all four cards from one bucket, or you may choose to draw
cards from both buckets, as long as the total number of cards you draw
is exactly four. Your total revenue, measured in points, is the sum of
the numbers printed on the four cards. Your total cost depends on how
you choose to draw cards from the buckets:
Your Choice Total Cost
Two cards drawn from each bucket 60 points
Exactly three cards drawn from one bucket 70 points
All four cards drawn from one bucket 100 points
Some Information About the Cards
In each bucket will be fifty (50) cards, each with a number printed
on it. Different cards within each bucket will generally have different
point values. The minimum number of points on a card is 25, and the
maximum is 75. The distribution of point values in one bucket may be
different from that in the other bucket, in each bucket, the same set of
cards will be used in each round unless you are told otherwise.
Therefore, the cards you draw in early rounds will give you some
information about what cards you might draw in later rounds, unless you
are told that the cards have been changed. In each round, all players
will choose between the same buckets with the same cards in them.
Record Keeping
You have been given a record sheet with spaces to write your
choices and the resulting outcomes. In each round, circle the letters in
the Draws columns corresponding to the color of each bucket you choose
from, and below each choice, write in the number of points earned. After
you have chosen all four of your cards for the round, fill in the last
four columns.
Payments
You will each receive $2.00 for participating in and completing the
experiment. In addition, one round will be chosen at random from the
rounds that have been played, and you will earn 5 cents for each point
of profit you received in that round (100 points = $5.00). Your earnings
will be paid to you in cash at the end of the experimental session. Your
profit in a round is your total revenue minus your total cost. Different
cards have different amounts printed on them, so your profit will be
based (to some extent) on luck. However, the amounts on the cards have
been chosen so that you are guaranteed to earn either zero or positive
profit.
Table 1. Distributions of Cards Used in the Experiment
Rounds Bucket 1 Cards Bucket 2 Cards
1-3 High Low
4-6 High Low
7-9 Medium Medium
Table 2. Descriptive Statistics
Round Median Mean Standard Error
1 2 2.11 0.82
2 3 3.03 0.70
3 4 3.64 0.54
4 2 2.31 0.86
5 3 3.06 0.83
6 4 3.56 0.65
7 2 1.78 0.64
8 2 2.08 0.84
9 2 2.06 0.53
We thank John Duffy, Chinhui Juhn, Sudipta Sarangi, Dek Terrell,
and an anonymous referee for helpful comments and discussions.
(1) Altonji and Pierret (2001) attempt to measure the ability of
statistical discrimination to explain racial differences in wages. One
of their findings is that either firms do not statistically discriminate on the basis of race or there is little correlation between race and
productivity among workers.
(2) The models of Farmer and Terrell (1996) and Lewis and Terrell
(2001) add an endogenous individual-specific human-capital parameter [Z.sup.it], which affects the worker's marginal product. Because we
are not modeling workers' decisions in this context and because
workers are employed by a firm for only one period, we can set
[Z.sup.it] = 1 to obtain our simpler model.
(3) We thank the referee for suggesting this analysis.
(4) One departure of our experiment from the model is the
nonnormality of the distribution of the [epsilon]'s. Some aspects
of the nonnormality were necessary, such as the discreteness of the
distribution (an infinite number of cards would be time consuming to
prepare) and its boundedness (negative profits were avoided, as well as
potentially large positive ones). We also thought it desirable to give
all three distributions the same support, so that any inferences about
which distribution was better could be made only probabilistically. This
last desideratum is the reason for the high skewness of the high and low
distributions.
(5) Sample instructions can be found in the Appendix. Additionally,
sample record sheets and the raw data from the experiment are available
from the authors upon request. Notice that our instructions and record
sheets use context-free language. We wanted to avoid any language that
would allow subjects to figure out that the subject of this experiment
was labor-market discrimination, for fears that demand effects would be
strong.
(6) The referee has pointed out that having more similar
distributions in rounds 1-6 would have lessened this third difference,
though probably at the cost of making it more difficult for subjects to
learn in these rounds that bucket I contains higher average payoffs.
References
Altonji, Joseph G., and Rebecca M. Blank. 1999, Race and gender in
the labor market. In Handbook of labor economics, volume 3C, edited by
Orly C. Ashenfelter and David Card. Amsterdam: North-Holland, pp.
3143-259.
Altonji, Joseph G., and Charles R. Pierret. 2001. Employer learning
and statistical discrimination. Quarterly Journal of Economics 116:313-50.
Anderson, Lisa, and Charles A. Holt. 1997. Information cascades in
the laboratory. American Economic Review 87:847-62.
Arrow, Ken. 1973. The theory of discrimination. In Discrimination
in labor markets, edited by Oily C. Ashenfelter and Albert E. Rees.
Princeton, NJ: Princeton University Press, pp. 3-33.
Becker, Gary. 1972. The economics of discrimination. Chicago: The
University of Chicago Press.
Bergmann, Barbara. 1974. Occupational segregation, wages and
profits when employers discriminate by race or sex. Eastern Economic
Journal 1:103-10.
Camerer, Colin. 1995. Individual decision making. In Handbook of
experimental economics, edited by John Kagel and Alvin E. Roth.
Princeton, NJ: Princeton University Press, pp. 587-703.
Card, David, and Anne Krueger. 1992. School quality and black-white
relative earnings: A direct assessment. Quarterly Journal of Economics
107:151-200.
Farmer, Amy, and Dek Terrell. 1996. Discrimination, Bayesian
updating of employer beliefs, and human capital accumulation. Economic
Inquiry 34:204-19.
Herrnstein, Richard J., and Charles Murray. 1994. The bell curve:
Intelligence and class structure in American life. New York: Free Press.
Johnson, George E., and Frank P. Stafford. 1998. Alternative
approaches to occupational exclusion. In Women's work and wages,
edited by Anna Bugge, Christina Jonung, Knut Wicksell, and Inga Persson.
London: Routledge, pp. 72-88.
Kessel, Reuben A. 1958. Price discrimination in medicine. Journal
of Law and Economics 1:20-53.
Lewis, Danielle, and Dek Terrell. 2001. Experience, tenure, and the
perceptions of employers. Southern Economic Journal 67:578-97.
Lundberg, Shelly J., and Richard Startz. 1983. Private
discrimination and social intervention in competitive labor markets?
American Economic Review 73:340-7.
Mincer, Jacob. 1974. Schooling, experience and earnings. New York:
Columbia University Press.
Phelps, Edmund S. 1972. The statistical theory of racism and
sexism. American Economic Review 62:659-61.
Robinson, Joan. 1934. The economics of imperfect competition.
London, UK: McMillan.
Ross, Malcolm. 1948. All manner of men. New York: Reynal and
Hitchcock.
Siegel, Sidney, and N. James Castellan, Jr. 1988. Nonparametric
statistics for the behavioral sciences. New York: McGraw-Hill.
Nick Feltovich * and Chris Papageorgiou ([dagger])
* Department of Economics, University of Houston, Houston, TX
77204, USA; E-mail
[email protected].
([dagger]) Department of Economics, Louisiana State University,
Baton Rouge, LA 70803, USA; E-mail
[email protected]; corresponding author.
Received October 2002; accepted August 2003.