  Does the "blindness" of peer review influence manuscript selection efficiency?
  Piette, Michael J.
  Southern Economic Journal
  • 印刷版ISSN:0038-4038
  1994
  April
  • 语种:English
  • 出版社:Southern Economic Association
  • 关键词:Experimental design;Manuscripts;Periodicals;Research design

Does the "blindness" of peer review influence manuscript selection efficiency?

Piette, Michael J.

The results of the experiment do indicate, contrary to the expectation of some (including myself), that the refereeing process does have an effect on which papers we decide to publish. I believe it was this finding, despite the ambiguous findings regarding the nature of any biases in the decisions, which motivated the vote by our Board of Editors.

Orley Ashenfelter |1, 594~

I. Introduction

According to Ashenfelter |1~, the recent decision by the Board of Editors of the American Economic Review to adopt a double-blind review policy was based, in his option, on the findings reported by Blank |3~. In her words:

. . . there are significant differences in acceptance rates and referee ratings between single-blind and double-blind papers. Most strikingly, double-blind papers have a lower acceptance rate and lower referee evaluations. In addition, double-blind reviewing results in different patterns of acceptance rates and referee ratings by institutional rank of author |3, 1042~.

Although we find Blank's results compelling, it is difficult to draw solid conclusions from them. Perhaps this is what Ashenfelter meant when he referred to the "ambiguous" findings regarding the nature of any biases. The issue of scientific concern vis-a-vis type of review process employed by journal editors is the severity of type-I and type-II errors. Do editors employing a single-blind review process systematically publish more papers that have little or no impact on the profession and/or fail to publish more truly good papers than do editors of double-blind journals?

Blank's findings do not provide certain relevant information with respect to the type-I/type-II error problem that arguably plagues single-blind reviewing but not double-blind reviewing. One must have information regarding how the marketplace for scientific ideas responds to published papers in order to gauge the severity of both types of errors. Without knowing the fate of manuscripts rejected for publication in Blank's (or any other researcher's) sample, the severity of the type-II error problem cannot be determined. However, drawing from a large sample of papers published in the top economics journals in 1984, we are able to investigate the degree to which journals employing a single-blind review process suffer from the type-I error problem (i.e., publish papers that are revealed not to have the impact that might reasonably have been expected).

Both types of errors might characterize a single-blind review process for at least two reasons. First, a reviewer with knowledge of the author's identity might economize on his (her) refereeing costs by substituting the already-revealed value of the author's average contribution in previous papers for his evaluation/forecast of the marginal contribution contained in the paper under consideration. Second, personal characteristics of the author (gender, institutional affiliation, friendship with the reviewer, race, intellectual conformity with the reviewer, etc.) may weigh more heavily in a reviewer's evaluation of the publishability of a manuscript than the reviewer's forecast of the marginal contribution contained therein.(1)

H. Data, Methodology and Findings


We compiled detailed information on 1,051 articles (excluding comments, replies, notes and book reviews) published in 28 top economics journals in 1984.(2) Specifically, we identified citations to each article, as listed in the Social Sciences Citation Index for the five years following publication; article-specific characteristics (length in AER-equivalent pages and whether or not it was published as a lead article); characteristics of the authors (age and professional affiliation at the time the paper was published, their Ph.D.-granting institution, gender, and cumulative stock of citations (to all previous work) in the five years prior to 1984, as a proxy for author reputation or quality); characteristics of editors, co-editors and associate editors (institutional affiliations and Ph.D.-granting institutions); and the type of review process employed by each journal. Variable means and standard errors for the entire sample and the samples of double-blind and single-blind reviewed papers are reported in Table I.(3)
Table I. Means and Standard Errors for All Variables by Review Process

 Entire Double-Blind Single-Blind
Variable Sample(*) Review(*) Review(*)

Citations 1985-89 7.057 6.733 7.333
 (11.173) (10.520) (11.704)
Length 11.721 10.929 12.396
 (7.301) (5.552) (8.461)
Lead Article 0.091 0.099 0.085
 (0.288) (0.299) (0.279)
Author(s)Stock of Citations 1979-83 132.382 109.537 151.882
 (302.683) (259.188) (334.374)
Authors' Mean Age 38.221 37.982 38.425
 (7.737) (7.543) (7.900)
Review Process, Double-Blind = 1 0.461 1.000 0.000
Gender, Woman = 1 0.081 0.072 0.088
 (0.273) (0.259) (0.284)
Journal Quality Index 49.370 38.543 58.612
 (23.926) (14.028) (26.613)
N 1051 484 567

* Standard errors in parentheses.

We note, without comment at this stage, that papers published in double-blind reviewed journals are shorter and attract fewer citations, on average, than papers published in single-blind reviewed journals. In addition, the average citation stock of authors of papers published in single-blind journals is nearly 50 percent greater than for authors of papers published in journals employing a double-blind review process.


Part of the difficulty in interpreting findings that different editorial practices result in different outcomes for authors is that there is no well-articulated theory of the editorial process. We suspect that most academic scientists believe that editors should attempt to maximize the expected impact that articles published in their journals have on subsequent scientific thought within the relevant community of scholars. This implies that a scientific manuscript be evaluated solely on the basis of expected marginal contribution to scientific knowledge, not on the basis of non-substantive criteria.

On the other hand, one could well imagine a theory of the editorial process that is governed by the principle of editorial favoritism towards former and current graduate students, colleagues, faculty at the "elite" schools, etc. Indeed, charges of editorial favoritism have been raised in many a private conversation among economists. Individual scholars may entrepreneur (and, in so doing, become editors of) scientific journals as a means of maximizing their own influence within a personally-relevant community of scholars. In this world, editors selectively supply page-space in their journals to prospective authors in exchange for past, present and/or future considerations that both parties agree upon. Editors include personal well-being, as well as the value of scientific knowledge produced in their decision calculus. Manuscripts are not necessarily, or even probably, evaluated on the basis of the expected marginal scientific contributions contained therein.

Yet a third possibility, suggested by the reviewer, is that journal editors have certain idiosyncratic biases/preferences regarding what they feel are important areas of scientific investigation. Their current and former graduate students tend to work on these issues, in part because their mentors accept disproportionately papers written in their pet areas of interest.

We are troubled by the lack of any well-specified, widely-acknowledged objective function for journal editors, because this lacuna reduces our ability to evaluate differences in performance across alternative types of review process. Nonetheless, without knowing the specific objective functions maximized by the editors in our sample of journals, we assume that they act as agents of their respective communities of scholars and that these scholars want editors to function as "gatekeepers" of knowledge |4~. This assumption permits us to evaluate the desirability of review processes on the basis of observed differences with respect to type-I and/or type-II errors.

Our procedure is to examine the impact of the type of review process on citations to published articles, controlling for author, article, and journal-specific characteristics that might influence citations. This methodology permits us to evaluate whether one type of review process is superior to the other in terms of either: (1) identifying papers that will attract more citations than would be predicted by author and article-specific characteristics, or (2) failing to identify papers that will be cited less than would be predicted by author and article-specific characteristics.

One problem with our methodology is that for any of a variety of reasons a reviewer may be able to identify the author(s) of a paper (s)he has been sent to review even though the editor nominally employs a double-blind review process.(4) Thus the same biases that theoretically plague journals employing a single-blind review process may likewise plague journals employing a double-blind review process. This is true regardless of whether the reviewer is able to successfully discern the identity of the author(s). The mere fact that the reviewer substitutes personalistic criteria (whether correctly attributed or not) for a concrete evaluation of the content of the particular paper under review introduces bias into the double-blind review process. However, the greater the incidence of this sort of "contamination" of double-blind reviewing, the less real distinction there is between the two review processes vis-a-vis reviewer treatment of papers. This implies a lower likelihood of finding statistically significant differences in citations to papers published by single-blind journals versus papers published by double-blind journals.

We employed ordinary least squares regression and nonlinear regression to estimate numerous alternative specifications of the following model of the determinants of citations to an article:

|Citations.sub.i~|a.sub.0~ + |a.sub.1~|Length.sub.i~ + |a.sub.2~|Lead Article.sub.i~ + |a.sub.3~|Gender.sub.i~ + |a.sub.4~|Authors' Mean Age.sub.i~

+ |a.sub.5~|Review Process.sub.i~ + |a.sub.6~|Author(s).sub.i~ Stock of Citations 1979-83

+ |a.sub.7~|Journal Quality.sub.i~ + |e.sub.i~, (1)


|Citations.sub.i~ = citations to article i listed in the Social Sciences Citation Index from 1985-89, inclusive, but excluding self citations;

|Length.sub.i~ = length of article i in AER-equivalent sized pages,

|Lead Article.sub.i~ = 1 if article i was printed as the lead article in the journal, 0 otherwise;

|Gender.sub.i~ = 1 if the sole author of article i was female or if all of the coauthors of article i were female;

|Authors' Mean Age.sub.i~ = the actual age of the author of article i in the case of sole-authored papers and the mean age of all authors of a coauthored paper, as calculated from the 1989 American Economic Association Membership Directory;

|Review Process.sub.i~ = 1 if article i was published in a journal employing a double-blind review process, 0 otherwise;

|Author(s).sub.i~ Stock of = the cumulative citations listed for the author(s) of article i in the

Citations 1979-83 1979-83 editions of the Social Sciences Citation Index to all previous work, excluding self-citations;

|Journal Quality.sub.i~ = a normalized measure of the relative prestige of the journal in which article i was published, using citations per character, as reported by Liebowitz and Palmer |12~;

|e.sub.i~ = a random disturbance term.

Following Laband |8~, we expect signs on |a.sub.1~, |a.sub.6~, and |a.sub.7~ to be positive. More substantive scientific contributions, will plausibly require greater elucidation than less substantive contributions, with possibly diminishing effect. Thus, citations should be a positive function of article length.

Citations to a scientific paper may be influenced by the reputation of the author(s), for at least two reasons. First, readership of a paper depends on author reputation. That is, this paper would be more widely-read if George Stigler had been the author (or at least a coauthor).(5) Second, the fact that a scientist is highly-cited undoubtedly bears some positive relationship to the caliber of past contributions (s)he has made to the corpus of scientific knowledge and, moreover, to the expected caliber of future contributions.(6) For similar reasons, citations of a scientific article are likely to be influenced by journal of publication. Readership of a scholarly journal is related to the readers' expected value of articles published. Readers probably base their forecasts of the expected value of current contributions in a scholarly journal on the actual value of past contributions, as revealed by subsequent citations. Those journals that are routinely cited heavily may attract greater readership than those journals that are not.

Conventional wisdom (and the behavior of journal editors) suggests that lead articles are published in that position precisely because the editors expect these articles will have special relevance to the readership.(7) We therefore expect lead articles to be cited more than other articles; |a.sub.2~ should sign positive.(8)

We have no strong feeling a priori about the impact of authors' mean age on subsequent citations. Younger scientists typically employ state-of-the-art methods to a greater degree than do older scientists, which implies a degree of precision and rigor on the part of the former that may not characterize the work of the latter. However, older scientists are more likely than younger ones to have developed a sense of perspective that enables them to identify and tackle the truly incisive questions that other members of the profession will find relevant. This may be offset, in some measure, by the tendency of older, more established scholars to occasionally trade on their reputational capital by publishing papers that are not quite up to the authors' previous standards of excellence. Without prior insights into the strengths of these various effects, we have no expectations regarding the effect of age.

Likewise, we have no strong expectations about the predicted sign on gender. If female economists are implicitly held to a higher publication standard than male economists, their contributions to the professional journal literature should routinely attract more citations per article than those written by men, ceteris paribus, unless, of course, male scientists simply do not cite the work of female scientists at the same rate they cite the work of other male scientists. Professor Blank found no significant evidence of differences in the evaluations of submissions of male versus female authors by type of review process employed. Laband |9~ previously found no evidence of differences in citations to male-authored and female-authored scientific work.

We expect journals employing a single-blind review process to be plagued by both type-I and type-II errors to a greater extent than journals employing a double-blind review process, for reasons outlined previously. Thus, in a ceteris paribus environment, we expect articles published in the latter to attract more citations than articles published in the former. The sign on |a.sub.5~ should therefore be positive.

Both the distribution of citations to articles and the distribution of citations to authors are non-normal. The vast majority of scholarly papers in economics are cited infrequently, if at all |8~; this finding holds across scientific disciplines generally |6; 7~. A relatively small number of papers and authors are truly influential; most offer marginal contributions to the stock of scientific knowledge. To account for this acknowledged skewness in the distribution of citations, we:

(a) logged the citations variables on both sides of equation (1) and conducted an ordinary least squares regression analysis;(9) and

(b) estimated equation (1) using the ordered probit nonlinear regression methodology (our software package was LIMDEP).(10) Our dependent variable, citations in 1985-89 to a TABULAR DATA OMITTED paper published in 1984, was distributed in such a manner that quintiles were easy to identify. Finer analysis by decile was impossible because some 21 percent of all papers received no citations at all during the 1985-89 period.


Following Leamer |10; 11~, numerous alternative specifications of equation (1) were estimated, including models with linear and squared terms of several variables and models that included a variety of interaction terms. Table II reports the OLS and Ordered Probit estimation results; standard errors of the coefficient estimates are reported in parentheses. Table III reports the estimated marginal impact of each explanatory variable on the probability of a paper falling into a specific quintile of the citations distribution, other than the lowest quintile (zero citations).

In contrast to the lower mean citations of papers reviewed double-blind, reported in Table I, articles published in journals using the double-blind review process attract more citations than TABULAR DATA OMITTED those published in journals employing a single-blind review process, controlling for the other reported attributes. As expected, article length, author reputation and relative quality of publishing journal all demonstrate positive and statistically significant explanatory power with respect to subsequent citation of an article. We found no gender-based differences in citations to the articles in our sample, whether defining female authorship as at least one woman on a coauthored paper or all women on a coauthored paper. These results were consistent across all of our regression and ordered probit estimations.(11)

Citations to an article are inversely related to the mean age of the authors. In separate, unreported regressions, we included an interaction term between Mean Age and Authors' Stock of Citations. With the inclusion of this variable the coefficient estimate of Mean Age is statistically insignificantly different from zero, while the coefficient estimate of the interaction term is negative and statistically significant. This suggests that the negative impact of Mean Age on citations derives from more heavily cited scholars.


What can we conclude about the impact of the review process? First, papers published by single-blind journals are, on average, better papers than those published in double-blind journals. The former attract nearly 10 percent more citations per paper than the latter. In part, at least, this is due to the fact that the single-blind journals seem to attract submissions from more accomplished authors, judging by the difference in mean citation stocks of authors publishing in double-blind versus single-blind journals.

These differences notwithstanding, estimated citations to papers refereed under a double-blind review process exceed those of papers refereed under a single-blind review process, given the author, article and journal characteristics we are able to control for in each case. The coefficient TABULAR DATA OMITTED estimate reveals what the residual impact of the review process is on citations. The double-blind review process generates statistically significantly more citations per article than predicted by author, article and journal characteristics as compared to the residual citations received by articles receiving single-blind reviews.

To investigate the size of the impact of double-blind reviewing, we split our sample by review process and estimated equation (1) separately for the two samples. We then calculated the predicted mean citations of the single-blind reviewed papers, using the coefficient estimates of the double-blind sample. That is, we assigned each explanatory variable a value equal to the mean value for the single-blind sample and calculated the predicted value of logged citations using the coefficient estimates of the double-blind sample. We then compared the predicted mean value of logged citations for papers having the same characteristics as our sample of single-blind reviewed papers had they been reviewed double-blind against the actual mean citations for the single-blind sample. Actual mean logged citations for the set of single-blind reviewed papers was 0.8653. If papers with the same characteristics were reviewed double-blind, the predicted mean citations would equal 0.9194. The estimated impact of double-blind reviewing is to increase predicted citations of otherwise identical papers by 5.6 percent. By the same token, if papers with the same characteristics as the double-blind sample were reviewed single-blind, they would receive roughly 18 percent fewer citations than the double-blind reviewed papers actually received (on average).

The results reported in Table IV indicate that articles published by journals employing a double-blind review process systematically attract more citations than would be predicted on the basis of author characteristics, length, prestige of journal, etc., while articles published in journals employing a single-blind review process attract fewer citations than would be predicted on the basis of those same characteristics. These findings suggest that under a double-blind review process, where referees' lack of knowledge about the author(s) makes substitution of expected average caliber of contribution for expected marginal contribution very difficult, submissions are judged on their own merit and referees do a good job of picking high-quality papers for publication. By contrast, some substitution of expected average impact for expected marginal impact evidently does occur under a single-blind review process. Indeed, the proportion of papers accepted for publication that are characterized by marginal contributions that fall below expected average contributions must swamp the proportion accepted for publication whose marginal contribution equals or exceeds the expected average contribution.

III. Concluding Comments

Our findings indicate that the double-blind review process outperforms the single-blind review process. Specifically, we found that papers with the characteristics of the single-blind reviewed papers in our sample would receive 5.6 percent more logged citations if reviewed double-blind, while papers with the characteristics of the double-blind reviewed papers in our sample would receive nearly 18 percent fewer logged citations if reviewed single-blind. These findings suggest a specific interpretation of Professor Blank's finding that referees' evaluations, and acceptance rates, of manuscripts reviewed double-blind are lower than those of manuscripts reviewed single-blind. The single-blind review process apparently suffers from a type-I error bias to a greater extent than the double-blind review process.

We emphasize that the only impact of double-blind refereeing is on outside referee reports, not on editors' decisions. However, there is some evidence that editors rely heavily upon referee reports in their decision-to-publish calculus. One interpretation of our results, which is also consistent with Professor Blank's findings, is that editors really do use the information contained in outside reviews, and "knowing the editor," by itself, usually is not enough to overcome a set of bad, or even lukewarm, referee reports.

Why aren't journal editors stampeding to adopt double-blind review? We suspect the answer has something to do with the costs associated each type of review process. Holding caliber and timeliness of reviews constant, it may be cheaper for editors to secure refereeing services from the desired number of individuals through use of a single-blind review process than a double-blind process. The loss implied by publication of occasional bad papers may be more than made up for by the cost savings in contracting with referees.

Journals employing single-blind review may have significantly faster review times than journals employing double-blind review. To the extent processing speed matters to authors, authors with significant research findings will prefer to submit their papers to single-blind reviewed journals, to help ensure speedy definition of intellectual property rights. Even allowing for the type-I error problem, such a journal may, on balance, publish better papers than one employing a double-blind review process, by virtue of having attracted better papers there in the first place. In a world in which intellectual property rights determine recipients of rewards (such as Nobel prizes), competitive journal editors may find it impossible to ignore the speed of handling margin.

We obviously do not know whether the single-blind review process is associated with significantly faster reviews than the double-blind process. There are plausible reasons to think that such might be the case. Whether or not speed of handling is what attracts the most capable of scholars to consistently submit their work to single-blind reviewed journals is not a question we are able to answer at the present time. Nor need we do so. The point is, something associated with journals that employ the single-blind review process apparently does attract the best scholars. The mean citation stock of authors of papers published in single-blind refereed journals was some 50 percent greater than the mean citation stock of authors of papers published in double-blind refereed journals. Although these data for published papers do not reveal information with respect to submissions, they are suggestive. While residual citations to a (rare) bad paper written by a Nobel laureate would indeed be negative, even their bad papers probably have a greater total impact than the avenge economist's (even rarer) good papers, which would be characterized by positive residuals. To the extent submissions of the highest-caliber economists are skewed in favor of journals employing the single-blind review process, for reasons that are beyond the purview of this paper, these journals may actually outperform journals employing the double-blind review process, in terms of impact on the profession. It seems unlikely that the noted difference in mean stocks of authors' citations that favors single-blind reviewed journals over double-blind reviewed journals derives specifically from the review process employed. Our findings in this regard may: (1) be coincidental, (2) result from historical accident, and/or (3) be sensitive to the journals included in our sample or to the specific years covered by our data.

1. Beyer |2, 75~ described the potential harm resulting from reviewers' use of authors' personal characteristics vis-a-vis publishability of manuscripts:

. . . any factors that increase the probability of particularistic decisions or increase their consequences are not likely to benefit the majority of scientists. A relatively small proportion of such decisions spread over time may serve to give some groups and individuals substantial cumulative advantage, because publication itself is convertible into the scarce "evidence" of competence that makes future selection for further advantage then based upon competence, and therefore universalistic. Thus, a particularistic advantage can soon be transformed into a universalistic one.

2. These journals are: American Economic Review, American Journal of Agricultural Economics, Brookings Papers on Economic Activity, Canadian Journal of Economics, Econometrica, Economic Inquiry, Economic Journal, Economica, International Economic Review, Journal of Econometrics, Journal of Economic Literature, Journal of Finance, Journal of Financial Economics, Journal of Human Resources, Journal of International Economics, Journal of Law and Economics, Journal of Mathematical Economics, Journal of Monetary Economics, Journal of Money, Credit and Banking, Journal of Political Economy, Journal of Public Economics, National Tax Journal, Quarterly Journal of Economics, Rand Journal of Economics, Review of Economic Studies, Review of Economics and Statistics, Scandinavian Journal of Economics, and Southern Economic Journal.

3. Our sample of 1,051 articles consists only of those articles for whom complete information on all variables was available. Thus, for example, if we were unable to locate an author in the AEA Directory in an effort to obtain age and affiliation information, the article (co)written by the individual was dropped from the data set. We concede that this may ultimately bias our results one way or another, but it is unclear a priori in which direction any such biases, if present, would run. The total number of full articles published in the 28 journals in 1984 was 1490.

4. For example, in the case of our paper, although the Southern Economic Journal employs a double-blind review process, the reviewer was able to discern the identity of one of the authors.

5. The observation that the professional reward/recognition system in science is self-reinforcing (i.e., past recognition influences current recognition and past lack of recognition begets current lack of recognition) has been remarked upon by a number of authors. See, especially, Merton |14~.

6. We acknowledge the possibility of additional impacts of reputation on citations, such as signalling by citing scholars. However, even though one occasionally hears of this sort of thing, there is no concrete evidence of the extent to which young scientists engage in this practice |5~. We believe that citations to a particular paper are influenced by an author's reputation because of time-independent differences with respect to the quality of contributions made by different scientists. Baumol, Mincer, Becker, and other top economists routinely advance the frontiers of knowledge more than most of us do. We can with confidence predict that future contributions will follow this same pattern.

7. For example, the editors of Economic Inquiry published "Economical Writing," as a lead article in 1985, and continue to emphasize their commitment to the message contained therein in their style guidelines to authors. Several top journals routinely publish Nobel lectures and presidential addresses as lead articles.

8. An anonymous referee suggested that the greater visibility of lead articles may lead to their being cited more than non-lead articles, irrespective of any implied qualitative judgements by the editor(s). We agree.

9. Since the log of zero is undefined, and we were reluctant to throw away the information derived from the 21 percent of our sample with article citations and/or authors' stock of citations equal to zero, we generated new variables for article citations and authors' stock of citations by adding one citation to the actual numbers for those two variables. We used the incremented citations variables in our regression analyses.

10. A detailed discussion of the ordered probit technique is presented by Maddala |13~.

11. We investigated the possibility that the impact of some or all of our control variables differs by type of review process employed, by estimating separate regressions for the single-blind and the double-blind papers. Although large differences in coefficient estimates on certain variables were apparent, the only statistically significant difference, as determined by estimating equation (1) with review process interaction terms to all other explanatory variables, was on journal quality. The estimated impact of journal quality on citations was approximately twice as great for articles reviewed double-blind as for those reviewed single-blind. These results are available upon request.


