文章基本信息

标题：The evolution of city population density in the United States.
作者：Bryan, Kevin A. ; Minton, Brian D. ; Sarte, Pierre-Daniel G. 等
期刊名称：Economic Quarterly
印刷版ISSN：1069-7225
出版年度：2007
期号：September
语种：English
出版社：Federal Reserve Bank of Richmond
摘要：Chatterjee and Carlino (2001) offer an insightful example as to why density can be more important than population size. They note that though Nebraska and San Francisco have the same population, urban interactions occur far less frequently in Nebraska because of its much larger area. Though the differences in the area of various cities are not quite so stark, there are meaningful heterogeneities in city densities. Given the importance of urban density, the stylized facts presented in the article ultimately require explanations such as those given for the evolution of city population.
关键词：Population density;Urban economics

The evolution of city population density in the United States.

Bryan, Kevin A. ; Minton, Brian D. ; Sarte, Pierre-Daniel G. 等

The answers to important questions in urban economics depend on the density of population, not the size of population. In particular, positive production or residential externalities, as well as negative externalities such as congestion, are typically modeled as a function of density (Chatterjee and Carlino 2001, Lucas and Rossi-Hansberg 2002). The speed with which new knowledge and production techniques propagate, the gain in property values from the construction of urban public works, and the level of labor productivity are all affected by density (Carlino, Chatterjee, and Hunt 2006, Ciccone and Hall 1996). Nonetheless, properties of the distribution of urban population size have been studied far more than properties of the urban density distribution.

Chatterjee and Carlino (2001) offer an insightful example as to why density can be more important than population size. They note that though Nebraska and San Francisco have the same population, urban interactions occur far less frequently in Nebraska because of its much larger area. Though the differences in the area of various cities are not quite so stark, there are meaningful heterogeneities in city densities. Given the importance of urban density, the stylized facts presented in the article ultimately require explanations such as those given for the evolution of city population.

This article makes two major contributions concerning urban density. First, we construct an electronic database containing land area, population, and urban density for every city with population greater than 25,000 in the United States. Second, we document a number of stylized facts about the urban density distribution by constructing nonparametric estimates of the distribution of city densities over time and across regions.

We compile data for each decade from 1940 to 2000; by 2000, 1,507 cities meet the 25,000 threshold. In addition, we include those statistics for every "urbanized area" in the United States, decennially from 1950 to 2000. Though we also present data on Metropolitan Statistical Area (MSA) density evolution from 1950 to 1980, this definition of a city can be problematic for work with densities. A discussion of the inherent problems with using MSA data is found in Section 1. To the best of our knowledge, these data have not been previously collected in an electronic format.

Our findings document that the distribution of city densities in the United States has shifted leftward since 1940; that is, cities are becoming less dense. This shift is not confined to any particular decade. It is evident across regions, and it is driven both by new cities incorporating with lower densities, and by old cities adding land faster than they add population. The shift is seen among several different definitions of cities. A particularly surprising result is that "legal cities," defined in this article as regions controlled by a local government, have greatly decreased in density during the period studied. That is, since 1940, local governments have been annexing territory fast enough to counteract the increase in urban population. Annexation is the only way that cities can simultaneously have increasing population, which is true of the vast majority of cities in our sample, and yet still have decreasing density.

This article is organized as follows. Section 1 describes how our database was constructed, and also discusses which definition of city is most appropriate in different contexts. Section 2 discusses our use of nonparametric techniques to estimate the distribution of urban density. Section 3 presents our results and discusses why cities might be decreasing in density. Section 4 concludes.

1. DATA

What is a city? There are at least three well-defined concepts of a city boundary in the United States that a researcher might use: the legal boundary of the city, the boundary of the built-up, urban region around a central city (an "urbanized area"), and the boundary of a census-defined Metropolitan Statistical Area (MSA). The legal boundary of a city is perhaps most relevant when investigating the area that state and local governments believe can be covered effectively with a single government. Legal boundaries also have the advantage of a consistent definition over the period studied; this is not completely true for urbanized areas, and even less true for MSAs. Urbanized areas parallel nicely with an economist's mental image of an agglomeration, as they include the built-up suburban areas around a central city. MSAs, though commonly used in the population literature, offer a much vaguer interpretation. Figure 1 displays the city, urbanized area, and MSA boundaries for Richmond, Virginia, and Las Vegas, Nevada, in the year 2000.

Our database of legal cities is constructed from the decennial U.S. Bureau of the Census Number of Inhabitants, which is published two to three years after each census is taken. Population and land area for every U.S. "place" with a population greater than 2,500 are listed. Places include cities, towns, villages, urban townships, and census-designated places (CDPs). Cities, towns, and townships are legally defined places containing some form of local government, while a census-designated place (called an "unincorporated place" before 1980) refers to unincorporated areas with a "settled concentration of population." Some of these CDPs can be quite large; for instance, unincorporated Metairie, Louisiana has a population of nearly 150,000 in 2000. Though CDPs do not represent any legal entity, they are nonetheless defined in line with settlement patterns determined after census consultation with state and local officials, and are similar in size and density to incorporated cities. (1) Including CDPs in our database, and not simply incorporated cities, is particularly important as some states only have CDPs (such as Hawaii), and "towns" in eight states, including all of New England, are only counted as a place when they appear as a CDP.

From this list, we selected every place (including CDPs) with a population greater than 25,000 for each census from 1940 to 2000. There are 412 places in 1940 and 1,507 places in 2000 that meet this restriction. Each place was coded into one of nine geographical regions in line with the standard census region definition. (2) We also labeled each place as either "new" or "old." An old place is a place that had a population greater than 25,000 in 1940 and still has a population greater than 25,000 in 2000. A new place is one that had a population less than 25,000 or did not exist at all in 1940, yet has a population greater than 25,000 in 2000. There are some places which had a population greater than 25,000 in 1940 but less than 25,000 in 2000 (for instance, a number of Rust Belt cities with declining populations); we considered these places neither new or old. Delineating places in this manner allows us to investigate whether the leftward shift of the distribution of U.S. cities was driven by newly founded cities having a larger area, or by old cities annexing area faster than their population increases.

[FIGURE 1 OMITTED]

In addition to legal cities, we also construct a series of urbanized areas from the Number of Inhabitants publication. Beginning in 1950, the U.S. Census defined urbanized areas as places with a population of 50,000 or more, meeting a minimum density requirement, plus an "urban fringe" consisting of places roughly contiguous with the central city meeting a small population requirement; as such, urbanized areas are defined in a similar way as agglomerations in many economic models. Aside from 1960, when the density requirement for central cities was lowered from approximately 2,000 people per square mile to 1,000 per square mile, changes in the definition of an urbanized area have been minor. (3) Our database includes each urbanized area from 1950 to 2000; there were 157 such areas in 1950 and 452 in 2000.

Much of the literature on city population uses data on Metropolitan Statistical Areas (MSAs). An MSA is defined as a central urban city, the county containing that city, and outlying counties that meet certain requirements concerning population density and the number of residents who commute to the central city for work. (4) We believe there are a number of reasons that this data can be problematic for investigating city density. First, it is difficult to get consistent data on metro areas. Before 1950, they were not defined at all, though Bogue (1953) constructed a series of MSA populations for 1900-1940 by adding up the population within the area of each MSA as defined in 1950. Because, by definition, Bogue holds MSA area constant for 1900-1950, this data set would not pick up any changes in density caused by the changing area of a city over time. Furthermore, there was a significant change in how MSAs are defined in 1983, with the addition of the "Consolidated Metropolitan Statistical Area" (CMSA). Because of this, MSAs between 1980 and 1990 are not comparable. Dobkins and Ioannides (2000) construct MSAs for 1990 using the 1980 definition, but no such series has been constructed for 2000.

Second, the delineation of MSAs is highly dependent on county definitions. Particularly in the West, counties are often much larger than in the Midwest and the East. For instance, in 1980, the Riverside-San Bernardino-Ontario, California MSA had an area of 27,279 square miles and a population density of 57 people per square mile. (5) This MSA has an area three times the size of and a lower population density than Vermont. (6) When looking solely at population, MSAs can still be useful because the population in outlying rural areas tends to be negligible; this is not the case with area, and therefore density.

Third, the number of MSAs is problematic in that it truncates the number of available cities such that only the far right-hand tail of the population distribution is included. For instance, Dobkins and Ioannides' (2000) MSA database includes only 162 cities in 1950, rising to 334 by 1990. For cities and census-designated places, three to four times as much data can be used. Eeckhout (2004) notes that the distribution of urban population size is completely different when using a full data set versus a truncated selection that includes only MSAs; it seems reasonable to believe that urban density might be similar in this regard. Further, nonparametric density estimation, as used in this article, requires a large data set. For completeness, we show in Section 3 that the distribution of densities in MSAs from 1950 to 1980, when the MSA definition was roughly consistent, follows a similar pattern to that of urbanized areas and legal cities.

Other than the database used in this article, we know of no other complete panel data set of urban density for U.S. cities. For 1990 and 2000, a full listing of places with area and population is available online as part of the U.S. Census Gazetteer. (7) The County and City Data Books, hosted by the University of Virginia, Geospatial and Statistical Data Center, hold population and area data for 1930, 1940, 1950, 1960, and 1975; these data were entered by hand during the 1970s from the same census books we used. (8) However, crosschecking this data with the actual census publications revealed a number of minor errors, and further indicated that unincorporated places and urban towns were not included. For some states (for instance, Connecticut and Maryland), this means that very few places were included in the data set at all. Our data set rectifies these omissions.

2. NONPARAMETRIC ESTIMATION

With these density data, we estimate changes in the probability density function (pdf) over time for each definition of a city in order to examine, for instance, how the distribution of urban densities is changing over time. We use nonparametric techniques, rather than parametric estimation, because nonparametric estimators make no underlying assumption about the distribution of the data (for instance, the presence or lack of normality). Assuming, for instance, an underlying normal distribution might mask evidence of a true bimodal distribution, and given our lack of priors concerning the distribution of urban densities, nonparametric estimates offer more flexibility. Potential pitfalls in nonparametric estimation are the requirement of larger data sets, and the computational difficulty of calculating pdf estimates with more than two or three variables; (9) however, our data sets are large and our estimated pdfs are univariate. Nonparametric estimates of a pdf are closely related to the histogram; a description of this link, and basic nonparametric concepts, is given in Appendix A.

One frequently used nonparametric pdf estimator is the Rosenblatt-Parzen estimator,

[^.f](x) = [1/nh] [n.summation over (i=1)] K ([[psi].sub.i]),

where n is the number of observations, h is a "smoothing factor" to be chosen below, [[psi].sub.i] = [x-[x.sub.i]]/h, and K is a nonparametric kernel. The smoothing factor determines the interval of points around x which are used to compute [^.f](x), and the kernel determines the manner in which an estimator weighs those points. For instance, a uniform kernel would weigh all points in the interval equally.

In practice, the choice of kernel is relatively unimportant. In this article, we use one of the more common kernels, namely the Gaussian kernel,

K ([[psi].sub.i]) = (2[pi])[.sup.-.5][e.sup.-[[[psi].sub.i.sup.2]/2]].

This kernel uses a weighted average of all observations, with weights declining in the distance of each observation from [x.sub.i].

The choice of bandwidth h, on the other hand, can be important, and is often chosen so as to minimize an error function of bias and variance. Given a set of assumptions about the nature of f(x), the Rosenblatt-Parzen estimator [^.f](x) is such that (10)

Bias = [[h.sup.2]/2][[integral][[psi].sub.2] K([psi])d[psi]]f"(x) + O([h.sup.2]) (1)

and

Variance = [1/nh]f(x)[integral][K.sub.2] [integral] [K.sup.2]([psi])d[psi] + O([1/nh]). (2)

A low bandwidth, h, gives low bias but high variance, whereas a high h will give high bias but low variance. That is, choosing too small of a value for h will cause the estimated density to lack smoothness since not enough sample points will be used to calculate each [^.f]([x.sub.i]), whereas too high a value for h will smooth out even relevant bumps such as the trough in a bimodal distribution. A description of the assumptions necessary for our bias and variance formulas can be found in Appendix B.

The integrated mean squared error is defined as

[integral][Bias([^.f](x))[.sup.2] + V([^.f](x))]dx. (3)

This function simultaneously accounts for bias and variance. It is analogous to the conventional mean squared error in a parametric estimation. When h is chosen to minimize (3) after substituting for the bias and variance using expressions (1) and (2) respectively, we obtain

h = [cn.sup.-[1/5]] where c = [[[integral] [K.sup.2] ([psi])d[psi]]/[[[integral] [[psi].sup.2] K ([psi])d [psi]][.sup.2] [integral] (f"(x))[.sup.2]dx]][.sup.[1/5]].

Since f (x) is unknown, and the formula for h involves knowing the true f" (x), no more can be said about h without making some assumptions about the nature of f (x). For example, if f (x) ~ N([mu], [[sigma].sup.2]), then c = 1.06[^.[sigma]], and therefore h = 1.06[^.[sigma]] [n.sup.-[1/5]] exactly. (11) This formula is called Silverman's Rule of Thumb, and works very well for data that is approximately normally distributed (Silverman 1986). Silverman notes that this rule does not necessarily work well for bimodal or heavily skewed data, and some of the series in this article (for instance, city populations) are heavily skewed. In particular, outliers lead to large increases in the estimated standard deviation, [^.[sigma]], and therefore a very large value for h. Consequently, this article instead uses Silverman's more general specification

h = .9 B[n.sup.-[1/5]]

given

B = min ([^.[sigma]], [IQR/1.34]),

where IQR is the interquartile range of sample data. This formula is much less sensitive to outliers than the Rule of Thumb. In practice, this has shown to be nearly optimal for somewhat skewed data.

3. RESULTS

Using the kernel and smoothing parameter from the previous section, we can construct estimates of the pdf of the distribution of population, area, and urban density in each decade.

Figure 2 shows nonparametric estimations of the distributions of population size, area, and density for legal cities as defined in Section 1. Panel C shows a leftward shift of the distribution of city densities; that is, cities in 2000 are significantly less dense than in 1940. The mean population per square mile during that period fell from 6,742 to 3,802. This is being driven principally by an increase in the area of each city; mean area has increased from 19.2 square miles to 35.1 square miles between 1940 and 2000. The distribution of populations has remained relatively constant during this period.

One might imagine that this shift is being driven only by a subset of cities, such as rapidly-growing suburban and exurban cities, or cities in the West where land is less scarce. Hence, we divide cities into "new" and "old," as defined in Section 1, as well as categorize each city into one of four regions: East, South, Midwest, and West. Figure 3 shows that the leftward shift in distribution is similar among both old and new cities; that is, city density is decreasing both because existing cities are annexing additional area, and because new cities have lower initial densities than in the past. The number of cities that change their legal boundaries in a given decade is surprising; for instance, between 1990 and 2000, nearly 36 percent of the cities in our data set added or lost at least one square mile. These changes vary enormously by state, however, in a state such as Massachusetts, where all of the land has been divided into towns for decades, there is very little opportunity for a city to add territory. Alternatively, in a state such as Oregon where the majority of land is unincorporated, annexation is much more common. Might it then be the case that the shift in city density is specific to the Midwest and West, where annexation is frequent?

In fact, the leftward shift in city density does not appear to be a regional phenomenon. Figure 4 shows the distribution of densities in the East, South, Midwest, and West during the period 1940-2000. Each region showed a similar decline in density. The full distribution of log density from the Rosenblatt-Parzen estimator is particularly useful when examining the relatively small number of cities in each region when compared to a simple table of moments, as extreme outliers in the data can result in high skewness. For instance, Juneau, Alaska, had an area of 2,716 square miles and a population of 30,711 in 2000, giving a density of approximately 11 people per square mile.

[FIGURE 2 OMITTED]

The trend in density is even clearer if we look at urbanized areas. Urbanized areas can be reasonably thought of as urban agglomerations; they represent the built-up area surrounding a central city. Figure 5 shows the estimated distribution of urbanized areas in 1960, 1980, and 2000. As in the case of legal cities, there has been a clear decrease in the density of urbanized areas during this period. Because the boundaries of urbanized areas and legal cities are quite different, it is rather striking that, under both definitions, the decrease in density has been so evident. That is, cities have not simply expanded into a mass of lower-density suburbs, but the individual cities and suburbs themselves have decreased in density, primarily by annexing land.

Finally, we consider the density of Metropolitan Statistical Areas. As noted in Section 1, there are only consistently defined MSA data available for the period 1950-1980. Furthermore, a decrease in the distribution of MS A density might simply reflect the increase in the number of MSAs in states with large counties, since each MSA by definition includes its own county. The urban economics literature concerning population size, however, often uses MSAs. Figure 6 shows that the distribution of MSA population density also appears to be shifting leftward in the same manner as legal cities and urbanized areas, but again, it is hazardous to give any interpretation to this shift. The definitional advantages and large data sample size for urbanized areas and legal cities potentially makes them preferable to MSAs for future work concerning urban density.

[FIGURE 3 OMITTED]

The importance of these shifts in urban density is underscored by the long-understood link between density and economic prosperity. Lucas (1988) cites approvingly Jane Jacobs' contention that dense cities, not simply cities, are the economic "nucleus of an atom," the central building block of development through their role in spurring human capital transfers. Ciccone and Hall (1996), using county-level data, find that a doubling of employment density in a county increases labor productivity by 6 percent. In addition to knowledge transfer, agglomerations arise in order to facilitate effective matches between employer and employee and to take advantage of external economies of scale such as a common deepwater port.

[FIGURE 4 OMITTED]

Measuring the nature of local knowledge transfer, and in particular whether the relevant area has expanded as transportation and communication technologies have fallen, is difficult. Jaffe, Trajtenberg, and Henderson (1993) find evidence that, given the existing distribution of industries and research activity, new patents tend to cite existing patents from the same state and MSA at an unexpectedly high level. Using data on the urbanized portion of a metropolitan area, Carlino, Chatterjee, and Hunt (2006) find that patents per capita rise 20 percent as the employment density of a city doubles. They also find that the benefits of density are diminishing over density, so that cities with employment densities similar to Philadelphia and Baltimore, around 2,100 jobs per square mile, are optimal.

Given the economic benefits of density, the changes in the urban density distribution presented in this article suggest two questions. First, why have agglomeration densities decreased? Second, why have the areas of legal jurisdictions increased?

[FIGURE 5 OMITTED]

Decreased densities in urban areas have been explained by a number of processes in the literature, including federal mortgage insurance, the Interstate Highway System, racial tension, and schooling considerations. Mieszkowski and Mills (1993) counter that these explanations tend to be both unique to the United States and are phenomena of the postwar period, whereas a decrease in urban density began as early as 1900 and has occurred across the developed world. Two theories remain.

First, the decreased transportation costs brought about by the automobile and the streetcar has allowed congestion in central cities to be avoided by firms and consumers. Glaeser and Kahn (2003) point out that the automobile also has a supply-side effect in that it allows factories and other places of work to decentralize by eliminating the economies of scale seen with barges and railroads; the rail industry was three times larger than trucking in 1947, but trucks now carry 86 percent of all commodities in the United States. Whereas the wealthy in the nineteenth century might have preferred to live in the center of a city while the poor were forced to walk from the outskirts, the modern well-to-do are less constrained by transport times and, therefore, occupy land in less-dense suburban and exurban cities.

[FIGURE 6 OMITTED]

Rossi-Hansberg, Sarte, and Owens (2005) present a model in which firms set up non-integrated operations such that managers work in cities in order to take advantage of knowledge transfer externalities but production workers tend to work at the periphery of a city where land costs are lower. They then show that, as city population grows, the internal structure of cities changes along a number of dimensions that are consistent with the data.

A second theory, not entirely independent from the first, posits that cities have become less dense because of a desire for homogenization. When a large group with relatively homogenous preferences for tax rates and school quality is able to occupy its own jurisdiction, it can use land-use controls to segregate itself from potential residents with a different set of preferences. Mieszkowski and Mills (1993) argue that land-use restrictions have become more stringent in the postwar era, and that segregation into income-homogenous areas may be contributing to decreased densities.

There are fewer existent theories about why legal jurisdictions, at a given population level, have increased in area. Glaeser and Kahn (2003) note that effective land use requires larger jurisdictions as transportation costs fall. That is, if a city wished to limit sprawl in an era with high transportation costs, it could enact effective land-use regulations within small city boundaries. In an era with low transportation costs, however, such a regulation would simply push residents into another bedroom community and have no effect on sprawl or traffic. The growing number of regional land-use planning commissions, such as Portland's Metropolitan Service District and Atlanta's Regional Commission, speak to this trend (Song and Knaap 2004).

Austin (1999) discusses reasons why cities may want to annex territory, including controlling development on the urban fringe, increasing the tax base, lowering the cost of municipal services, lowering municipal service costs by exploiting returns to scale, or altering the characteristics of the city, such as decreasing the minority proportion of population. External areas may wish to be annexed because of urban economies of scale, and because urban areas offer benefits such as cheaper bond issuance than suburban and unincorporated areas. Austin finds evidence that cities annex for both political and economic reasons, but that increasing the tax base does not appear to be a relevant factor, perhaps because of the growing ability of high-wealth areas to avoid annexation by poorer cities.

4. CONCLUDING REMARKS

This article provides two novel contributions. First, it constructs an electronic data set of urban densities in the United States during the previous seven decades for three different definitions of a city. Second, it applies non-parametric techniques to estimate the distribution of those densities, and finds that there has been a stark decrease in density during the period studied. This deconcentration has been occurring continuously since at least 1940, in every area of the United States, and among both new and old cities. This result is striking; increasing population and increasing area across cities do not, by themselves, tell us what will happen to density.

Falling urban densities suggest that, over the past seven decades, the productivity benefits of dense cities have been weakening. Decreasing costs of transportation and communication have allowed firms to move production workers out of high-rent areas, and have allowed residents to move away from downtowns. It is unclear what effect these changes in the urban landscape will have on knowledge accumulation and growth in the future. For instance, it is conceivable that the productivity loss from ever-decreasing spatial density might be counteracted by decreased long-range communication costs. Understanding the broad properties of urban density in modern economies is merely a necessary first step in understanding how these changing properties of cities will affect the broader economy.

APPENDIX A: NONPARAMETRIC ESTIMATORS

Classical density estimation assumes a parametric form for a data set and uses sample data to estimate those parameters. For instance, if an underlying process is assumed to generate normal data, the estimated density is

[1/[[sigma][square root of (2[pi])]]][e.sup.[[-(x-u)[.sup.2]]/[2.sub.[[sigma].sup.2]]]].

where [sigma] and [mu] are the sample standard deviation and mean.

Nonparametric density estimation, on the other hand, allows a researcher to estimate a complete density function from sample data, and therefore estimate each moment of that data, without assuming any underlying functional form. For instance, if a given distribution is bimodal, estimating moments under the assumption of normally distributed data will be misleading. Knowing the full distribution of data also makes clear what stylized facts need to be explained in theory; if the data were skewed heavily to the right and suffered from leptokurtosis, a theory explaining that data should be able to replicate these properties. Nonparametric estimation generally requires a larger data set than parametric estimation to achieve consistency, but is becoming more common in the literature. Given that our city data set is large, we use nonparametric techniques in this article. A brief introduction to these techniques can be found in Greene (2003), while a more complete treatment is found in Pagan and Ullah (1999).

At its core, a nonparametric density estimate is simply a smoothed histogram. Therefore, the nonparametric estimator can be motivated by beginning with a histogram. In a histogram, the full range of n sample values is partitioned into non-overlapping bins of equal width h. Each bin has a height equal to the number of sample observations within the range of that bin divided by the total number of observations. Given an indicator function I(A), defined as equal to 1 if the statement A is true, and 0 if the statement A is false, the height of a bin centered at some point [x.sub.0], with width h, is

H ([x.sub.0]) = [1/n] [n.summation over (i=1)] I([x.sub.0] - [h/2] < [x.sub.i] [less than or equal to] [x.sub.0] + [h/2]).

That is, we are simply counting the number of sample observations in each bin of width h, and dividing that frequency by the sample size; the resulting height of each bin is the relative frequency. If there are 40 observations, of which 10 are in the bin (1,2], with h = 1, then the histogram has height H(1.5) = .25 for all x in (1,2].

This concept can be extended by computing a "local" histogram for each point x in the range ([x.sub.min] - [h/2], [x.sub.max] + [h/2]], where [x.sub.min] and [x.sub.max] are the minimum and maximum values in the sample data. (12) In the histogram above, we computed H([x.sub.0]) for only h points in the range; [x.sub.0] was required to be the midpoint of a bin. The local histogram will instead calculate [^.f](x) for every x in ([x.sub.min] - [h/2], [x.sub.max] + [h/2]), where [^.f](x) evaluated at a given point [x.sub.0] is equal to the number of sample observations within ([x.sub.0] - [h/2], [x.sub.0] + [h/2]), divided by n to give a frequency. (13) That is,

[^.f](x) = [1/n] [n.summation over (i=1)] I(x - [h/2] < [x.sub.i] < x + [h/2])

= [1/n] [n.summation over (i=1)] I(|[x-[s.sub.i]]/h| < [1/2])

= [1/n] [n.summation over (i=1)] I(|[psi]([x.sub.i])| < [1/2]),

where [psi]([x.sub.i]) = [x - [x.sub.i]]/h. [^.f](x) is a proper density function if, first, it is greater than or equal to zero for all x, which is guaranteed since the indicator function is always either 0 or 1, and second, if [[integral].sub.-[infinity].sup.[infinity]] [^.f](x)dx = 1. Dividing [^.f](x) by h ensures that the function integrates to one. To see this, observe first that

[[integral].sub.-[infinity].sup.[infinity]] I(|[psi]([x.sub.i])| < [1/2])d [psi] = [[integral].sub.[--1/2].sup.[1/2]] I(|[psi]([x.sub.i])| < [1/2])d [psi] = [[integral].sub.-[1/2].sup.[1/2]] d [psi] = 1.

In addition, since [psi]([x.sub.i]) = [[x - [x.sub.i]]/h],

[1/h] [[integral].sub.-[infinity].sup.[infinity]] [^.f](x)dx = [1/nh] [n.summation over (i=1)] [[integral].sub.-[infinity].sup.[infinity]] I(|[[x - [x.sub.i]]/h]| < [1/2])dx

= [1/n] [n.summation over (i=1)] [[integral].sub.-[infinity].sup.[infinity]] I(|[[x - [x.sub.i]]/h]| < [1/2])d[psi]

= 1.

While local histograms certainly provide a nonparametric estimate of density, and are smoother than proper histograms, they are still discontinuous. It seems sensible, then, to attempt to smooth the histogram. This is done by replacing the indicator function in

[^.f](x) = [1/nh] [n.summation over (i=1)] I(|[x - [x.sub.i]]/h| < [1/2])

with another function called a kernel, K([psi]), such that [^.f](x) [greater than or equal to] 0, integrates to one and is smooth. An estimator of the form

[^.f](x) = [1/nh] [n.summation over (i=1)] K ([[psi].sub.i]), where [[psi].sub.i] = [[x - [x.sub.i]]/h],

is a Rosenblatt-Parzen kernel estimator, and the resulting function [^.f](x) depends on the choice of h, called a bandwidth or smoothing parameter, and the choice of kernel. A "good" density estimate will have low bias (that is, E([^.f](x)) - f(x), where f(x) is the true density of the data) and low variance.

APPENDIX B: ROSENBLATT-PARZEN BIAS AND VARIANCE

Bias and variance of a nonparametric estimator can be calculated given the following four assumptions:

1) The sample observations are i.i.d.

2) The kernel is symmetric around zero and satisfies [[integral].sub.-[infinity].sup.[infinity]] K ([psi])d[psi] = 1, [[integral].sub.-[infinity].sup.[infinity]] [[psi].sup.2] K ([psi])d [psi] [not equal to] 0, and [[integral].sub.-[infinity].sup.[infinity]] [K.sup.2] ([psi])d [psi] < [infinity].

3) The second-order derivatives of [^.f] are continuous and bounded around x, and

4) h [right arrow] 0 and nh [right arrow] [infinity] as n [right arrow] [infinity].

It can be shown that the Rosenblatt-Parzen estimator [^.f](x) has

Bias = [[h.sup.2]/2][[integral] [[psi].sup.2] K ([psi])d[psi]]f" (x) + O([h.sup.2])

and

Variance = [1/nh] f(x) [integral] [K.sup.2]([psi])d[psi] + O([1/nh]).

The integrated mean squared error (MISE) is defined as

[integral] [Bias([^.f](x))[.sup.2] + V([^.f](x))]dx.

Substituting the formulas for bias and variance, and ignoring the higher order terms, O([h.sup.2]) and O([1/nh]), respectively, gives the asymptotic integrated mean squared error (AMISE):

[[h.sup.4]/4][[integral][[psi].sup.2]K([psi])d[psi]][.sup.2][integral] (f"(x))[.sup.2]dx + [1/nh][integral] f(x)dx[integral] [K.sup.2]([psi])d[psi] = [[h.sup.4]/4][[integral][[psi].sup.2]K([psi])d[psi]][.sup.2][integral] (f"(x))[.sup.2]dx + [1/nh][integral] [K.sup.2]([psi])d[psi].

Differentiating with respect to h and setting the result equal to zero, we have

[h.sup.3][[integral] [[psi].sup.2]K([psi])d[psi]][.sup.2][integral] (f"(x))[.sup.2]dx - [1/[nh.sup.2]][integral] [K.sup.2]([psi])d[psi] = 0

or

h = [cn.sup.-[1/5]], where c = [[[integral] [K.sup.2]([psi])d[psi]]/[[[integral] [[psi].sup.2]K([psi])d[psi]][.sup.2][integral] (f"(x))[.sup.2]dx]][.sup.1/5].

REFERENCES

Austin, D. Andrew. 1999. "Politics vs. Economics: Evidence from Municipal Annexation." Journal of Urban Economics 45 (3): 501-32.

Bogue, Donald J. 1953. Population Growth in Standard Metropolitan Areas 1900-1950. Oxford, Ohio: Scripps Foundation in Research in Population Problems.

Carlino, Gerald, Satyajit Chatterjee, and Robert M. Hunt. 2006. "Urban Density and the Rate of Invention." Federal Reserve Bank of Philadelphia Working Paper No. 06-14.

Chatterjee, Satyajit, and Gerald A. Carlino. 2001. "Aggregate Metropolitan Employment Growth and the Deconcentration of Metropolitan Employment." Journal of Monetary Economics 48 (3): 549-83.

Ciccone, Antonio, and Robert E. Hall. 1996. "Productivity and the Density of Economic Activity." American Economic Review 86 (1): 54-70.

Dobkins, Linda, and Yannis Ioannides. 2000. "Dynamic Evolution of the Size Distribution of U.S. Cities." In The Economics of Cities, eds. J. Huriot and J. Thisse. New York, NY: Cambridge University Press.

Eeckhout, Jan. 2004. "Gibrat's Law for (All) Cities." American Economic Review 94 (5): 1,429-51.

Glaeser, Edward L., and Matthew E. Kahn. 2003. "Sprawl and Urban Growth." In Handbook of Regional and Urban Economics, eds. J. V. Henderson and J. F. Thisse, 1st ed., vol. 4, chap. 56. North Holland: Elsevier.

Greene, William. 2003. Econometric Analysis. 5th ed. Upper Saddle River, NJ: Prentice Hall.

Jaffe, Adam B., Manuel Trajtenberg, and Rebecca Henderson. 1993. "Geographic Localization of Knowledge Spillovers as Evidenced by Patent Citations." Quarterly Journal of Economics 108 (3): 577-98.

Lucas, Robert E., Jr. 1988. "On the Mechanics of Economic Development." Journal of Monetary Economics 22 (1): 3-42.

Lucas, Robert E., Jr., and Esteban Rossi-Hansberg. 2002. "On the Internal Structure of Cities." Econometrica 70 (4): 1,445-76.

Marshall, Alfred. 1920. Principles of Economics. 8th ed. London: Macmillan and Co., Ltd.

Mieszkowski, Peter, and Edwin S. Mills. 1993. "The Causes of Metropolitan Suburbanization." The Journal of Economic Perspectives 7 (3): 135-47.

Pagan, Adrian, and Aman Ullah. 1999. Nonparametric Econometrics. Cambridge, UK: Cambridge University Press.

Rossi-Hansberg, Esteban, Pierre-Daniel Sarte, and Raymond Owens III. 2005. "Firm Fragmentation and Urban Patterns." Federal Reserve Bank of Richmond Working Paper No. 05-03.

Silverman, B. W. 1986. Density Estimation. London: Chapman and Hall.

Song, Yan, and Gerritt-Jan Knaap. 2004. "Measuring Urban Form: Is Portland Winning the War on Sprawl?" Journal of the American Planning Association 70 (2): 210-25.

U.S. Bureau of the Census. "Number of Inhabitants: United States Summary." Washington, DC: U.S. Government Printing Office 1941, 1952, 1961, 1971, and 1981.

U.S. Bureau of the Census. 1994. Geographic Areas Reference Manual. Available online at http://www.census.gov/geo/www/garm.html (accessed September 4, 2007).

We wish to thank Kartik Athreya, Nashat Moin, Roy Webb, and especially Ned Prescott for their comments and suggestions. The views expressed in this article are those of the authors and do not necessarily represent those of the Federal Reserve Bank of Richmond or the Federal Reserve System. Data and replication files for this research can be found at http://www.richmondfed.org/research/research_economists/pierre-daniel_sarte.cfm. All errors are our own.

(1) 1980 Census of Population: Number of Inhabitants. "Appendix A-Area Classification." U.S. Department of Commerce, 1983. Note that CDPs did not appear in the 1940 Census.

(2) "Census Regions and Divisions of the United States." Available online at http://www.census.gov/geo/www/us_regdiv.pdf.

(3) See the Geographic Areas Reference Manual, U.S. Bureau of the Census, chap. 12. Available online at: http://www.census.gov/geo/www/garm.html.

(4) In New England, the town, rather than the county, is the relevant area.

(5) The MSA was made up of two counties: Riverside County with an area of 7,214 square miles, and San Bernardino County with an area of 20,064 square miles.

(6) In fact, the entire planet has a land area of around 58 million square miles and a population of 6.5 billion, giving a density of 112 people per square mile, or twice the density of the Riverside MSA.

(7) The 1990 data can be found at http://www.census.gov/tiger/tms/gazetteer/places.txt. Data for 2000 are available at: http://www.census.gov/tiger/tms/gazetteer/places2k.txt.

(8) County and City Data Books. University of Virginia, Geospatial and Statistical Data Center. Available online at: http://fisher.lib.virginia.edu/collections/stats/ccdb/.

(9) Nonparametric estimates converge to their true values at a rate slower than [square root of n]

(10) If [[X.sub.n]/[n.sup.k]] [right arrow] some real number c as n [right arrow] [infinity], then [X.sub.n], is O([n.sup.k]). O(A) is the largest order of magnitude of a sequence of real numbers [X.sub.n].

(11) Note that this rule does not imply that the nonparametric estimate will look like a parametric normal distribution; it merely says that, given data that are roughly normal. 1.06[^.[sigma]] [n.sup.-[1/5]] is the smoothing factor that minimizes both bias and variance.

(12) The local histogram [^.f](x) must be computed for ([x.sub.min] - [h/2], [x.sub.max] + [h/2]] and not simply for ([x.sub.min], [x.sub.max]], because [^.f](x) > 0 for points outside of ([x.sub.min], [x.sub.max]]. For instance, if h = 1 and ([x.sub.min], [x.sub.max]] = (0, 10], [^.f](10.4) will be greater than zero because it will count the sample observation [x.sub.0] = 10.

(13) In practice, [^.f](x) can only be computed for a finite number of points. The distributions we display in Section 5 have been computed at 1,000 points evenly divided on the range ([x.sub.min], [x.sub.max]).

Table 1 Three Definitions of a City

Legal City The region controlled by a local government or a similar
 unincorporated region (CDP).
 Defined by local and state governments.
Urbanized Area A region incorporating a central city plus surrounding
 towns and cities meeting a density requirement.
 Defined by the U.S. Census Bureau.
MSA A region incorporating a central city, the county
 containing that city, and surrounding counties meeting
 a requirement on the percentage of workers commuting
 to the center.
 Defined by the U.S. Census Bureau.