About the Chartbook

Introduction

The first decennial census of population in the United States was taken in 1790, reflecting the constitutional requirement (Article I, Section 2) that an enumeration be conducted every ten years for use in apportioning members of the House of Representatives among the states.  The information collected was tallied by household and included five items in addition to the name of head of the household: the numbers of persons who were White males 16 years and over, White males under age 16, White females, other free persons, and slaves.[1]

During the following two centuries, the items on the census of population were expanded to include a wide range of social and economic characteristics.  Starting in 1850, information was collected for each person (rather than by household), permitting the addition of items with a large number of possible responses, such as place (state or foreign country) of birth and occupation.  The census of housing, conducted in conjunction with the census of population, was introduced in 1940, although information on household tenure (owner/renter) had been collected previously in the census of population.  The many changes in the decennial census in addition to content included the introduction of electrical tabulating machines in 1890, sampling in 1940, electronic computers in 1950, and the gradual replacement of door-to-door enumerators with mail-out, mail-back data collection from 1960 to 1980.  Starting in 1880, the decennial census has been taken under some provision for confidentiality.  The decennial census is currently taken under Title 13, U.S. Code, which requires that the Census Bureau release no data that would permit the identification of any person or household.[2]

Data from the decennial census of population were published in printed reports starting with a single 56-page report for 1790 and surpassing 100,000 pages for 1960, when data were first made available also on computer tape files.  In general, the increase reflected the increase in the amount of data collected, the quantity of cross-tabulations produced, and the geographic detail for which the data were published.  The Census Bureau Internet site (www.census.gov) has supplanted printed reports as the primary medium for releasing census data, and printed pages were reduced to about 50,000 in just three report series for 2000.

While many of the older printed reports from the decennial census are not readily available, most of these publications for the 1790 to 2000 period are now available on the Census Bureau Internet site.  In addition, two other major sources for historical census data merit description here.  The first of these is the five-volume Historical Statistics of the United States: Millennial Edition, hereafter referred to as Millennial Edition.[3] It was produced by an academic consortium and supersedes editions of Historical Statistics of the United States published by the Census Bureau in 1949, 1960, and 1975.[4] The Millennial Edition facilitates the use of historical data by presentation in time series format, and in the case of decennial census data in particular, includes data at the regional and state levels as well as the national level on several topics.

The second of these additional sources is the Integrated Public Use Microdata Series (IPUMS), which combines microdata files produced by the Census Bureau since 1960 with representatives samples drawn from census schedules for the 1850 to 1950 period., excluding 1890 (destroyed by fire in 1921) and 1930 (under development).  IPUMS (www.ipums.umn.edu) permits the development of time series with a consistent set of classifications on topics for which the census classifications have changed over time (e.g., occupation) and/or which were not tabulated at each census in which the underlying data were collected (e.g., households by size).  Many time series of decennial census data based on IPUMS are included in the Millennial Edition.[5]

As suggested by the preceding description, there is a tremendous quantity of data available based on the decennial census of population.  In addition, there is extensive scholarly analysis of American demographic history based on the census and other sources of demographic information.[6] These other sources include, for example, administrative data on vital statistics (births, deaths, marriage, and divorce) and immigration, and national survey data since the 1940s on a wide range of social and economic characteristics.

The purpose of the American Demographic History Chartbook is to present basic information about demographic trends and differentials in the United States in the 1790 to 2000 period, as revealed by the decennial census of population, and to do so using graphics designed for a general audience of persons interested in U.S. history.  In keeping with the stated purpose and general audience, the Chartbook has been kept relatively short.  Those persons interested in more detail on any of the topics included here may consult References, which are divided into Decennial Census Publications and References Other than Decennial Census Publications.

 

Scope of the Graphics

Limiting the length of the Chartbook, as described above, requires some choices regarding four parameters: geographic detail, topics, cross tabulations, and census years.  For example, graphics showing data down to the county level (which still would not include data for individual cities) for all topics in the decennial census of population for all years (1790 to 2000) with extensive cross-tabulations (e.g., by type of residence and by race and Hispanic origin) would require several hundred thousand pages.

The geographic coverage in the Chartbook is primarily for the United States and for regions (North, South, and West, as discussed below).  In addition, data are shown on some topics for states and for the ten largest cities and the ten largest metropolitan areas in the United States.

The Census Bureau designations of subnational regions (based on aggregations of states) has changed over time (Dahmann, 1992; Hindman, 2006).  The four regions used currently by the Census Bureau are the Northeast, Midwest, South, and West; however, it is preferable here to use the three regions listed in the preceding paragraph for geographical, historical, and demographic reasons.  These three regions are much more similar in land area (representing 25.5 percent, 24.6 percent, and 49.6 percent, respectively of U.S. land area in 2000) whereas the Northeast represents only 4.6 percent of U.S. land area.[7] The major conflicts in American history reflect North-South comparisons more than Northeast-Midwest comparisons.[8] Historical differences between the North and the South are much greater than between the Northeast and the Midwest for a wide range of demographic characteristics, including, for example, population growth rates, urbanization, racial composition, educational levels, and the proportion foreign born.

Geographic detail is shown for the United States as defined at each census.  This includes the enumerated area of the conterminous United States and includes territories, which frequently had very different boundaries than states subsequently created from them.  Alaska and Hawaii, being outside the conterminous United States, are included first in 1960.  Important changes in state boundaries include the creation of the District of Columbia from parts of Maryland and Virginia in 1791, the retrocession of the Virginia portion of the District of Columbia in 1846, and the creation of West Virginia from part of Virginia in 1863.  The Oklahoma Territory and the Indian Territory, shown for the only time in the census in 1900, were merged to form the state of Oklahoma in 1907.[9] Prior to 1890, the enumeration of the American Indian population in the decennial census excluded those living in tribal society (the large majority of the Indian population), and information on the full range of census items was not collected for the American Indian population until 1900.

Three maps are included for use with the graphics.  Map 1 and Map 2 show states and territories for the 1790 to 1840 period and the 1850 to 1900 period, respectively (after which no major changes in boundaries have occurred in the conterminous United States, except as noted above for Oklahoma).  Map 3 shows the United States by census region, census division, and state in 2000.

Because the focus here is on American demographic history for the full 1790 to 2000 period, the topics included are limited to those for which information has been collected, or is derivable from the information collected, in the decennial census for at least a century and reflect some judgment about what topics are of most general interest.  The topics included are indicated in the chapter titles.  In two cases, presentation of data on a topic reflects a change in the census item.  Educational level was measured by illiteracy rates (i.e., from a negative perspective) prior to 1940, by years of school completed from 1950 to 1980, and primarily by educational diplomas and degrees  in 1990 and 2000.  Fertility was first measured by children ever born in 1910; however, child-woman ratios can be calculated using data on age and sex back into the 19th century.[10]

With a few exceptions, the topics shown in the graphics are not cross-tabulated by other characteristics, for reasons of space.  The primary exception is that some social and economic topics (e.g., marital status and labor force participation rates) are shown by age and sex, which are of fundamental importance in demographic analysis.[11]  In general, data are not shown by race or Hispanic origin, except for censuses prior to 1870 when data on some topics were collected only for the White population or for the free population.  For many topics, data are available in decennial census publications on population characteristics down to the state level by type of residence (e.g., urban and rural) and to the county level.

At the national level, the graphics show data for every census year for which data are available in the 1790 to 2000 period.  With a few exceptions, data are shown at the regional and state levels only for selected years.  In general, these years are 1790 (the first census) 1820, (about midway between 1790 and 1860, and following the effects of the War of 1812), 1860 (on the eve of the Civil War), 1900 (turn of the century), 1940 (on the eve of World War II and the baby boom), 1970 (end of the baby boom and the beginning a new wave of large-scale immigration), and 2000 (turn of the century).

 

Addition of Data for 2010

As described in the Introduction, there were several major changes in the decennial census from 1790 to 2000. None of these changes affected one of the basic characteristics of a decennial census: the collection of data at 10-year intervals to provide information about the population.

This situation changed between 2000 and 2010 with what is arguably the biggest change in U.S. decennial census history. From 1960 through 2000, basic population and housing information was collected with a few questions on the “short form” that covered the entire population, and most information, including information on wide range of social, economic, and housing characteristics, was collected on a sample basis with a larger number of questions on the “long form.”

In order to simplify operational aspects of the decennial census (the primary purpose of which is to fill the constitutional requirement for a count of population for apportionment purposes) and to provide more frequent data on population and housing characteristics for small areas (for which existing national surveys like the Current Population Survey do not provide data), the 2010 decennial census was restricted to the short form, and the long form was removed from the decennial census and replaced by the American Community Survey (ACS).

The ACS is an ongoing survey that collects information monthly throughout the United States. These monthly data can be aggregated to produce annual estimates (12-month average data). ACS data collected from 2000 through 2004 were published annually (in the year following data collection) for geographic areas of 250,000+ population. Starting with data collected in 2005, annual estimates were published for geographic areas with 65,000+ population.

Annual average data can be aggregated to produce multiyear estimates. The original sample design for the fully-implemented ACS called for a 3-percent sample of the population each year so that 5-year average estimates would be based on a 15-percent sample of the population, thereby approximating the sampling rate for the long form in the 2000 census. Based on this design, the Census Bureau determined population thresholds for producing estimates from the ACS: annual estimates for geographic areas of 65,000+ population, 3-year annual average estimates for geographic areas of 20,000+ population, and 5-year annual average estimates for all geographic areas (including statistical areas like census tracts).

Unfortunately, due primarily to budget cuts, the actual sample size for the ACS is only about one-half as large as originally planned. As a result, sampling errors for ACS estimates are larger than would be the case if the original sample design could have been implemented each year.

Fortunately, the relatively small sample in the ACS does not pose a major problem for the analysis of historical trends and differentials, as shown in the Chartbook, for two reasons. First, the geographic areas for which data for 2010 are shown in the Chartbook all have a population of at least 500,000 (and much larger in most cases) so that sampling errors for estimates for 2010 are still relatively small. Second, since the comparisons are decade to decade rather than year to year, apparent changes are more likely to be large enough to be statistically significant.
One drawback of the ACS for historical comparisons is that the Census Bureau has focused on producing the huge quantity of tabulations now available (three sets of estimates each year in place of decennial census data once every 10 years). As a result, more detailed tabulations that have historically been published for higher-level geography (such as the United States, regions, and states) have not been produced, at least so far. For example, a table showing state-to-state lifetime (state of birth by state of residence), which was published in each census from 1850 through 2000, has not been produced using 2010 ACS data. This tabulation underlies the data shown in Figure 11-5. Such tabulations can be developed using sample data on the PUMS file for the 2010 ACS, although the results will differ slightly from the results that would have been produced by using the full 2010 ACS sample data and thus will be slightly inconsistent with data from the 2010 ACS as published by the Census Bureau.

While, as noted earlier, the relatively small sample size in the ACS does not pose a major problem for the analysis of trends and differentials as shown in the Chartbook, it should be noted that in general, the analysis of trends and differentials is much more complicated using ACS data than traditional decennial census data, especially for geographic areas with relatively small populations. This is due to three interrelated factors: (1) the smaller sample size in the ACS, (2) the need to choose between 1-year, 3-year, and 5-year data where available (which involves a tradeoff between reliability, as measured by sampling error, and currency), and (3) the fact that in general, statistically significant change is more likely to occur and to be detectable for a 10-year period than for a much shorter period, especially one year.

For 2010, data on topics shown in figures in chapters 1 through 6 and in Figure 8-3 in the Chartbook are from the decennial census. Data for 2010 on other topics shown in figures in the Chartbook are from the ACS.

The Census Bureau has issued (in PDF) several reports under the general title of A Compass for Understanding and Using American Community Survey Data. The full series is available at www.census.gov/acs/www/guidance_for_data_users/handbooks/. In particular, it is strongly suggested that at a minimum, the user of ACS data read the first report in the series: What General Data Users Need to Know (by Linda A. Jacobsen and Mark Mather; issued October, 2008).

 

Accuracy of the Data

General information on census data, including area classifications, definitions of topics, accuracy of the data, and collection and processing techniques, is provided in decennial census publications (and with census data sets on the Internet).  The United States has been taken primarily on a de jure (usual place of residence) basis rather than on a de facto (location at the time of the census) basis.  Estimates of census coverage and net under-enumeration have been prepared for the decennial census on a regular basis since 1940.  While the estimated rates of net undercount have varied somewhat, they have generally shown higher rates of net undercount for males than for females, for young adults than for other age groups, and for minority groups than for the White (or White non-Hispanic) population.[12]

Since 1940, some data in the decennial census have been collected on a sample basis, and since 1960, this has been the case for data on most social and economic characteristics.  The use of sample data (in decennial census publications and tabulations based on IPUMS)  is indicated in headnotes for the graphics.  In general, estimates of sampling error are provided in decennial census publications that show sample data.

Sample estimates may differ somewhat from the data that would have been obtained if information had been collected for the entire population.  In addition to sampling error for data based on a sample, both 100-percent data and sample data are subject to nonsampling error.  Nonsampling error may be introduced during any of the numerous operations used to collect and process data.  Such errors may include the following: not enumerating every household or every person in the population, failing to obtain all required information from the respondents, obtaining incorrect or inconsistent information, and recording information incorrectly.  In addition, errors can occur during the review of the enumerators’ work, during clerical handling of the questionnaires, and during the processing of the questionnaires.

The magnitude of sampling error is determined primarily by sample size and to a lesser degree by the sampling rate.  Since the sample data shown in the graphics are for the United States, regions, states, and large cities (and not, for example, for small towns), the samples on which the sample data are based are sufficiently large that the resulting sampling errors are relatively small.  As noted above, information on sampling error typically is provided in decennial census publications; however, the following very general guideline is offered, unless there is particular reason to question the comparability of data (e.g., due to changes in definitions).  Changes (over time) and differences (for the same census year) of less than one or two percentage points (in the case of percentages) or of less than one or two percent (in the case of other measures, such as ratios, or numbers) do not merit emphasis.  Such differences may not be statistically significant due to sampling error and/or nonsampling error.  In addition, such small changes may not be of substantive significance, even if they are of statistical significance.[13]

This general guideline does not apply to estimates based on sample data for net migration of the population born in the United States.  In this case, estimates of in-migration and of out-migration are each subject to sampling error, and the resulting estimate of net migration, which may be a much smaller number, may have a large sampling error relative to the size of the estimate.

 

Acknowledgments

The author thanks the many demographers at the Census Bureau and in academe for their contributions over the years to his understanding of historical U.S. decennial census statistics and their limitations, and thanks others, including several high-school history teachers, for their input in developing the Chartbook.  In particular, the author thanks David M. Kennedy (Donald J. McLachlan Professor of History, Emeritus, and Director of the Bill Lane Center for the American West, at Stanford University), Herbert S. Klein (Professor of History and Director of the Center for Latin American Studies at Stanford University), and Alexander E. Landry (Reference Librarian at the U.S. Census Bureau) for their consultation and their support.

 

 

[1]Bohme et al, 1973.

[2]Bohme et al, 1973; Anderson, 1988; Gauthier, 2002.  The requirement for confidentiality is 72 years, after which period microfilm copies of decennial census schedules are available to the public from the National Archives, pursuant to Title 44. U.S. Code..

[3]Carter, Gartner, Haines, Olmstead, Sutch, and Wright, editors in chief, 2006.

[4]For simplicity, the term Census Bureau is used in the text to include its predecessors, the Bureau of the Census and the Census Office.  See References, Decennial Census Publications.

[5]For a description of IPUMS, see Ruggles, Sobek, et al, 1997.

[6]The essays in the Millennial Edition, along with the references provided in these essays, provide an extensive analysis of American demographic history.  See especially, Carter, Haines, Sutch, and Wright, 2006; Haines, 2006a; Haines, 2006b; Ferrie, 2006; Barde, Carter, and Sutch, 2006; Ruggles, 2006; Carter, 2006; and Sobek, 2006.  For a one-volume “survey” of American demographic history, see Klein, 2004.  For demographic methods and techniques of analysis, see one of the editions of The Methods and Materials of Demography, the first and the most recent editions being Shryock, Siegal, and Associates, 1971; and Siegal and Swanson, editors, 2004.

[7]U.S. Census Bureau, 2004.

[8]This is reflected in textbooks on American history.  For example, see Kennedy, Cohen, and Bailey, 2010.

[9]For a description of boundary changes, as well as historical population counts for states and counties, see Forstall, 1996.

[10]For a comprehensive list of items on the population questionnaire at each decennial census in the 1790 to 2000 period, see Gauthier, 2002.

[11]Hobbs, 2004.

[12]For discussion and estimates of estimated net undercount, see U.S. Bureau of the Census, 1975 (Historical Statistics of the United States, Part 1, p. 1); Fay et al, 1988; Robinson et al, 1993; and U.S. Census Bureau, 2003.

[13]For illustration, an example at the national level is provided using 1950 census data on children ever born to ever-married women.  These data were based on a 3 and 1/3 percent sample, much smaller than for most census sample data.  For a weighted population of 1,000,000 (meaning about 33,000 sample cases), the standard error on an estimated percentage of 10 percent with zero children ever born  is 0.2 percentage points, and the standard error on an estimated rate of 3.00 lifetime births per woman is 0.02 births (U.S. Bureau of the Census, 1955).  There is about a 69-percent chance that the sample-based estimates would be within one standard error (and about a 95-percent chance within two standard errors) of what would have been obtained from a complete census.