Geographic Crosswalks

Overview

NHGIS geographic crosswalks describe how U.S. census geographic units from one census year correspond to units from another year. The crosswalks are designed to support high-quality tabulations of one year's census data for another year's geographic units (e.g., 1990 foreign-born population for 2010 census tracts).

NHGIS crosswalks are similar to the U.S. Census Bureau's Relationship Files, but NHGIS crosswalks include interpolation weights, derived from advanced models. Each interpolation weight indicates the proportion of a source zone's characteristics that should be allocated to a specific target zone. We use these same weights to produce NHGIS geographically standardized time series tables.

Crosswalk Availability by Geographic Levels and Years
Source Zones Target Zones 1990 to 2010 2000 to 2010 2010 to 2020 2020 to 2010
Blocks Blocks X X X X
Block Group Parts Block Groups X X    
Block Group Parts Census Tracts X X    
Block Group Parts Counties X X    
Block Groups Block Groups     X X
Block Groups Census Tracts     X X
Block Groups Counties     X X


Back to top

Selecting a Crosswalk

Why It's Best to Start Small

To transform summary data from one census's geographic units to another's, it's important to start from the lowest possible level.

Let's say you'd like "harmonized tract data": multiple years of census tract data for spatially consistent tracts. So you obtained 1990, 2000, 2010, and 2020 tract data, and now you'd like to use a tract crosswalk to standardize the boundaries. But there are lots of data available for units smaller than tracts, and those smaller units fit much better within other years' tracts, so why not start with data from the smaller units? Doing so is consistently more accurate, and sometimes much more.

For example, there are thousands of cases where a single 1990 census tract corresponds to multiple 2010 tracts. This is especially common in fast-growing areas on the outskirts of cities, as in Figure 1. To tabulate 1990 data for the 2010 tracts in these cases, starting from tract counts requires disaggregating counts from the larger 1990 tracts down to their component parts, which can result in substantial errors. Disaggregation models commonly assume that population characteristics are uniformly distributed within source zones, resulting in inaccurately uniform distributions among the target zones. This is evident in the tract-based estimates of the 1990 Black population distribution in Figure 1, which indicate that this area was moderately well integrated.

Computing 1990 Percent Black in 2010 Tracts from Different Source Levels, Charlotte Example

Figure 1. Computing 1990 rates of Black residents in 2010 tracts from different source levels, on the southwest outskirts of Charlotte, NC. Click for larger version. Source for tract-based estimates: Longitudinal Tract Database (LTDB).

In this case, there is no need to start with tract data. The Census Bureau publishes many summary tables for lower levels, such as blocks, block groups, and "block group parts" (intersections between block groups, places, county subdivisions, and some other levels). Allocating data from these levels requires less disaggregation, and in many cases it requires only aggregation, resulting in exact counts for the target zones. In Figure 1, the allocations from the lower levels reveal that this area's population was in fact strongly segregated in 1990, with high rates of Black population in the eastern 2010 tracts and low rates in the western 2010 tracts.

Back to top

Options & Recommendations

Which source level is "lowest" depends on whether the data are from the full-count, short-form census or from a sample-based, long-form survey.

From 1970 to 2000, the decennial census used two questionnaires:

  1. The short-form questionnaire, to be completed for all population and housing units, covering subjects such as age, sex, race, household size, and housing tenure.
  2. The long-form questionnaire, sent only to a sample of households, covering subjects such as income, employment, education, nativity, migration, and commuting.

After the 2000 census, the Census Bureau stopped including the long form in decennial census operations and has instead collected comparable information through the year-round American Community Survey (ACS). The Bureau publishes summary tables from the ACS annually. To generate a large enough sample to report data for small areas, it pools 5 years of samples together, producing 5-year summary data.

  • Blocks are the lowest level for which the Census Bureau tabulates full-count, short-form summary data.
  • Block group parts are the lowest level for which the Census Bureau tabulated sample-based, long-form summary data in 1990 and 2000.
  • Block groups are the lowest level in ACS 5-year summary tables.*

*The ACS also provides data for many entities that are smaller than block groups, such as small villages and tribal areas, but unlike these areas, block groups cover the entire U.S. and are uniformly small, with no great exceptions.

For optimal accuracy, we recommend using:

  • Crosswalks from blocks for short-form decennial census data
  • Crosswalks from block group parts for long-form 1990 and 2000 census data
  • Crosswalks from block groups for ACS 5-year data

There are some census and ACS tables that are not available for these levels (e.g., detailed place of birth), and there are some other data sources that provide data only for tracts or higher levels. To help with these settings, we plan to add crosswalks for some other source levels in the future.

Back to top

How to Use the Crosswalks

Using Block Crosswalks

In a block-to-block crosswalk, each record identifies a possible intersection between a single source block and a single target block, along with an interpolation weight (ranging between 0 and 1) identifying approximately what portion of the source zone's population and housing units were located in the intersection. These weights can be used to estimate how any counts available for source blocks (e.g., females age 75 and over, single-member households, owner-occupied housing units, etc.) are distributed among target blocks.

For example, to interpolate count data from 1990 blocks to 2010 blocks:

  1. Obtain data of interest for 1990 blocks
    • E.g., using the NHGIS Data Finder, find and download tables for the Block geographic level for the year 1990 (dataset 1990_STF1)
  2. Join the 1990-block-to-2010-block crosswalk to the 1990 block data of interest
  3. Multiply the 1990 block counts by the crosswalk's interpolation weights, producing estimated counts for all 1990-2010 block intersections, or "atoms"
  4. Sum these atom counts for each 2010 block

Back to top

Using Block Crosswalks to Generate Data for Larger Units

Since 1990, every census geographic unit corresponds exactly to a set of blocks from the same census year. If you have census counts for blocks, you can compute counts for any larger census units by summing all the counts for blocks in that unit. In this way, the NHGIS block-to-block crosswalks can be used to allocate data from 1990, 2000 or 2020 to any 2010 census units, or from 2010 to any 2020 census units, not just to blocks.

For example, to compute 2000 counts for 2010 school districts:

  1. Obtain 2000 block counts of interest and use the 2000-to-2010 block crosswalk to generate 2000 data for 2010 blocks (following steps similar to those above)
  2. Obtain a 2010 block data table from NHGIS and join it to the 2000 counts from step 1
    • You can find codes for most 2010 census units, including school districts, in 2010 block-level table files
  3. Sum the 2000 counts for all 2010 blocks in each 2010 school district

Back to top

Crosswalks from Block Groups & Block Group Parts

The steps for using crosswalks from block groups and block group parts (BGPs) mirror those for crosswalks from blocks.

One difference is that crosswalks from block groups and BGPs include separate interpolation weights for different census counts, as laid out in the Technical Details section. To generate the weights in these crosswalks, we compute the proportion of each source zone's characteristics in each target zone based on block-level characteristics, using the block-to-block crosswalks. The separate interpolation weights are based on different block-level characteristics (total population, total housing units, etc.), enabling crosswalk users to choose the weights that are most suitable for their data of interest.

For example, a user interpolating counts of foreign-born persons may choose to use the weights based on total populations, assuming that the distribution of foreign-born persons corresponds closely to the general population's distribution. A user interpolating counts of high-income households may instead choose to use the weights based on total households, etc.

Back to top

Finding Data for Block Group Parts

Finding source data for 1990 and 2000 BGPs through the NHGIS Data Finder involves an extra step. In NHGIS terminology, BGPs are a type of compound level. To find them in a selection window for Geographic Levels, you will need to click on the "Show Compound Geographic Levels" option in the upper right. You will then find BGP levels under the "BLOCK GROUP" heading.

Summary files for 1990 and 2000 use different versions of BGPs, each combining different sets of geographic units. The crosswalks use these levels:

Geographic Levels in Crosswalks for Block Group Parts
Year NHGIS ID Label
1990 blck_grp_598 Block Group [1990 partition] (by State--County--County Subdivision--Place/Remainder--Census Tract--Congressional District (1987-1993, 100th-102nd Congress)--American Indian/Alaska Native Area/Remainder--Reservation/Trust Lands/Remainder--Alaska Native Regional Corporation/Remainder--Urbanized Area/Remainder--Urban/Rural)
2000 blck_grp_090 Block Group [2000 & 2010 partition] (by State--County--County Subdivision--Place/Remainder--Census Tract--Urban/Rural)


Back to top

Extending to 2011-2019 ACS Units

In most cases, allocating data to 2010 counties, census tracts, or block groups will produce results that are directly comparable to tables from the 2011-2019 releases of American Community Survey (ACS) Summary Files. The Census Bureau generally makes no changes to its definitions of census tracts or block groups between censuses, and changes to county boundaries are also uncommon in recent decades.

There are a few exceptions:

  • Prior to the 2011 ACS data release: Tract numbering corrections and minor geographic definition changes in Madison County (FIPS: 053), Oneida County, (065) and Richmond County (085), New York (36)
  • Prior to the 2012 ACS data release: Tract numbering corrections in Pima County (FIPS: 019), Arizona (04), and the restoration of a deleted tract in Los Angeles County (037), California (06)
  • In 2013, 2015, and 2019: A few changes to the boundaries and/or codes of counties or county equivalents in Alaska, Virginia, and South Dakota. See the ACS Geography Boundaries by Year to determine which ACS data releases were affected by these changes.

We plan to extend our crosswalks in the future to incorporate these discrepancies and facilitate allocations to and from census units for any year since 2010 for anywhere in the U.S.

Back to top

Technical Details

Basics

The crosswalks are provided through the links below as comma-separated values (CSV) files within Zip archives.

Each Zip file includes a "README" text file that describes the content of the crosswalk file in detail.

Back to top

Methodology

The interpolation weights in NHGIS block crosswalks are primarily based on "target-density weighting" (TDW) (Schroeder 2007). TDW assumes that characteristics within each source zone have a distribution proportional to the densities of another characteristic among target zones. For example, if a 2020 block intersects two 2010 blocks, one of which was 10 times as dense as the other in 2010, then TDW assumes that the same 10:1 ratio holds within the 2020 block in 2020.

The interpolation weights in the crosswalks from 1990 and 2000 blocks to 2010 blocks involve some more advanced modeling as documented in these pages:

To generate crosswalks from block groups and block group parts (BGPs), we used block-level data and the block-to-block crosswalks to calculate the proportion of each source zone's characteristics that were located in each target zone. The Python-based source code for the BGP crosswalks is available in this Github repository, which includes an FAQ with additional documentation.

Back to top

Geographic Coverage

Each crosswalk file is complete for the entire U.S. or for an entire state. State-level files include all target zones for the state as well as any source zones that intersect any of those target zones, including some source zones from neighboring states in cases where the Census Bureau adjusted state boundary lines between censuses.

Users interested in producing a complete set of data for a single state may need to obtain source data for both the state of interest and its neighboring states to ensure they have the required input data to allocate to all target zones in the state.

The block crosswalks and the nationwide BGP crosswalks can include millions of records and may therefore be too large to open in some applications.

Back to top

Geographic Identifiers

There are two types of block crosswalk files, differing in the codes they use to identify blocks:

  • GISJOIN identifiers match the identifiers used in NHGIS data tables and boundary files. A block GISJOIN concatenates these codes:
    Component Notes
    "G" prefix This prevents applications from automatically reading the identifier as a number and, in effect, dropping important leading zeros
    State NHGIS code 3 digits (FIPS + "0"). NHGIS adds a zero to state FIPS codes to differentiate current states from historical territories.
    County NHGIS code 4 digits (FIPS + "0"). NHGIS adds a zero to county FIPS codes to differentiate current counties from historical counties.
    Census tract code 6 digits for 2000 and 2010 tracts. 1990 tract codes use either 4 or 6 digits.
    Census block code 4 digits for 2000 and 2010 blocks. 1990 block codes use either 3 or 4 digits.
  • GEOID identifiers correspond to the codes used in most current Census sources (American FactFinder, TIGER/Line, Relationship Files, etc.). A block GEOID concatenates these codes:
    Component Notes
    State FIPS code 2 digits
    County FIPS code 3 digits
    Census tract code 6 digits. 1990 tract codes that were originally 4 digits (as in NHGIS files) are extended to 6 with an appended "00" (as in Census Relationship Files).
    Census block code 4 digits for 2000 and 2010 blocks. 1990 block codes use either 3 or 4 digits.

The block group crosswalks include both GISJOIN and GEOID identifiers, and the BGP crosswalks include GISJOIN identifiers for the source BGPs as well as both GISJOIN and GEOID identifiers for each target zone. For details on those identifiers, see the README files that accompany the crosswalks.

Back to top

Interpolation Weights

The block-to-block crosswalks include a single interpolation weight (labeled "WEIGHT") representing the expected proportion of the source block's population and housing units located in each target block. These are the weights used for geographically standardized time series tables.

The crosswalks from block groups and BGPs include multiple interpolation weights:

Interpolation Weights in Crosswalks from Block Groups & Block Group Parts
Field Name Description
wt_pop Expected proportion of source zone's population located in target zone
wt_adult Expected proportion of source zone's adult population (18 years and over) located in target zone. Currently provided only in crosswalks from block groups.
wt_fam Expected proportion of source zone's families located in target zone. Currently not provided in crosswalks from 2020 block groups.
wt_hh Expected proportion of source zone's households located in target zone. Note: household counts are equal to counts of occupied housing units and of householders.
wt_hu Expected proportion of source zone's housing units located in target zone

Back to top

Blank/Missing Identifiers

In crosswalks with 1990 source zones, the zone identifier fields may contain blank values. Blank values are given in cases where a source or target zone lies entirely offshore in coastal or Great Lakes waters. In these cases we are unable to use NHGIS boundary files, which exclude offshore areas, to determine relationships between 1990 and later census zones. (For censuses after 1990, we use block relationship files from the Census Bureau to identify intersections in offshore areas.) None of the blocks or BGPs with a "blank" target zone had any reported population or housing units. We include the records with blank values to ensure that all source and target zones are represented in the file.

Back to top

Download

Crosswalks from Blocks

Blocks → Blocks: GISJOIN Identifiers

Blocks → Blocks: GEOID Identifiers

Back to top

Crosswalks from Block Group Parts

Block Group Parts → Block Groups

Block Group Parts → Census Tracts

Block Group Parts → Counties

Back to top

Crosswalks from Block Groups

Block Groups → Block Groups

Block Groups → Census Tracts

Block Groups → Counties

Back to top

Citation and Use

Use of NHGIS crosswalks is subject to the same conditions as for all NHGIS data. See Citation and Use of NHGIS Data.

Back to top


References

Back to Top