- Selecting a Crosswalk: How Low Can You Go?
- How to Use the Crosswalks
- Technical Details
- Citation and Use
NHGIS geographic crosswalks describe how U.S. census geographic units from one census year correspond to units from another year. The crosswalks are designed to support high-quality tabulations of one year's census data for another year's geographic units (e.g., 1990 foreign-born population for 2010 census tracts).
NHGIS crosswalks are similar to the U.S. Census Bureau's Relationship Files, but NHGIS crosswalks include interpolation weights, derived from advanced models. Each interpolation weight indicates the proportion of a source zone's characteristics that should be allocated to a specific target zone. We use these same weights to produce NHGIS geographically standardized time series tables.
|1990 to 2010||2000 to 2010||2010 to 2020||2020 to 2010|
|Blocks to Blocks||X||X||X||X|
|Block Group Parts to Block Groups||X||X|
|Block Group Parts to Census Tracts||X||X|
|Block Group Parts to Counties||X||X|
Selecting a Crosswalk: How Low Can You Go?
To reallocate summary data from one census's geographic units to another's, it's important to start from the lowest possible level.
Let's say you'd like "harmonized tract data": multiple years of census tract data for spatially consistent tracts. So you obtained some 1990, 2000, and 2010 tract data, and now you'd like to use a tract crosswalk to reallocate the 1990 and 2000 tract data to 2010 tracts. But there are lots of 1990 and 2000 data available for units smaller than tracts, and those smaller units fit much better within 2010 tracts, so why not start with data from the smaller units? Doing so is consistently more accurate, and sometimes much more.
For example, there are thousands of cases where a single 1990 census tract corresponds to multiple 2010 tracts. This is especially common in fast-growing areas on the outskirts of cities, as in Figure 1. To tabulate 1990 data for the 2010 tracts in these cases, starting from tract counts requires disaggregating counts from the larger 1990 tracts down to their component parts, which can result in substantial errors. Disaggregation models commonly assume that population characteristics are uniformly distributed within source zones, resulting in inaccurately uniform distributions among the target zones. This is evident in the tract-based estimates of the 1990 Black population distribution in Figure 1, which indicate that this area was moderately well integrated.
In this case, though, there is no need to start with tract data. The Census Bureau publishes many summary tables for lower levels, such as blocks, block groups, and "block group parts" (intersections between block groups, places, county subdivisions, and some other levels). Allocating data from these levels requires less disaggregation, and in many cases it requires only aggregation, resulting in exact counts for the target zones. In Figure 1, the allocations from the lower levels reveal that this area's population was in fact strongly segregated in 1990, with high rates of Black population in the eastern 2010 tracts and low rates in the western 2010 tracts.
The options available for source levels vary by data source. From 1970 to 2000, the decennial census used two questionnaires: a short-form questionnaire that included a small set of questions asked of all households, and a long-form questionnaire sent only to a sample of households.
- Blocks are the lowest level for which the Census Bureau tabulates full-count, short-form summary data, covering subjects such as age, sex, race, household size, and housing tenure.
- Block group parts are the lowest level for which the Census Bureau tabulated sample-based, long-form summary data in 1990 and 2000, covering subjects such as income, employment, education, nativity, migration, and commuting.
For optimal accuracy, we recommend using crosswalks from blocks for short-form data and crosswalks from block group parts for long-form data. There are some census summary tables that are not available for these levels (e.g., detailed place of birth), and there are some other data sources that provide data only for tracts or higher levels. To help with these settings, we plan to add crosswalks for some other source levels in the future.
How to Use the Crosswalks
Crosswalks from Blocks
In a block-to-block crosswalk, each record identifies a possible intersection between a single source block and a single target block, along with an interpolation weight (ranging between 0 and 1) identifying approximately what portion of the source zone's population and housing units were located in the intersection. These weights can be used to estimate how any characteristics available for source blocks (e.g., females age 75 and over, single-member households, owner-occupied housing units, etc.) are distributed among target blocks.
For example, to interpolate count data from 1990 blocks to 2010 blocks:
- Obtain data of interest for 1990 blocks
- E.g., using the NHGIS Data Finder, find and download tables for the Block geographic level for the year 1990 (dataset 1990_STF1)
- Join the 1990-block-to-2010-block crosswalk to the 1990 block data of interest
- Multiply the 1990 block counts by the crosswalk's interpolation weights, producing estimated counts for all 1990-2010 block intersections, or "atoms"
- Sum these atom counts for each 2010 block
Since 1990, all census geographic units correspond exactly to a set of blocks from the same census year. Therefore, the NHGIS block-to-block crosswalks can be used to allocate data from 1990, 2000 or 2020 to any 2010 census units, or from 2010 to any 2020 census units, not just to blocks.
For example, to compute 2000 counts for 2010 school districts:
- Obtain 2000 block counts of interest and use the 2000-to-2010 block crosswalk to generate 2000 data for 2010 blocks (following steps similar to those above)
- Obtain a 2010 block data table from NHGIS and join it to the 2000 counts from step 1
- You can find codes for most 2010 census units, including school districts, in 2010 block-level table files
- Sum the 2000 counts for all 2010 blocks in each 2010 school district
Crosswalks from Block Group Parts
The steps for using crosswalks from block group parts (BGPs) mirror those for crosswalks from blocks. One key difference is that finding data for BGPs through the NHGIS Data Finder involves an extra step.
In NHGIS terminology, BGPs are a type of compound level. To find them in a selection window for Geographic Levels, you will need to click on the "Show Compound Geographic Levels" option in the upper right. You will then find BGP levels under the "BLOCK GROUP" heading.
Summary files for 1990 and 2000 use different versions of BGPs, each combining different sets of geographic units. The crosswalks use these levels:
|1990||blck_grp_598||Block Group [1990 partition] (by State--County--County Subdivision--Place/Remainder--Census Tract--Congressional District (1987-1993, 100th-102nd Congress)--American Indian/Alaska Native Area/Remainder--Reservation/Trust Lands/Remainder--Alaska Native Regional Corporation/Remainder--Urbanized Area/Remainder--Urban/Rural)|
|2000||blck_grp_090||Block Group [2000 & 2010 partition] (by State--County--County Subdivision--Place/Remainder--Census Tract--Urban/Rural)|
Crosswalks from BGPs differ from block crosswalks in their inclusion of multiple interpolation weights for different census counts, as laid out in the Technical Details section. We generate the BGP crosswalks by determining the proportions of block-level characteristics located in both the source BGP and the target zone. The separate interpolation weights are based on different block-level characteristics, enabling crosswalk users to choose the weights that are most suitable for their data of interest.
For example, a user interpolating counts of employed females may choose to use weights based on total populations, assuming that the distribution of employed females corresponds closely to the general population's distribution. A user interpolating counts of high-income households may instead choose to use weights based on total households, etc.
Extending to 2011-2019 Census Units
In most cases, allocating data to 2010 counties, census tracts, or block groups will produce results that are directly comparable to tables from the 2011-2019 releases of American Community Survey (ACS) Summary Files. The Census Bureau generally makes no changes to its definitions of census tracts or block groups between censuses, and changes to county boundaries are also uncommon in recent decades.
There are a few exceptions:
- Prior to the 2011 ACS data release: Tract numbering corrections and minor geographic definition changes in Madison County (FIPS: 053), Oneida County, (065) and Richmond County (085), New York (36)
- Prior to the 2012 ACS data release: Tract numbering corrections in Pima County (FIPS: 019), Arizona (04), and the restoration of a deleted tract in Los Angeles County (037), California (06)
- In 2013, 2015, and 2019: A few changes to the boundaries and/or codes of counties or county equivalents in Alaska, Virginia, and South Dakota. See the ACS Geography Boundaries by Year to determine which ACS data releases were affected by these changes.
We plan to extend our crosswalks in the future to incorporate these discrepancies and facilitate allocations to the census units for any year since 2010 for anywhere in the U.S.
The crosswalks are provided through the links below as comma-separated values (CSV) files within Zip archives.
Each Zip file includes a "README" text file that describes the content of the crosswalk file in detail.
The interpolation weights in NHGIS crosswalks are primarily based on "target-density weighting" (TDW) (Schroeder 2007). TDW assumes that characteristics within each source zone have a distribution proportional to the densities of another characteristic among target zones. For example, if a 2020 block intersects two 2010 blocks, one of which was 10 times as dense as the other in 2010, then TDW assumes that the same 10:1 ratio holds within the 2020 block in 2020.
The interpolation weights in the crosswalks from 1990 and 2000 blocks to 2010 blocks involve some more advanced modeling as documented in these pages:
To generate crosswalks from block group parts (BGPs), we used block-level data and the block-to-block crosswalks to calculate the proportion of each BGP's characteristics that were located in each target zone. The Python-based source code for these calculations is available in this Github repository, which includes an FAQ with additional documentation.
Each crosswalk file is complete for the entire U.S. or for an entire state. State-level files include all target zones for the state as well as any source zones that intersect any of those target zones, including some source zones from neighboring states in cases where the Census Bureau adjusted state boundary lines between censuses.
Users interested in producing a complete set of data for a single state may need to obtain source data for both the state of interest and its neighboring states to ensure they have the required input data to allocate to all target zones in the state.
The block crosswalks and the nationwide BGP crosswalks can include many millions of records and may therefore be too large to open in some applications (e.g., Microsoft Excel).
There are two types of block crosswalk files, differing in the codes they use to identify blocks:
- GISJOIN identifiers match the identifiers used in NHGIS data tables and boundary files. A block GISJOIN concatenates these codes:
Component Notes "G" prefix This prevents applications from automatically reading the identifier as a number and, in effect, dropping important leading zeros State NHGIS code 3 digits (FIPS + "0"). NHGIS adds a zero to state FIPS codes to differentiate current states from historical territories. County NHGIS code 4 digits (FIPS + "0"). NHGIS adds a zero to county FIPS codes to differentiate current counties from historical counties. Census tract code 6 digits for 2000 and 2010 tracts. 1990 tract codes use either 4 or 6 digits. Census block code 4 digits for 2000 and 2010 blocks. 1990 block codes use either 3 or 4 digits.
- GEOID identifiers correspond to the codes used in most current Census sources (American FactFinder, TIGER/Line, Relationship Files, etc.). A block GEOID concatenates these codes:
Component Notes State FIPS code 2 digits County FIPS code 3 digits Census tract code 6 digits. 1990 tract codes that were originally 4 digits (as in NHGIS files) are extended to 6 with an appended "00" (as in Census Relationship Files). Census block code 4 digits for 2000 and 2010 blocks. 1990 block codes use either 3 or 4 digits.
The BGP crosswalks include GISJOIN identifiers for the source BGPs as well as both GISJOIN and GEOID identifiers for each target zone. For details on those identifiers, see the README file that accompanies the BGP crosswalks.
The block-to-block crosswalks include a single interpolation weight (labeled "WEIGHT") representing the expected proportion of the source block's population and housing units located in each target block. These are the weights used for geographically standardized time series tables.
The crosswalks from block group parts include four distinct weights:
|wt_pop||Expected proportion of source zone's population located in target zone|
|wt_fam||Expected proportion of source zone's families located in target zone|
|wt_hh||Expected proportion of source zone's households located in target zone. Note: household counts are equal to counts of occupied housing units and of householders.|
|wt_hu||Expected proportion of source zone's housing units located in target zone|
In crosswalks with 1990 source zones, the zone identifier fields may contain blank values. Blank values are given in cases where a source or target zone lies entirely offshore in coastal or Great Lakes waters. In these cases we are unable to use NHGIS boundary files, which exclude offshore areas, to determine relationships between 1990 and later census zones. (For censuses after 1990, we use block relationship files from the Census Bureau to identify intersections in offshore areas.) None of the blocks or BGPs with a "blank" target zone had any reported population or housing units. We include the records with blank values to ensure that all source and target zones are represented in the file.
Blocks → Blocks: GISJOIN Identifiers
2020 → 2010
2010 → 2020
2000 → 2010
1990 → 2010
Blocks → Blocks: GEOID Identifiers
2020 → 2010
2010 → 2020
2000 → 2010
1990 → 2010
Block Group Parts → Block Groups
2000 → 2010
1990 → 2010
Block Group Parts → Census Tracts
2000 → 2010
1990 → 2010
Block Group Parts → Counties
2000 → 2010
1990 → 2010
Citation and Use
Use of NHGIS crosswalks is subject to the same conditions as for all NHGIS data. See Citation and Use of NHGIS Data.
- ^ Schroeder, J. P. (2007). "Target-density weighting interpolation and uncertainty evaluation for temporal analysis of census data." Geographical Analysis 39(3), 311–335. http://dx.doi.org/10.1111/j.1538-4632.2007.00706.x