What are aggregate data?
What is the source of NHGIS data?
What are time series tables?
What are breakdowns?
Does NHGIS have data for Puerto Rico, the US Virgin Islands, or other territories?
Do I need special statistical software like SPSS, SAS, or Stata to use NHGIS data?
How do I obtain data?
Why can’t I select data for a single county, place, or metro area?
What are compound geographic levels?
Why can’t I get data for census tracts from before 1910?
Can I just get a copy of every data table you have?
How long does a data extract take?
What is the American Community Survey (ACS)?
Does NHGIS have all of the ACS datasets?
Should I use 1-year, 3-year, or 5-year ACS data?
Which ACS year range is best for comparisons with decennial censuses?
Why do I receive so many data files in my downloaded zipped file?
Why don’t the data include FIPS codes?
What do the field names (e.g., AG3001, B5P001) in an NHGIS data file mean?
What is wrong with the 1960 data?
Something is wrong with the data I downloaded. What should I do?
I don’t have Esri ArcGIS! Can I still map NHGIS data?
I downloaded some GIS files, but where are the census data?
OK, I downloaded the data tables and the GIS files. Now how do I join them together?
Why am I missing columns of data when I bring the .csv into a GIS?
What projected coordinate system are the GIS files in?
What are the units of the SHAPE_AREA and SHAPE_LEN fields?
Why isn’t the GISJOIN the same from 1 decade to the next for a census tract that doesn’t seem to change?
Why don't newer shapefiles align with older shapefiles?
Why do most historical shapefiles have both a "2000 TIGER/Line +" and a "2008 TIGER/Line +" version?
General Data Questions
What are aggregate data? Back to Top
Aggregate data summarize sets of individuals through counts, sums, means or other aggregate statistics. The NHGIS specifically provides spatially aggregated census data: data summarizing individuals within particular areas, such as states or counties, where the "individuals" might be persons, housing units, farms, libraries, newspapers or any other features that were at some point counted in a U.S. census. No individual-level records, with or without personally identifiable information, are included anywhere in the NHGIS. So if you are trying to find your great-great-great grandfather who homesteaded in the late 1800s, you will not have any luck with NHGIS data.
What is the source of NHGIS data? Back to Top
The original source of most of the aggregate data, with a few exceptions, is the U.S. Census. For censuses since 1970, the NHGIS obtained digital data directly from the Census Bureau. For earlier censuses, NHGIS data are generally derived from secondary sources: separate projects which have, over the course of several decades, undertaken the arduous conversion of pre-computer-age historical data from print media to a digital, machine-readable form. Most of this work was completed by Michael Haines, Donald Bogue, Andrew Beveridge and their respective research teams. Documentation for most sources is available on the Tabular Data Documentation page.
The GIS boundary files are based on the TIGER/Line Files that the U.S. Census Bureau creates, which NHGIS staff edited to produce all pre-1990 boundaries. The primary guide for historical census tract boundaries was original census maps and, for early county boundaries, it was the book, Map Guide to the U.S. Federal Censuses 1790-1920, by William Thorndale and William Dollarhide (Genealogical Publishing Co., Baltimore, MD, c. 1987).
What are time series tables? Back to Top
NHGIS time series tables link together comparable statistics from multiple censuses in one table with standardized labels and codes for all years. Users can view available time series tables and select and download them in various layouts through the Data Finder. Complete information on the derivation, layout, and coverage of time series tables is available here.
What are breakdowns? Back to Top
In NHGIS terminology, "breakdowns" are additional categories of data that are available for all tables in some datasets. In Census documentation, these categories are typically referred to as "iterations" or "components."
For example, 2010 Summary File 2 has two breakdown variables. The first distinguishes Geographic Subareas (or, in Census terms, Geographic Components) such as "In metropolitan statistical area" and "Not in metropolitan statistical area," etc. The second distinguishes Race/Ethnicity categories (or, in Census terms, Characteristic Iterations) such as "White alone, not Hispanic or Latino" and "Native Hawaiian alone or in any combination," etc. Users who request NHGIS tables from 2010 Summary File 2 may select any combination of these breakdown categories to obtain data for the corresponding subareas and race/ethnicity groups.
For datasets with two breakdown variables (like Summary File 2), data are available only for certain, adequately represented breakdown combinations. For example, 2010 Summary File 2 supplies no data for Cubans (a Race/Ethnicity breakdown) in Hawaiian Home Lands (a Geographic Subarea breakdown). In addition, data for a given breakdown combination may be available for only a limited set of geographic levels. For example, 2010 Summary File 2 supplies data for the Race/Ethnicity breakdown of "Papua New Guinean alone" at the Nation level only.
Users who request a breakdown combination at a geographic level for which no data exist will receive no data for that combination at that level.
Does NHGIS have data for Puerto Rico, the US Virgin Islands, or other territories? Back to Top
We do not have any data for these areas for the years 1790-2000. The 2010 Census and American Community Survey (ACS) do include Puerto Rico data.
Can I use NHGIS for genealogy? Back to Top
Not really. NHGIS data are aggregate data, which means we have no information on specific people. There are no names or addresses in anything we have.
Do I need special statistical software like SPSS, SAS, or Stata to use NHGIS data? Back to Top
Absolutely not! Any spreadsheet software like Microsoft Excel will work fine, and even that technically is not required. Statistical software may make it easier to analyze large amounts of data, and NHGIS does provide an output file format specifically for the three major statistical software packages, complete with a command file for each.
Data Access Questions
How do I obtain data? Back to Top
All NHGIS data are delivered through our data extraction system. Users select the data tables and GIS boundary files they would like, and the system creates a custom-made extract containing this information. Data are generated on our server, and the system sends out an email message to the user when the extract is completed. The user must download the extract and analyze it on their local machine. Users need to register for a free account before they can submit an extract request. Users do not, however, need to register or login prior to building an extract request. Detailed information on using the data extraction system is provided on the User’s Guide page.
Why can’t I select data for a single county, place, or metro area? Back to Top
Rather than specifying a specific geographic location, users only select the geographic level of interest. For example, if interested in data for Hennepin County, Minnesota, you would simply select "County" as your geographic level. Then, after downloading the county-level data for the entire United States, users can easily extract the specific locations of interest.
NHGIS provides data in this format primarily in the interest of standardizing the selection interface for all years and geographic levels. Giving users the ability to select data from multiple years and data from previously unavailable "compound" geographic levels (see the next FAQ) makes it difficult to support selection by specific geographic extent. Data from different years or from compound geographic levels do not consistently nest within a static set of selectable areas. In addition, over the years NHGIS has discovered that users typically like more data, rather than less! Finally, removing the need to select a geographic extent simply shortens the time it takes users to create an extract.
What are compound geographic levels? Back to Top
All U.S. census geographic levels have "nesting relationships" with other levels. For example, counties nest within states, meaning that every county lies entirely in a single state, and no county extends into multiple states. Both census tracts and county subdivisions nest within counties, but census tracts and county subdivisions have no nesting relationship with each other; a census tract may include parts of multiple county subdivisions, and vice versa.
Typically, census summary data describe whole units, e.g., population counts for whole census tracts or whole county subdivisions. But since 1970, the Census Bureau has also tabulated data for intersections between non-nesting units, e.g., counts for the parts of census tracts lying in different county subdivisions.
NHGIS identifies levels consisting of intersections between non-nesting units as compound levels. Within the NHGIS Data Finder, in places where you can select geographic levels, the default view hides compound levels. To see the compound levels, simply click on the "Show compound geographic levels" option.
Why can’t I get data for census tracts from before 1910? Back to Top
They did not exist back then! The 1910 census was the first for which the Census Bureau tabulated tract data.
In addition, NHGIS does not provide shapefiles for all census summary levels. For example, although NHGIS does provide data tables at the "Congressional District (1983-1985, 98th Congress)" level, NHGIS does not provide a corresponding shapefile.
Can I just get a copy of every data table you have? Back to Top
You might not realize how big the NHGIS really is. (It's into the terabytes!) Users frequently ask for this, thinking it will save them time to have every file we have; trust us, it won’t. Sifting through over eighteen thousand tables and hundreds of thousands of fields is not an easy task. Data-hungry researchers are free to email NHGIS to request large quantities of data delivered outside of the website data extraction system. Honoring said requests, however, is at the discretion of NHGIS staff.
How long does a data extract take? Back to Top
The time needed to complete an extract request depends on the size of the data extract requested and the load on our server. Extracts can take from a few seconds to an hour or more. The system sends an email when the extract is completed, so there is no need to stay active on the NHGIS site while the extract is being prepared. You can also refresh your Extracts History page in your web browser any time after submitting a request to see if there's been a change in the extract status.
Questions about the American Community Survey
What is the American Community Survey (ACS)? Back to Top
The U.S. Census Bureau initiated the ACS program as a way to provide more current data than the decennial census—at annual rather than decennial intervals. The ACS also replaced the "long form" survey that had been used in decennial censuses from 1940 through 2000 to collect detailed information from a sample of the U.S. population. The 2010 decennial census instead used a single standard "short form" for the entire population, covering only a few basic characteristics like age, sex, race, Hispanic/Latino origin, and housing tenure. In effect, the ACS is now the only source for recent census information on subjects ranging from education and income to housing value and plumbing facilities.
The Census conducts ACS surveys throughout the year for a small sample of housing units and group quarters. After each calendar year, the Census groups the responses across time to generate and publish new estimates of the average characteristics for 1-year and 5-year periods. (The Census intially also produced estimates for 3-year periods, but the 3-year summaries were discontinued after the 2013 ACS data release.)
Expanding the time frame increases the total sample size, which reduces sampling error and enables tabulations for smaller areas with smaller populations. The caveat is that ACS data can be more difficult to describe and interpret than decennial census data because ACS data do not represent a single point in time. For example, the ACS's 2010 5-year data do not provide estimates of 2010 characteristics; rather, they estimate average characteristics over the entire 2006-2010 5-year period.
Does NHGIS have all of the American Community Survey datasets? Back to Top
NHGIS includes most ACS Summary File data since the 2010 releases. This includes the 1-Year and 5-Year Summary Files from each of these releases, as well as 3-Year Summary Files for 2013 and earlier. NHGIS also includes all tables from the 2009 5-Year Summary Files--the Census's earliest 5-year dataset.
We aim to add data from each new release within six weeks of the Census Bureau's release date for 1-Year Summary Files and within four weeks for 5-Year Summary Files. Look for updates on data additions on the NHGIS News page.
Should I use 1-year, 3-year, or 5-year ACS data? Back to Top
Users interested in studying annual trends, particular moments in time, or the most current conditions will prefer the temporal precision of the 1-year estimates. But these data are also the least reliable ACS estimates because they have the smallest sample period, and to ensure an acceptable sample size for all estimates, the Census tabulates 1-year data only for areas with populations of 65,000 or more.
Because the 5-year ACS data have a longer sample period, they are considerably more reliable than 1-year data, and the Census provides 5-year estimates for all areas regardless of population. The 5-year data are the only ACS series to include data for census tracts and block groups.
The 3-year data represent a compromise option, providing estimates for areas with populations of 20,000 or more, with finer temporal precision than 5-year data and greater reliability than 1-year data. However, the 3-Year Summary Files were discontinued by the Census after the 2013 ACS release.
Which ACS year range is best for comparisons with decennial censuses? Back to Top
For comparisons with older census data, NHGIS highly recommends using an ACS dataset with a year range centered on 2010 (the 2010 1-year data, the 2009-2011 3-year data, or the 2008-2012 5-year data) in order to maintain a consistent decade interval between the "center points" of the study's measurement periods.
For example, to compare changes from 1990 to 2000 with changes after 2000, it might at first seem like using data from the 2010 ACS release would be most suitable. But if the study requires the complete geographic coverage provided by 5-year ACS data, then the most suitable ACS source is not the 2010 5-year data (spanning 2006-2010 and centered on 2008) but rather the 2012 5-year data (spanning 2008-2012 and centered on 2010). Using the latter will ensure that both studied change periods (1990 to 2000 and 2000 to 2008-2012) are about one decade long and therefore more directly comparable.
Questions about Downloaded Data
Why do I receive so many data files in my downloaded zipped file? Back to Top
A tabular data file is returned for each dataset, geographic level, and year for which tables are selected. If you only downloaded county level data for one year and from one dataset, you will only receive one data file. If your data extract request was more expansive, however, you can expect more data files. For example, selecting data from 1940 and 1950 at the state and county geographic levels, you would receive four data files (1940 state, 1940 county, 1950 state, 1950 county).
Why don’t the data include FIPS codes? Back to Top
FIPS codes did not exist prior to the 1970 Census, so they could not be provided for older census data tables. The NHGIS instead provides custom codes, which generally correspond to FIPS codes, with an added zero, for all areas that persisted beyond 1970. For recent censuses, NHGIS shapefiles include concatenated NHGIS codes but not the FIPS codes in order to minimize file size. FIPS codes do exist within all NHGIS data tables for 1970 and later, but the codes are not concatenated as you would typically find in data downloaded from American FactFinder.
What do the field names (e.g., AG3001, B5P001) in an NHGIS data file mean? Back to Top
The key to these unique, NHGIS-created column names is found in the Codebook file(s) that were automatically included in your data extract. Look for the .txt file(s) in the zipped file you downloaded, and they will shed some light on your data.
What is wrong with the 1960 data? Back to Top
Sadly, such a common question! The 1960 Census employed a uniquely restrictive data suppression strategy that leaves many data tables with lots of missing data. In addition, the NHGIS can offer only a small set of data tables for states and counties due to a scarcity of digital 1960 Census data, and the 1960 tract data come from 2 separate sources, resulting in some inconsistent redundancy. Review the Tabular Data Documentation page for more detailed information.
The Minnesota Population Center is currently working in collaboration with the Census Bureau to recover lost data from the 1960 Census of Population and Housing. Once the project is completed, new 1960 summary files will be available along with additional microdata products.
Something is wrong with the data I downloaded. What should I do? Back to Top
There are lots of reasons why something may seem wrong with your data. Typically (but not always), the issue stems from trying to use the incorrect data format for your software or by not being aware of the many, many odd quirks that exist in older census data. Review the information on the User’s Guide and Data Documentation pages for additional information. Of course, always feel free to contact NHGIS User Support at firstname.lastname@example.org with any questions you may have!
I don’t have Esri ArcGIS! Can I still map NHGIS data? Back to Top
There are several options for those who do not have access to Esri ArcGIS. A number of open source GIS programs are available, including GRASS and QGIS. In addition, the Social Explorer website allows online mapping of select NHGIS data. Another option for many students is a free student ArcGIS license, which can often be obtained through college GIS or Geography departments for class purposes, or by purchasing select books from the Esri Press. Student licenses vary in length, but are typically 6 or 12 months.
I downloaded some GIS files, but where are the census data? Back to Top
NHGIS shapefiles, on their own, do not contain any census data, even if you downloaded the data tables at the same time. To attach tabular NHGIS data files to the GIS files requires a join operation in your GIS software. Additional information on using a GIS with NHGIS data can be found on the User’s Guide page.
OK, I downloaded the data tables and the GIS files. Now how do I join them together? Back to Top
NHGIS has made it as easy as possible to join data tables to their respective GIS files. In both files, you will find a field called GISJOIN that will serve as the join field. Additional information on using a GIS with NHGIS data can be found on the User’s Guide page.
Why am I missing columns of data when I bring the .csv into a GIS? Back to Top
Unfortunately, when adding a .csv file into older versions of Esri ArcGIS the maximum number of columns that ArcGIS will import is 255, and any additional fields are truncated. This is a known issue to Esri and Microsoft and is outside the control of NHGIS. This issue is resolved in ArcGIS versions 10.1 and newer. Users of older versions of ArcGIS may try using the Quick Import tool that is part of the Data Interoperability extension to ArcGIS as a workaround. It is not available, however, for all ArcGIS users. Other solutions do exist and additional information on the issue, along with instructions on using the Quick Import tool, can be found on the User’s Guide page. In addition, be advised that older versions of Microsoft Excel (pre-2007) have the same 255-column limitation.
What projected coordinate system are the GIS files in? Back to Top
NHGIS shapefiles use Esri's USA Contiguous Albers Equal Area Conic projection. Prior to May 2013, NHGIS provided separate files for Alaska in the Alaska Albers Equal Area Conic projection, for Hawaii in the Hawaii Albers Equal Area Conic, and for Puerto Rico in an Albers Equal Area Conic projection with central meridian, standard parallels and latitude of origin set to match the Puerto Rico State Plane Coordinate System's.
What are the units of the SHAPE_AREA and SHAPE_LEN fields? Back to Top
SHAPE_AREA is an area measurement in square meters. SHAPE_LEN is a perimeter measurement in meters.
Why isn’t the GISJOIN the same from one decade to the next for a census tract that doesn’t seem to change? Back to Top
The numbering of census tracts (and other lower levels of geography like census blocks) is determined systematically for entire counties by a combination of local and Census Bureau authorities. In the early years of tract definitions, the numbering systems occasionally underwent dramatic revisions, but even in recent censuses, when numbering systems have been more stable, tract boundary changes in one part of a county can force a renumbering to tracts throughout the county to accommodate a more logical numbering.
Why don't newer shapefiles align with older shapefiles? Back to Top
Recent releases of the Census TIGER/Line Shapefiles have included numerous improvements in spatial accuracy (from a greater use of GPS and localized data sources). This has resulted in numerous misalignments between the new data and older, less accurate NHGIS boundaries. To address this issue, NHGIS staff conducted a systematic realignment of our historical shapefile boundaries to accord with newer, 2008-based TIGER/Line data.
The Census Bureau, however, subsequently made additional improvements to TIGER/Line features, so the 2008-based files NHGIS created are not consistently comparable with 2010 and later TIGER/Line files. In general, most 2008-based boundaries align better than 2000-based boundaries with more recent TIGER/Line files, but the 2008-based boundaries also include occasional gross inaccuracies.
Why do most historical shapefiles have both a "2000 TIGER/Line +" and a "2008 TIGER/Line +" version? Back to Top
The realignment project, as mentioned in the previous question, resulted in new boundaries being created for all tract and county shapefiles 1790-2000 based on updated 2008 TIGER/Line boundaries. Because the original and new shapefiles each have instances where they are better suited, both are available for download. Users may wish to download both the 2000- and 2008-based versions of historical boundaries in order to determine which is more suitable for his or her study area and analysis.
The original shapefiles based, in part, on the 2000 TIGER/Line boundaries are typically a better choice when mapping only pre-2010 data as all boundaries should better align. If wanting to display the 1970 census tracts inside the SMSA boundaries, for example, the 2008-based census tract may not align with the 2000-based SMSA file.
The new, 2008-based shapefiles align more closely (but not completely) with 2010 and later shapefiles and are typically a better choice when creating an overlay that includes both historical and 2010 or later shapefiles. The 2008-based census tracts for 1990 and 2000, for example, will typically align more closely with 2010-based census tracts.
NHGIS Project Questions
Should I cite NHGIS in my paper? How? Back to Top
Reports and publications using NHGIS data must cite NHGIS appropriately. See the Citation and Use page for NHGIS data use conditions and the current recommended citation.
How do I get a job working with NHGIS? Back to Top
We’re flattered you're interested in us! Please visit the employment pages at IPUMS for up-to-date job postings for both students and professionals. Also watch for emails from IPUMS, which go to all registered users, or follow IPUMS on Twitter (@ipums) to get timely job announcements.
I have a suggestion for how you can improve your website! Do you want to hear it? Back to Top
Sure! We are always open to new suggestions. This does not mean we can always act on the suggestion, but many changes to NHGIS have come about through users’ suggestions. You may direct yours to email@example.com.