What are aggregate data?
What is the source of NHGIS data?
What are time series tables?
What are breakdowns?
Does NHGIS have data for areas outside the USA?
Does NHGIS have data for Puerto Rico, the US Virgin Islands, or other territories?
Can I use NHGIS for genealogy?
Do I need special statistical software like SPSS, SAS, or Stata to use NHGIS data?
How do I obtain data?
Why can’t I select data for a single county, place, or metro area?
Why are there so many different geographic level filter options?
Why can’t I get data for census tracts from before 1910?
Can I just get a copy of every data table you have?
How long does a data extract take?
What is the American Community Survey (ACS)?
Does NHGIS have all of the ACS datasets?
Should I use 1-year, 3-year, or 5-year ACS data?
Which ACS year range is best for comparisons with decennial censuses?
Why do I receive so many data files in my downloaded zipped file?
Why don’t the data include FIPS codes?
What do the field names (e.g., AG3001, B5P001) in an NHGIS data file mean?
What is wrong with the 1960 data?
Something is wrong with the data I downloaded. What should I do?
I don’t have Esri ArcGIS! Can I still map NHGIS data?
I downloaded some GIS boundary files, but where are the census data?
OK, I downloaded the data tables and the GIS boundary files. Now how do I join them together?
Why am I missing columns of data when I bring the .csv into a GIS?
What projected coordinate system are the GIS boundary data in?
What are the units of the SHAPE_AREA and SHAPE_LEN fields?
Why isn’t the GISJOIN the same from 1 decade to the next for a census tract that doesn’t seem to change?
Why don't newer shapefiles align with older shapefiles?
Why do most historical shapefiles have both a "2000 TIGER/Line +" and a "2008 TIGER/Line +" version?
General Data Questions [top]
What are aggregate data? [top]
Aggregate data summarize sets of individuals through counts, sums, means or other aggregate statistics. The NHGIS specifically provides spatially aggregated census data: data summarizing individuals within particular areas, such as states or counties, where the "individuals" might be persons, housing units, farms, libraries, newspapers or any other features that were at some point counted in a U.S. census. No individual-level records, with or without personally identifiable information, are included anywhere in the NHGIS. So if you are trying to find your great-great-great grandfather who homesteaded in the late 1800s, you will not have any luck with NHGIS data.
What is the source of NHGIS data? [top]
The original source of most of the aggregate data, with a few exceptions, is the U.S. Census. For censuses since 1970, the NHGIS obtained digital data directly from the Census Bureau. For earlier censuses, NHGIS data are generally derived from secondary sources: separate projects which have, over the course of several decades, undertaken the arduous conversion of pre-computer-age historical data from print media to a digital, machine-readable form. Most of this work was completed by Michael Haines, Donald Bogue, Andrew Beveridge and their respective research teams. Documentation for most sources is available on the Tabular Data Documentation page.
The GIS boundary files are based on the TIGER/Line Files that the U.S. Census Bureau creates, which NHGIS staff edited to produce all pre-1990 boundaries. The primary guide for historical census tract boundaries was original census maps and, for early county boundaries, it was the book, Map Guide to the U.S. Federal Censuses 1790-1920, by William Thorndale and William Dollarhide (Genealogical Publishing Co., Baltimore, MD, c. 1987).
What are time series tables? [top]
NHGIS time series tables link together comparable statistics from multiple censuses in one table with standardized labels and codes for all years. Users can view available time series tables and select and download them in various layouts through the Data Finder. Complete information on the derivation, layout, and coverage of time series tables is available here.
What are breakdowns? [top]
In NHGIS terminology, "breakdowns" are additional categories of data that are available for all tables in some datasets. In Census documentation, these categories are typically referred to as "iterations" or "components."
For example, 2010 Summary File 2 has two breakdown variables. The first distinguishes Geographic Subareas (or, in Census terms, Geographic Components) such as "In metropolitan statistical area" and "Not in metropolitan statistical area," etc. The second distinguishes Race/Ethnicity categories (or, in Census terms, Characteristic Iterations) such as "White alone, not Hispanic or Latino" and "Native Hawaiian alone or in any combination," etc.
Does NHGIS have data for areas outside the USA? [top]
NHGIS does not have international data. Two other projects at the Minnesota Project Center, IPUMS-International and NAPP provide census microdata (not aggregate data) for other countries.
Does NHGIS have data for Puerto Rico, the US Virgin Islands, or other territories? [top]
We do not have any data for these areas for the years 1790-2000. The 2010 Census and American Community Survey (ACS) do include Puerto Rico data.
Can I use NHGIS for genealogy? [top]
Sorry! NHGIS data are aggregate data, which means we have no information on specific people. There are no names or addresses on anything we have. Genealogists, unfortunately, will have to look elsewhere for this type of data. Resources such as www.ancestry.com or www.familysearch.org may be useful websites to visit.
Do I need special statistical software like SPSS, SAS, or Stata to use NHGIS data? [top]
Absolutely not! Any spreadsheet software like Microsoft Excel will work fine, and even that technically is not required. Statistical software may make it easier to analyze large amounts of data, and NHGIS does provide an output file format specifically for the three major statistical software packages, complete with a command file for each.
Data Access Questions [top]
How do I obtain data? [top]
All NHGIS data are delivered through our data extraction system. Users select the data tables and GIS boundary files they would like, and the system creates a custom-made extract containing this information. Data are generated on our server, and the system sends out an email message to the user when the extract is completed. The user must download the extract and analyze it on their local machine. Users need to register for a free account before they can submit an extract request. Users do not, however, need to register or login prior to building an extract request. Detailed information on using the data extraction system is provided on the User’s Guide page.
Why can’t I select data for a single county, place, or metro area? [top]
Rather than specifying a specific geographic location, users only select the geographic level of interest. For example, if interested in data for Hennepin County, Minnesota, you would simply select "County" as your geographic level. Then, after downloading the county-level data for the entire United States, users can easily extract the specific locations of interest.
NHGIS provides data in this format primarily in the interest of standardizing the selection interface for all years and geographic levels. Giving users the ability to select data from multiple years and data from previously unavailable "compound" geographic levels (see the next FAQ) makes it difficult to support selection by specific geographic extent. Data from different years or from compound geographic levels do not consistently nest within a static set of selectable areas. In addition, over the years NHGIS has discovered that users typically like more data, rather than less! Finally, removing the need to select a geographic extent simply shortens the time it takes users to create an extract.
Why are there so many different geographic level filter options? [top] NHGIS provides access to compound geographic levels. When you click to "show all geographic levels" on the Filter page, you see the standard geographic levels like Census Tract or County, but for each of these, there are numerous compound levels also available with labels describing exactly how the compound level is subdivided.
Standard geographic levels nest consistently within a hierarchy of larger units. For example, census tracts nest within counties, which in turn nest within states. This is a standard geographic level because census tracts cannot be split by county boundaries nor can counties be split by state boundaries. If you download data at this geographic level, a single record for each census tract is returned.
The non-standard or "compound" geographic levels consist of intersections between non-nesting standard levels. For example, you can now download census tracts in a State>Place>County hierarchy. Because census tracts do not nest within places, this geographic level will provide separate records for the portions of census tracts contained within different places, and a census tract's code will appear in multiple records if multiple places fall within its boundary.
Why can’t I get data for census tracts from before 1910? [top] They did not exist back then! The 1910 census was the first for which the Census Bureau tabulated tract data.
In addition, NHGIS does not provide shapefiles for all census summary levels. For example, although NHGIS does provide data tables at the "Congressional District (1983-1985, 98th Congress)" level, NHGIS does not provide a corresponding shapefile.
Can I just get a copy of every data table you have? [top]
You might not realize how big the NHGIS really is. (It's into the terabytes!) Users frequently ask for this, thinking it will save them time to have every file we have; trust us, it won’t. Sifting through over eighteen thousand tables and hundreds of thousands of fields is not an easy task. Data-hungry researchers are free to email NHGIS to request large quantities of data delivered outside of the website data extraction system. Honoring said requests, however, is at the discretion of NHGIS staff.
How long does a data extract take? [top]
The time needed to complete an extract request depends on the size of the data extract requested and the load on our server. Extracts can take from a few seconds to an hour or more. The system sends an email when the extract is completed, so there is no need to stay active on the NHGIS site while the extract is being prepared. You can also refresh your Extracts History page in your web browser any time after submitting a request to see if there's been a change in the extract status.
Questions about the American Community Survey [top]
What is the American Community Survey (ACS)? [top]
The U.S. Census Bureau initiated the ACS program as a way to provide more current data than the decennial census—at annual rather than decennial intervals. The ACS also replaced the "long form" survey that had been used in decennial censuses from 1940 through 2000 to collect detailed information from a sample of the U.S. population. The 2010 decennial census instead used a single standard "short form" for the entire population, covering only a few basic characteristics like age, sex, race, Hispanic/Latino origin, and housing tenure. In effect, the ACS is now the only source for recent census information on subjects ranging from education and income to housing value and plumbing facilities.
The Census published its first ACS Summary File for the 2005 survey year. The Census now conducts ACS surveys throughout each month of every year for a small sample of housing units and group quarters. Following each calendar year, the Census releases tabulations of ACS responses for 1-year, 3-year, and 5-year periods. Expanding the time frame increases the total sample size, which reduces sampling error and enables tabulation for smaller areas with smaller populations. The caveat is that ACS data can be more difficult to describe and interpret because they do not represent a single point in time.
Does NHGIS have all of the American Community Survey datasets? [top]
Currently, NHGIS includes 1-Year, 3-Year, and 5-Year Summary Files from the 2010, 2011 and 2012 ACS releases. We aim to add data from each new ACS release within six weeks of the Census Bureau release date. We will also continually be adding older ACS datasets, beginning with 2009 Summary Files. Look for updates on data additions on the NHGIS News page.
Should I use 1-year, 3-year, or 5-year ACS data? [top]
Users interested in studying annual trends, particular moments in time, or the most current conditions will prefer the temporal precision of the 1-year estimates. But these data are also the least reliable ACS estimates because they have the smallest sample period, and to ensure an acceptable sample size for all estimates, the Census tabulates 1-year data only for areas with populations of 65,000 or more.
Because the 5-year ACS data have a longer sample period, they are considerably more reliable than 1-year data, and the Census provides 5-year estimates for all areas regardless of population. The 5-year data are also the only ACS series to include data for census tracts and block groups.
The 3-year data represent a compromise option, providing estimates for areas with populations of 20,000 or more, with finer temporal precision than 5-year data and greater reliability than 1-year data.
Which ACS year range is best for comparisons with decennial censuses? [top]
For comparisons with older census data, NHGIS highly recommends using an ACS dataset with a year range centered on 2010 (the 2010 1-year data, the 2009-2011 3-year data, or the 2008-2012 5-year data) in order to maintain a consistent decade interval between the "center points" of the study's measurement periods.
For example, to compare changes from 1990 to 2000 with changes after 2000, it might at first seem like using data from the 2010 ACS release would be most suitable. But if the study requires the complete geographic coverage provided by 5-year ACS data, then the most suitable ACS source is not the 2010 5-year data (spanning 2006-2010 and centered on 2008) but rather the 2012 5-year data (spanning 2008-2012 and centered on 2010). Using the latter will ensure that both studied change periods (1990 to 2000 and 2000 to 2008-2012) are about one decade long and therefore more directly comparable.
Questions about Downloaded Data [top]
Why do I receive so many data files in my downloaded zipped file? [top]
A tabular data file is returned for each dataset, geographic level, and year for which tables are selected. If you only downloaded county level data for one year and from one dataset, you will only receive one data file. If your data extract request was more expansive, however, you can expect more data files. For example, selecting data from 1940 and 1950 at the state and county geographic levels, you would receive four data files (1940 state, 1940 county, 1950 state, 1950 county).
Why don’t the data include FIPS codes? [top]
FIPS codes did not exist prior to the 1970 Census, so they could not be provided for older census data tables. The NHGIS instead provides custom codes, which generally correspond to FIPS codes, with an added zero, for all areas that persisted beyond 1970. For recent censuses, NHGIS shapefiles include concatenated NHGIS codes but not the FIPS codes in order to minimize file size. FIPS codes do exist within all NHGIS data tables for 1970 and later, but the codes are not concatenated as you would typically find in data downloaded from American FactFinder.
What do the field names (e.g., AG3001, B5P001) in an NHGIS data file mean? [top]
The key to these unique, NHGIS-created column names is found in the Codebook file(s) that were automatically included in your data extract. Look for the .txt file(s) in the zipped file you downloaded, and they will shed some light on your data.
What is wrong with the 1960 data? [top]
Sadly, such a common question! The 1960 Census employed a uniquely restrictive data suppression strategy that leaves many data tables with lots of missing data. In addition, the NHGIS can offer only a small set of data tables for states and counties due to a scarcity of digital 1960 Census data, and the 1960 tract data come from 2 separate sources, resulting in some inconsistent redundancy. Review the Tabular Data Documentation page for more detailed information.
The Minnesota Population Center is currently working in collaboration with the Census Bureau to recover lost data from the 1960 Census of Population and Housing. Once the project is completed, new 1960 summary files will be available along with additional microdata products.
Something is wrong with the data I downloaded. What should I do? [top]
There are lots of reasons why something may seem wrong with your data. Typically (but not always), the issue stems from trying to use the incorrect data format for your software or by not being aware of the many, many odd quirks that exist in older census data. Review the information on the User’s Guide and Data Documentation pages for additional information. Of course, always feel free to contact NHGIS User Support at email@example.com with any questions you may have!
GIS Questions [top]
I don’t have Esri ArcGIS! Can I still map NHGIS data? [top]
There are several options for those who do not have access to Esri ArcGIS. A number of open source GIS programs are available, including GRASS and QGIS. In addition, the Social Explorer website allows online mapping of select NHGIS data. Another option for many students is a free student ArcGIS license, which can often be obtained through college GIS or Geography departments for class purposes, or by purchasing select books from the Esri Press. Student licenses vary in length, but are typically 6 or 12 months.
I downloaded some GIS boundary files, but where are the census data? [top]
GIS boundary files, on their own, do not contain any census data, even if you downloaded the data tables at the same time. To attach tabular NHGIS data files to the GIS boundary files requires a join operation in your GIS software. Additional information on using a GIS with NHGIS data can be found on the User’s Guide page.
OK, I downloaded the data tables and the GIS boundary files. Now how do I join them together? [top]
NHGIS has made it as easy as possible to join data tables to their respective GIS boundary files. In both files, you will find a field called GISJOIN that will serve as the join field. Additional information on using a GIS with NHGIS data can be found on the User’s Guide page.
Why am I missing columns of data when I bring the .csv into a GIS? [top]
Unfortunately, when adding a .csv file into older versions of Esri ArcGIS the maximum number of columns that ArcGIS will import is 255, and any additional fields are truncated. This is a known issue to Esri and Microsoft and is outside the control of NHGIS. This issue is resolved in ArcGIS versions 10.1 and newer. Users of older versions of ArcGIS may try using the Quick Import tool that is part of the Data Interoperability extension to ArcGIS as a workaround. It is not available, however, for all ArcGIS users. Other solutions do exist and additional information on the issue, along with instructions on using the Quick Import tool, can be found on the User’s Guide page. In addition, be advised that older versions of Microsoft Excel (pre-2007) have the same 255-column limitation.
What projected coordinate system are the GIS boundary data in? [top]
NHGIS shapefiles use Esri's USA Contiguous Albers Equal Area Conic projection. Prior to May 2013, NHGIS provided separate files for Alaska in the Alaska Albers Equal Area Conic projection, for Hawaii in the Hawaii Albers Equal Area Conic, and for Puerto Rico in an Albers Equal Area Conic projection with central meridian, standard parallels and latitude of origin set to match the Puerto Rico State Plane Coordinate System's.
What are the units of the SHAPE_AREA and SHAPE_LEN fields? [top]
SHAPE_AREA is an area measurement in square meters. SHAPE_LEN is a perimeter measurement in meters.
Why isn’t the GISJOIN the same from one decade to the next for a census tract that doesn’t seem to change? [top]
The numbering of census tracts (and other lower levels of geography like census blocks) is determined systematically for entire counties by a combination of local and Census Bureau authorities. In the early years of tract definitions, the numbering systems occasionally underwent dramatic revisions, but even in recent censuses, when numbering systems have been more stable, tract boundary changes in one part of a county can force a renumbering to tracts throughout the county to accommodate a more logical numbering.
Why don't newer shapefiles align with older shapefiles? [top]
Recent releases of the Census TIGER/Line Shapefiles have included numerous improvements in spatial accuracy (from a greater use of GPS and localized data sources). This has resulted in numerous misalignments between the new data and older, less accurate NHGIS boundaries. To address this issue, NHGIS staff conducted a systematic realignment of our historical shapefile boundaries to accord with newer, 2008-based TIGER/Line data.
The Census Bureau, however, subsequently made additional improvements to TIGER/Line features, so the 2008-based files NHGIS created are not consistently comparable with 2010 and later TIGER/Line files. In general, most 2008-based boundaries align better than 2000-based boundaries with more recent TIGER/Line files, but the 2008-based boundaries also include occasional gross inaccuracies.
Why do most historical shapefiles have both a "2000 TIGER/Line +" and a "2008 TIGER/Line +" version? [top]
The realignment project, as mentioned in the previous question, resulted in new boundaries being created for all tract and county shapefiles 1790-2000 based on updated 2008 TIGER/Line boundaries. Because the original and new shapefiles each have instances where they are better suited, both are available for download. Users may wish to download both the 2000- and 2008-based versions of historical boundaries in order to determine which is more suitable for his or her study area and analysis.
The original shapefiles based, in part, on the 2000 TIGER/Line boundaries are typically a better choice when mapping only pre-2010 data as all boundaries should better align. If wanting to display the 1970 census tracts inside the SMSA boundaries, for example, the 2008-based census tract may not align with the 2000-based SMSA file.
The new, 2008-based shapefiles align more closely (but not completely) with 2010 and later shapefiles and are typically a better choice when creating an overlay that includes both historical and 2010 or later shapefiles. The 2008-based census tracts for 1990 and 2000, for example, will typically align more closely with 2010-based census tracts.
NHGIS Project Questions [top]
Should I cite NHGIS in my paper? How? [top]
Reports and publications using NHGIS data must be cited appropriately. The citation is:
Minnesota Population Center. National Historical Geographic Information System: Version 2.0. Minneapolis, MN: University of Minnesota 2011.
Also, it is extremely important for us that you send us a citation or link for any paper or article you write using NHGIS data. Continued funding for the NHGIS depends on our ability to show our sponsor agencies that researchers are using the data for productive purposes.
What data is NHGIS working on right now? [top]
Currently, we are preparing to release the 2009 and prior ACS datasets, as well as new time series tables. In addition, we are in the early stages of creating GIS point files for historical places.
How do I get a job working with NHGIS? [top]
We’re flattered you're interested in us! NHGIS is a part of the Minnesota Population Center, which is itself part of the University of Minnesota on the Twin Cities campus. Please visit the employment pages at MPC and at the U of M for up to date job postings for both students and professionals.
I have a suggestion for how you can improve your website! Do you want to hear it? [top]
Sure! We are always open to new suggestions. This does not mean we can always act on the suggestion, but many changes to NHGIS have come about through users’ suggestions. You may direct yours to firstname.lastname@example.org.