- What are time series tables?
- An example: Persons by Sex tables
- Integration methods
An NHGIS time series table links together comparable statistics from multiple U.S. censuses in one downloadable bundle. A table is comprised of one or more related time series, each of which describes a single summary statistic (e.g., the count of occupied housing units) measured at multiple times (e.g., each census year from 1970 to 2010) at selected geographic levels (e.g., states or counties). The set of covered statistics, years, and geographic levels varies from table to table according to which categories are available, and at which levels, at different times. Tables may also differ in the method of geographic integration they use to align geographic units across time.
Users may request any set of time series tables, for any set of geographic levels, in any of three possible layouts. NHGIS then delivers data covering all years associated with the requested tables and including all areas for the requested geographic levels (excluding Puerto Rico data prior to 2010).
NHGIS provides three Persons by Sex time series tables, each of which contains two time series: Persons: Male and Persons: Female. The three tables differ in the geographic levels and years they cover, and in the type of geographic integration they use.
Two Persons by Sex tables use nominal integration, aligning geographic units across time simply by unit name or code without regard to any changes in unit boundaries. The first nominally integrated table (code A08) provides state or county data for all censuses back to 1820. The second table (AV0) provides data for states, counties, county subdivisions, places, or census tracts for censuses back to 1970 and for the 2008-2012 5-year period (using data from the 2012 American Community Survey).
The third Persons by Sex table (CM0) provides geographically standardized 2000 and 2010 data for 2010 census tracts. Future data releases will expand this table to cover more geographic levels and years.
Linking census summary statistics across time requires two types of integration: attribute integration, ensuring that the measured characteristics in a time series are comparable across time, and geographic integration, ensuring that the areas summarized by time series are comparable across time.
Attribute integration: To define time series tables, NHGIS researchers create metadata specifying sets of comparable statistics from various source datasets. In many instances, generating a single time series (e.g., Persons: Under 5 years) requires aggregating multiple source statistics (e.g., summing Males under age 5 and Females under age 5) to produce a comparable statistic across all years. NHGIS researchers specify any needed operations in the metadata, and the extract system completes the computations in advance… saving NHGIS users from another big data processing hassle!
NHGIS researchers execute attribute integration one tabulation type at a time, where a single "tabulation type" includes all tables that summarize a particular feature (e.g., persons, families, housing units, etc.) using a particular aggregation method (e.g., counts, medians, quotients, etc.) broken down into categories for a particular set of characteristics (e.g., sex, age, race, sex by age, etc.). Example tabulation types include: Persons by Sex by Age, Median Age by Sex, and Per Capita Income.
For each tabulation type, NHGIS researchers may define multiple tables to cover as many categories as possible for different sets of years and geographic levels. As a result, a single time series (e.g., Males under age 5) may appear in multiple tables, but each complete table is unique in the combination of categories, years and geographic levels it covers, or in the method of geographic integration it uses.
NHGIS assigns a consistent label and code to each complete time series, and the labels supply key information about changes in measured concepts. For example, to indicate precisely the timing and nature of a discrepancy in educational attainment categories, one time series is labeled 4 or more years of college (until 1980) or bachelor's degree or higher (since 1990).
- Geographic integration: NHGIS time series tables align geographic units across time in one of two ways:
Nominally integrated tables link geographic units across time according to their names and codes, disregarding any changes in unit boundaries. The identified geographic units match those from each census source, so the spatial definitions and total number of units may vary from one time to another (e.g., a city may annex land, a tract may be split in two, a new county may be created, etc.). The tables include data for a particular geographic unit only at times when the unit's name or code was in use, resulting in truncated time series for some areas.
Nominal integration is useful for:
- Mapping spatial patterns at different times, in which case it is appropriate to map the geographic units in use at each time
- Measuring changes in areas where boundaries were stable, as they are for most states and counties between censuses
- Studying changes in characteristics of places and county subdivisions according to their legal definitions, including annexations, etc.
Users should be cautious when interpreting changes in nominally integrated time series because a single unit code may refer to distinctly different areas at different times. We recommend that users of nominally integrated tables inspect NHGIS boundary files (which are available for most years and levels covered by time series tables) to identify any boundary changes in areas of interest.
Geographically standardized tables provide data from multiple times for a single census's units. For example, currently available standardized time series tables provide 2000 and 2010 data for 2010 census units. To allocate one census's summary data to another census's geographic units, NHGIS reaggregates data from the smallest source units for which the data are available (e.g., 2000 census blocks). Where a source unit intersects multiple standard units, NHGIS applies interpolation to estimate how the source unit's characteristics are distributed among the standard units.
Detailed description of geographic standardization procedures:
For each standardized statistic, NHGIS also supplies lower and upper bounds based on the spatial relationship between the source units and standard units. For example, if there are three 2000 census blocks that straddle a 2010 census unit's boundary, then it is possible that either all or none of the three blocks' 2000 residents were located in the 2010 unit. The upper bound assumes that all residents and housing units in straddling blocks were located in the 2010 unit, and the lower bound assumes that none were.
NHGIS has not yet implemented standardization for non-count statistics such as medians and quotients. Therefore, currently available standardized tables supply only count statistics.
NHGIS delivers standardized statistics with two decimal digits of precision in order to reduce the size of rounding errors when users sum estimates. Rounding errors may still occur, but in most settings, these errors will be small and can be cleanly eliminated by rounding sums to integers.
The NHGIS Data Finder indicates which type of geographic integration each time series table uses. Users may select and download tables of one or both types. When a data request includes tables of different integration types, NHGIS delivers separate data files for each type.
Time series tables are designed to encompass as many different statistics as possible for a predetermined set of topics, years, and geographic levels:
Topic coverage: Initial time series releases covered only tabulation types available in 2010 census summary files, i.e., only characteristics measured by 100%-count statistics such as sex, age, race, Hispanic or Latino origin, household and group quarters type, housing occupancy and tenure, etc. Newer releases of nominally integrated tables use American Community Survey data and earlier sample-based datasets to cover such characteristics as education, income, poverty, marital status, place of birth, etc. Future releases will continue to add tables of other tabulation types.
At this time, geographically standardized tables cover only statistics available for 2000 census blocks, which are a subset of all 100% count statistics from 2000 Summary File 1. NHGIS staff are investigating additional standardization methods that would reaggregate statistics from census tracts or block groups, making it possible to standardize sample-based statistics, which the Census Bureau does not report at the block level.
To see all topics for which time series are available, open the Topics filter window through the NHGIS Data Finder and look for the "TS" icon, which indicates topics covered by time series tables.
Year coverage: Initial time series assembly has focused on censuses since 1970 because 1970 data cover a much larger range of topics and geographic levels than the 1960 data. We plan to provide more tables spanning longer time ranges in future releases.
The year coverage for individual tables also depends on the ranges of time for which different aggregate statistics are available, and on the method of geographic integration used. For example, the Persons by Race Combination tables extend only as far back as 2000 because the 2000 census was the first to tabulate such counts.
At this time, all geographically standardized tables cover only 2000 and 2010. Future releases will add geographically standardized 1990 and 1980 data.
Geographic coverage: Nominally integrated tables provide data for up to eight different geographic levels: nation, region, division, state, county, census tract, county subdivision, and place. At this time, geographically standardized tables provide only census tract data, but work is underway to cover several more levels soon.
The geographic levels available for nominally integrated tables are also restricted according to the availability of statistics for all years covered by the table. For example, because the 1970 census summary files do not provide statistics at the nation, region, or division levels, tables covering 1970 do not provide data for any of these levels. Similarly, tables that use data from 1990 Summary Tape Files 2 or 4 do not provide statistics for the place level because the place data in those summary files was restricted to larger places.
Time series tables can be downloaded in one of three layouts:
- Time varies by file: Data for different times are placed in different files. Within each file, the rows correspond to geographic units, and each column corresponds to a single time series instance at a single time.
- Time varies by column: Data for different times are placed in separate columns within one file. The rows correspond to geographic units, and the columns correspond to particular times within a time series. E.g., one column reports Median Age in 2000 and another column reports Median Age in 2010.
- Time varies by row: Data for different times are placed in separate rows within one file. Each row represents a single geographic unit at a single time (e.g., Alabama in 1990 or Alabama in 2000), and each column corresponds to a single time series.
For each time series table, NHGIS provides a complete listing of table contents, coverage, and sources, along with notes describing any known comparability issues and links to relevant source documentation. These details can be accessed by clicking on a table name in the Data Finder. The complete set of all table details is also available here:
The initial definition, documentation, and dissemination of NHGIS time series tables was a central component of the Integrated Spatio-Temporal Aggregate Data Series (ISTADS) project at the Minnesota Population Center, with funding provided by the Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD) at the National Institutes of Health. Two current grants from the National Science Foundation and the NICHD support the geographic standardization and expansion of NHGIS time series tables.