UNDERSTANDING “DIFFERENTIAL PRIVACY”

The U.S. Census Bureau collects sensitive data from individuals and households, publishing demographic data that is essential for telling the story of Americans and their communities. Under Section 9 of the Census Act, U.S. Code Title 13, as a “trusted curator” the Bureau is required uphold respondent privacy in data that are released publicly. To meet this requirement, the Census Bureau typically releases aggregate-level data (i.e. census block, block-group, or tract) and implements various disclosure avoidance techniques (e.g. collapsing data, variable suppression). More recently however, modern computational methods, combined with more publicly available data, has increased the risk of exposing individual privacy. In response, the Bureau has explored approaches for modernizing disclosure avoidance.

WHAT IS DIFFERENTIAL PRIVACY?

Differential privacy includes various techniques aimed at limiting available aggregate information to protect individual privacy. More specifically, differential privacy attempts to balance privacy loss and accuracy through mathematical formulas. Once the limit for acceptable privacy loss is established (this is part of the current debate), measures including adding synthetic data (or introducing “synthetic noise”), data-swapping, and data imputation can be used to ensure that the database is sufficiently safeguarded from reconstruction and individual identification.

DIFFERENTIAL PRIVACY AND THE CENSUS

The U.S. Census Bureau has expressed interest in implementing differential privacy in the 2020 Census. In practice, the Census Bureau would need to set a limit for the amount of disclosure avoidance that balances privacy with data utility and accuracy. Given the importance of decennial census and ACS data, it is critical to understand the impact that differential privacy will have on data availability, particularly for cross-tabulated data (e.g. poverty by race/ethnicity), microdata (e.g. Public Use Microdata Sample or PUMS), and for small-area geographies (e.g. census blocks).

In October 2019, the Bureau released demonstration data that applies differential privacy to Census 2010 data. Since then, academics, policymakers, and other data users have examined how Census 2010 test data can yield different policy results. These studies have been presented or were featured in the following meetings/events:

WHERE CAN I GET THE CENSUS 2010 TEST DATA?

For novice users, one of the transparent and user-friendly tools for examining differences between actual Census 2010 data and Census 2010 test data is from ESRI:

For more advanced data users interested in examining data in a statistical data package, the data are available from:

WHAT ARE THE CONCERNS ABOUT DIFFERENTIAL PRIVACY?

A white paper written by Census Bureau staff acknowledges that differential privacy “lacks a well-developed theory for measuring the relative impact of added noise on the utility of different data products, tuning equity trade-offs, and presenting the impact of such decisions.” By adjusting the perceived demographic composition of communities, differential privacy has the capacity to disproportionately impact racial/ethnic minorities and underrepresented individuals. Communities where individuals of color make up a small percentage of the population, for example, may require data swapping to a different tract or block group to meet the privacy limits set under differential privacy protocol.

The Minnesota Population Center (MPC) outlines a number of concerns regarding differential privacy. The MPC recommends the following points be addressed before differential privacy policies are implemented:

  1. More testing is needed before final decisions are made on how differential privacy will be applied to census data.

  2. Differential privacy is not appropriate or feasible for ACS microdata (e.g. PUMS).

  3. For all data products, the Census Bureau should proceed cautiously in close consultation with the data user community.

DIFFERENTIAL PRIVACY IN THE NEWS

May 2020

April 2020

March 2020

February 2020

  • US Census Bureau Response to Federal State Cooperative for Population Estimates (FSCPE) Questions Surrounding Differential Privacy

FOR MORE INFORMATION

This topic is ever-evolving. As such, this post will be updated to make the most current information available.

Good, less-technical overviews:

More in-depth, technical resources:

ARCHIVE

In a presentation at the American Community Survey (ACS) Data Users Group (DUG) meeting in May 2019, Dr. Connie Citro recommended the Bureau consider the following points as they move forward on differential privacy:

  1. An observation and recommendation by Dr. Citro: “taking the relationship between the Census Bureau and users to the next level of systematic, two-way interaction. That relationship, in my experience going back over 50 years, is not yet there” (Slide 3).

  2. To build credibility among data users (Slide 7), Dr. Citro calls on the Bureau to: “institutionalize systematic, two-way, transparent interaction—structured input, dialog, preliminary decision, [repeat], and document the final decision (Slide 8).

  3. Dr. Citro offers a number of “Ways and Means to Step Up” (Slides 11-13).

Jason Jurjevich