UNDERSTANDING “DIFFERENTIAL PRIVACY”
The U.S. Census Bureau collects sensitive data from individuals and households, publishing demographic data that is essential for telling the story of Americans and their communities. Under Section 9 of the Census Act, U.S. Code Title 13, as a “trusted curator” the Bureau is required uphold respondent privacy in data that are released publicly. To meet this requirement, the Census Bureau typically releases aggregate-level data (i.e. census block, block-group, or tract) and implements various disclosure avoidance techniques (e.g. collapsing data, variable suppression). More recently however, modern computational methods, combined with more publicly available data, has increased the risk of exposing individual privacy. In response, the Bureau has explored approaches for modernizing disclosure avoidance.
WHAT IS DIFFERENTIAL PRIVACY?
Differential privacy includes various techniques aimed at limiting available aggregate information to protect individual privacy. More specifically, differential privacy attempts to balance privacy loss and accuracy through mathematical formulas. Once the limit for acceptable privacy loss is established (this is part of the current debate), measures including adding synthetic data (or introducing “synthetic noise”), data-swapping, and data imputation can be used to ensure that the database is sufficiently safeguarded from reconstruction and individual identification.
DIFFERENTIAL PRIVACY AND THE CENSUS
The U.S. Census Bureau has expressed interest in implementing differential privacy in the 2020 Census. In practice, the Census Bureau would need to set a limit for the amount of disclosure avoidance that balances privacy with data utility and accuracy. Given the importance of decennial census and ACS data, it is critical to understand the impact that differential privacy will have on data availability, particularly for cross-tabulated data (e.g. poverty by race/ethnicity), microdata (e.g. Public Use Microdata Sample or PUMS), and for small-area geographies (e.g. census blocks).
In October 2019, the Bureau released demonstration data that applies differential privacy to Census 2010 data. Since then, academics, policymakers, and other data users have examined how Census 2010 test data can yield different policy results. These studies have been presented or were featured in the following meetings/events:
CNSTAT Expert Meeting on Disclosure Avoidance
June 2020. DAS Updates, by Michael Hawes.
June 2020. Metrics Updates, by Christine Borman and Matthew Spence.
2020 Disclosure Avoidance System (DAS)
Committee on National Statistics
WHERE CAN I GET THE CENSUS 2010 TEST DATA?
For novice users, one of the transparent and user-friendly tools for examining differences between actual Census 2010 data and Census 2010 test data is from ESRI:
For more advanced data users interested in examining data in a statistical data package, the data are available from:
WHAT ARE THE CONCERNS ABOUT DIFFERENTIAL PRIVACY?
A white paper written by Census Bureau staff acknowledges that differential privacy “lacks a well-developed theory for measuring the relative impact of added noise on the utility of different data products, tuning equity trade-offs, and presenting the impact of such decisions.” By adjusting the perceived demographic composition of communities, differential privacy has the capacity to disproportionately impact racial/ethnic minorities and underrepresented individuals. Communities where individuals of color make up a small percentage of the population, for example, may require data swapping to a different tract or block group to meet the privacy limits set under differential privacy protocol.
The Minnesota Population Center (MPC) outlines a number of concerns regarding differential privacy. The MPC recommends the following points be addressed before differential privacy policies are implemented:
More testing is needed before final decisions are made on how differential privacy will be applied to census data.
Differential privacy is not appropriate or feasible for ACS microdata (e.g. PUMS).
For all data products, the Census Bureau should proceed cautiously in close consultation with the data user community.
DIFFERENTIAL PRIVACY IN THE NEWS
May 2020
April 2020
“Census 2020 Will Protect Your Privacy More than Ever—But at the Risk of Accuracy,” by Nicholas Nagle. The Conversation
March 2020
“Modernizing Disclosure Avoidance: What We’ve Learned, Where We Are Now,” by John M. Abowd and Victoria A. Velkoff. US Census Bureau
February 2020
US Census Bureau Response to Federal State Cooperative for Population Estimates (FSCPE) Questions Surrounding Differential Privacy
FOR MORE INFORMATION
This topic is ever-evolving. As such, this post will be updated to make the most current information available.
Good, less-technical overviews:
“To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data,” by Mark Hensen, New York Times (December 2018)
“Potential privacy lapse found in Americans' 2010 census data.” NBC News, (February 2019).
More in-depth, technical resources:
Modernizing Disclosure Avoidance: A Multipass Solution to Post-Processing Error,” by John Abowd and Victoria Velkoff. U.S. Census Bureau (June 2020).
Dear Differential Privacy, Put Up or Shut Up, by Paul Francis. Medium (January 2020).
Oregon Census State Data Center (SDC) Annual Data Users Conference (October 2019)
“New Privacy Measures for the 2020 Census.” Michael Hawes.
“Status Update on the 2020 Census Data Products Plan.” Marc Perry and Rachel Marks.
Differential Privacy, An Easy Case. Mark Hansen (January 2019).
Innovating Data Privacy for the American Community Survey. -Rolando Rodriguez and Amy Lauger (2019).
Challenges and New Approaches for Protecting Privacy in Federal Statistical Programs. National Academies, Committee on National Statistics (2019).
Changes to Census Bureau Data Products. University of Minnesota, IPUMS webpage (2019).
ARCHIVE
In a presentation at the American Community Survey (ACS) Data Users Group (DUG) meeting in May 2019, Dr. Connie Citro recommended the Bureau consider the following points as they move forward on differential privacy:
An observation and recommendation by Dr. Citro: “taking the relationship between the Census Bureau and users to the next level of systematic, two-way interaction. That relationship, in my experience going back over 50 years, is not yet there” (Slide 3).
To build credibility among data users (Slide 7), Dr. Citro calls on the Bureau to: “institutionalize systematic, two-way, transparent interaction—structured input, dialog, preliminary decision, [repeat], and document the final decision (Slide 8).
Dr. Citro offers a number of “Ways and Means to Step Up” (Slides 11-13).