Webinar Q & A: How Will New Census Bureau Privacy Measures Change 2020 Decennial Census Data?

|

On November 18, APDU hosted a webinar on new measures taken by the Census Bureau to protect respondent privacy in the decennial census known as “differential privacy.” The webinar recording is available to APDU members in the member email and in a follow-up email to webinar registrants. Below are follow-up answers to questions from the question and answer portion of the webinar.

Has there been any discussion concerning cell specific error terms—akin to ACS MOEs seen in the summary files?

DVR answer – We do not know whether error terms will be provided with the Decennial census counts. Computing the error terms from the underlying data uses up part of the privacy loss budget. Census would have to decide whether that use of the privacy loss budget would be worth doing. If part of the privacy loss budget is used to compute error terms, the actual could will be more inaccurate. The Census Bureau recognizes the importance of such error terms – see https://arxiv.org/abs/1809.02201 for more details.

Do I understand correctly that a variable is invariant means that it will be reported as tabulated with no change?

That is correct. An “invariant” is a variable to which no noise will be added—it will be reported as enumerated (including any editing or imputation).

Is there any chance that the Bureau will realize that the cost/benefit of this is totally unacceptable? It seems like a massive over-reaction to me.

APDU has no formal position on this, but highly encourages all data users to submit their feedback to the Census Bureau’s email dcmd.2010.demonstration.data.products@census.gov

For more information about the comment process, see https://www.census.gov/programs-surveys/decennial-census/2020-census/planning-management/2020-census-data-products/2010-demonstration-data-products.html

Why is Illinois very high in the test tables? Is it because MCDs are a key part of the state’s political structure? Why wouldn’t New England states also be high in your tables, since MCDs are keys in those states?

Minor civil divisions are a fundamental part of Illinois’ political structure, and there are lots of MCDs with small populations. I bet that MCDs in New England have a larger populations, on average, than those in Illinois. Noise injection via differential privacy has a larger proportional impact on small populations. Thus, we see a larger fraction of Illinois MCDs with no vacant housing counts than we observe in New England states.

Will smaller geographies sum to larger ones, such as blocks to blockgroups?

Yes, smaller geographies will sum to larger units, such as blocks to block groups. The final output of the differential privacy algorithm is a set of microdata with block IDs on them. Tabulations derived from these microdata will sum up the geography hierarchy.

Why is a Laplace distribution used?

Technically the Census Bureau is using a geometric distribution, which allows the process to draw integer values for noise-introduction (and is similar to Laplace). Laplace is the current standard in differential privacy across the data privacy field. See the following two links for a more detailed discussion of Laplace vs. other symmetric distributions:

https://stats.stackexchange.com/questions/187410/what-is-the-purpose-of-using-a-laplacian-distribution-in-adding-noise-for-differ

https://www.johndcook.com/blog/2019/02/05/normal-approximation-to-laplace-distribution/

https://www.johndcook.com/blog/2017/09/20/adding-laplace-or-gaussian-noise-to-database/

Kathy’s slide 3 or 4 showed that one table that looks to be dropped for 2020 data products is HH by presence of nonrelatives. You also say Census may drop tables on young children at specific ages, such as 2 or 3. If these tables are dropped, research on the persistent and growing undercount of young children will be severely hampered. Households with nonrelatives are one of three types of complex households that have the highest correlation with young children who were originally missed in the 2010 Census, just some of whom were added back into the 2010 Census counts through the Census Followup Operation. The undercount of young children is a major issue that has been recognized by Congressional Committees, as well as the Census Bureau’s outside Advisory Committees, and Complete Count Committees. These are CRITICAL data for data users and policy makers. These tables are VERY much needed and we should urge the Census Bureau to provide these data!    

To be clear, there’s no definite decision on tables yet. The Census Bureau is proposing for the DHC tables to have a table on “SEX BY AGE FOR THE POPULATION UNDER 20 YEARS [43]” at the block level so there’ll be a count of children by specific ages, so I think that will meet your use case. One question is the needed geography for tables such as HOUSEHOLD TYPE BY RELATIONSHIP BY AGE FOR THE POPULATION UNDER 18 YEARS [36] (PCO9 in the new DHC) is proposed at the county level only, so may not meet the needs if people are using it at the tract level now.

Another issue is the importance of related children (which is not a category in the new tables) versus own children only. For example, related children are grouped with other non-related children in the PCO9 table. This is not my area of study but may be of concern to some people.

In any case, we encourage you to dig into the tables yourself and share your perspective with the Bureau. In addition to whether the tables are published, whether the data is appropriate for your use case will also depend on the level of accuracy of the numbers.  If you have a particular table of interest to your organization, we encourage you to take a look at the demonstration data and how it compares to SF1 values in 2010.

For information on how to submit comments to the Census Bureau, see https://www.census.gov/programs-surveys/decennial-census/2020-census/planning-management/2020-census-data-products/2010-demonstration-data-products.html