APDU Member Blog Post: It’s not too late to rebuild data-user trust in Census 2020 data products
By: Jan Vink, Cornell Program on Applied Demographics
Vice chair of the Federal State Cooperative on Population Estimates Steering Committee
Opinions are my own
The Census Bureau is rethinking the way it will produce the data published from the Census 2020. They argue that the old way is not good enough anymore in this day and age because with enough computer power someone could learn too many details about the respondents.
There are two separate but related aspects to this rethinking:
- The table shells: what tabulations to publish and what not to publish
- Disclosure Avoidance Systems (DAS) that add noise to the data before creating these tables
Both aspects have huge consequences for data users. A good place to start reading about this rethinking is the 2020 Census Data Products pages at the Census Bureau.
The Census Bureau is aware that there will be this impact and has asked the data-user community for input in the decision process along the way. There were Federal Register Notices asking for use cases related to the 2010 tables, an ask for feedback on a proposed set of tables. There were publications of application of a DAS to 1940 Census data, 2018 PL94-171 data from the 2018 test and the 2010 Demonstration Products. Currently the Census Bureau is asking for feedback on the measurement of progress of the DAS implementation they plan to use for the first set of products coming out of the Census.
The intentions of stakeholder involvement were good BUT didn’t lead to buy-in from those stakeholders and many are afraid that the quantity and quality of the published data will severely impact the capability to make sound decisions and do sound research based on Census 2020 and products that are directly or indirectly based on that data. Adding to this anxiety is the very difficult unexpected circumstances the Census Bureau has to deal with while collecting the data.
From my perspective as one of those stakeholders that is wary about the quantity and quality of the data there are a few things that could have gone better:
- The need for rethinking is not communicated clearly. For example, I cannot find a Census Bureau publication that plainly describe the re-identification process, all I can find are a few slides in a presentation. A layman’s explanation of the legal underpinning would be helpful as well as some argue that there has been a drastic reinterpretation.
- The asks for feedback were all very complicated, time consuming and reached only a small group of very dedicated data users that felt tasked to respond for many and stick with the low hanging fruits.
- It is not clear what the Census Bureau did with the responses.
- The quality of the 2010 Demonstration Products was very low and would have severely impacted my use of the data and many others uses.
- Most Census Bureau communications about this rethinking consisted of a mention of a trade-off between privacy and accuracy followed by a slew of arguments about the importance of privacy and hardly any mention how important accuracy is for the mission of the Census Bureau. Many stakeholders walked away with the feeling that the Bureau feels responsibility for privacy protection, but not as much for accuracy.
There is a hard deadline for the production of the PL94-171 data, although Congress has the power to extend that date because of the Covid-19 pandemic. Working back from that, I am afraid that decision time is not too far away. The Census Bureau is developing the DAS using an agile system with about 8 weeks between ‘sprints’. The Bureau published updated metrics from sprint II at the end of May, but already started with sprint IV at that time. If we keep the 8 weeks between sprints this implies in my estimation that there is room on the schedule for 2 or 3 more sprints and very little time to rebuild trust from the data-user community.
Examples of actions that would help rebuilding some trust are:
- Appointing someone that is responsible for the stakeholder interaction. So far, my impression is that there is no big picture communication plan and two-way communication depends too much on who you happen to know within the Census Bureau. Otherwise the communication is impersonal and slow and often without a possibility for back-and-forth. This person should also have the seniority to fast-trac the publication review process so stakeholders are not constantly 2 steps behind.
- Plan B. A chart often presented to us is a line that shows the trade-off between privacy and accuracy. The exact location of that line depends on the privacy budget and the implementation of the DAS and the Census Bureau seems to have the position that they can implement a DAS with a sweet spot between accuracy and privacy that would be an acceptable compromise. But what if there is no differential privacy based DAS implementation (yet?) that can satisfy a minimal required accuracy and a maximal allowed disclosure risk simultaneous? So far it is an unproven technique for such a complex application. It would be good to hear that the Census Bureau has a plan B and a set of criteria that would lead to a decision to go with plan B.
- Promise another set of 2010 data similar to the 2010 demonstration products so data users can re-evaluate the implications of the DAS. This should be done in a time frame that allows for tweaks to the DAS. Results of these evaluations could be part of the decision whether to move to plan B.
- Have a public quality assurance plan. The mission of the Census Bureau is to be the publisher of quality data, but I could not find anything on the Census Bureau website that indicates what is meant with data quality and what quality standards are used. Neither could I find who in the Census Bureau oversees and is responsible for data quality. For example: does the Bureau see accuracy and fitness for use as the same concepts? Others disagree. And what about consistency? Can inconsistent census data still be of high quality? Being open about data quality and have a clear set of quality standards would help showing that quality is of similar priority as privacy.
- Publish a time line, with goals and decision points.
- Feedback on the feedback: what did the Bureau do with the feedback? What criteria were used to implementing some feedback but not others?
Time is short and stakes are high, but I think there are still openings to regain trust of the data community and have Census data products that will be of provable high quality and protects the privacy of the respondents at the same time.