Category Archives: Blog

Items in this category show up on apdu.org/blog

Data Is vs. Data Are: Settling the Debate

By Bernie Langer, APDU Board Member

There are many debates in the world of public data. Privacy vs. accuracy. Survey data vs. administrative data. CSV vs. XLS. But if you really want to see data nerds fight, ask them whether they say “data is” or “data are”. Is the word “data” singular or plural?

“Good data is important to good decision-making” or “Good data are important to good decision-making”?

This came up on Twitter recently, when NPR reporter/Census superfan Hansi Lo Wang tweeted: “…The 2020 census redistricting data, needed to redraw voting maps, is now expected by Sept. 30…” In his next tweet, he wrote: “(Sorry for 1st tweet’s typo: *data are)”.

It may seem trivial, but it’s important, and not for the reason you expect.

The argument for “data are” is thus: “Data” is derived from the Latin word datum, meaning, “that which is given.” In Latin, datum is a singular neuter noun. My high school Latin teacher made sure I never forget The Neuter Law: all neuter nouns (in the nominative and accusative cases) always end in -a. Therefor, the plural form of datum is data. Data is plural. Quod erat demonstrandum.

Furthermore, we know data to be a collection of individual values (observations, survey responses, etc.). A census never has only one respondent (unless it’s a very sad census). The concept is inherently plural.

The argument for “data is” is simple: “Data are” sounds ridiculous.

Okay, there are some more nuanced arguments for “data is.” We’re speaking English, not Latin. Language evolves. “Data” in common usage is an uncountable noun, like “water.” The ocean is full of water, but no one says, “Water are wet.”

But that’s secondary. What’s more important is: “Data are” sounds ridiculous.

As data professionals, we need to communicate with the rest of the world in a clear and accessible way. We want others to embrace the power of data, knowing that data can be useful to them. No one needs to be special to use data.

Insisting on treating data as a plural noun can be alienating. (Pro tip: Correcting someone’s grammar in any circumstance is alienating.) We don’t want anyone to think they’re not good enough to use data. Even if it’s not off-putting, it’s distracting. The general public doesn’t expect to hear “data are,” and when they do hear it, they’ll momentarily dwell on it, and not the substance of what was left in your sentence.

Of course, this isn’t just about grammar and the word “data.” It’s about not gatekeeping, and communicating complex (but understandable) concepts to the public on their terms. When non-experts understand data, data professionals become more valuable, not less.

And if your conscience cannot permit you to use data in a singular form (old habits die hard), then at the very least, when someone else does, bite your tongue.

A common refrain is “The plural of anecdote is not data.” Let’s reinforce that by not using data as a plural.

This blog post represents the views of its author and does not represent the view of APDU or the Board of Directors.

2021 APDU Conference Call for Proposals

Public Data: Making Sense of the New Normal

APDU is welcoming proposals on “making sense of the new normal” using public data. With economic, public health, and governance challenges arising from COVID-19 and political polarization, trustworthy public data is vital to open and honest policy debates. APDU is interested in proposals regarding:

  • Novel uses of public data to understand the shifting American landscape;
  • Ways that researchers and advocates are ensuring that public data is accurate and equitable;
  • How public data can help restore trust in institutions;
  • How to rebuild trust in public data; or
  • Other related and relevant topics.

Proposals can be for a single presentation or panel, whether based on a particular project, data practice, or formal paper. You may submit ideas for a single presentation or a full panel (three presenters, plus a moderator). However, it is possible that we will accept portions of panel submissions to combine with other presenters. Submissions will be evaluated on the quality of work, relevance to APDU Conference attendees, uniqueness of topic and presenter, and thematic fit.

EXTENDED Deadline: March 26, 2021

Please submit your proposal using the Survey Monkey collection window below.  Proposals will need to be submitted by members of APDU, and all presenters in a panel must register for the conference (full conference registration comes with a free APDU membership).  Proposers will be notified of our decision by mid-April.

About APDU

The Association of Public Data Users (APDU) is a national network that links users, producers, and disseminators of government statistical data. APDU members share a vital concern about the collection, dissemination, preservation, and interpretation of public data.  The conference will be held virtually on July 26-29, 2021, and brings together data users and data producers for conversations and presentations on a wide variety of data and statistical topics.

Create your own user feedback survey

2021 APDU President’s Message

Dear APDU members,

Happy New Year and thank you for either renewing or joining the Association of Public Data Users (APDU). I am honored to serve as the 2021 APDU President, and I look forward to working with you this year.

I am sure many of you are happy to turn the page on 2020 and are looking forward to more positive developments in 2021. There are many reasons to be hopeful; however, the pandemic continues to cast a cloud of uncertainty, which is forcing organizations, including APDU, to plan accordingly. As a result, our annual meeting will be held virtually again later this summer. Fortunately, the success of the 2020 meeting portends a repeat performance! I urge you to stay tuned as details emerge in the coming months and to plan to attend the 2021 APDU annual meeting.

While working to ensure a high quality, well attended annual meeting is always the APDU President’s top priority, I have other goals that I hope the APDU staff and board, with support from its members, can achieve this year. These goals include:

  • Increasing APDU membership – I will be forming a working group comprised primarily of APDU board members to develop strategies for boosting APDU membership– particularly among data users outside of the DC metropolitan area. We may be contacting APDU members to help inform the working group’s deliberations.
  • Enhancing Training – Throughout 2021, the APDU board and staff will be identifying opportunities to offer expanded training and networking opportunities for our members outside of those offered in conjunction with the annual meeting. We hope to host in-person events later in the year when it is safe for us to gather once again.
  • Improving communication  I want to continue to build upon the progress that has been made in recent years to enhance the APDU newsletter and to improve the organization’s website. I am also going to be asking the APDU board members to post at least one blog during the year on a topic they choose and encouraging APDU members to consider serving as “guest bloggers” on issues of interest to them, too.

Once again, thank you for being an APDU member! Please feel free to contact the APDU staff or any board members if you have ideas, concerns, or need assistance. We want to ensure APDU is serving the needs of its members and the broader public data user community.

Warm regards,

Mary Jo Hoeksema

Looking Back on the 2020 APDU Annual Conference

 

With the 2020 APDU Annual Conference in the rearview mirror, now is a good time to reflect on the week and look ahead to what’s next.

This year’s conference, as so many things in 2020, was disrupted but not diminished. While we didn’t have the opportunity to meet with each other in person, the virtual format enabled some of our friends from around the country to participate who might not have been able to otherwise.

Speakers like danah boyd of Microsoft Research and Data & Society Research Institute (excerpted above) brought a unique perspective to the conference, challenging our thinking about from issues ranging from how we approach issues of privacy and accuracy to the impacts misinformation and data voids can have on our understanding of data quality and reliability.

Federal agency leaders such as Deborah Stempowski, Brian Moyer, Bill Beach, and Mary Bohman provided insider insights into their organizations.

Speakers from universities and research organizations across the country covered hot topics such as data on COVID-19, evictions, policing, and more.

Speakers from the Census Bureau, universities, and nonprofits discussed how the Disclosure Avoidance System will affect the quality of Census data.

Attendees met with APDU board members in a series of town hall conversations on a variety of topics – offering a promising way for APDU members to connect with one another.

This year’s conference was a success for a variety of reasons – but the biggest reason was the engagement of our attendees and speakers. Stay tuned for continued quality programming in Fall 2020!

APDU Statement on Concerns Regarding the Census Field Operations Timeline

A statement from the APDU Board of Directors.

The 2020 Census will determine Congressional representation, and the data will form the foundation for the next decade of federal statistics. These data will provide guidance to the federal government on where to provide needed resources, and information to local governments on who lives in their states, cities, and towns.

Federal statistics also provide guidance to businesses on where their products and services are needed by consumers. Decisions on the spending of billions of dollars—public and private—will be made based on the next decade of federal statistics.

The 2020 Census forms the backbone of the next decade of federal statistics. It’s too important to rush.

The COVID-19 pandemic has created an unprecedented situation the Census Bureau has never had to deal with before. The national self-response rate is just above 60%; two out of five people in this country have yet to be counted. This is significantly below expected benchmarks. Despite the ongoing pandemic, census workers are beginning the process of going door to door to count everyone who hasn’t yet responded. This large-scale effort was slated to begin months ago, but was delayed by the pandemic.

Because of these circumstances, it’s necessary to extend the deadline for the Census Bureau to deliver its results. Census experts strongly believe that the Census Bureau needs extra time to conduct a complete and accurate count, as the Constitution requires.

This is a non-partisan issue that threatens businesses and governments in every part of the country. The Association of Public Data Users calls on Congress to extend the deadline for the 2020 Census to a timeframe that allows for a complete and accurate count.

APDU Response to Memorandum on the Apportionment Base Following the 2020 Census

A statement from the APDU Board of Directors.

On July 21, 2020 the Trump Administration issued a memorandum on apportionment counts from the 2020 Census suggesting that unauthorized migrants would be excluded from the counts.

At a time when the decennial census is already beset by unprecedented challenges, this new disruption further threatens the accuracy of the 2020 Census count. Regardless of whether or not the memorandum withstands legal challenges, its messaging will likely reduce census participation among all residents of the United States, undercounting not only unauthorized migrants but also citizens and authorized migrants who live in mixed-status households.

A complete and accurate census of all residents of the United States is critical for the proper functioning of federal, state, and local government agencies as well as businesses and organizations that rely on federal statistics to operate effectively. America needs a full count of all individuals in the United States because all people use our roads and mass transit, drink our clean water, use our electricity, require access to emergency services, and buy goods and services from our businesses. Without a full count, we cannot accurately allocate public or private investments to ensure a full functioning economy or adequate public services are available.

We urge the Administration to immediately retract this memorandum before it has an opportunity to influence the public’s willingness to respond to the decennial census operations now underway across the country.

APDU Past President: Why Attend the APDU Conference?

By Cliff Cook, Senior Planning Information Manager, City of Cambridge, Massaschusetts

In working with public data users often discover a shortfall between the way data would ideally be delivered and the form in which it actually arrives.  While the data we use is by definition only a partial reflection of the underlying reality, the ways in which we structure elements of the data collection, compilation and delivery systems all potentially to create further impediments to data access and usability.

The 2020 APDU conference will include a session dedicated to a discussion this important set of issues:  “Impediments to Accurate Statistics”.

We will hear from three experts in three different domains.

  • Elsa Schaffer, a Data Scientist from Ididio, will discuss her experience bringing together multiple sources of data that cover aspects of education, employment and income to develop data sets that help students and others with career choices.
  • Lavar Edwards, a Research Specialist from the Eviction Project, will talk about the myriad obstacles encountered in buiding a national database of rental housing eviction actions, a topic with significant implications for racial equity.
  • Abraham Flaxman, as Associate Professor from the University of Washington Institute for Health Metrics and Evaluation, will delve into the world of public health statistics and explore his experience using data about the Covid-19 pandemic.

This session will focus on how various types of impediments prevent users from obtaining the full value of data, how data users deal with these roadblocks, and how the data user community should advocate for solutions.

APDU Member Blog Post: It’s not too late to rebuild data-user trust in Census 2020 data products

By: Jan Vink, Cornell Program on Applied Demographics
Vice chair of the Federal State Cooperative on Population Estimates Steering Committee
Twitter: @JanVink18
Opinions are my own

The Census Bureau is rethinking the way it will produce the data published from the Census 2020. They argue that the old way is not good enough anymore in this day and age because with enough computer power someone could learn too many details about the respondents.

There are two separate but related aspects to this rethinking:

  1. The table shells: what tabulations to publish and what not to publish
  2. Disclosure Avoidance Systems (DAS) that add noise to the data before creating these tables

Both aspects have huge consequences for data users. A good place to start reading about this rethinking is the 2020 Census Data Products pages at the Census Bureau.

The Census Bureau is aware that there will be this impact and has asked the data-user community for input in the decision process along the way. There were Federal Register Notices asking for use cases related to the 2010 tables, an ask for feedback on a proposed set of tables. There were publications of application of a DAS to 1940 Census data, 2018 PL94-171 data from the 2018 test and the 2010 Demonstration Products. Currently the Census Bureau is asking for feedback on the measurement of progress of the DAS implementation they plan to use for the first set of products coming out of the Census.

The intentions of stakeholder involvement were good BUT didn’t lead to buy-in from those stakeholders and many are afraid that the quantity and quality of the published data will severely impact the capability to make sound decisions and do sound research based on Census 2020 and products that are directly or indirectly based on that data. Adding to this anxiety is the very difficult unexpected circumstances the Census Bureau has to deal with while collecting the data.

From my perspective as one of those stakeholders that is wary about the quantity and quality of the data there are a few things that could have gone better:

  • The need for rethinking is not communicated clearly. For example, I cannot find a Census Bureau publication that plainly describe the re-identification process, all I can find are a few slides in a presentation. A layman’s explanation of the legal underpinning would be helpful as well as some argue that there has been a drastic reinterpretation.
  • The asks for feedback were all very complicated, time consuming and reached only a small group of very dedicated data users that felt tasked to respond for many and stick with the low hanging fruits.
  • It is not clear what the Census Bureau did with the responses.
  • The quality of the 2010 Demonstration Products was very low and would have severely impacted my use of the data and many others uses.
  • Most Census Bureau communications about this rethinking consisted of a mention of a trade-off between privacy and accuracy followed by a slew of arguments about the importance of privacy and hardly any mention how important accuracy is for the mission of the Census Bureau. Many stakeholders walked away with the feeling that the Bureau feels responsibility for privacy protection, but not as much for accuracy.

There is a hard deadline for the production of the PL94-171 data, although Congress has the power to extend that date because of the Covid-19 pandemic. Working back from that, I am afraid that decision time is not too far away. The Census Bureau is developing the DAS using an agile system with about 8 weeks between ‘sprints’. The Bureau published updated metrics from sprint II at the end of May, but already started with sprint IV at that time. If we keep the 8 weeks between sprints this implies in my estimation that there is room on the schedule for 2 or 3 more sprints and very little time to rebuild trust from the data-user community.

Examples of actions that would help rebuilding some trust are:

  • Appointing someone that is responsible for the stakeholder interaction. So far, my impression is that there is no big picture communication plan and two-way communication depends too much on who you happen to know within the Census Bureau. Otherwise the communication is impersonal and slow and often without a possibility for back-and-forth. This person should also have the seniority to fast-trac the publication review process so stakeholders are not constantly 2 steps behind.
  • Plan B. A chart often presented to us is a line that shows the trade-off between privacy and accuracy. The exact location of that line depends on the privacy budget and the implementation of the DAS and the Census Bureau seems to have the position that they can implement a DAS with a sweet spot between accuracy and privacy that would be an acceptable compromise. But what if there is no differential privacy based DAS implementation (yet?) that can satisfy a minimal required accuracy and a maximal allowed disclosure risk simultaneous? So far it is an unproven technique for such a complex application. It would be good to hear that the Census Bureau has a plan B and a set of criteria that would lead to a decision to go with plan B.
  • Promise another set of 2010 data similar to the 2010 demonstration products so data users can re-evaluate the implications of the DAS. This should be done in a time frame that allows for tweaks to the DAS. Results of these evaluations could be part of the decision whether to move to plan B.
  • Have a public quality assurance plan. The mission of the Census Bureau is to be the publisher of quality data, but I could not find anything on the Census Bureau website that indicates what is meant with data quality and what quality standards are used. Neither could I find who in the Census Bureau oversees and is responsible for data quality. For example: does the Bureau see accuracy and fitness for use as the same concepts? Others disagree. And what about consistency? Can inconsistent census data still be of high quality? Being open about data quality and have a clear set of quality standards would help showing that quality is of similar priority as privacy.
  • Publish a time line, with goals and decision points.
  • Feedback on the feedback: what did the Bureau do with the feedback? What criteria were used to implementing some feedback but not others?

Time is short and stakes are high, but I think there are still openings to regain trust of the data community and have Census data products that will be of provable high quality and protects the privacy of the respondents at the same time.

 

 

 

APDU Member Post: Assessing the Use of Differential Privacy for the 2020 Census: Summary of What We Learned from the CNSTAT Workshop

By:

Joseph Hotz, Duke University

Joseph Salvo, New York City Department of City Planning

Background

The mission of the Census Bureau is to provide data that can be used to draw a picture of the nation, from the smallest towns and villages to the neighborhoods of the largest cities. Advances in computer science, better record linkage technology, and the proliferation of large public data sets have increased the risk of disclosing information about individuals in the census.

To assess these threats, the Census Bureau conducted a simulated attack, reconstructing person-level records from published 2010 Census tabulations using its previous Disclosure Avoidance System (DAS) that was based in large part on swapping data records across households and localities. When combined with information in commercial and publicly available databases, these reconstructed data suggested that 18 percent of the U.S. population could be identified with a high level of certainty. The Census Bureau concluded that, if adopted for 2020, the 2010 confidentiality measures would lead to a high risk of disclosing individual responses violating Title 13 of the U.S. Code, the law that prohibits such disclosures.

Thus, the Census Bureau was compelled to devise new methods to protect individual responses from disclosure. Nonetheless, such efforts – however well-intentioned – may pose a threat to the content, quality and usefulness of the very data that defines the Census Bureau’s mission and that demographers and statisticians rely on to draw a portrait of the nation’s communities.

The Census Bureau’s solution to protecting privacy is a new DAS based on a methodology referred to as Differential Privacy (DP). In brief, it functions by leveraging the same database reconstruction techniques that were used to diagnose the problem in the previous system: the 2020 DAS synthesizes a complete set of person- and household-level data records based on an extensive set of tabulations to which statistical noise has been added. Viewed as a continuum between total noise and total disclosure, the core of this method involves a determination regarding the amount of privacy loss or e, that can be accepted without compromising data privacy while ensuring the utility of the data. The key then becomes “where to set the dial”—set e too low and privacy is ensured at the cost of utility, but set e too high and utility is ensured but privacy in compromised. In addition to the overall level of e, its allocation over the content and detail of the census tabulations for 2020 is important. For example, specific block-level tabulations needed for redistricting may require a substantial allocation of the privacy-loss budget to achieve acceptable accuracy for this key use, but the cost is that accuracy of other important data (including for blocks, such as persons per household) will likely be compromised. Finding ways to resolve these difficult tradeoffs represents a serious challenge for the Census Bureau and users of its data.

The CNSTAT Workshop

In order to test how well this methodology worked in terms of the accuracy of noise-infused data, the Census Bureau issued special 2010 Census files subject to the 2020 DAS. The demonstration files applied the 2020 Census DAS to the 2010 Census confidential data — that is, the unprotected data from the 2010 Census that are not publicly available. The demonstration data permit scientific inquiry into the impact of DP. In addition, the Census commissioned the Committee on National Statistics (CNSTAT) of the National Academies of Sciences, Engineering and Medicine to host a 2-day Workshop on 2020 Census Data Products: Data Needs and Privacy Considerations, held in Washington, DC, on December 11-12, 2019. The two-fold purpose of the workshop was:

  • To assess the utility of the tabulations in the 2010 Demonstration Product for specific use cases/real-life data applications.
  • Generate constructive feedback for the Census Bureau that will be useful in setting the ultimate privacy loss budget and on the allocation of shares of that budget over the broad array of possible tables and geographic levels.

We both served as the co-chairs of the Committee that planned the Workshop. The Workshop brought together a diverse group of researchers who presented findings for a wide range of use cases that relied on data from past censuses.

These presentations, and the discussions surrounding them, provided a new set of evidence-based findings on the potential consequences of the Census Bureau’s new DAS. In what follows, we summarize “what we heard” or learned from the Workshop. This summary is ours alone; we do not speak for the Workshop’s Planning Committee, CNSTAT, or the Census Bureau. Nonetheless, we hope that the summary below provides the broader community of users of decennial census data with a better understanding of some of the potential consequences of the new DAS for the utility of the 2020 Census data products. Moreover we hope it fosters an on-going dialogue between the user community and the Census Bureau on ways to help ensure that data from the 2020 Census are of high quality, while still safeguarding the privacy and confidentiality of individual responses.

What We Heard

  • Population counts for some geographic units and demographic characteristics were not adversely affected by Differential Privacy (DP). Based on results presented at the Workshop, it appears that there were not, in general, differences in population counts between the 2010 demonstration file at some levels of geography. For the nation as a whole and for individual states, the Census’s algorithm, ensured that that counts were exact, i.e., counts at these levels were held invariant by design. Furthermore, the evidence presented also indicated that the counts in the demonstration products and those for actual 2010 data were not very different for geographic areas that received direct allocations of the privacy budget, including most counties, metro areas (aggregates of counties) and census tracts. Finally, for these geographic areas, the population counts by age in the demonstration products were fairly accurate when using broader age groupings (5-10 year groupings or broader ones), as well as for some demographic characteristics (e.g., for non-Hispanic whites, and sometimes for Hispanics).
  • Concerns with data for small geographic areas and units and certain population groups. At the same time, evidence presented at the Workshop indicated that most data for small geographic areas – especially census blocks – are not usable given the privacy-loss level used to produce the demonstration file. With some exceptions, applications demonstrated that the variability of small-area data (i.e., blocks, block groups, census tracts) compromised existing analyses. Many Workshop participants indicated that a larger privacy loss budget will be needed for the 2020 Census products to attain a minimum threshold of utility for small-area data. Alternatively, compromises in the content of the publicly-released products will be required to ensure greater accuracy for small areas.

The Census did not include a direct allocation of the privacy-loss budget 2010 demonstration file to all geographic areas, such as places and county subdivisions, or to detailed race groups, such as American Indians. As noted by numerous presenters, these units and groups are very important for many use cases, as they are the basis for political, legal, and administrative decision-making. Many of these cases involve small populations and local officials rely on the census as a key benchmark; in many cases, it defines who they are.

  • Problems for temporal consistency of population counts. Several presentations highlighted the problem of temporal inconsistency of counts, i.e., from one census to the next using DP. The analyses presented at the Workshop suggested that comparisons of 2010 Census data under the old DAS to 2020 Census data under DP may well show inexplicable trends, up or down, for small geographic areas and population groups. (And comparisons of 2030 data under DP with 2020 data under DP may also show inconsistencies over time). For example, when using counts as denominators to monitor disease rates or mortality at finer levels of geography by race, by old vs young, etc., the concern is that it will be difficult to determine real changes in population counts, and, thus, real trends in disease or mortality rates, versus the impact of using DP.
  • Unexpected issues with the post-processing of the proposed DAS. The Top-Down algorithm (TDA) employed by the Census Bureau in constructing the 2010 demonstration data produced histograms at different levels of geography that are, by design, unbiased —but they are not integers and include negative counts. The post-processing required to produce a microdata file capable of generating tabulations of persons and housing units with non-negative integer counts produced biases that are responsible for many anomalies observed in the tabulations. These are both systematic and problematic for many use cases. Additional complications arise from the need to hold some data cells invariant to change (e.g., total population at the state level) and from the separate processing of person and housing unit tabulations.

The application of DP to raw census data (the Census Edited File [CEF]) produces estimates that can be used to model error, but the post-processing adds a layer of complexity that may be very difficult to model, making the creation of “confidence intervals” problematic.

  • Implications for other Census Bureau data products. Important parts of the planned 2020 Census data products cannot be handled by the current 2020 DAS and TDA approach. They will be handled using different but as-yet-unspecified methods that will need to be consistent with the global privacy-loss budget for the 2020 Census. These products were not included in the demonstration files and were out of scope for the Workshop. Nonetheless, as noted by several presenters and participants in the Workshop, these decisions raise important issues for many users and use cases going forward. To what extent will content for detailed race/Hispanic/nationality groups be available, especially for American Indian and Alaska Native populations? To what degree will data on household-person combinations and within-household composition be possible under DAS?

For example, while the Census Bureau has stated that 2025 will be the target date for the possible application of DP to the ACS, they indicated that the population estimates program will be subject to DP immediately following 2020. These estimates would then then be used for weighting and post-stratification adjustments to the ACS.

  • Need plan to educate and provide guidance for users of the 2020 Census products. Regardless of what the Census Bureau decides with respect to ε and how it is allocated across tables, the Workshop participants made clear that a major re-education plan for data users’ needs to be put in place, with a focus on how best to describe key data and the shortcomings imposed by privacy considerations and error in general. Furthermore, as many at the Workshop voiced, such plans must be in place when the 2020 Census products are released to minimize major disruptions to and problems with the myriad uses made of these data and the decisions based on them.
  • Challenging privacy concerns and their potential consequences for the success of the 2020 Census. Finally, the Workshop included a panel of experts on privacy. These experts highlighted the disclosure risks associated with advances in linking information in public data sources, like the decennial census, with commercial data bases containing information on bankruptcies and credit card debt, driver licenses, and federal, state and local government databases on criminal offenses, public housing, and even citizenship status. While there are federal and state laws in place to protect the misuse of these governmental databases as well as the census (i.e., Title 13), their adequacy is challenged by advances in data linkage technologies and algorithms. And, as several panelists noted, these potential disclosure risks may well undercut the willingness of members of various groups – including immigrants (whether citizens or not), individuals violating public housing codes, or those at risk of domestic violence – to participate in the 2020 Census.

The Census Bureau has recently stated that it plans to have CNSTAT organize a follow-up set of expert meetings to “document improvements and overcome remaining challenges in the 2020 DAS.” In our view, such efforts, however they are organized, need to ensure meaningful involvement and feedback from the user community. Many within that community remain skeptical of the Bureau’s adoption of Differential Privacy and its consequences for their use cases. So, not only is it important that Census try to address the various problems identified by Workshop presenters and others who evaluated the 2010 demonstration products, it also is essential that follow-up activities are designed to involve a broader base of user communities in a meaningful way.

We encourage members of the census data user community to become engaged in this evaluation process, agreeing, if asked, to become involved in these follow-up efforts. Such efforts will be essential to help ensure that the Census Bureau meets its dual mandate of being the nation’s leading provider of quality information about its people and economy while safeguarding the privacy of those who provide this information.

2020 APDU Conference Call for Proposals

#Trending in 2020: Data Privacy, Accuracy, and Access

APDU is welcoming proposals on any topic related to the privacy, accuracy, and access of public data.  Proposals can be for a single presentation or panel, whether based on a particular project, data practice, or formal paper.  In keeping with the theme of the conference, our interest is in highlighting the breadth of public data to both producers and consumers of public data.  Some examples of topics might cover:

  • Privacy
    • Differential privacy and tiered data
    • State/local data privacy issues
    • Data Suppression
    • Corporate data privacy (ex. Facebook’s use of differential privacy)
  • Accuracy
    • Machine learning and the use of programming languages
    • How data accuracy will affect redistricting or federal allocations
    • Federal agencies data protection actions’ impact on other agency data
    • Synthetic or administrative data
    • Decennial Census
      • Citizenship question
      • Complete Count Committee
  •  Access
    • Future public data and policy developments
    • Current availability of public data (health, education, the economy, energy, the environment, climate, and other areas)
    • Federal statistical microdata such as ResearchDataGov
    • Federal Data Strategy updates and advocacy

Proposal Deadline: February 28, 2020.

You may submit ideas for a single presentation or a full panel (three presenters, plus a moderator). However, it is possible that we will accept portions of panel submissions to combine with other presenters. Submissions will be evaluated on the quality of work, relevance to APDU Conference attendees, uniqueness of topic and presenter, and thematic fit.

Please submit your proposal using the Survey Monkey collection window below.  Proposals will need to be submitted by members of APDU, and all presenters in a panel must register for the conference (full conference registration comes with a free APDU membership).  Proposers will be notified of our decision by March 13, 2020.

About APDU

The Association of Public Data Users (APDU) is a national network that links users, producers, and disseminators of government statistical data. APDU members share a vital concern about the collection, dissemination, preservation, and interpretation of public data.  The conference is in Arlington, VA on July 29-30, 2020, and brings together data users and data producers for conversations and presentations on a wide variety of data and statistical topics.

Create your own user feedback survey