Special Topics & Emerging Issues in Public Data: Advanced Techniques to Increase Data Availability & Maintain Privacy

Big data has created complex new challenges to data privacy. One advantage of administrative big data is the enhanced feasibility of large scale record linkage. How can we make more data available to inform decision making without creating “Big Brother”? How can we inform this needed revolution in privacy protection without cutting back access to data?

In this webinar, Cavan Capps and Micah Altman will review their comprehensive analysis of an ACS use case that can be used to inform key decisions on how to protect data privacy while leveraging the latest data technologies. The results suggest that a multi-tiered access system to the data may be warranted in the future, potentially including traditional tabulations and regressions protected by Differential Privacy or variants of Secure Multi-party Computing (SMC) in software or in hardware using SGX, among other options. The webinar will  discuss some of the strengths and weaknesses of the tools mentioned above and propose how such an infrastructure might be constructed.

Finally, the webinar will provide an update on our work continuing work to examine the practical use of SGX-SMC and software based SMC for data collection and integrating shared confidential data from different sources. This enables data sharing while maintaining individual privacy of individual during any analysis. Differential privacy will be used to ensure that any outputs remain confidential.

Presenters:
Micah Altman, Head Research Scientist, MIT Libraries
Cavan Capps, Big Data Lead, U.S. Census Bureau

Fundamentals of Data Science and Visualization

Virtual Training

November 9 – 19, 2020

Classes: Nov 9, 10, 16, 17, 19 from 2:00-4:00 pm Eastern

Office Hours: Nov 9, 10, 16, 17, 19 from 4:00-5:00 pm Eastern

DOWNLOAD AGENDA

PDF Registration                                                          Online Registration

Data analysts can use a variety of methods and tools to accomplish their goals. With a deeper understanding of data visualization software packages, your organization can produce more intuitive data visualizations in less time and identify the best software solutions to optimize your team’s workflows.

In this course, we will review best practices in data visualization design and use cases for Excel, Tableau, and R (programming language).

Learn how to clean and format data in Excel, create interactive dashboards in Tableau, and clean and visualize data in R. This course will help participants identify use-cases for each software package that maximize impact with minimal effort, expanding participants’ toolbox as an analyst.

Join us to learn about how your organization can better leverage data visualization software!

Meet Your Instructor:

Lee Winkler joined the Center for Regional Economic Competitiveness (CREC) in 2018 after graduating with a Master’s in Public Policy from the George Washington University. He currently supports projects analyzing state-level certification and license attainment and the prevalence of educational and workforce credentials. Lee regularly uses Tableau to clean data, mine insights and create interactive visualizations and is excited to help the class find how Tableau can add value to their workflow.

Registration:
APDU Members: $390
Non-Members: $715

Looking Back on the 2020 APDU Annual Conference

 

With the 2020 APDU Annual Conference in the rearview mirror, now is a good time to reflect on the week and look ahead to what’s next.

This year’s conference, as so many things in 2020, was disrupted but not diminished. While we didn’t have the opportunity to meet with each other in person, the virtual format enabled some of our friends from around the country to participate who might not have been able to otherwise.

Speakers like danah boyd of Microsoft Research and Data & Society Research Institute (excerpted above) brought a unique perspective to the conference, challenging our thinking about from issues ranging from how we approach issues of privacy and accuracy to the impacts misinformation and data voids can have on our understanding of data quality and reliability.

Federal agency leaders such as Deborah Stempowski, Brian Moyer, Bill Beach, and Mary Bohman provided insider insights into their organizations.

Speakers from universities and research organizations across the country covered hot topics such as data on COVID-19, evictions, policing, and more.

Speakers from the Census Bureau, universities, and nonprofits discussed how the Disclosure Avoidance System will affect the quality of Census data.

Attendees met with APDU board members in a series of town hall conversations on a variety of topics – offering a promising way for APDU members to connect with one another.

This year’s conference was a success for a variety of reasons – but the biggest reason was the engagement of our attendees and speakers. Stay tuned for continued quality programming in Fall 2020!

APDU Statement on Concerns Regarding the Census Field Operations Timeline

A statement from the APDU Board of Directors.

The 2020 Census will determine Congressional representation, and the data will form the foundation for the next decade of federal statistics. These data will provide guidance to the federal government on where to provide needed resources, and information to local governments on who lives in their states, cities, and towns.

Federal statistics also provide guidance to businesses on where their products and services are needed by consumers. Decisions on the spending of billions of dollars—public and private—will be made based on the next decade of federal statistics.

The 2020 Census forms the backbone of the next decade of federal statistics. It’s too important to rush.

The COVID-19 pandemic has created an unprecedented situation the Census Bureau has never had to deal with before. The national self-response rate is just above 60%; two out of five people in this country have yet to be counted. This is significantly below expected benchmarks. Despite the ongoing pandemic, census workers are beginning the process of going door to door to count everyone who hasn’t yet responded. This large-scale effort was slated to begin months ago, but was delayed by the pandemic.

Because of these circumstances, it’s necessary to extend the deadline for the Census Bureau to deliver its results. Census experts strongly believe that the Census Bureau needs extra time to conduct a complete and accurate count, as the Constitution requires.

This is a non-partisan issue that threatens businesses and governments in every part of the country. The Association of Public Data Users calls on Congress to extend the deadline for the 2020 Census to a timeframe that allows for a complete and accurate count.

APDU Response to Memorandum on the Apportionment Base Following the 2020 Census

A statement from the APDU Board of Directors.

On July 21, 2020 the Trump Administration issued a memorandum on apportionment counts from the 2020 Census suggesting that unauthorized migrants would be excluded from the counts.

At a time when the decennial census is already beset by unprecedented challenges, this new disruption further threatens the accuracy of the 2020 Census count. Regardless of whether or not the memorandum withstands legal challenges, its messaging will likely reduce census participation among all residents of the United States, undercounting not only unauthorized migrants but also citizens and authorized migrants who live in mixed-status households.

A complete and accurate census of all residents of the United States is critical for the proper functioning of federal, state, and local government agencies as well as businesses and organizations that rely on federal statistics to operate effectively. America needs a full count of all individuals in the United States because all people use our roads and mass transit, drink our clean water, use our electricity, require access to emergency services, and buy goods and services from our businesses. Without a full count, we cannot accurately allocate public or private investments to ensure a full functioning economy or adequate public services are available.

We urge the Administration to immediately retract this memorandum before it has an opportunity to influence the public’s willingness to respond to the decennial census operations now underway across the country.

APDU Past President: Why Attend the APDU Conference?

By Cliff Cook, Senior Planning Information Manager, City of Cambridge, Massaschusetts

In working with public data users often discover a shortfall between the way data would ideally be delivered and the form in which it actually arrives.  While the data we use is by definition only a partial reflection of the underlying reality, the ways in which we structure elements of the data collection, compilation and delivery systems all potentially to create further impediments to data access and usability.

The 2020 APDU conference will include a session dedicated to a discussion this important set of issues:  “Impediments to Accurate Statistics”.

We will hear from three experts in three different domains.

  • Elsa Schaffer, a Data Scientist from Ididio, will discuss her experience bringing together multiple sources of data that cover aspects of education, employment and income to develop data sets that help students and others with career choices.
  • Lavar Edwards, a Research Specialist from the Eviction Project, will talk about the myriad obstacles encountered in buiding a national database of rental housing eviction actions, a topic with significant implications for racial equity.
  • Abraham Flaxman, as Associate Professor from the University of Washington Institute for Health Metrics and Evaluation, will delve into the world of public health statistics and explore his experience using data about the Covid-19 pandemic.

This session will focus on how various types of impediments prevent users from obtaining the full value of data, how data users deal with these roadblocks, and how the data user community should advocate for solutions.

Intermediate Data Visualization Techniques in Tableau

August 25-September 3, 2020

Virtual Training

AGENDA

A picture is worth a thousand words. Use data to state your case using easy-to-understand data visualization tools. Give your audience the freedom to adapt your data in new ways in interactive dashboards that answer immediate questions and uncovers new insights. Data visualization tools can help you communicate better both internally and with your partners.

Tableau can help you produce more intuitive data visualizations, and we can show you how. In this course, you will build your skills in making appropriate graphics, but you will also incorporate complex calculations in ways that improve insights, make charts more relevant, and create the most impactful dashboard graphics.

Learn how to clean, shape, aggregate, and merge frequently used public data in Tableau Prep. Then, organize your visualizations into sleek dashboards in Tableau Desktop. We will provide helpful tips on how to analyze, design, and communicate these data in ways that will wow your supervisor and organization’s customers.

Training Prerequisites:

Skills: Participants must have a basic understanding of how Tableau works before attending this class, including knowledge of Tableau terminology, uploading data, editing data sources, and creating basic charts. Attendees should be familiar with all materials presented in the Pre-Session Videos: Overview of Charts and Calculated
Fields.
Tools: Laptop, wired mouse, Tableau Desktop (personal, professional, or public version), and Tableau Prep.
• Public version of the Tableau desktop is available at:
https://public.tableau.com/s/download
• Tableau Prep Software can be downloaded here:
https://www.tableau.com/products/prep/download

**Zoom will be required for this training – if you have Zoom restrictions for a work laptop, we recommend using a personal laptop or desktop. We do not recommend using an iPad for this training.
Pricing
APDU, C2ER, LMI Institute Premium Organizational Members $ 495
APDU, C2ER, LMI Institute Individual & Organizational Members $ 575
Non-Members $ 715

CANCELLATION POLICY: APDU must confirm cancellation before 5:00 PM (Eastern Standard Time) on August 14, 2020, after which a $135 cancellation fee will apply. Substitute registrations will be accepted.

APDU Member Blog Post: It’s not too late to rebuild data-user trust in Census 2020 data products

By: Jan Vink, Cornell Program on Applied Demographics
Vice chair of the Federal State Cooperative on Population Estimates Steering Committee
Twitter: @JanVink18
Opinions are my own

The Census Bureau is rethinking the way it will produce the data published from the Census 2020. They argue that the old way is not good enough anymore in this day and age because with enough computer power someone could learn too many details about the respondents.

There are two separate but related aspects to this rethinking:

  1. The table shells: what tabulations to publish and what not to publish
  2. Disclosure Avoidance Systems (DAS) that add noise to the data before creating these tables

Both aspects have huge consequences for data users. A good place to start reading about this rethinking is the 2020 Census Data Products pages at the Census Bureau.

The Census Bureau is aware that there will be this impact and has asked the data-user community for input in the decision process along the way. There were Federal Register Notices asking for use cases related to the 2010 tables, an ask for feedback on a proposed set of tables. There were publications of application of a DAS to 1940 Census data, 2018 PL94-171 data from the 2018 test and the 2010 Demonstration Products. Currently the Census Bureau is asking for feedback on the measurement of progress of the DAS implementation they plan to use for the first set of products coming out of the Census.

The intentions of stakeholder involvement were good BUT didn’t lead to buy-in from those stakeholders and many are afraid that the quantity and quality of the published data will severely impact the capability to make sound decisions and do sound research based on Census 2020 and products that are directly or indirectly based on that data. Adding to this anxiety is the very difficult unexpected circumstances the Census Bureau has to deal with while collecting the data.

From my perspective as one of those stakeholders that is wary about the quantity and quality of the data there are a few things that could have gone better:

  • The need for rethinking is not communicated clearly. For example, I cannot find a Census Bureau publication that plainly describe the re-identification process, all I can find are a few slides in a presentation. A layman’s explanation of the legal underpinning would be helpful as well as some argue that there has been a drastic reinterpretation.
  • The asks for feedback were all very complicated, time consuming and reached only a small group of very dedicated data users that felt tasked to respond for many and stick with the low hanging fruits.
  • It is not clear what the Census Bureau did with the responses.
  • The quality of the 2010 Demonstration Products was very low and would have severely impacted my use of the data and many others uses.
  • Most Census Bureau communications about this rethinking consisted of a mention of a trade-off between privacy and accuracy followed by a slew of arguments about the importance of privacy and hardly any mention how important accuracy is for the mission of the Census Bureau. Many stakeholders walked away with the feeling that the Bureau feels responsibility for privacy protection, but not as much for accuracy.

There is a hard deadline for the production of the PL94-171 data, although Congress has the power to extend that date because of the Covid-19 pandemic. Working back from that, I am afraid that decision time is not too far away. The Census Bureau is developing the DAS using an agile system with about 8 weeks between ‘sprints’. The Bureau published updated metrics from sprint II at the end of May, but already started with sprint IV at that time. If we keep the 8 weeks between sprints this implies in my estimation that there is room on the schedule for 2 or 3 more sprints and very little time to rebuild trust from the data-user community.

Examples of actions that would help rebuilding some trust are:

  • Appointing someone that is responsible for the stakeholder interaction. So far, my impression is that there is no big picture communication plan and two-way communication depends too much on who you happen to know within the Census Bureau. Otherwise the communication is impersonal and slow and often without a possibility for back-and-forth. This person should also have the seniority to fast-trac the publication review process so stakeholders are not constantly 2 steps behind.
  • Plan B. A chart often presented to us is a line that shows the trade-off between privacy and accuracy. The exact location of that line depends on the privacy budget and the implementation of the DAS and the Census Bureau seems to have the position that they can implement a DAS with a sweet spot between accuracy and privacy that would be an acceptable compromise. But what if there is no differential privacy based DAS implementation (yet?) that can satisfy a minimal required accuracy and a maximal allowed disclosure risk simultaneous? So far it is an unproven technique for such a complex application. It would be good to hear that the Census Bureau has a plan B and a set of criteria that would lead to a decision to go with plan B.
  • Promise another set of 2010 data similar to the 2010 demonstration products so data users can re-evaluate the implications of the DAS. This should be done in a time frame that allows for tweaks to the DAS. Results of these evaluations could be part of the decision whether to move to plan B.
  • Have a public quality assurance plan. The mission of the Census Bureau is to be the publisher of quality data, but I could not find anything on the Census Bureau website that indicates what is meant with data quality and what quality standards are used. Neither could I find who in the Census Bureau oversees and is responsible for data quality. For example: does the Bureau see accuracy and fitness for use as the same concepts? Others disagree. And what about consistency? Can inconsistent census data still be of high quality? Being open about data quality and have a clear set of quality standards would help showing that quality is of similar priority as privacy.
  • Publish a time line, with goals and decision points.
  • Feedback on the feedback: what did the Bureau do with the feedback? What criteria were used to implementing some feedback but not others?

Time is short and stakes are high, but I think there are still openings to regain trust of the data community and have Census data products that will be of provable high quality and protects the privacy of the respondents at the same time.

 

 

 

Intermediate Application of Data Sets: BLS Unemployment Data

Did you know that there are at least three sources of unemployment statistics in the United States? In this APDU webinar you’ll learn about the three primary data sources—Current Population Survey (CPS), Local Area Unemployment Statistics (LAUS), and American Community Survey (ACS)—and how they differ. Then we’ll explore how to access the official national and state unemployment statistics, based on CPS.

Presenter:
Garrett Schmitt, Senior Economist, Bureau of Labor Statistics

Intermediate Application of Data Sets: CDC Mortality Data

Mortality data are in the news on a daily basis. Accurate data is key to tracking the spread of COVID-19. However, there are important nuances that data users need to know:

  • How are mortality data collected?
  • When are data released?
  • Where can you access the data?
  • What are the differences between provisional and final mortality data?

Register for this APDU webinar today to learn more about mortality data from the CDC.

Presenter:
Robert N. Anderson, Ph.D., Chief, Mortality Statistics Branch, National Center for Health Statistics