APDU Board Member: Building Up by Breaking Down

|

By: Beth Jarosz, Program Director at PRB

If the mantra in real estate is “location, location, location,” the mantra for public data users is—or should be: “disaggregation, disaggregation, disaggregation.”

Why does disaggregation matter?

The average income in your neighborhood will go up if a billionaire moves in, but that rising average doesn’t tell you anything about how all of the other residents are doing. A new billionaire neighbor is a dramatic example but one that illustrates the point: disaggregated data are crucial for understanding how people are doing.

We couldn’t uncover these truths without disaggregated data:

During the 2021 APDU Conference, speaker Rhonda Vonshay Sharpe provided numerous examples of how disaggregating data—by gender, race/ethnicity, and education—provides crucial insights for improving public health and well-being. Her talk was inspiring and also left many in the audience wondering…

If disaggregation is so important, why isn’t it more common?

To be fair, some people probably just don’t think about disaggregation. But there are bigger, systemwide challenges.

Sometimes survey sample sizes are too small to produce reliable estimates for a population of interest. When this happens, researchers—hoping to provide some data rather than none—may group smaller demographic groups together so they have enough combined survey responses to get an estimate they can report.

I have done this kind of aggregation in my own work—grouping across income levels, sexual orientations, racial/ethnic groups, geographies, or ages—because in the context of the work I was doing, aggregated data were preferable to tables full of missing data. If you’re considering aggregating groups, the Urban Institute provides some handy guidelines. And remember that sometimes noting that the sample size is small or estimates are unreliable is important because it signals that there’s a data gap.

Speaking of data gaps… Sometimes data are only reported for aggregate groups. A visitor to federal statistical websites will often find data for just five racial groups (American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White) and two ethnic groups (Hispanic or Latino and Not Hispanic or Latino). These groups reflect minimum standards set by the U.S. Office of Management and Budget (OMB) in 1997. Even with those standards in place, it was just last month that BLS began publishing jobs data for Native Americans.

While many agencies in the federal statistical system go beyond the minimums—such as reporting data for Multiracial populations—the standards (in my opinion) are overdue for an overhaul.

What can be done to make disaggregated data more widely available?

Recently APDU joined more than 150 other signatories in a letter to the Acting Director, Office of Management and Budget requesting that the OMB minimum standards be revised. The letter included requests, developed in collaboration with community groups and based on the latest research on self-identification, such as the following:

1. The use of a combined question versus separate questions to measure race and ethnicity and question phrasing as a solution to race/ethnicity question nonresponse;
2. The classification of a Middle Eastern and North African (MENA) group and distinct ethnic reporting category;
3. The description of the intended use of minimum reporting categories; and
4. The salience of terminology used for race and ethnicity classifications and other language in the standard.

The letter included specific suggestions focused on the data needs of Asian American populations, Native Hawaiian and Pacific Islander populations, Hispanic/Latino populations, Middle Eastern and North African populations, and Black and African American populations.

But racial demographics are just the tip of the disaggregation iceberg. Sexual orientation and gender identity, age, geography, education, and other topics also deserve data systems that are robust enough to support disaggregation. The solution for survey data is to structure—and fund—surveys that have enough records to support detailed disaggregation. This could be achieved through larger sample sizes overall, as proposed by The Census Project for the ACS, or through strategic oversampling of specific smaller populations of interest.

For administrative data such as birth and death records, education statistics, and others, many agencies already collect more racial/ethnic, age, income, and sexual orientation and gender identity data than they report. Reporting has often been limited by staff time, data quality issues, and, in some cases, by privacy and confidentiality concerns. In these cases newer tools, such as synthetic estimation or noise infusion, may help achieve a balance between reporting disaggregated data and protecting individual privacy.

What can a data geek do in the meantime?

There is no one perfect answer, but a couple of suggestions:

• Disaggregate when you can.
• Consider whether reporting “data not available,” rather than aggregating, could be a powerful advocacy tool to spotlight data gaps.
• Be clear about what groups you’re aggregating and why.
• When reporting data for larger groups, speak to what is known about how smaller groups may differ from the aggregate trend.
• Communicate with data providers about data gaps and advocate for more funding for federal and state agencies to collect and disseminate the data you need.

Only by breaking down the data can we understand enough to make wise policy decisions that build up our communities.