APDU Board Member: Enriching data: Understanding ‘Who is Being Evicted’ to address inequalities


By: Michelle Riordan-Nold, Executive Director, CT Data Collaborative

Since April is National Fair Housing Month, I thought I would share an eviction data project we recently finished in Connecticut to understand and address ‘Who is Being Evicted.’ My hope in sharing our work is that it might inspire others to replicate it and expose who is being impacted by evictions in our country.

Our work builds on the work of Matthew Desmond and the Eviction Lab in making national eviction data available to the public. However, in Connecticut, we wanted to go deeper with the data and examine evictions at the census tract level in addition to answering the question ‘who is being evicted?’

In order to do this, we partnered with CT Fair Housing Center, a statewide housing advocacy organization that purchases the court record data.

We started out with the question: Where are evictions occurring and who is being evicted? Since this is administrative data, not collected for the purposes of analysis, we had to first clean the data and then enrich it. Below describes the steps we took to both clean and enrich the data. (To read a full explanation of our methodologies go here.)

Cleaning the data.

To understand where evictions happen in the state, we needed accurate address data. The court filings may contain misspellings in addresses, as well as discrepancies such as village names (e.g. Rockville) instead of town names (Vernon). CTData developed a workflow to minimize these errors and accurately map the data.

In addition, to assigning each address an appropriate census tract (in 2010 boundaries), various geocoders are used, including US Census Geocoder, Nominatim, and OpenCage.

Enriching the data

The court records do not contain the race/ethnicity nor sex of the defendants. But this was particularly important for us to understand if evictions were adversely affecting one group more than another.

Therefore, sex probability was assigned for each eviction filing based on the principal defendant’s first name. In our analysis, we assumed that the principal defendant is the head of the household.

The probability is based on historical baby names from Social Security records between 1945–2004. The data can be found on data.gov website: https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data

To determine race/ethnicity, we use Surgeo’s implementation of the Bayesian Improved Surname Geocoding (BIGS) model. The model takes into account a person’s zip code and last name, and uses 2010 Census data to assign probabilities of a person belonging to each of 6 racial groups. We did not assign racial probabilities to anonymous principal defendants.


The method of assigning sex is not perfect as it does not take into account immigrants to the US who did not receive a Social Security number at birth, as well as those changing their sex later in life. Those with rare names (given to fewer than 5 babies in a particular year) were omitted from the SS dataset for privacy concerns (see Beyond the Top 1000 Names on Social Security website).

Note that we also assign sex to anonymous defendants (Jane Doe, John Doe, and similar). We assume that Jane is a female, and John is a male. We do not assume any probabilities when names do not appear in the Social Security dataset.


Not surprisingly, we did find disparities in the impact of evictions by race, ethnicity, and sex:

  • Black and Hispanic/Latino renters have cases filed against them at the highest rates.
  • Overall, Black renters are over three times more likely than white renters to face eviction, and Hispanic/Latino renters are over two times more likely than white renters.
  • Eviction cases are disproportionately filed against females, and even more disproportionately against Black and Hispanic/Latina females.
  • Renters who do not have a lawyer are almost twice as likely to have a removal order issued against them.


These data were released the day the CT Legislature was deciding on an extension of the eviction moratorium. They enabled policy decisions to be informed by data. There is a lot to be learned from administrative data, even with its limitations. Court record data on evictions is just one example but there are many other domains.

Attend the upcoming virtual APDU-CIC Data Symposium on May 3-5, 2022 conference to discuss and learn the possibilities of public data! Click here to learn more.