Data Is vs. Data Are: Settling the Debate

|

By Bernie Langer, APDU Board Member

There are many debates in the world of public data. Privacy vs. accuracy. Survey data vs. administrative data. CSV vs. XLS. But if you really want to see data nerds fight, ask them whether they say “data is” or “data are”. Is the word “data” singular or plural?

“Good data is important to good decision-making” or “Good data are important to good decision-making”?

This came up on Twitter recently, when NPR reporter/Census superfan Hansi Lo Wang tweeted: “…The 2020 census redistricting data, needed to redraw voting maps, is now expected by Sept. 30…” In his next tweet, he wrote: “(Sorry for 1st tweet’s typo: *data are)”.

It may seem trivial, but it’s important, and not for the reason you expect.

The argument for “data are” is thus: “Data” is derived from the Latin word datum, meaning, “that which is given.” In Latin, datum is a singular neuter noun. My high school Latin teacher made sure I never forget The Neuter Law: all neuter nouns (in the nominative and accusative cases) always end in -a. Therefor, the plural form of datum is data. Data is plural. Quod erat demonstrandum.

Furthermore, we know data to be a collection of individual values (observations, survey responses, etc.). A census never has only one respondent (unless it’s a very sad census). The concept is inherently plural.

The argument for “data is” is simple: “Data are” sounds ridiculous.

Okay, there are some more nuanced arguments for “data is.” We’re speaking English, not Latin. Language evolves. “Data” in common usage is an uncountable noun, like “water.” The ocean is full of water, but no one says, “Water are wet.”

But that’s secondary. What’s more important is: “Data are” sounds ridiculous.

As data professionals, we need to communicate with the rest of the world in a clear and accessible way. We want others to embrace the power of data, knowing that data can be useful to them. No one needs to be special to use data.

Insisting on treating data as a plural noun can be alienating. (Pro tip: Correcting someone’s grammar in any circumstance is alienating.) We don’t want anyone to think they’re not good enough to use data. Even if it’s not off-putting, it’s distracting. The general public doesn’t expect to hear “data are,” and when they do hear it, they’ll momentarily dwell on it, and not the substance of what was left in your sentence.

Of course, this isn’t just about grammar and the word “data.” It’s about not gatekeeping, and communicating complex (but understandable) concepts to the public on their terms. When non-experts understand data, data professionals become more valuable, not less.

And if your conscience cannot permit you to use data in a singular form (old habits die hard), then at the very least, when someone else does, bite your tongue.

A common refrain is “The plural of anecdote is not data.” Let’s reinforce that by not using data as a plural.

This blog post represents the views of its author and does not represent the view of APDU or the Board of Directors.