An elderly man living in the South of the United States is trapped for three days when the floor of his outhouse collapses under him. He survives the ordeal thanks to a mailman who found him. How would a data journalist — someone with a “data state of mind” — cover this story?
They’d ask a question like “How often does this kind of thing happen?,” Ron Nixon, The New York Times’ homeland security correspondent, tells a packed seminar room at #GIJC17. They’ve come to hear him and Brant Houston, the Knight Chair in Investigative Reporting at the University of Illinois, speak about getting a data state of mind.
“What you’re really doing with a data state of mind is counting,” says Houston. “At the core, data journalism allows us to quantify something.”
You need to start off with a basic question, says Houston. Like, how often does this kind of thing happen? Then go looking for data that will answer that question.
So to answer the question: “How often do old people get trapped in their outhouses?” What data would you look for? It’s not like there’s a dataset already out there listing this kind of thing.
“You have to be creative,” says Nixon. You may have to look for proxy data. “The first thing I think is ‘who would have this data and how would I access it?’” You could start with government census information about homes without indoor sanitation in an area, for example. Then see if you could find the households with people over the age of 65 living in them. Then you’d visit the area and do on-the-ground reporting.
Sometimes you just have to build your own dataset from scratch. Nixon described how he pored through thousands of court records and other official documents to compile a spreadsheet of nearly 200 employees and contract workers of the U.S. Department of Homeland Security who had taken bribes.
“Documents and databases are often official records, so you can quote them as such,” says Nixon. “ And data will never call your editors up and say: ‘I didn’t say that,’” he jokes. “Also, if you build the database yourself, you’re the only one with the information, therefore government agencies can’t come back at you.”
To a certain extent, all journalists use data, but just like not all reporting is investigative, neither is it data journalism. So how do you develop a data state of mind?
Here are Ten Tips
- Find a dataset for every story. Interview your data like you would your human sources. Ask it questions. For example, see if you can quantify something, measure change over time or measure performance against a standard.
- Always assume that the dataset you want is out there. It’s just figuring out how you get it. Houston, who started using data in the pre-internet time of 1986, said: “I never saw the difference between documents and data. There wasn’t a web, but there was data all around.”
- Always assume your data is dirty. “Remember, most data entry is done by poorly paid, pissed off people,” says Nixon. “Sometimes there are mistakes.”
- Never assume that just because it’s data that it’s correct. For example, people use population data because it’s official, but the numbers could be from a census that’s nearly a decade old.
- Understand your data. Find out where it came from, why and how it was collected and who did the collecting. Know what do the column headings mean – if there’s a data diary, read it.
- Check your data. There are two kinds of checks, an internal integrity check where you check your data for errors and an external integrity check where you call experts to ask them to help you understand your data.
- Only say what you know. Don’t try to make the data say what you think it should.
- Put your data into context and don’t overwhelm people with numbers. What does seven parts per billion mean? Give examples, people understand what a teaspoon is and what a lake is, so use those sorts of analogies, says Nixon.
- Remember that data is only one of three pillars in investigative reporting: Data, observation and interviewing. A database is a summary, generally, the more interesting information is in the source, even if that source is documents, says Houston.
- Always start with a basic question. Then go looking for data. Don’t just go fishing for data.
Laura Grant is a South African freelance journalist, specializing in data-driven journalism. She also teaches online journalism courses at the University of the Witwatersrand in Johannesburg.