Helena Bengtsson, Editor of Digital Projects at the Guardian newspaper, does not like the term, Big Data. “I don’t talk about Big Data,” she told the audience at the Big Data for Media Conference at Google London last week. “I do ‘large data’ for journalism. Big Data is complex and you can’t process it using traditional tools.”
Bengtsson gave a number of examples of how Big, large and small data has been used in journalism in stories at the Guardian and worldwide.
The first, Reading the Riots, the product of a collaboration between the Guardian and the London School of Economics, was based on data acquired by analysing 2.5m tweets during the 2011 London riots.
She also mentioned the Centre for Public Integrity’s data journalism project, Cracking the Codes. Using data gathered from 84 million medicare claims in the United States, it revealed that medical providers were getting extra medicare fees by exaggerating medical claims.
Bengtsson then discussed a project based on Big Data conducted by Japan’s public broadcaster, NHK.
This consisted of a series of documentaries that were researched using “disaster data” gathered during the 2011 Japanese earthquake/tsunami. NHK analysed reconstruction and recovery efforts using this Big Data, including demographic trends revealed by statistics from mobile phone signals. The signals showed where people were living before and after the event, which killed 20,000 people.
The data journalists also collected and analysed information from 750,000 company computers. This revealed that 20,000 business connections were lost after the earthquake. They also studied movement of traffic in the period after the disaster, using signals from car satellite navigation systems.
Bengtsson said that, although it was an example of excellent data journalism, NHK was able to use data that journalists would not normally be able to access, she said.
The data in the WikiLeaks Iraq war logs were described by Bengtsson as “the most exciting database I have ever worked with.”
“We analysed it using traditional and non-traditional methods,” she said. “One of the reasons I love data journalism is that it helps me to pick that needle out of the haystack. It is about finding the story, finding the detail, more than finding the trends.
“We could have found more stories from the WikiLeaks data if we had had some of the tools we have now,” Bengtsson said.
Asked by an audience member for advice on how to persuade reporters not to be afraid of data, Bengtsson answered: “I don’t know why journalists think it is too difficult for them. I find it baffling that journalists can take on the complexities of stories, yet when you try to teach them to understand an Excel five document, they panic.”
But, she said, as data journalism becomes more widely practised, the better journalists will get at doing it. “We just need stories, stories, stories,” Bengtsson said.
This item originally appeared as a blog from the INMA Big Data Conference, 26/27 March 2015
pic credit: Aynur Simsek, INMA
Tags: big data, Data Journalism, digital news, Guardian, Reading the Riots