Larry Birnbaum likes to tell stories from data. “It is clear that there is a lot of data and a lot of stories in that data, but it is not easy to find them,” he told delegates at INMA’s Big Data for Media conference at Google London last week. “I tend to start with a question.”
In his talk: “Finding and Telling Stories at Internet Scale,” Birnbaum outlined a number of free tools to exploit free data, built by journalism and computer science students at Knight Lab, Northwestern University, where he is a Professor of Computer Science, and of Journalism
The first, TweetCast Your Vote, is an engagement tool used to predict how a user will vote according to their tweets. Type in any name and it will search for key words used in that person’s Twitter feed.
LocalRx (local recommendations) is another tool, also based on tweets, which currently only works in the United States and in Barcelona. “It can tell you what people who patronise your business are tweeting about, which helps business owners to build a profile of their customers,” Birnbaum said.
Another, NewsRx (news recommendations), aggregates news stories based on tweets. For example, if a user regularly tweets about food, news stories about food or restaurants could be recommended. BookRx, recommends books based on a user’s Twitter feed.
Birnbaum pointed out these tools allow access to free data because they are based on Twitter, an open platform. “If you are the New York Times or the London Times, you have a lot of access to data about your readers. Small publications do not have the same access,” he said. “By using these tools, you can find out about your readers. All you need is their twitter handles.”
Birnbaum said these mechanisms were relatively simple to build. “It only took 50 lines of code to build some of them. The first version of NewsRx was built by four undergrad students in ten weeks.
“Another system we built is Local Angle. This helps locally relevant stories surface in national news. It goes through a news feed, pulls out names from the feed, then finds locations associated with the people mentioned, and it sorts stories into local areas,” he said.
Birnbaum described other tools including Buzz Lite and Quill Connect – which analyses a user’s Twitter behaviour and gives tips on how to change this behaviour to build and extend social networks.
Birnbaum also discussed his work at Narrative Science, a U.S. technology company that builds software to write stories and news reports from data. Currently most of the stories created by algorithms are based around sports reports and finance news and reports.
Birnbaum, who is the company’s Chief Scientific Advisor, said that one of the biggest challenges when designing this software is writing the language for the algorithm, to enable it to describe the data in an engaging way. “The critical thing is working out what to say and how to say it. We use a language generator,” Birnbaum said. “Clichés are great because people get them. Clichés are good.”
He said automated journalism is unlikely to take the place of human journalism: “A lot of the work the company is doing is not traditional journalism.”
But he acknowledged that there are still uncertainties around automated journalism. In answer to a question about legal problems that might arise when an algorithm describes a company negatively, for example that it is “performing badly,” – because the statistics show this – Birnbaum agreed it was an unresolved issue.
pic credit: Aynur Simsek