Ever since the Associated Press started to fully automate the production of company earnings reports almost two years ago, algorithms that automatically write news stories from structured data have shaken up the news industry. This so-called automated journalism – which is often, and somewhat misleadingly, referred to as robot journalism – works for fact-based, routine, and repetitive stories for which clean, structured, and reliable data are available. Apart from financial news, other popular examples include recaps of sports matches, crime reports, and stories based on sensor data, for instance, to track earthquakes and fine-dust levels.
For such well-defined problems, algorithms, once developed, can create a virtually unlimited number of stories at little cost, in multiple languages, and personalised to the needs of individual readers. And, perhaps most importantly, algorithms can do this faster and potentially with fewer errors than human journalists.
These obvious economic benefits perfectly fit in with news organisations’ aim to cut costs while, at the same time, increase the quantity of news and offer personalised content. Therefore, it is not surprising that last year the World Editors Forum named automated journalism a top newsroom trend. In addition, the technology attracted much attention in the popular press. For example, the Planet Money podcast had one of National Public Radio’s most experienced reporters compete with an algorithm to write a news story. Another example is a quiz published by the New York Times, which let its readers guess whether a text was written by a human or generated by an algorithm.
In a study, Readers Perception of Computer-Generated News, we were interested in how people perceive automated news. In particular, we studied how readers assess the quality of automated relative to human-written news. Together with co-authors Bastian Haarmann and Hans-Bernd Brosius, we asked a total of 986 participants to read two news articles, one on financial news and one recap of a soccer game. The human-written texts were taken from popular German news websites. The computer-generated counterparts for the same topics were provided by the Fraunhofer Institute for Communication, Information Processing and Ergonomics, which is working on the development of text-generation algorithms. For each article, we provided participants with information about whether it was written by a human or generated by a computer. In some cases, however, we deliberately manipulated the bylines by misinforming participants about the actual article creator.
Here is what we found:
People preferred reading human-written over computer-generated articles, a result that one might already have expected. Interestingly, however, these results even held for articles that were actually generated by an algorithm but wrongly declared as human-written. In other words, simply making people believe that they were reading a human-written article already increased readability ratings. One explanation for this finding is that once people (think that they) read a computer-generated article, they may scrutinize it more carefully and actively look for signs that an algorithm wrote it.
That said, participants did not generally favour the human-written articles. Perhaps somewhat surprisingly, participants rated the automated articles higher than their human-written counterparts in terms of credibility. One explanation for this result is that the automated texts are very heavy on numbers, which may make them appear as more credible.
Machine-written articles read as if they were written by humans
In general, however, differences in people’s perception were rather small, which confirms results from two similar studies conducted in Denmark and the Netherlands. In a way, this is the most interesting finding, as it suggests that the quality of automated texts is already on par with human-written texts, at least for very routine news stories. Yet, human-written articles often read as if they were written by a machine: they follow standard conventions of news writing in simply reciting facts, lacking sophisticated storytelling and narration. This is of course the very reason why it is easy to automate such writing in the first place – and why readers may not even notice that they are reading a computer-generated story. In other words, journalists who merely cover routine tasks already act like machines which, ironically, is exactly why algorithms can easily replace them.
Automation will eliminate some jobs but create others
Not surprisingly then, many journalists perceive automation as a threat and their coverage of the topic often emphasises the man vs. machine frame. Needless to say, automation will eliminate some jobs. But it will also create new jobs within the process of automating news production. For example, text-generating algorithms need a lot of editorial insight and manual configuration as well as a certain degree of maintenance, even after their initial programming.
Human and automated journalism will become more integrated
Furthermore, human and automated journalism will likely become closely integrated and form a man-machine marriage, in which algorithms and humans will each perform the task they are best at. That is, an algorithm might analyse data, find interesting stories, and provide a first draft on a topic, which journalists could then enrich with in-depth analyses, interviews with key people, and behind-the-scene reporting.
We recently embarked on a new project to develop automated news based on forecasts of this year’s US presidential election. For this, we rely on data from the forecasting website PollyVote.com, which was founded in 2004 to demonstrate the benefits of evidence-based forecasting principles for forecasting elections.
Our idea is to use their data to create a first draft of a story that includes all important facts plus additional analyses. For example, in the case of polls, a typical story would describe who conducted the poll and when, the sample size, the margin of error, and of course the actual poll result. In addition, the poll’s results are compared to a poll average as well as to a combined forecast that includes a wealth of information from other sources. The automated story will not only provide a simple summary of a poll, it will also put the poll into context by comparing it to other forecasts.
By focusing on political forecasts, this project carries automated news into an area of “hard news”. Apart from providing data-driven forecasting pieces as a service to journalists, our aim is to study how users with different levels of involvement perceive automated news for such a high-profile and sensitive topic. Visit PollyVote.com to track how the quality of the automated news will develop over time.
Andreas Graefe, Mario Haim, Bastian Haarmann and Hans-Bernd Brosius: Readers’ perception of computer-generated news: Credibility, expertise, and readability. Journalism, Published online before print April, 2016
Picture credit: Wikimedia Commons