Data collection, fact-checking in real time, automated videos and graphics, the precisely targeted distribution of information – all these functions have been performed with the help of artificial intelligence (AI) for some time. But there’s now a growing number of automated journalism tools capable of carrying out not only routine newsroom tasks, but also of producing the news itself.
Some notable examples – most of them originating in English-speaking countries – are TruthTeller developed by the Washington Post, Wordsmith by Automated Insights (used by the Associated Press to produce news stories about U.S. corporate earnings) and News Tracer (used by Reuters to monitor social networks). The text-to-video service Wibbitz allows its users – which include USA Today, the New York Times and Le Figaro – to create video within minutes. Graphiq’s artificial intelligence system is capable of producing tens of interactive graphs a day without human intervention, at minimal cost and in very little time, which makes it an invaluable infographics tool.
AI journalism has been significantly slower to get off the ground in the Czech Republic than elsewhere. According to Václav Moravec, head of the Center of Artificial Intelligence Journalism (CAIJ) at Prague’s Charles University, there are a number of reasons for this. “The development of AI journalism is hampered by the under-resourcing of newsrooms, a limited capacity to adopt an interdisciplinary approach, and the fact that we have some way to go in the convergence of digital and traditional news media,” Moravec told EJO.
CAIJ operates within the Department of Journalism at Charles University and conducts systematic research into artificial intelligence in journalism and its wider social impacts. Recently, Moravec and his team also received funding for the development and practical application of an AI system designed to produce news articles in Czech.
Czech newsrooms have not yet employed artificial intelligence systems to create news items using NLG (Natural Language Generation). But they have experimented with automated news content, using applications that create articles based on supplied data and predefined templates. These applications can be used for reports on finance, statistics, sport, election results etc.
The Czech News Agency (ČTK) has trialled this kind of software, notably in an experiment to automate its reporting on the 2108 senate and municipal council elections. This automated coverage was based on algorithms that processed data supplied by the Czech Statistical Office (CZSO).
“To automatically generate news articles, we created templates for three types of news item,” ČTK Technical Director Jan Kodera told EJO. “Our database consisted of lists of candidates, towns, constituencies, keywords for news items and other information. We imported the CZSO data into our database, processed it and when the moment was right, the data was combined with a template to produce a news headline or a story. The generation occurred after 50 percent of the votes had been counted, by which time it was clear who the winner would be. Eventually, once the final election results were released, a complete report was generated,” Kodera said.
This generated content was automatically linked to pre-prepared articles written by humans. The system was able to retrieve metadata from the related planning item and associate it with the newly generated item. In accordance with the agency’s normal practice, the machine-generated articles were checked, edited where necessary and published by the editors.
From automated text to AI
Czech academic researchers have been working with ČTK to develop an AI system capable of generating articles from news items in the agency’s archives.
“We decided to experiment with the ‘summary’ format,” CAIJ director Moravec told EJO. “ČTK uses this format to extract the key information from a sequence of news items about the same story… It requires quite a lot of time and effort for a human to produce something in this format.” ČTK currently publishes dozens of summaries every day, so considerable human resources are involved in their production.
In the experiment, carried out by IT experts at the University of West Bohemia (ZČU) in Plzeň, the computer was given the task of creating an automatic summary of a sequence of news items, using material extracted from the ČTK archive. The researchers used two methods of creating automatic summaries: automatic extractive summarisation and automatic abstract summarisation. Neither produced entirely satisfactory results, and so these methods are not yet ready to be applied in practice.
One of the IT experts involved in the ZČU experiment, Jakub Sido, speculated that one reason for its limited success could be the fact that the data provided was inadequate considering the complexity of the task. He told EJO that a solution to this could be to simplify the task in future experiments.
Success at last
And indeed, a second experiment conducted by ZČU’s IT experts in the autumn of 2019 proved to be successful. This time, the aim was to automatically generate reports using financial data provided by the Prague Stock Exchange.
“We used a template approach for this purpose,” Sido told EJO. “The template design is based on existing ČTK reports: the computer analyses sentences from its previous news output… Using machine learning methods and neural networks, we are also experimenting with automatic template generation based on older reports and the structured data associated with them, to which we have access from the Prague Stock Exchange,” he said.
The result is a fully-functioning system that generates news items from data supplied by the Prague Stock Exchange. “Our algorithms are now capable of generating seven basic kinds of news item, including headlines and captions, without the intervention of journalists, within a single second,” Moravec was quoted as saying in a press release issued by ČTK and published on the website of Charles University’s Department of Journalism.
The system should make it possible to publish routine news items much more quickly. “The generated texts will be loaded into the content management system and will be subjected to the same editorial processes as news items created from the outset by human journalists… The whole process should be speeded up considerably,” ČTK Technical Director Jan Kodera was quoted as saying by the same press release.
Jakub Sido is also confident that the latest experiment shows the way forward. “It will work and it will work well,” he told EJO.
If you liked this story, you may also be interested in Smarter Journalism: Artificial Intelligence in the Newsroom
Tags: Artificial Intelligence, CAIJ, Charles University Prague, Czech news agency, Czech Statistical Office, data processing, machine-generated news, natural language generation, University of West Bohemia