Not long ago print newspapers were filed in enormous folders, placed in metal cabinets and stored in the basement of newspapers. They gathered dust and were a fire hazard, but there were few other ways to store them. In the digital age methods of archiving newspapers may be more sophisticated but they are still imperfect and, according to new research, important online data is being lost.
Key questions about the way digital content can be preserved and archived have only recently been asked, four decades after newspapers began storing news digitally.
Kathleen A. Hansen and Nora Paul, of the University of Minnesota, looked at ten news organisations and their archiving processes. Nine of these organisations were legacy newspapers and one was digitally native.
Their report: “Newspaper archives reveal major gaps in digital age” identified a number of problems in the way online records have been kept since digital archiving began in the 1990s.
In the legacy system the method of storing newspaper archives in folders, reference books or on microfilm was unsatisfactory. Content stored this way eventually deteriorates or can get lost or misfiled, especially when organisations close, merge, or change hands.
But digital storage seems to be just as problematic: neither legacy news organisations or those “digitally-born” have clear policies for archiving and preserving online resources, according to the University of Minnesota study. There is evidence that data has been lost, including a Pulitzer finalist’s 34-part investigative journalism series.
“Digital information itself has all kinds of advantages….. except when it goes, it really goes,” said Jason Scott, an archivist and historian for the Internet Archive. “It’s gone gone. A piece of paper can burn and you can still kind of get something from it. With a hard drive or a URL, when it’s gone, there is just zero recourse.”
The University of Minnesota study had a number of key findings:
- Legacy newspapers have access to the print newspaper in multiple formats, available to the newsroom and also to the community.
- Since the early to mid-2000s, PDF versions of newspapers have been kept, usually held by an outside vendor. The problem is these are often not searchable.
- The researchers found archival short-comings regardless of whether the newspaper was part of a chain, whether it was family-owned, or whether the newspaper had the same owner for decades or changed hands multiple times over the years.
- Library staff previously numbered 10 to 15 per organization, this has been reduced to an average of one to two people, who manage the digital database of the print product.
- What is archived internally is different from what the public can see. Only the internal newsroom archives captures what has been published in the most complete form.
- While most digital photo files started in the mid-1990s, originally only a notation would appear that a photo was part of the story. More recent archiving systems attempt to capture the photo and the metadata that goes along with it.
- At seven of nine of the legacy organisations studied, print photographs are available to the public for a fee. However only two of the nine legacy organisations allow the public to access their digital photo archives.
- Graphics files are another problem. Stored on internal servers or by individual artists, software changes and compatibility issues have made accessing older graphics impossible.
- Issues of content managing systems and servers have made accessing older versions of a website sometimes impossible, resulting in loss of data.
- From the digital-born to the legacy websites, not one has a complete archive of its website. Backward compatibility with new content managing systems or changes in Web hosting are problematic.
- Archiving of multimedia elements is sporadic, if non-existent.
- Comments on stories by readers are not being archived, except in rare instances by an outside vendor for a particular purpose.
- Social media content isn’t being archived either. This includes work contributed to Facebook, Pinterest, Instagram, and Twitter.
- Legacy organizations are not archiving mobile content. Only the digital-born publication archives this content.
Digital preservation is important to understand because newspapers play an important role in our society, politically and commercially, documenting the history since the nation’s founding. This rich history is at risk, given the current trajectory of digital news preservation. As time moves forward, digital data continues to be lost
The Atlantic recently addressed this issue of digital preservation by examining how a Pulitzer-prize finalist’s 34-part series of investigative journalism articles vanished from the web. Read the story of one reporter’s work to resurrect his project.
Also important is the work of the Journalism Digital News Archive (JDNA) at the Donald W. Reynolds Journalism Institute. In response to the current state of the news industry, it established the world’s first digital curator of journalism position and explores the best practices to archive and access content and resources.
The article “Newspaper archives reveal major gaps in digital age” (pp. 390-398) appeared in the September 2015 special edition of the Newspaper Research Journal called “Capturing and Preserving the First Draft of History in the Digital Environment.” For the abstract of the article, go to: http://nrj.sagepub.com/content/36/3/290.short
pic credit: Flickr Creative Commons Leww_pics