Preservation: the construction of our digital continuity

Open Access initiatives in online publication, whose momentum and sheer number has become an overwhelming driving force for digital continuity over the last few years, makes it increasingly more important that the problem of the long-term preservation of such content is examined.

One of the first signs that pointed out the challenges and problems posed by digital preservation appeared in a report prepared in 1996 by the Research Libraries Group, a library consortium based in the USA, which merged with OCLC in 2006 and loaded its content into WorldCat. These initial reports and analyses of the problems related to digital preservation focussed on ensuring that the materials in question could survive technical obsolescence and losses caused by errors in their management, that is to say, these materials would be managed throughout their life cycle so that they would remain accessible to those who required them. This was the task carried out by libraries and archives over the centuries, and as a consequence it was perfectly natural that these initial reports produced in the mid 1990’s recommended nothing more than a transfer of the traditions of their profession to the information media which were emerging, particularly on the Internet.

These studies were revealing new questions and challenges that went beyond cataloging per se. As a result, this led to the creation of the appropriate metadata to improve access and retrieval, and the control of authentication procedures, and the creation of audit records to ensure that the material in question would not be altered in the process by people not authorized to do this. This content would need to be conserved in such a way to allow it to be used and adapted for new purposes, which could well lead to the creation of new content.

The concept of “digital preservation” has developed and become more sophisticated, and is turning into a dynamic and comprehensive perspective known as “digital continuity” which according to The National Archives (USA) is defined as the ability to use your information in the way that you need for as long as you need”.

The type of material to be preserved is also widening its scope, since in addition to digital copies of analog and textual content, data from academic research, records of the administrative processes of government, educational resources in digital repositories, portals of Open Access digital journals, institutional data-sets, and economic and meteorological series, and so on are being added. This specifically encompasses everything that is produced and published on the Internet (born-digital) which should be preserved.

These circumstances are defining challenges, which are not only technological, but are something more important – challenges at the institutional level which were not planned for, and professional competencies which were not commonly found in the marketplace. The central theme in this discussion is that normal library functions need to expand in order to take in activities which are related to the organization and the manipulation of data and digital data sets, and for those types of activity there were neither resources nor formal disciplines, and much less sufficient experience.

A recent study of the job market which looked at the knowledge, aptitudes and skills that are required for digital preservation projects, sets out the following as the three most important from a long list of requirements: (1) the ability to work in a highly technical environment; (2) command of standards and specifications; and (3) command of IT tools and applications.

Right from the start, researchers in this field noted a number of shortcomings in university libraries, in particular the scarcity of professionals with sufficient training to perform and support these activities. Fortunately, some five or six years ago, matters began to change on an international level. For those who are interested in this topic, a number of reference centers are listed below.

  • University of California Curation Center, created in 2010 in partnership with the 10 campuses of the University of California, is responsible for all digital matters, ranging from museums and libraries to research departments and individual researchers;
  • DigCCur Program, based at the University of North Carolina at Chapel Hill, has been offering postgraduate level courses in Digital Curation since 2008;
  • Luleå University of Technology in Sweden offers Masters level courses in Digital Curation;
  • IFLA dedicated a session on training for digital preservation at its recent international conference held in Puerto Rico in 2011;
  • The International Journal of Digital Curation is an Open Access journal which deals specifically with the topic of digital preservation¹;
  •  The most complete and extensive bibliography on this topic, containing more than 90 pages, was published by Charles W. Bailey² in 2012 and is available through Open Access.

The challenge of guaranteeing the long-term preservation and access to research results, especially to data-sets produced by publicly funded projects, is a topic of great importance in the USA. This importance has become more marked particularly since the passing of the Fair Access to Science and Technology Research Act (FASTR) in February of this year which states that all federal agencies must develop a plan to make research results available through Open Access. Some 100 million dollars has been earmarked for this.

However, the current situation regarding institutions and professional resources is not so optimistic, according to a study published this past November by the Council on Library and Information Resources³. This study, know as DataRes, took two years to complete and had as its objective: (1) the analysis and documentation of the trends in management planning and institution policies in research in response to federal requirements, (2) the determination of how information professionals can better respond to the emerging needs of managing research data in universities.

This study identified various barriers to effective data management.

a. Lack of financial resources.

Institutions do not have adequate resources in their investment and expenditure plans to finance preservation programs.

b. Lack of organizational structures

Academic structures are slow to change. They are based largely on long accepted notions of them as functional stereotypes.

c. Lack of professional preparation

The DataRes project identified lack of training, certification and other forms of professional preparation as a basic failing in members of academia in the management of research data. Even worse, and at a more fundamental level, in the academic world no one perceives that they have the professional responsibility or mandate to manage research data. The lack of priority amongst researchers in managing research data was a recurring theme in the DataRes Project which is explained by reason of researchers being compensated primarily for carrying out new research and not for managing the results of research done previously.

d. Lack of institutional mandates

There are no explicit and recognized mandates in existence at the institutional level for the efficient management of research data. The production of data in the course of research activity has always been understood to be part of the research process, but the thought that these same researchers should take the responsibility for sharing the accumulated data sets to promote broader research agendas is a relatively new concept, which came to be developed from the experience of groups working on the human genome project.

This lack of consensus results in the absence of institutional mandates and policies concerning the administration of this data. Without institutional mandates, research data could still be preserved in accessible ways, or not preserved at all, and this is because the systematic administration of research data has yet to become an institutional priority.

The high costs of maintaining the infrastructure to guarantee the digital continuity of millions of journals articles is gradually becoming a task that goes beyond individual institutions, the majority of universities, national libraries and all of the publishers in emerging economies. As a result, at least three strategies are taking shape in the market to address this problem.

  • Repositories maintained by large institutions, such as the Los Alamos National Laboratory Research Library (LANL) or the repository PANDORA of the National Library of Australia.
  • Repositories in the private sector such as Portico in the USA which operates under an annual subscription model and which takes responsibility for the storage and retrieval of the data.
  • Cooperative repositories such as LOCKSS of the University of Stanford and its associated network CLOCKSS. The SciELO Program uses CLOCKSS for the preservation of the SciELO Brazil journals.

At a recent congress in Barcelona on digital preservation³, the expert Martin Halbert, director of libraries at the University of North Texas and president of the MetaArchive system, a system worth investigating, stated that the current problem is that we do not have a historical perspective of digital preservation, and we lack best practices. He also stated that up to today, digital preservation is based primarily on the concept of “don’t put all of your eggs in one basket”. For example, MetaArchive preserves 6 copies of 15,000 theses in different places using the LOCKSS software.

Halbert also commented that he is not in favor of using cloud storage services and prefers to have files in different locations but on his own computers because he has no confidence in the companies that offer this service.

Digital continuity is a great challenge that we in all of the emerging economies are facing to maintain our presence on the Web. An important position has certainly been achieved through the SciELO Program which has been growing for the past 15 years. But it will be necessary to incorporate into our responsibilities, and in each one of our respective countries, the preparation of trained professionals, and to establish the required policies and technological infrastructures. The experience of SciELO, which has started to use CLOCKSS, can be very instructive for us all.

We must remain vigilant so that we may learn.

Notes

¹ World Library and Information Congress : 77th IFLA General Conference and Assembly. August 2011, San Juan, Puerto Rico. Available from: <http://conference.ifla.org/past/2011/education-and-training-section-with-preservation-and-conservation-information-techno.htm>.

² Bailey, C. W. Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works. 2012. Available from: <http://digital-scholarship.org/dcbw/dcb.htm>.

³ Aligning National Approaches to Digital Preservation: An Action Assembly Biblioteca de Catalunya. National Library of Catalonia. November  2013. Available from <http://www.educopia.org/events/ANADPII>.

References

The commission on preservation and access and the research libraries group. Report of the task force on archiving of digital information. 1996. Available from:<http://www.clir.org/pubs/reports/pub63watersgarrett.pdf>.

The National Archives. Understanding digital continuity. 2011, version 1.2. Available from: <http://www.nationalarchives.gov.uk/documents/information-management/understanding-digital-continuity.pdf>.

H.R. 708: Fair Access to Science and Technology Research Act of 2013. 2013. 113th Congress. Available from: <https://www.govtrack.us/congress/bills/113/hr708/text>.

Council on Library and Information Resources. Research Data Management Principles, Practices, and Prospects. Nov. 2013. Available from: <http://www.clir.org/pubs/reports/pub160>.

KIM, J., WARGA, E., and MOEN, W.E. Competencies Required for Digital Curation: An Analysis of JobAdvertisements. 2013, vol. 8, nº 1, pp. 66-83.

External link

University of California Curation Center – http://www.cdlib.org/services/uc3/

DigCCurr – http://ils.unc.edu/digccurr

Luleå University of Technology – http://www.ltu.se/edu/program/FMDBA?l=en

International Journal of Digital Curation – http://www.ijdc.net/

Portico – http://www.portico.org/

LOCKSS – http://www.lockss.org/

CLOKSS – http://www.clockss.org/clockss/Home

MetaArchive – http://www.metaarchive.org/

 

Ernesto SpinakAbout Ernesto Spinak

Collaborator on the SciELO program, a Systems Engineer with a Bachelor’s degree in Library Science, and a Diploma of Advanced Studies from the Universitat Oberta de Catalunya (Barcelona, Spain) and a Master’s in “Sociedad de la Información” (Information Society) from the same university. Currently has a consulting company that provides services in information projects  to 14 government institutions and universities in Uruguay.

 

Translated from the original in Spanish by Nicholas Cop Consulting.

 

Como citar este post [ISO 690/2010]:

SPINAK, E. Preservation: the construction of our digital continuity [online]. SciELO in Perspective, 2014 [viewed ]. Available from: https://blog.scielo.org/en/2014/01/02/preservation-the-construction-of-our-digital-continuity/

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation