Principles for the citation of scientific data

Data Citation PrinciplesThe citation of bibliographic references to published works has been a common and required practice in scientific literature since immemorial times. Each academic discipline has defined its own way of doing this, and there are dozens of editorial styles in existence for citations such as Vancouver, ISO 690, AP, Turabian, and MLA, etc.

However, in the last few years, thanks to the globalization, which has occurred in the publishing of scientific e-journals, and the possibilities for the communication and exchange of information brought about by the Internet, discussions have begun concerning the necessityand importance of also recording and citing the datasets which underpin the academic works themselves. For research to satisfy one of the fundamental premises for science as a healthy and robust academic discipline, the results presented in a scientific paper must be reproducible, which makes it both necessary and appropriate that the data used in the research in question must be citable and accessible.

Good practice in science must assume that the data under consideration forms part of an ecosystem and that it should be accessible and reusable. In other words, in addition to recording bibliographic sources, scientific works should also provide access to the original data which underpins the research in question.

As early as 2007, Micah Altman and Gary King put forward a proposal alerting the scientific community to the necessity of achieving that objective in a paper published in D-lib Magazine1, thereby setting in motion a process which was soon taken up by the International Council for Science: Committee on Data for Science and Technology (CODATA/ICSU), with the initiative being approved at its 27th General Assembly held in Cape Town, South Africa in 20102. The subsequent declaration produced pointed out a number of different issues which required the attention of the international scientific community, amongst which are:

  • Guaranteeing the interoperability and facilitation of re-use of data irrespective of the different formats used to record scientific activity (for example, flat-file, hierarchical, relational and XML databases).
  • Setting standards for the citation formats of the source data.
  • Creating standards for the recording of the metadata for the description of datasets.
  • Guaranteeing that citation practice incorporates data versioning, that is to say, be able to keep a record of the changes and additions made to the original data set owing to the fact that datasets are essentially dynamic (as opposed to documents).
  • Allowing for different levels of granularity, as required by different scientific disciplines, of the digital objects being registered as source data.
  • Communicating the roles carried out by the different stakeholders in the system such as research groups, funding agencies and universities.
  • Guaranteeing the continued usefulness of data and continued access to it over time by making the costs of access and maintenance affordable to all parties.
  • Preserving the framework of intellectual property rights, either under Creative Commons licenses or by the use of traditional copyright legislation.

The official declaration of 2010 was subsequently improved in various works which served as a basis for the Joint Declaration of Data Citation Principles(DC1)3 recently approved by the FORCE11 working group which is composed of researchers, librarians, archivists, publishers and scientific research funding agencies interested in the future of scholarly communication and e-academia.

The declaration was signed by more than 80 of the world’s leading scientific publishers, universities and institutions, among which we highlight Elsevier, PLOS, ORCID, Nature Publishing Group, Association of Research Libraries, BioMed Central, CrossRef, etc.

The objective of this initiative is to see that, once the culture of citing source data is established, the benefits, some of which we note below, start to become evident.

  • The editorial infrastructure will maintain electronic references (links) to data in the future so that it can be reused.
  • Electronic publication services will have controls to reduce the danger of researchers “stealing” data from others (data plagiarism).
  • The impact of datasets as well as of their creators will be measurable.
  • Researchers will be able to receive professional recognition just as they now do for traditional publications.

Maintaining data in a citable format is a responsibility that must be taken on by either the scientific publisher, the distributor, or the institutional repositories or files where the research is undertaken. In this context, the following factors should be considered:

  • The datasets will have to recorded with sufficient metadata to explain the sets and allow them to remain accessible.
  • The use of a persistent identifier of the DOI type as a recommended strategy.

The eight guiding principles approved in the recent DC1 declaration cover the purposes, functions and attributes of citations, and recognize the need to create practices that are understood by humans as well as machines.

These principles are: (1) Importance; (2) Credit and Attribution; (3) Evidence; (4) Unique Identification; (5) Access; (6) Persistence; (7) Specificity and Verifiability; (8) Interoperability and Flexibility.

A detailed explanation of these principles can be found on the FORCE113 site.


As stated previously in this post, the benefits to the scientific community of the citation of research data include:

  • Incorporating the notion that research data is a legitimately citable output in the paradigm of science.
  • Allowing for the verification of results by third parties, facilitating the replication of experiments and the reuse of data in future studies.
  • Allowing for the evaluation of usage metrics to generate credits in the same way as is done for conventional publications.

It is in SciELO’s plans to adopt the Data Citation Principles in the future as part of the process for managing research results.

If your institution wishes to support this initiative, please register at:

We recommend that readers view a very complete and informative Slideshare presentation titled Joint Declaration of Data Citation Principles (Overview)4.


1 ALTMAN, M. A Proposed Standard for the Scholarly Citation of Quantitative Data. D-Lib Magazine. 2007, vol. 13, nº3-4. ISSN 1082-9873. Available from:

2 Data Citation Standards and Practices: The need for robust data citation capabilities. CODATA/ICSU. 2010. Available from:

3 Joint Declaration of Data Citation Principles. FORCE11. 2014. Available from:

4 The Data Citation Synthesis Group. “Joint data citation principles slide set v2”. In: Joint Declaration of Data Citation Principles (Overview). 17 slides. Available from:


BALL, A., DUKE, M. Data Citation and Linking. DCC Briefing Papers. Edinburgh: Digital Curation Centre, 2012. Available from:

CODATA/ITSCI Task Force on Data Citation. Out of cite, out of mind: The Current State of Practice, Policy and Technology for Data Citation. Data Science Journal. 2013, vol. 12, nº1-75. DOI: 10.2481/dsj.OSOM13-043. Available from:

Data citation endorsements. FORCE11. 2014. [viewed September 20th 2014] Available from:

External Links


FORCE11 – <>


Ernesto SpinakAbout Ernesto Spinak

Collaborator on the SciELO program, a Systems Engineer with a Bachelor’s degree in Library Science, and a Diploma of Advanced Studies from the Universitat Oberta de Catalunya (Barcelona, Spain) and a Master’s in “Sociedad de la Información” (Information Society) from the same university. Currently has a consulting company that provides services in information projects to 14 government institutions and universities in Uruguay.


Translated from the original in Spanish by Nicholas Cop Consulting.


How to cite this post [ISO 690/2010]:

SPINAK, E. Principles for the citation of scientific data [online]. SciELO in Perspective, 2015 [viewed ]. Available from:


Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation