By Ernesto Spinak and Abel L. Packer
Scientific Data1 is a new type of online open access journal specifically created for the description of scientifically-valuable data sets. This is how this innovative initiative from The Nature Publication Group is announced on its website. It will come on stream as of May 2014.
The objective of the Scientific Data project is to promote the documentation, interchange and re-use of the data which underpins research done in an open way, to speed up the pace of scientific discovery. In order to achieve this objective, a new type of metadata known as the “Data Descriptor” has been introduced. This metadata could fulfill a desire which has been increasingly requested by researchers, funding agencies, learned societies, journal publishers and indexers, namely, how to make scientific data publicly available, citable, reusable and reproducible, and how to provide peer review mechanisms to guarantee quality and ensure compliance with the standards required by the scientific community.
The SciELO Program has defined as one of its action lines the promotion and establishment of solutions for the organization, description, publication and indexing of the data resulting from the academic research which is published in the SciELO journals, so that the visibility and impact of the research and its related articles can be increased. The proposal that the SciELO Secretariat has been working on is the creation of SciELO Data, compatible with international solutions for the indexing and publication of research data. It will be implemented on the same model as other SciELO components, namely, as a network of dataset collections of the research published by SciELO journals. This action line was one of the topics that was addressed at the SciELO 15 Year’s Conference which took place last October.
An important question, and one for which there is a consensus in this movement, is the necessity of ensuring the interoperability of the data.
The Research Data Alliance (RDA) is an ongoing initiative directed towards the search for solutions for the description and interoperability of data. The alliance was launched in March 2013 with the explicit purpose of perfecting the interchange of data. Since then the alliance has been developing through Work Groups and Interest Groups which are responsible for the definition of solutions to overcome the barriers to sharing data. On the basis of its experiences of multidisciplinary working, the RDA project is looking at the development of “building blocks” of common infrastructures as well as specific solutions of “data bridges”.
In addition to following the development of these different initiatives, SciELO is participating in the FAIRPORT project which is proposing the establishment of an open solution for the operation of metadata and interoperability services. The FAIRPORT project held its first international meeting in Leiden from 13 – 16 January 2014. The results will be published shortly and we will disseminate them on this blog.
As far as interoperability is concerned, Scientific Data is founded on six key principles which are in line with the advances and initiatives which have been taking place in the communication of scientific data:
- Enable datasets that are published in open access, by means of citation and citation index mechanisms, to generate the appropriate credits and acknowledgements of those authors who are not otherwise recorded in traditional journal articles.
- Standardized description of Data Descriptors to permit the retrieval, interpretation and preservation of data, facilitating its reuse in subsequent research performed by independent teams.
- A community based peer-review system provided by NPG that will ensure the quality and preservation of the descriptors.
- Standardized Data Descriptors to allow for uniformity of search and retrieval interfaces, and a certified system of links between data repositories and articles published in related journals.
- Publication under one of the Creative Commons (CC) 3.0 licenses to permit the open use of data and the creation of derivative works.
- The technology of NPG will ensure that the content is compatible with the main current dataset repositories, such as Figshare and Dryad (URL’s at the end of this article).
The Data Descriptors metadata add a layer of description that traditional journal articles do not have. Such metadata includes information on the origin and creation of datasets, the steps in an experiment, and how they link to other datasets. Also, the Data Descriptors will be associated with articles published in a wide range of journals and not only those published by NPG. The Data Descriptors will be available in Open Access CC 3.0 and will be published on payment of an article processing charge (APC) by the author.
The Scientific Data project will begin to publish datasets as of May 2014. They will be peer reviewed, processed and stored in repositories related to the project, which to date are: Data Dryad; Biosharing; Figshare; and ISA-Tools. The disciplines that are covered at this time are Life Science, Biomedical Science, and Environmental Science. Researchers from other disciplines should seek instructions before sending data.
Data Descriptors will be able to be cited, and in the near future will be included in PubMed, Scopus, WoS (Web of Science) and other major indexing services. It is anticipated that by 2016 metrics that measure the influence and impact of datasets will be produced using this data. Certainly SciELO, Google Scholar, CROSSREF and other indexes will participate in the process of indexing, searching and interoperability of the datasets.
The structure of the metadata as presented by Scientific Data may seem somewhat complex to the majority of authors (the data is meant to be processed by computer), but authors do not need to be familiar with the details of the specification because the metadata can be created automatically by using the Scientific Data project programs once the data has passed peer review. In any case, advanced users can create their own Data Descriptors using the metadata specifications because the rules will be publicly available.
The Scientific Data project site also offers the following sections:
- Submission guidelines containing detailed information to help authors prepare the format and submit a manuscript to generate the Data Descriptors2.
- Editorial and Advisory Board, composed of 26 expert representatives from academia, data repositories, and funding agencies3.
- Editorial Board, composed of more than 70 experts from the fields of science that form part of this initiative, who will peer review the data sent4.
- Instructions to authors5.
- Guide to referees6.
- Publication fees (APC)7.
- Examples of Data Descriptor structures8.
In light of the exponential increase in the data produced by research laboratories, and the requirements which are increasingly imposed by agencies that fund this research in the preservation, reuse and interoperability of data, it is important that the laboratories, research groups and researchers individually start making plans for managing their datasets in a manner that is consistent with the standardized solutions that are being developed. The publication of Data Descriptors will allow authors to comply with a significant part of data management plans required by funders and provides demonstrable proof of that plan.
The issue of communicating research data is one of the topics that will be a priority for the SciELO Blog, with contributions from the different stakeholders. Stay tuned and participate with your posts and comments.
Notes
1 Scientific Data – http://www.nature.com/scientificdata/
2 Submitting experimental metadata – http://www.nature.com/scientificdata/for-authors/submission-guidelines/#metadata
³Editors and Advisory Panel – http://www.nature.com/scientificdata/editors-and-advisory-panel/
⁴Editorial Board – http://www.nature.com/scientificdata/editorial-board/
5 For Authors- http://www.nature.com/scientificdata/for-authors
6 Guide to referees – http://www.nature.com/scientificdata/guide-to-referees/
7 Open Access – http://www.nature.com/scientificdata/open-access/
8 Sample Data Descriptors – http://www.nature.com/scientificdata/for-authors/sample-data-descriptors/
External link
Lorentz Center – http://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602&venue=Snellius
Scientific Data – http://www.nature.com/scientificdata/
Research Data Alliance – https://rd-alliance.org/
Data Dryad – http://datadryad.org/
Biosharing – http://biosharing.org/
Figshare – http://figshare.com/
ISA-Tools – http://isa-tools.org/
Fairport 1st Meeting – http://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602&venue=Snellius
About Ernesto Spinak
Collaborator on the SciELO program, a Systems Engineer with a Bachelor’s degree in Library Science, and a Diploma of Advanced Studies from the Universitat Oberta de Catalunya (Barcelona, Spain) and a Master’s in “Sociedad de la Información” (Information Society) from the same university. Currently has a consulting company that provides services in information projects to 14 government institutions and universities in Uruguay.
Translated from the original in Spanish by Nicholas Cop Consulting.
Como citar este post [ISO 690/2010]:
Recent Comments