Reproducibility of research results: on-going initiatives

The topic of the reproducibility of research results has been continuing to command the attention of the academic community and society in general in recent years. The premise that science corrects itself by evolving on the basis of reproducing previous studies continues to be called into question, in view of the growing amount of evidence to the contrary.

The low level of reliability of research, however, does not appear to be connected with scientific misconduct. Rather, it boils down to issues such as the inadequate training of technical staff and researchers, the incentives and rewards to be gained for the publication of positive results in high impact journals, and the emphasis on drawing up ambitious statements which are completely unjustified on the basis of the resulting outcomes. Badly developed experimental protocols, the low number of samples and inadequate statistical tests have already been pinpointed as being responsible for the low level of reproducibility in research.

These arguments, amongst others, led to the creation of the online platform Reproducibility Initiative in 2012, whose aim is to facilitate communication between researchers and pharmaceutical companies for the validation and acknowledgement of reproducible quality results relating to already published pre-clinical trials. The platform works through the Science Exchange network, which incorporates more than two thousand laboratories in some 400 research institutions which reproduce the experiments. The submissions and results obtained are confidential, and once completed and validated, they earn the certification “independently validated”. There are similar initiatives in existence such as Science Check, which are contributing to this effort by validating research results in the clinical field and ensuring that this supplementary data is available in open access repositories accesible to all.

In view of this worrying scenario, the questions which are thrown up are: Who is responsible? Why is it happening? How can it be stopped?

The chief researchers responsible for laboratories share in the kudos of the discoveries which are made – especially when they are co-authors of the papers concerned – and must be held equally accountable for the accuracy of the results, their reproducibility, and the statistical processing of the data, even if the experimental work is carried out by experienced researchers such as those working at the post-doctoral level. There are institutions in existence, such as the Laboratory of Molecular Biology in Heidelberg, Germany, which put a limit on the size of their laboratories so that the chief researcher does not have too many people to be monitored adequately. When this is not done, an appropriate hierarchical structure must be created in its place to ensure better control of the results obtained.

From the journal point of view, the provision of freely available commentaries online is extremely useful as is the presentation of source data to editors and peer-reviewers, when such data is requested, while encouraging the publication of negative results.

The field of psychology is also facing the same credibility crisis in its published results according to the paper written by Yong. A study conducted in 1959 by the statistician Theodore Sterling reached the conclusion that 97% of studies in psychology published in the four major journals in the field reported positive results that were statistically significant. When the author repeated the study in 1995, he obtained the same result. One of the grounds for accounting for the plethora of positive results in the field of psychology is that high impact journals in this discipline favor the publication of groundbreaking, interesting and even eye-catching articles, which by necessity require positive results. The desire on the part of authors to obtain positive results ends up with them designing experiments – experimental protocol, number of samples, and other details – in such a way as to make it difficult, if not impossible, for them to be reproduced by another study. According to Brian Nosek (apud Yong 2012), a social psychologist at the University of Virginia, in his discipline, “ To show that ‘A’ is true, you don’t do ‘B’. You do ‘A’ again‘.”. In his article, Yong shows a chart taken from Fanelli which reveals the tendency to publish positive results in 18 fields of knowledge. The so-called “Hard Sciences” (Space Sciences, Geosciences, Physics and Computer Science) are amongst those which publish the least amount of positive results. Molecular Biology and Genetics, Biology and Biochemistry, Chemistry, and Economics are found somewhere in the middle, and finally, Psychology and Psychiatry, Clinical Medicine and Materials Science are at the top of the list of those subjects which publish the most positive results. The author correlates this rate with the fact that there is a greater proportion of irreproducibility in trials carried out in these fields.

A lot of research in the clinical field uses animal subjects to mimic surgical techniques and the effects of drugs and other tests on human beings, before they conduct clinical trials on actual humans. The reason for the use of guinea pigs is clear and many patients do, in fact, benefit from research carried out with new drugs. However, the quality of experiments carried out on animals must be improved.

The most reliable studies that include animal subjects with human diseases are those that employ sample randomization to eliminate systematic differences between groups; conducting a blind study, in other words, induce the condition under study without knowing if the animal will receive the drug being tested, and perform a blind evaluation as well. However, what is observed is that at most one in three publications follows these protections against bias, which indicates that authors, peer reviewers and editors attach little importance to them. As seen earlier, the sample size directly influences the reliability of the results, and few studies report a calculation of the size of the sample. The tendency to publish only positive results, potentially of interest to projects in the pharmaceutical industry, is also present in this area.

In order to improve the reliability of tests using animals, the ARRIVE – Animal Research: Reporting In Vivo Studies – initiative was created in 2010 which authors guidelines for the reporting of scientific experiments performed with animals, with validation by the Nature Publishing Group, amongst others. It is expected that the recently created guidelines will positively influence researchers to change the culture of the search for positive and immediate results.

More recently, the National Institutes of Health of the United States (NIH) launched measures to improve the reproducibility and the transparency of research results, with emphasis on the proper design of research protocols. These include promoting the mandatory training of an institution’s researchers, producing guidelines of research best practices to be made available on the NIH website, and creating checklists to ensure a systematic evaluation of applications for research funding. The NIH is also studying ways to increase the transparency of data that form the basis of submissions for publication, and the creation of repositories called Data Discovery Index – DDI – for depositing primary research data. This platform envisages that if an author uses the primary data obtained by another researcher, that researcher will be cited, thus creating a new metric for scientific contributions not tied to journals.

In December of 2013, the NIH launched an online forum for comments on published articles – PubMed Commons. Authors who have publications in PubMed can write comments on articles indexed in this database, as well as having access to comments from their colleagues. Currently, more than 2,000 authors are registered and have posted more than 700 comments.

Undoubtedly, reproducibility is not a topic that the NIH can face on its own. The scientific community, publishers, universities, professional associations, the industry and society as a whole are being invited to participate in this discussion whose aim is to restore the reliability of research results. According to authors at the NIH, the most crucial point is a shift in attitude in the academic incentive system which currently puts a premium on publications in journals with a high Impact Factor in the awarding of research funds and in the promotion of researchers, despite recent initiatives discouraging this practice.

The scenario of low reproducibility demonstrated in the examples above confirms the unwritten rule shared among academic and commercial researchers that at least 50% of the published studies, even those in journals with a high Impact Factor, are not reproducible by other laboratories, and not even in the same laboratory, with the same equipment, with the same people and the same experiments. As already mentioned, countless factors have been suggested to explain the low reliability of the results, such as insufficient sample sizes, incorrect or inadequate statistical analysis, and the competition amongst researchers and institutions for positive and groundbreaking results, all leading to confirm a hypothesis, even if there are many unpublished results which contradict it.

According to Printz et al., publishers and peer reviewers are in no position to repeat the experiments and get to the bottom of the results presented. Thus, many errors go undetected. In addition to this is the fact that the works rejected by one journal end up being published by others, without any significant changes or improvements.

References

BAKER, M. Independent labs to verify high-profile papers. Nature. 14 August 2012. Available from: <http://www.nature.com/news/independent-labs-to-verify-high-profile-papers-1.11176>.

COLLINS, F.S., and TABAK, L. A. Policy: NIH plans to enhance reproducibility. Nature. 27 jan 2014. Availabe from: <http://www.nature.com/news/policy-nih-plans-to-enhance-reproducibility-1.14586>.

Declaração recomenda eliminar o uso do Fator de Impacto na Avaliação de Pesquisa. SciELO em Perspectiva. [viewed 04 February 2014]. Available from: <http://blog.scielo.org/blog/2013/07/16/declaracao-recomenda-eliminar-o-uso-do-fator-de-impacto-na-avaliacao-de-pesquisa/>.

Editorial. Further confirmation needed. Nature Biotechnology. 2012. Available from: <http://www.nature.com/nbt/journal/v30/n9/full/nbt.2335.html>.

Editorial. Must try harder. Nature. 2012. Available from: <www.nature.com/nature/journal/v483/n7391/full/483509a.html>.

FANELLI, D. “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE. 2010. Available from: <http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0010068>.

IOANNIDIS, J. P. Why most published research findings are false. PLoS Med. 2005. Available from: <http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124>.

KILKENNY, C. et al. Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research. PLoS Biology. 2010. Available from: <http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000412>.

MACLEOD, M. Why animal research needs to improve. Nature. 29 September 2011. Available from: <http://www.nature.com/news/2011/110928/full/477511a.html>.

PRINZ, F., et al. Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews. 2011. Available from: <http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html>.

PubMed Commons: NLM lança versão piloto que permite comentários abertos sobre artigos. SciELO em Perspectiva. [viewed 04 February 2014]. Available from: <http://blog.scielo.org/blog/2013/12/20/pubmed-commons-nlm-lanca-versao-piloto-que-permite-comentarios-abertos-sobre-artigos/>.

Reproducibility of research results: a subjective view. SciELO in Perspective. [viewed 25 February 2014]. Available from: <http://blog.scielo.org/en/2014/02/19/reproducibility-of-research-results-a-subjective-view/>.

Reproducibility of research results: the tip of the iceberg. SciELO in Perspective. [viewed 28 February 2014]. Available from: <http://blog.scielo.org/en/2014/02/27/reproducibility-of-research-results-the-tip-of-the-iceberg/>.

YONG, E. Replication studies: Bad copy. Nature. 17 May 2012. Available from: <http://www.nature.com/news/replication-studies-bad-copy-1.10634>.

Externals links

Reproducibility Initiative – http://www.reproducibility.org/

Science Check – http://www.sciencecheck.org/

Data Discovery Index – DDI – http://grants.nih.gov/grants/guide/rfa-files/RFA-HL-14-031.html

PubMed Commons – http://www.ncbi.nlm.nih.gov/pubmedcommons/

About Lilian Nassi-Calò

Lilian Nassi-Calò studied chemistry at Instituto de Química – USP, holds a doctorate in Biochemistry by the same institution and a post-doctorate as an Alexander von Humboldt fellow in Wuerzburg, Germany. After her studies, she was a professor and researcher at IQ-USP. She also worked as an industrial chemist and presently she is Coordinator of Scientific Communication at BIREME/PAHO/WHO and a collaborator of SciELO.

Translated from the original in Portuguese by Nicholas Cop Consulting.

Como citar este post [ISO 690/2010]:

NASSI-CALÒ, L. Reproducibility of research results: on-going initiatives [online]. SciELO in Perspective, 2014 [viewed ]. Available from: https://blog.scielo.org/en/2014/03/07/reproducibility-of-research-results-on-going-initiatives/