Reproducibility in research results: the challenges of attributing reliability

By Lilian Nassi-Calò

One of the pillars of scientific research combines the trustworthiness of scientists and the reliability of results, which, in turn, support the hypothesis to be proved. Just as the lack of ethics in experimentation and scientific publication, the lack of reproducibility is considered a serious failure that contributes to jeopardize the credibility of science as a whole.

Studies indicate, however, that more than half of the experiments involving clinical trials of new drugs and treatments are irreproducible. John Ioannidis at Stanford University, US, goes on saying that most of the search results is actually false. Ioannidis is the author of a mathematical model that predicts that the smaller the sample and less stringent are the experimental methodology, definitions, outcomes and statistical analysis, the greater the probability of error. Furthermore, studies that hold financial and other interests or of great impact are also more prone to false results.

The “science hierarchy” that positions the Exact Sciences at the top, the Humanities at the base and the Biological Sciences between them both has over 200 years, says Daniele Fanelli1, then a researcher at the University of Edinburgh, UK (he is currently senior researcher at Stanford University, US). His study correlates the areas of knowledge, the proportion of positive results and the reliability of results based on the rigor employed to test the author’s hypothesis, held with analysis of over two thousand articles in all disciplines. His results, however, confirm the status of the Social Sciences against numerous arguments that it is a rather subjective field. Fanelli shows that when scientific approach is used, its degree of reliability is located close to that of the Natural Sciences.

In this controversial scenario comes a study called Reproducibility Project: Psychology2 aimed to evaluate the reproducibility of 100 research articles in Psychology, started in 2011 and completed in 2015, and moved by allegations of fraud and failed statistical analysis in classic studies on Psychology. The results, reported in Nature in 20153 show that only 39 of them could be reproduced. The results of this study, however, are not absolute, and there are several nuances ranging from “virtually identical” to “somewhat similar” and “not at all similar”. Among the 61 studies that failed to be reproduced, scientists have classified 24 of them as presenting results “moderately similar” to the original experiment, but were disapproved by not achieving statistical significance, a necessary criteria to be considered a successful replication.

This result would lead to the conclusion that Psychology is not a reproducible science. However, areas such as Cancer Biology, and studies on new drugs have reproducibility rates even lower, according to Fanelli, who considers the result on the Psychology study quite acceptable. The teams that led reproducibility tests were not always available to the same experimental conditions and certainly not the same as the original study patients. This certainly contributes to the low reproducibility of studies.

Brian Nosek, a social psychologist and head of the Center for Science in the US Open, leader of the Reproducibility Project, worked directly with about 270 employees in the replication of psychology studies. As Fanelli, he reported to Nature4 that there is no safe way to state that a certain article is reliable or not from this study. It may be that the original or replication are flawed or that there are substantial differences between them to allow for proper assessment. Nosek stresses that the goal of the Reproducibility Project is simply not vouch for how many articles are reliable, but warn of the publication of results that do not stand up to more detailed scrutiny and quantitatively evaluate the present bias in publications in psychology. He believes that only 3% of the funds for the research were used in evaluation of this nature, would make a huge difference.

As in other disciplines, it is known that it is not only methodological rigor or statistical significance at stake to approve an article for publication. Periodicals want to attract the attention of your readers to preferentially publish positive or controversial results, sometimes validated by statistical tests handpicked to meet the author’s needs. This is particularly common in biomedicine, and is going on a similar initiative, the Reproducibility Project: Cancer Biology. It is worth noting that its implementation faced in mid-2015 difficulties accessing the original data of the reviewed studies and has not yet been completed.

However, the area of psychology researchers resumed the study of the Reproducibility Project: Psychology and concluded that there is insufficient evidence to doubt the credibility of the publications, according to Daniel Gilbert, a psychologist at Harvard University, USA, and one of the authors of the review recently published in Science5. A response6 published in the same issue of the journal, however, disputes the review claiming that is based on selective assumptions.

Gilbert defends the reliability of psychology studies and claims that they are as reproducible as any other area. Moreover, in his view, the percentage of results considered confirmed by Reproducibility Project (39%) is of the same order of magnitude than would be expected by chance, even if the original study were true. Analyzing the experimental protocols of the project, it is clear that each study was played only once, showing low statistical significance to confirm or not the original results. In fact, an article published in February this year in PLoS7 reassesses statistical tests of the Project, and concludes that about one third of replication is inconclusive.

The controversy about the attempts to test and certify the reproducibility of scientific studies is viewed with optimism by Nosek and other scientists, it points out that transparency in scientific methodology and statistical verification of the results is crucial in each study. As for contestatory study Gilbert and colleagues, Nosek believes that it cannot be assessed as definitive.

An article about David Allison authoring Reproducibility studies, Department of Biostatistics, School of Public Health, University of Alabama, USA, and colleagues, was published in Nature in February this year8. In the trial, the authors assess how science is subject to errors and to what extent it autocorrects. Although many fraudulent items or methodology failed to emerge and find themselves portrayed, this, unfortunately, is not the general rule. “Consult a statistician after completing an experiment is how to perform an autopsy. You can perhaps find the cause of death of the experiment, “said the statistician Ronald Fisher, who died in 2015. In his opinion, post publication reviews also are post mortems because attest that the studies were conducted flawed methodology and validated by statistical tests also flawed, but little can be done at this stage.

In addition to psychology, studies in economics are being evaluated as the reproducibility. An article published in Science9 in early March reported a project to replicate 18 studies in economics published in two renowned journals between 2011 and 2014. The researchers found that 11 studies could be reproduced, that number rose to 14 when different criteria were used to evaluate reproducibility.

According to Nosek, these results do not necessarily indicate that studies in economics are more reproducible than those of psychology, mainly because the number of studies in the first case was smaller and concentrated on studies with simple relationships. In the opinion of John Bohannon, correspondent and collaborator of Science, most studies could not be reproduced employed as statistical test a p-value below 5% as significant. According to the author, although many are in accordance with the test fragility, few are willing to discuss it. Some authors who have had their unplayed results stated that the study methodology was careful, correct and transparent, but do not agree that this means that the original test was a false positive. “We believe it is more accurate to interpret the failure to replicate our study as a treatment failure”10.

Researchers who were not involved with any of reproducibility projects believe that these different results are inherent to the social sciences, because the population is very heterogeneous. The proposed solution, which serves to all areas of knowledge, would be based on findings in multiple studies on the same subject, to enhance its credibility.

Notes

1. FANELLI, D. “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE. 2010, vol. 5, nº 4, e10068. DOI: 10.1371/journal.pone.0010068. Available from: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0010068

2. Open Science Collaboration. Estimating the reproducibility of psychological science.Science. 2015, vol. 349, nº 6251, aac4716. DOI: 10.1126/science.aac4716. Available from http://osf.io/ezcuj/wiki/home/

3. BAKER, M. First results from psychology’s largest reproducibility test. Nature. 2015. DOI: 10.1038/nature.2015.17433. Available from http://www.nature.com/doifinder/10.1038/nature.2015.17433

4. BAKER, M. Over half of psychology studies fail reproducibility test. Nature. 2015. DOI: 10.1038/nature.2015.18248. Available from: http://www.nature.com/doifinder/10.1038/nature.2015.18248

5. GILBERT, D.T., et al. Comment on “Estimating the reproducibility of psychological science” Science. 2016. vol. 351, nº 6277, pp. 1037. DOI: 10.1126/science.aad7243. Available from: http://science.sciencemag.org/content/351/6277/1037.2

6. ANDERSON, C.J., et al. Response to Comment on “Estimating the reproducibility of psychological science”. Science. 2016, vol. 351, nº 6277, pp. 1037. DOI: 10.1126/science.aad9163. Available from: http://dx.doi.org/10.1126/science.aad9163

7. ETZ, A. and VANDEKERCKHOVE, J. A Bayesian Perspective on the Reproducibility Project: Psychology. PLoS ONE 2016, vol. 11, nº 2, e0149794. DOI: 10.1371/journal.pone.0149794.

8. ALLISON, D.B., et al. Reproducibility: A tragedy of errors. Nature. 2016, vol. 530, nº 7588, pp. 27-29. DOI: 10.1038/530027a. Available from: http://www.nature.com/news/reproducibility-a-tragedy-of-errors-1.19264

9. CAMERER, C.F. et al. Evaluating replicability of laboratory experiments in economics. Science. 2016, vol. 351, nº 6280, pp. 1433-1436. DOI: 10.1126/science.aaf0918. Available from: http://science.sciencemag.org/content/351/6280/1433

10. BOHANNON, J. About 40% of economics experiments fail replication survey. Science. 2016. DOI: 10.1126/science.aaf4141. Available from: http://www.sciencemag.org/news/2016/03/about-40-economics-experiments-fail-replication-survey

References

ALLISON, D.B., et al. Reproducibility: A tragedy of errors. Nature. 2016, vol. 530, nº 7588, pp. 27-29. DOI: 10.1038/530027a. Available from: http://www.nature.com/news/reproducibility-a-tragedy-of-errors-1.19264

ANDERSON, C.J., et al. Response to Comment on “Estimating the reproducibility of psychological science”. Science. 2016, vol. 351, nº 6277, pp. 1037. DOI: 10.1126/science.aad9163. Available from: http://dx.doi.org/10.1126/science.aad9163

BAKER, M. First results from psychology’s largest reproducibility test. Nature. 2015. DOI: 10.1038/nature.2015.17433. Available from http://www.nature.com/doifinder/10.1038/nature.2015.17433

BAKER, M. Over half of psychology studies fail reproducibility test. Nature. 2015. DOI: 10.1038/nature.2015.18248. Available from: http://www.nature.com/doifinder/10.1038/nature.2015.18248

BEGLEY, C.G. and ELLIS, L.M. Drug development: Raise standards for preclinical cancer research. Nature. 2012, vol. 483, 7391, pp. 531-533. DOI: 10.1038/483531a

BOHANNON, J. About 40% of economics experiments fail replication survey. Science. 2016. DOI: 10.1126/science.aaf4141. Available from: http://www.sciencemag.org/news/2016/03/about-40-economics-experiments-fail-replication-survey

CAMERER, C.F. et al. Evaluating replicability of laboratory experiments in economics. Science. 2016, vol. 351, nº 6280, pp. 1433-1436. DOI: 10.1126/science.aaf0918. Available from: http://science.sciencemag.org/content/351/6280/1433

ETZ, A. and VANDEKERCKHOVE, J. A Bayesian Perspective on the Reproducibility Project: Psychology. PLoS ONE 2016, vol. 11, nº 2, e0149794. DOI: 10.1371/journal.pone.0149794.

FANELLI, D. “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE. 2010, vol. 5, nº 4, e10068. DOI: 10.1371/journal.pone.0010068. Available from: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0010068

GILBERT, D.T., et al. Comment on “Estimating the reproducibility of psychological science” Science. 2016. vol. 351, nº 6277, pp. 1037. DOI: 10.1126/science.aad7243. Available from: http://science.sciencemag.org/content/351/6277/1037.2

IOANNIDIS, J. P. Why most published research findings are false. PLoS Med. 2005. DOI: 10.1371/journal.pmed.0020124. Available from: http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124.

NASSI-CALÒ, L. Reproducibility of research results: a subjective view. SciELO in Perspective. [viewed 06 March 2016]. Available from: http://blog.scielo.org/en/2014/02/19/reproducibility-of-research-results-a-subjective-view/

NASSI-CALÒ, L. Reproducibility of research results: the tip of the iceberg. SciELO in Perspective. [viewed 06 March 2016]. Available from: http://blog.scielo.org/en/2014/02/27/reproducibility-of-research-results-the-tip-of-the-iceberg/

Open Science Collaboration. Estimating the reproducibility of psychological science.Science. 2015, vol. 349, nº 6251, aac4716. DOI: 10.1126/science.aac4716. Available from http://osf.io/ezcuj/wiki/home/

PRINZ, F., SCHLANGE, T., and ASADULLAH, K. Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery. 2011, vol. 10, nº 712. DOI: 10.1038/nrd3439-c1. Available from: http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html

VAN NOORDEN, R. Sluggish data sharing hampers reproducibility effort. Nature. 2015. DOI: 10.1038/nature.2015.17694. Available from: http://www.nature.com/news/sluggish-data-sharing-hampers-reproducibility-effort-1.17694

External link

Reproducibility Project: Cancer Biology – <http://validation.scienceexchange.com/#/cancer-biology>

 

lilianAbout Lilian Nassi-Calò

Lilian Nassi-Calò studied chemistry at Instituto de Química – USP, holds a doctorate in Biochemistry by the same institution and a post-doctorate as an Alexander von Humboldt fellow in Wuerzburg, Germany. After her studies, she was a professor and researcher at IQ-USP. She also worked as an industrial chemist and presently she is Coordinator of Scientific Communication at BIREME/PAHO/WHO and a collaborator of SciELO.

 

Translated from the original in portuguese by Lilian Nassi-Calò.

 

How to cite this post [ISO 690/2010]:

NASSI-CALÒ, L. Reproducibility in research results: the challenges of attributing reliability [online]. SciELO in Perspective, 2016 [viewed ]. Available from: http://blog.scielo.org/en/2016/03/31/reproducibility-in-research-results-the-challenges-of-attributing-reliability/

 

2 Thoughts on “Reproducibility in research results: the challenges of attributing reliability

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation