The shocking assertion will be that most statistics in most scientific papers has errors. —Charles Geyer
Pre-lecture materials
Read ahead
Before class, you can prepare by reading the following materials:
- Statistical programming, Small mistakes, big impacts by Simon Schwab and Leonhard Held
- Reproducible Research: A Retrospective by Roger Peng and Stephanie Hicks
Acknowledgements
Material for this lecture was borrowed and adopted from
Learning objectives
At the end of this lesson you will:
- Know the difference between replication and reproducibility
- Identify valid reasons why replication and/or reproducibility is not always possible
- Identify the type of reproducibility
- Identify key components to enable reproducible data analyses
Introduction
From a young age, we have learned that scientific conclusions should be reproducible. After all, isnʻt that what the methods section is for? We are taught to write methods sections so that any scientist could, in theory, repeat the experiment with the idea that if the phenomenon is true, they should obtain comparable results and more often than not should come to the same conclusions.
But how repeatable is modern science? Many experiments are now so complex and so expensive that repeating them is not practical. However, it is even worse than that. As datasets get larger and analyses become ever more complex, there is a growing concern that even given the data, we still cannot necessarily repeat the analysis. This is called “the reproducbility crisis”.
Recently, there has been a lot of discussion of reproducibility in the media and in the scientific literature. The journal Science had a special issue on reproducibility and data replication.
Take for example a recent study by the Crowdsourced Replication Initiative (2022). It was a massive effort by 166 coauthors published in PNAS to test repeatability:
- 73 research teams from around the world analyzed the same social science data.
- They investigated the same hypothesis: that more immigration will reduce public support for government provision of social policies.
- Together they fit 1261 statistical models and came to widely varying concluisons.
- A meta-analysis of the results by the PIs could not explain the variation in results. Even after accounting for the choices made by the research teams in designing their statistical tests, 95% of the total variation remained unexplained.
- The authors claim that “a hidden universe of uncertainty remains.”
This should be very disturbing. It was very disturbing to me! Greyer notes that the meta-analysis did not investigate how much of the variability of results was due to outright error. He furthermore notes that while the meta-analysis was done in a reproducibly, the original 73 analyses were not. What does he mean?
Some of the issues from a statisticianʻs perspective
Greyer provides nine ideas worth considering:
- Most scientific papers that need statistics have conclusions that are not actually supported by the statistical calculations done, because of
- mathematical or computational error,
- statistical procedures inappropriate for the data, or
- statistical procedures that do not lead to the inferences claimed.
- Good computing practices — version control, well thought out testing, code reviews, literate programming — are essential to correct computing.
- Failure to do all calculations from raw data to conclusions (every number or figure shown in a paper) in a way that is fully reproducible and available in a permanent public repository is, by itself, a questionable research practice.
- Failure to do statistics as if it could have been pre-registered is a questionable research practice.
- Journals that use P < 0.05 as a criterion of publication are not scientific journals (publishing only one side of a story is as unscientific as it is possible to be).
- Statistics should be adequately described, at least in the supplementary material.
- Scientific papers whose conclusions depend on nontrivial statistics should have statistical referees, and those referees should be heeded.
- Not all errors are describable by statistics. There is also what physicists call systematic error that is the same in every replication of an experiment. Physicists regularly attempt to quantify this. Others should too.
A reasonable ideal for reproducible research today - Research should be reproducible. Anything in a scientific paper should be reproducible by the reader. - Whatever may have been the case in low tech days, this ideal has long gone. Much scientific research in recent years is too complicated and the published details to scanty for anyone to reproduce it. - The lack of detail is not entirely the author’s fault. Journals have severe page pressure and no room for full explanations. - For many years, the only hope of reproducibility is old-fashioned person-to-person contact. Write the authors, ask for data, code, whatever. Some authors help, some don’t. If the authors are not cooperative, tough. - Even cooperative authors may be unable to help. If too much time has gone by and their archiving was not systematic enough and if their software was unportable, there may be no way to recreate the analysis. - Fortunately, the internet comes to the rescue. No page pressure there! - Nowadays, many scientific papers also point to supplementary materials on the internet. Data, computer programs, whatever should be there, permanently. Ideally with a permanent Document Identifier or DOI. There are complaints that many Supplmentary Materials are incomprehensible, but that can be improved with practices of reproducible reserach.
Therefore, at the very least scientists should use in their statistical programming - version control, - software testing, - code reviews, - literate programming, and - all data and code available in a permanent public repository.
Some journals have specific policies to promote reproducibility in manuscripts that are published in their journals. For example, the Journal of American Statistical Association (JASA) requires authors to submit their code and data to reproduce their analyses and a set of Associate Editors of Reproducibility review those materials as part of the review process:
Recommendations
Post-lecture materials
Final Questions
Here are some post-lecture questions to help you think about the material discussed.
Why can replication be difficult to achieve? Why is reproducibility a reasonable minimum standard when replication is not possible?
What is needed to reproduce the results of a data analysis?
Additional Resources
- Reproducibility and Error by Charles J. Geyer