In June, I attended the World Conference on Research Integrity in Athens. I am still inspired by the many fruitful and fun encounters with colleagues from different places in the world and by thought provoking presentations. One of these talks will form the basis of this blogpost. It is the keynote of Daniele Fanelli entitled “Cautionary tales from metascience” in a session on the effects of research integrity on innovation and policy making. Among his main messages were (1) that replication rates are not as bad as we make them to be,(2) that reproducibility is related to the complexity of the research at hand, (3) that changing policies as a reaction to the reproducibility crisis might do more harm than good and that (4) it is not a one size fits all. Below, I will discuss these issues and will try to conclude what this means for the work within NLRN.
Let’s start with his premises that replication rates from the literature are actually not that bad. He mentioned rates of 60-90%, taking the higher values from ranges reported in the literature. I think a more fair representation would be a median rate between 50-60%. Whether that means that we are in a crisis is a different question. Crises are usually associated with specific periods in time and it’s probably reasonable to assume that replication rates would not have been much different 20 or 30 years ago, if there had been replication studies at that time. Fanelli went on to mention the large variance in replication rates across studies and apparently also (sub) disciplines and there he has a good point.
Fanelli presented results from own work, performed with data from the Brazilian reproducibility initiative, showing that complexity might indeed be related to replication. So there is at least some empirical evidence for his statement. It also seems logical that simpler, more straightforward research or research in a mature field, where there is a high degree of consensus on methods and procedures, would be easier to reproduce and results to replicate.
Fanelli went on to argue that policies focusing on incentive structures are not effective in combating questionable research practices (QRP). He showed that bias and questionable research practices are overall only strongly related to country of first author. Additionally , within countries in which QPRs are prevalent, incentive structures and publication pressure seem to be important drivers, but in other countries, these things do not seem to be related to QPRs. This would, according to Fanelli, imply that policies focused on these things would not be effective in many countries. Here I think Fanelli jumps to the conclusion a bit too quickly. All of his evidence comes from meta-research, which is by nature observational and on an aggregated level. This means that there might be confounds underlying the relations he showed. Moreover, we would need intervention studies to explore whether intervening on these aspects change outcomes. Such studies are scarce. In the field of reproducibility, there is some evidence that rigorous-enhancing practices in both original studies and replication studies can lead to high replication rates and effect sizes that are virtually unchanged in the replications.1 These practices included confirmatory tests, large sample sizes, preregistration and methodological transparency . However, this multi-lab study was done in social psychology and it is uncertain how results will be in other fields or (sub) disciplines.
All in all, there is not much evidence yet for policy interventions improving the reproducibility and replication of studies and it is probably not one size fits all. Fanelli concludes that policy should be light and adaptive and that makes sense. We will have to strike a balance between incorporating some generic principles and leaving enough room for discipline, field and country/region specific differences. How do we know what works for whom? By developing interventions / policies together with academic and non academic staff, piloting and evaluating these and when deemed viable, by implementing them on a broader scale and evaluating and adapting where necessary. These efforts need continuous monitoring. The reproducibility networks are ideally suited to support these efforts through their network of research performing institutions, communities of researchers and educators and other relevant stakeholders.
Within the Dutch Reproducibility Network we acknowledge the specificity of reproducibility and replication across disciplines and fields, which is why one of our focus areas for the coming years is non-quantitative research. We are eager to work on these and other pressing issues with our partners, striving for evidence-informed implementation of interventions and policies on reproducibility.
1 Protzko, J., Krosnick, J., Nelson, L. et al. High replicability of newly discovered social-behavioural findings is achievable. Nat Hum Behav 8, 311–319 (2024). https://doi.org/10.1038/s41562-023-01749-9
find the slides to Danielle Fanelli’s talk here: https://az659834.vo.msecnd.net/eventsairwesteuprod/production-pcoconvin-public/e2e0a11ab8514551a3376e9b49af030d
Thanks for this post. The Protzko et al. study featured in this post was recently retracted by the journal that published it. A piece worth reading about the retraction, including reflections of the original authors, is here: https://www.nature.com/articles/d41586-024-03178-8
Thanks Rene! It goes to show that we also need to scrutinize replication studies and other meta-science endavours. Interesting to see whether the authors will indeed resubmit and what the amended paper will look like.