Building a Vision for Reproducibility in the Cyberinfrastructure Ecosystem: Leveraging Community Efforts
DOI:
https://doi.org/10.14529/jsfi200106Abstract
The scientific computing community has long taken a leadership role in understanding and assessing the relationship of reproducibility to cyberinfrastructure, ensuring that computational results - such as those from simulations - are "reproducible", that is, the same results are obtained when one re-uses the same input data, methods, software and analysis conditions. Starting almost a decade ago, the community has regularly published and advocated for advances in this area. In this article we trace this thinking and relate it to current national efforts, including the 2019 National Academies of Science, Engineering, and Medicine report on "Reproducibility and Replication in Science".
To this end, this work considers high performance computing workflows that emphasize workflows combining traditional simulations (e.g. Molecular Dynamics simulations) with in situ analytics. We leverage an analysis of such workflows to (a) contextualize the 2019 National Academies of Science, Engineering, and Medicine report's recommendations in the HPC setting and (b) envision a path forward in the tradition of community driven approaches to reproducibility and the acceleration of science and discovery. The work also articulates avenues for future research at the intersection of transparency, reproducibility, and computational infrastructure that supports scientific discovery.
References
Aasi, J., Abbott, B.P., Abbott, R., et al.: The LIGO scientific collaboration. Classical and Quantum Gravity 32(7), 074001 (2015), DOI: 10.1088/0264-9381/32/7/074001
Acernese, F., Agathos, M., Agatsuma, K., et al.: Advanced Virgo: a second-generation interferometric gravitational wave detector. Class.Quant. Grav. 32, 2 32(2) (2015), DOI: 10.1088/0264-9381/32/2/024001
Adorf, C.S., Dodd, P.M., Ramasubramani, V., et al.: Simple data and workflow management with the signac framework. Computational Materials Science 146, 220–229 (2018), DOI: 10.1016/j.commatsci.2018.01.035
Bailey, D., Barrio, R., Borwein, J.: High-precision computation: Mathematical physics and dynamics. Applied Mathematics and Computation 218(20), 10106–10121 (2012), DOI: 10.1016/j.amc.2012.03.087
Barba, L.A.: SC reproducibility initiative author-kit. https://github.com/SC-Tech-Program/Author-Kit (2013)
Barba, L.A.: The hard road to reproducibility. Science 354(6308), 142–142 (2016)
Brinckman, A., Chard, K., Gaffney, N., et al.: Computing environments for reproducibility: Capturing the “Whole Tale”. Future Generation Comp. Syst. 94, 854–867 (2019), DOI: 10.1016/j.future.2017.12.029
Brumfiel, G.: Neutrinos not faster than light. ICARUS experiment contradicts controversial claim. Nature (2012)
Buckheit, J.B., Donoho, D.L.: WaveLab and Reproducible Research, pp. 55–81. Springer, New York, NY (1995), DOI: 10.1007/978-1-4612-2544-7_5
Canon, R.S., Younge, A.: A case for portability and reproducibility of HPC containers. In: IEEE/ACM International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC, CANOPIE-HPC, 18 Nov. 2019, Denver, CO, USA. pp. 49–54. IEEE (2019), DOI: 10.1109/CANOPIE-HPC49598.2019.00012
Chapp, D., Johnston, T., Taufer, M.: On the need for reproducible numerical accuracy through intelligent runtime selection of reduction algorithms at the extreme scale. In: Proceedings of the 2015 IEEE International Conference on Cluster Computing. pp. 166–175 (2015), DOI: 10.1109/CLUSTER.2015.34
Chapp, D., Rorabaugh, D., Brown, D.A., et al.: Applicability study of the PRIMAD model to LIGO gravitational wave search workflows. In: Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS@HPDC 2019. pp. 1–6 (2019), DOI: 10.1145/3322790.3330591
Chapp, D., Sato, K., Ahn, D., et al.: Record-and-replay techniques for HPC systems: A survey. Supercomputing Frontiers and Innovations 5(1), 11–30 (2018), DOI: 10.14529/jsfi180102
Chard, K., Gaffney, N., Hatigan, M., et al.: Toward enabling reproducibility for data-intensive research using the Whole Tale platform. In: Proceedings of the International Conference on Parallel Computing, PARCO 2019. Advances in Parallel Computing, IOS Press (2019)
Chard, K., Gaffney, N., Jones, M.B., et al.: Implementing computational reproducibility in the Whole Tale environment. In: Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems, P-RECS ’19. pp. 17–22. ACM, New York, NY, USA (2019), DOI: 10.1145/3322790.3330594
Chen, X., Dallmeier-Tiessen, S., Dasler, R. et al.: Open is not enough. Nature Physics 15, 113–119 (2019), DOI: 10.1038/s41567-018-0342-2
Chirigati, F., Shasha, D., Freire, J.: Reprozip: Using provenance to support computational reproducibility. In: Presented as part of the 5th USENIX Workshop on the Theory and Practice of Provenance. USENIX (2013)
Claebout, J.: Hypertext documents about reproducible research (1994), http://sepwww.stanford.edu/doku.php
Claerbout, J.F., Karrenbach, M.: Electronic documents give reproducible research a new meaning. In: SEG Technical Program Expanded Abstracts 1992, pp. 601–604. Society of Exploration Geophysicists (1992), DOI: 10.1190/1.1822162
Deelman, E., Vahi, K., Juve, G., et al.: Pegasus: a workflow management system for science automation. Future Generation Computer Systems 46, 17–35 (2015), DOI: 10.1016/j.future.2014.10.008
Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: 2013 IEEE 21st Symposium on Computer Arithmetic, 7-10 April 2013, Austin, TX, USA. pp. 163–172. IEEE (2013), DOI: 10.1109/ARITH.2013.9
Donoho, D.L., Maleki, A., Rahman, I.U., Shahram, M., Stodden, V.: Reproducible research in computational harmonic analysis. Computing in Science Engineering 11(1), 8–18 (2009), DOI: 10.1109/MCSE.2009.15
Forde, J., Head, T., Holdgraf, C., et al.: Reproducible research environments with repo2docker. In: ICML 2018 Reproducible Machine Learning. ICML (2018)
Freire, J., Fuhr, N., Rauber, A.: Reproducibility of Data-Oriented Experiments in e-Science (Dagstuhl Seminar 16041). Dagstuhl Reports 6(1), 108–159 (2016), DOI: 10.4230/DagRep.6.1.108
Gamblin, T., LeGendre, M., Collette, M.R., et al.: The Spack package manager: bringing order to HPC software chaos. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’15, 15-20 Nov. 2015, Austin, TX, USA. pp. 1–12. IEEE (2015), DOI: 10.1145/2807591.2807623
Gopalakrishnan, G., Hovland, P.D., Iancu, C., et al.: Report of the HPC correctness summit, Jan 25–26, 2017, Washington, DC. CoRR abs/1705.07478 (2017), https://arxiv.org/abs/1705.07478
He, Y., Ding, C.H.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. The Journal of Supercomputing 18(3), 259–277 (2001), DOI: 10.1023/A:1008153532043
Honarmand, N., Torrellas, J.: Replay debugging: Leveraging record and replay for program debugging. SIGARCH Comput. Archit. News 42(3), 445–456 (2014), DOI: 10.1145/2678373.2665737
James, D., Wilkins-Diehr, N., Stodden, V., Colbry, D., Rosales, C., et al.: Standing together for reproducibility in large-scale computing: Report on reproducibility@xsede. CoRR abs/1412.5557 (2014), http://arxiv.org/abs/1412.5557
Jimenez, I., Arpaci-Dusseau, A., Arpaci-Dusseau, R., et al.: PopperCI: Automated reproducibility validation. In: 2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS, 1-4 May 2017, Atlanta, GA, USA. pp. 450–455. IEEE (2017), DOI: 10.1109/INFCOMW.2017.8116418
Jimenez, I., Sevilla, M., Watkins, N., et al.: The Popper convention: Making reproducible systems evaluation practical. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW, 29 May-2 June 2017, Lake Buena Vista, FL, USA. pp. 1561–1570. IEEE (2017), DOI: 10.1109/IPDPSW.2017.157
Mirowski, P.: The future(s) of open science. Social Studies of Science 48(2), 171–203 (2018), DOI: 10.1177/0306312718772086
National Academies of Sciences, Engineering, and Medicine: Reproducibility and Replicability in Science. The National Academies Press, Washington, DC (2019), DOI: 10.17226/25303
Peng, R.D.: Reproducible research and biostatistics. Biostatistics 10(3), 405–408 (2009), DOI: 10.1093/biostatistics/kxp014
Sato, K., Laguna, I., Lee, G.L., et al.: PRUNERS: Providing reproducibility for uncovering non-deterministic errors in runs on supercomputers. The International Journal of High Performance Computing Applications 33(5), 777–783 (2019), DOI: 10.1177/1094342019834621
Sawaya, G., Bentley, M., Briggs, I., et al.: FLiT: Cross-platform floating-point resultconsistency tester and workload. In: 2017 IEEE international symposium on workload characterization, IISWC, 1-3 Oct. 2017, Seattle, WA, USA. pp. 229–238. IEEE (2017), DOI: 10.1109/IISWC.2017.8167780
Schwab, M., Karrenbach, N., Claerbout, J.: Making scientific computations reproducible. Computing in Science Engineering 2(6), 61–67 (2000), DOI: 10.1109/5992.881708
Stodden, V.: Resolving irreproducibility in empirical and computational research. IMS Bulletin (November 2013), http://bulletin.imstat.org/2013/11/resolving-irreproducibility-in-empirical-and-computational-research/
Stodden, V., Borwein, J., Bailey, D.H.: Setting the default to reproducible in computational science research. SIAM News 46(5), 4–6 (2013), https://sinews.siam.org/Details-Page/setting-the-default-to-reproducible-in-computational-science-research
Stodden, V., Krafczyk, M.: Assessing reproducibility: An astrophysical example of computational uncertainty in the HPC context. In: The 1st Workshop on Reproducible, Customizable and Portable Workflows for HPC, HPC18 (2018)
Stodden, V., Leisch, F., Peng, R.D.: Implementing Reproducible Research. The R Series, Chapman & Hall/CRC (2014)
Stodden, V., McNutt, M., Bailey, D.H., et al.: Enhancing reproducibility for computational methods. Science 354(6317), 1240–1241 (2016), DOI: 10.1126/science.aah6168
Stodden, V., Miguez, S.: Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software 2(1) (2014), DOI: 10.5334/jors.ay
Taufer, M., Anderson, D., Cicotti, P., et al.: Homogeneous redundancy: A technique to ensure integrity of molecular simulation results using public computing. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 4-8 April 2005, Denver, CO, USA. IEEE (2005), DOI: 10.1109/IPDPS.2005.247
Thomas, S., Wyatt, M., Do, T.M.A., et al.: Characterizing in situ and in transit analytics of molecular dynamics simulations for next generation supercomputers. In: Proceedings of the International Conference on eScience, eScience’19, 24-27 Sept. 2019, San Diego, CA, USA. pp. 188–198. IEEE (2019), DOI: 10.1109/eScience.2019.00027
Wild, S.: Irreproducible astronomy. Physics Today (2018), DOI: 10.1063/PT.6.1.20180404a
Downloads
Published
How to Cite
Issue
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-Non Commercial 3.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.