I don’t fully understand it, but I know the Astronomical Journal (AJ) and Astrophysical Journal (ApJ) are different than many other journals: They are run by the American Astronomical Society (AAS) and not by a for-profit publisher. That means that the AAS Council and the members (the people actually producing and reading the science) have a lot of control over how the journals are run. In a recent President’s Column, the AAS President, David Helfand proposed a radical, yet obvious, idea for propelling our field into the realm of data sharing and open access: require all journal articles to be accompanied by the data on which the conclusions are based.
We are a data-rich—and data-driven—field [and] I am advocating [that authors provide] a link in articles to the data that underlies a paper’s conclusions…In my view, the time has come—and the technological resources are available—to make the conclusion of every ApJ or AJ article fully reproducible by publishing the data that underlie that conclusion. It would be an important step toward enhancing and sharing our scientific understanding of the universe.
While David makes the case for reproducibility in his column, I think it’s much bigger than that: it’s also about enabling other investigators to use the data for other projects. For example, incorporating it into a bigger sample, using it as one epoch in a variability study, or by studying a feature that the original authors were not interested in. By making all of our fully reduced data available for the rest of the community to use, we are greatly increasing the return on investment. Especially as “small” telescopes are being closed and our access to resources is declining, we need to be getting the most bang for our buck out of each photon and CPU cycle.
In my opinion, the greatest obstacle to implementing this idea is the infrastructure necessary for making proper data archival and sharing a reality. We need a tool that makes both data ingestion and data discovery easy and intuitive. It needs to accommodate ground-based, space-based, and model data. It should somehow be linked to the the major wide-sky surveys, SIMBAD, VizieR, and ADS. There need to be guidelines for quality flagging, raw vs. reduced, appropriate citation, and so many other things. This is a huge project but absolutely achievable with current technology. We just need to figure out how to get it done. (I actually thought the Virtual Observatory was going to do this, but in my current understanding, building the data archive was never actually part of the project.)
Ase we talked about last week, MAST is providing a huge step in this direction by accepting user-contributed data but only data from “a MAST-supported mission (e.g. HST, FUSE, GALEX, IUE, EUVE etc.), or ground-based observations closely related to a MAST mission.” Ground-based observations related to Spitzer, WISE, 2MASS, etc. are accepted by the Infrared Science Archive (IRSA) but it looks like they focus mainly on large “Legacy” projects and not your average-Jane’s dataset. And as far as I know, there’s no supported repository for hosting reduced data products from wholly ground-based programs.
Regardless of the existence of an archive that will take your data, there is still a more fundamental problem which needs to be addressed. Until there is real incentive or enforced requirements for making data available, tarballs of nonsense will continue to be emailed and underutilized datasets will continue to languish on our hard drives or hidden on personal websites. Sure, NSF and NASA require Data Management Plans, but there is no enforcement of any of the promises made in those plans.
What does a fully functional data archival and sharing tool look like to you? Is it many websites like we have now that just need to be linked together? or one central portal? Do you think published data should be accessible? How else do you think data publication and archival could be encouraged in our community? How high of a priority do you think this should be?
I close with this. It’s scary how familiar this conversation is, especially the bit in Act 3 about field names: