4 August 2016

The Geoscience Papers of the Future: a modern publication strategy for data management and scientific publication

Posted by llipuma

This is part of a new series of posts that highlight the importance of Earth and space science data and its contributions to society. Posts in this series showcase data facilities and data scientists; explain how Earth and space science data is collected, managed and used; explore what this data tells us about the planet; and delve into the challenges and issues involved in managing and using data. This series is intended to demystify Earth and space science data, and share how this data shapes our understanding of the world.

By Xuan Yu and Leah Dodd

Rethinking geoscience papers: geoscience data explosion from the punch card age to the cloud storage age. (Adapted from doi:10.1002/2015EA000155). Credit: Xuan Yu.

In the early decades of 20th century, punch cards were used in data storage and processing. For youngsters out there, a punch card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions (Figure 1). Nowadays, cloud storage and cloud computing occupy scientific research, even in our daily lives—many scientific and personal data are stored on cloud storage repositories.

In geoscience research, diverse data are required to develop geoscience models for different purposes with representation of different processes at different spatial-temporal scales. Such projects require collaborative teams and are becoming a de facto strategy for Earth science research (e.g., Critical Zone Observatory, HydroShare, HyperHydro, Organic Data Science). Meanwhile, scientific publication in geoscience still focuses on the hypotheses as the tradition. Many data used in scientific papers are not accessible by reading the papers, which makes it difficult to understand and reuse. Current scientific publication practices are being challenged by the need to effectively communicate data results and preserve observations, simulations, and predictions.

In an effort to solve this problem, a pilot program – the Geoscience Papers of the Future (GPF) – was initiated in early 2015 to encourage geoscientists to publish papers together with the associated digital products of their research. This program is supported by OntoSoft, an National Science Foundation-funded EarthCube project. Multidisciplinary researchers from the OntoSoft Early Career Advisory Board worked with OntoSoft PIs to develop an initial definition, criteria, and list of best practices for GPFs, which has been published in the journal Earth and Space Science. The final goal of this program is to widely improve reproducibility, digital scholarship, and open research culture in geoscience.

One example is the application of GPF recommendations to make water cycle modeling data more accessible and useful. Researchers from the University of Delaware, Pennsylvania State University and the National Institute of Scientific Research in Quebec published their results in Earth and Space Science, which demonstrates that the modern publication strategy recommended by GPF can allow authors to completely document data workflow so that the simulations can be easily reproduced after the publication.

There are four key steps in modern publication strategy of data and software: persistent, linked, user-friendly, and sustainable (PLUS, Figure 1).

  • Persistent: Data, software, and authors should be persistently (i.e., consistently) identifiable. The process of the research (including data and software) should be assigned with persistent unique identifiers (e.g., data and software can be uploaded to online repositories with DOIs or PURLs (Permanent Uniform Resource Identifiers)).
  • Linked: Data and software should be linked in the computational workflow so that the software can be understood and reused by the readers. These links should include any intermediate data derived from the original data that represent essential information in the final figures of the article.
  • User-friendly: The software should be packaged with documentation and instructions so that readers can decide if the software can be reused in their work, and, if applicable, know how to apply it. It is important to be mindful of your audience and consider cross-disciplinary readers.
  • Sustainable: Authors are recommended to register an ORCID (Open Researcher and Contributor ID), so that readers can track research updates. Software should be maintained at repositories (e.g., GitHub, CRAN, and CodePlex) so that further development can be achieved (e.g., users will be notified when the software is updated and authors will receive suggestions and comments regarding next version of the software).

Through these publication practices, both data management and scientific publication are improved. GPF inspires routine documentation of all the associated digital objects in geoscience papers; complete dataflow (including intermediate data) are preserved and accessible. GPF also addresses knowledge sharing through complete dataflow; the study results are easier to reproduce and understand. Therefore, the knowledge in the scientific publication will be persuasive and can be reused.

With support from the Engagement Team at EarthCube, a distinguished lecture tour was launched in 2016 to promote research transparency and the modern publication strategy for geoscientists. These lectures have been presented or scheduled at the University of Delaware, University of Pennsylvania, USGS, University of Maryland, Michigan State University, Princeton University, University of Florida, and University of Central Florida. In these lectures, the audience is introduced to the EarthCube project. And then the method of modern publication strategy: GPF is explained, followed by an application of GPF in a coastal hydrologic study.

Screen Shot 2016-08-04 at 1.21.15 PM

Figure 2. Feedback on EarthCube Distinguished Lecture. Credit: Xuan Yu.

The lectures use social media to engage the audience and broader communities. Mini-whiteboards were prepared for each audience to encourage them to write down their thoughts. At the end of the lecture, their feedback was posted to Twitter as pictures. The audience demonstrated passion for research reproducibility, data management, and future geoscience publication (Figure 2). You can find these by searching the #whyearthcubedistinguishedlecture hashtag on Twitter.

To increase awareness of GPF among geoscientists, hands-on training sessions are available, and training materials are freely available online. Please subscribe to this mailing list if you would like to receive notifications of training sessions and other general announcements about GPF. More GPF examples can be found in this special issue of Earth and Space Science.
Xuan Yu is the co-chair of the Engagement Team at EarthCube. Leah Dodd is a digital outreach specialist at the University of Delaware.