We know that software is important in research, and some of us in the scholarly communications community, for example, in FORCE11, have been pushing the concept of software citation as a method to allow software developers and maintainers to get academic credit for their work: software releases are published and assigned DOIs, and software users then cite these releases when they publish research that uses the software.

DataCite recently examined the DOIs that have been created for software, and found that the number of new DOIs created for software is growing roughly exponentially, now reaching about 2000 software DOIs per month, with spikes of around 4000 per month in some of 2017. The data and results are shown here. The source code for the R script used to generate the data and figures is available (Fenner, Katz, Smith, & Nielsen (2018)).

As of May 16, 2018, 58,301 DOIs have been registered for software. We can break down this number by repository where the software source code is hosted – most DOIs for software have been registered at Zenodo.

CERN.ZENODO - ZENODO - Research. Shared. 41346
FIGSHARE.ARS - figshare Academic Research System 4226
PURDUE.NCIB - National Cancer Institute, Bioconductor 2769
PURDUE.EZID - Purdue University 2463
OSTI.DOE - DOE Generic 736
INIST.INRA - Institut National de Recherche Agronomique 223
OCEAN.OCEAN - Code Ocean 206
CRUI.INFNCNAF - Istituto Nazionale di Fisica Nucleare. Centro Nazionale Analisi Fotogrammi 190
CDL.UCI - UC Irvine Library 120
ETHZ.DA-RD - ETHZ Data Archive - Research Data 88

Changes over Time

How did these numbers change over time, since the he first DataCite DOI for software was registered September 7th, 2011 by the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Germany (Colmsee, Flemming, Klapperstück, Lange, & Scholz (2011))?

We can start by looking at the Zenodo/GitHub integration, where users can archive a GitHub repository in the Zenodo data repository. The integration was launched in February 2014 and we can see a nice correlation with this data, and with a May 2014 blog post by Arfon Smith on the GitHub blog, describing (and advertising) the integration work.

Software DOIs registered at DataCite

Software DOIs registered at DataCite

In September 2016, the FORCE11 Software Citation Principles (A. M. Smith, Katz, Niemeyer, & FORCE11 Software Citation Working Group (2016)) were published, the Zenodo/GitHub integration was upgraded (@http://help.zenodo.org/whatsnew/), and in October 2016 the GitHub Guide to Making your Code Citable was updated. There appears to be a change of in the rate of growth around this time as well.

Looking forward

We see a nice exponential growth in the number of DOIs for software, and we don’t expect this to change in 2018 and beyond. The FORCE11 Software Citation Implementation Working Group is working on implementation and adoption of the Software Citation Principles, and for a number of use cases, e.g., citation in a journal article, DOIs play an important role. The working group also tries to address the challenges in using DOIs as identifiers for software that still exist, and what is done to resolve them, including pre-registration APIs to smooth the automated push-style deposit; better semantic linkage supported by extensions to the DataCite schema, and group/collective/microcitation DOI use.

We expect initiatives such as Citation File Format and Software Heritage to have a positive impact on the number of DOIs for software. A paper on persistent identification and citation of software using DOIs by Jones et al (C. M. Jones, Matthews, Gent, Griffin, & Tedds (2017)) was published in July 2017, based on earlier work from 2015 (Gent, Jones, & Matthews (2015)), and the DataCite Metadata 4.1 schema focussing on software citation was released in September 2017 (DataCite Metadata Working Group (2017), Starr (2017)).

CodeMeta (Boettiger (2017), M. B. Jones et al. (2017)) is particularly relevant; this new standard for software metadata simplifies the crosswalk between the wide variety of metadata standards for software, and is increasingly integrated into DOI registration workflows, including the CaltechDATA repository since March 2018, the DataCite DOI registration service since May 2018 (Fenner (2018), Dasler (2018)) and is planned for the Zenodo/GitHub integration in autumn 2018. CodeMeta libraries are currently available for R (Codemetar, Boettiger et al. (2018)), Ruby (Bolognese, Fenner (2017)) and Python (CodeMetaPy).

References

Boettiger, C. (2017, January). Codemeta: A Rosetta Stone for Software Metadata. figshare. https://doi.org/10.6084/m9.figshare.4490588

Boettiger, C., Salmon, M., Arfon Smith, Ross, N., Leinweber, K., & Krystalli, A. (2018). Ropensci/Codemetar: Codemetar: Generate Codemeta Metadata For R Packages. Zenodo. https://doi.org/10.5281/zenodo.1241346

Colmsee, C., Flemming, S., Klapperstück, M., Lange, M., & Scholz, U. (2011). A case study for efficient management of high throughput primary lab data. Leibniz Institute of Plant Genetics; Crop Plant Research (IPK). https://doi.org/10.5447/ipk/2011/0

Dasler, R. (2018). DOI Fabrica 1.0 is Here! DataCite. https://doi.org/10.5438/0yk5-b755

DataCite Metadata Working Group. (2017). DataCite Metadata Schema for the Publication and Citation of Research Data v4.1. DataCite. https://doi.org/10.5438/0014

Fenner, M. (2017). Bolognese: a Ruby library for conversion of DOI Metadata. DataCite. https://doi.org/10.5438/n138-z3mk

Fenner, M. (2018). Frontend for the DataCite DOI Fabrica service. DataCite. https://doi.org/10.5438/CXE5-RG55

Fenner, M., Katz, D. S., Smith, A., & Nielsen, L. H. (2018). DOI Registrations for Software. DataCite. https://doi.org/10.5438/wr0x-e194

Gent, I., Jones, C., & Matthews, B. (2015). Guidelines for persistently identifying software using DataCite. Retrieved from http://purl.org/net/epubs/work/24058274

Jones, C. M., Matthews, B., Gent, I., Griffin, T., & Tedds, J. (2017). Persistent Identification and Citation of Software. International Journal of Digital Curation, 11(2), 104–114. https://doi.org/10.2218/ijdc.v11i2.422

Jones, M. B., Boettiger, C., Mayes, A. C., Smith, A., Slaughter, P., Niemeyer, K., … Goble, C. (2017). CodeMeta: an exchange schema for software metadata. KNB Data Repository. https://doi.org/10.5063/schema/codemeta-2.0

Smith, A. M., Katz, D. S., Niemeyer, K. E., & FORCE11 Software Citation Working Group. (2016). Software citation principles. PeerJ Computer Science, 2, e86. https://doi.org/10.7717/peerj-cs.86

Starr, J. (2017). New DataCite Metadata Updates Support Software Citation. https://doi.org/10.5438/NZHX-XX96


Blog Logo

Martin Fenner

DataCite Technical Director

https://orcid.org/0000-0003-1419-2405

Blog Logo

Daniel S. Katz

Assistant Director for Scientific Software and Applications at NCSA, Research Associate Professor at the University of Illinois Urbana-Champaign

https://orcid.org/0000-0001-5934-7525

Blog Logo

Lars Holm Nielsen

Zenodo Project Leader at CERN.

https://orcid.org/0000-0001-8135-3489

Blog Logo

Arfon Smith

Head of the Data Science Mission Office at Space Telescope Science Institute.

https://orcid.org/0000-0002-3957-2474

DOI Registrations for Software

https://doi.org/10.5438/1nmy-9902

History

© 2018 Martin Fenner, Daniel S. Katz, Lars Holm Nielsen, and Arfon Smith. Distributed under the terms of the Creative Commons Attribution license.

doi, software, featured