DataCite and the FREYA project partners are proud to announce the official launch of DataCite Commons today. DataCite Commons is the web interface to explore the PID Graph, formed by the publications, datasets, research software, and other research outputs generated by researchers working at research institutions and supported by grant funding (???). The PID Graph depends on persistent identifiers to uniquely identify all these resources, and metadata that describe these resources and their connections.
We launched a pre-release version of DataCite Commons in August [Fenner (2020a)] and have used the last two months not only for many small improvements and bug fixes, but also to add two important new features: a statistics page that describes the information available in the PID Graph, and personal accounts that allow for ORCID claiming and other functionality going forward.
While DataCite Commons includes all PIDs and metadata from DataCite and Research Organization Registry (ROR), it currently includes only a subset of metadata from ORCID, and only a subset of DOIs and metadata from Crossref. Other persistent identifiers for scholarly resources will be added over time. The statistics page shows the current coverage of DataCite Commons and thus the FREYA PID Graph. The statistics page also shows the current numbers of the connections between works, between works and people, and between works and organizations, allowing us to track the growth of the PID Graph over time.
User accounts are the other big change in this DataCite Commons release. They make it easier to navigate to your personal DataCite Commons page, listing all your publications, datasets, and software that DataCite Commons knows about. But more importantly, these personal accounts enable ORCID claiming, adding one or more works to your ORCID Record. When you search for works in DataCite Commons after logging in, you can see the works that have already been sent to your ORCID Record by DataCite, or an error message is shown if something went wrong. The next step, actual claiming from DataCite Commons search results, is in the final phase of development and will be open for beta testers in November.
This release of DataCite Commons wraps up the work on the PID Graph in the FREYA project, which will end at the end of November. We went from the initial idea for the PID Graph concept to the implementation of a production service for users, following the process summarized below.
User story collection and prioritization
In 2018 the FREYA project partners started collecting user stories that address important needs of their respective communities. In an August 2018 workshop we discussed these user stories, grouped them together, and prioritized them. The main categories were the aggregation of scholarly outputs, e.g. by research institution, funder, or researcher; the versioning and granularity of data and software, and the grouping of all research outputs and other resources (e.g. data, software, people, funding) for a given publication. All these user stories depend on a PID Graph, with typically two connections needed in the graph, e.g. “show me all citations for datasets funded by a particular grant”.
Technical architecture and API development
Based on these requirements we started to investigate if our existing technical architecture supported these user stories, or what changes would be needed. The initial exploration looked at incremental changes to our existing REST APIs, but it became clear that more fundamental changes would be needed, supporting queries of the PID Graph in a number of different ways. In the spring of 2019, we decided to use GraphQL as the underlying technology for the PID Graph, as it supports the kinds of queries common in the PID Graph, is widely adopted in terms of software libraries, documentation, and developer community, and can be easily integrated into existing backend systems such as relational databases and search indexes such as Solr or Elasticsearch. DataCite released a GraphQL API pre-release version in May 2019, and a production version in May 2020 (Fenner, 2020b).
Web frontend development
The GraphQL API we had developed allowed us to address the user stories we identified, and we started to write Jupyter notebooks as a platform that makes it easier to work with the GraphQL API, and in August 2020 the FREYA project released ten Jupyter notebooks addressing some of the user stories we had identified (Fenner & Petryszak, 2020). But APIs and Juypter notebooks are still a significant hurdle for many users, and in the spring of 2020 we started work on a web frontend for the GraphQL API and thus the PID Graph. In August 2020 we launched a pre-release version of this web frontend and called it DataCite Commons (Fenner, 2020a). Today we are officially releasing DataCite Commons (Fenner, Hallett, Garza, & Wimalaratne, 2020) as the web frontend for the FREYA PID Graph and as a key FREYA contribution to the European Open Science Cloud (EOSC).
Monitoring and feedback
The release of DataCite Commons is an important milestone, but more work is needed to make sure that DataCite Commons addresses the needs of its users, and that the service sees significant use by the community. The focus for the next few months will therefore be on monitoring for traffic, bugs and other issues, and feedback collection regarding additional features. Add your ideas to the DataCite Public Roadmap or send comments and questions to mailto:email@example.com.
Fenner, M. (2020a). DataCite commons - exploiting the power of pids and the pid graph. https://doi.org/10.5438/F4DF-4817
Fenner, M. (2020b). Powering the pid graph: Announcing the datacite graphql api. https://doi.org/10.5438/YFCK-MV39
Fenner, M., & Petryszak, R. (2020). FREYA webinar - the pid graph in practice - jupyter notebook demonstration. https://doi.org/10.5281/ZENODO.4004426
Fenner, M., Hallett, R., Garza, K., & Wimalaratne, S. (2020). Frontend for the datacite commons service. DataCite. https://doi.org/10.14454/QGK4-ZS88