DataCite Blog
  • Support
  • DataCite homepage

Data catalog cards: simplifying article/data linking

May 13, 2016 Martin Fenner
https://doi.org/10.5438/cab5-teg0

Data citation is core to DataCite’s mission and DataCite is involved in several projects that try to facilitate data citation, including THOR, Data Citation Implementation Pilot (DCIP), Research Data Alliance (RDA), and COPDESS. The biggest roadblock for wider data citation adoption might be insufficient incentives for individual researchers, but another major challenge is that implementing data citation is still too complicated.

Citation needed. By User:Tfinc (Own work) CC BY-SA 3.0, via Wikimedia Commons
Citation needed. By User:Tfinc (Own work) CC BY-SA 3.0, via Wikimedia Commons

When we talk about data citation, we typically mean two related, but different scenarios:

  1. an article or other scholarly work cites an already published dataset.
  2. all data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository (PLOS data availability statement)

The first scenario is not conceptually different from an article citing another article, where the common practice is to put everything that is cited into the reference list.

The second scenario is probably not only more common, but also requires more complex workflows, e.g. coordination of issuing persistent identifiers for article and data and linking them together via metadata. And we as a community are still working on common practices for doing this. Assuming again that incentives are the biggest driver of change, I would argue that researchers, publishers, and funders are all interested in making this work, but that data repositories have the strongest motivation to improve the current situation. If this is true then we should give data repositories a bigger role in the publication of data associated with an article.

While many publishers host supplementary information for articles, they leave the hosting of more complex research data to external data repositories specialized in this task. Properly referencing all associated data in the article is currently the job of the publisher, and I propose that we give more of this responsibility to the data repository. The data repository can create a data catalog card (with associated persistent identifier and metadata) that describes all data associated with an article. The data catalog card is a collection of metadata, and different from a data paper. The data described in the catalog card can be hosted in that repository or elsewhere.

The medium is the message. By suzanne chapman CC BY-NC-SA 2.0, via Flickr
The medium is the message. By suzanne chapman CC BY-NC-SA 2.0, via Flickr

The publisher then links to this data catalog card via the article metadata and can display the catalog card formatted as a data availability statement. The publisher could (and should) still link to individual data where appropriate, but the proposed solution helps solve several important issues:

  • the data catalog card simplifies manuscript submission for publishers
  • the data record provides a machine-readable representation of the data availability statement that publishers are increasingly requiring
  • the publisher doesn’t need to provide machine-readable metadata for all data used in an article, but can reference the data catalog card. Accession numbers that are not globally unique can be used in the article if they are properly referenced in the data catalog card. This facilitates the transition from current practices
  • some articles refer to thousands of datasets (e.g. genomics papers), and this number of links is difficult to describe in the traditional article format (e.g. JATS)

Several general purpose data repositories already provide most or all of this functionality, I am most familiar with Dryad, BioStudies [@https://doi.org/10.15252/msb.20156658] and Figshare [@https://figshare.com/blog/Unveiling_figshare_Collections_a_new_way_to_group_content/202]. Data catalog cards probably work best for repositories that a flexible in the kinds of data they take, and repositories that already have integrations with publishers. Not every data repository needs to support this functionality. Data catalog cards are also an opportunity for differentiation, e.g. by providing data curation, help with data review, etc.

My thinking about this topic was triggered by a conversation with Tim Clark in the context of the DCIP project. The guest post by Dan S. Katz [@https://blog.datacite.org/to-better-understand-research-communication-we-need-a-groid-group-object-identifier] and the discussion around it was another important motivation, and a DataCite blog post from last August [@https://blog.datacite.org/reference-lists-and-tables-of-content] contains some of the ideas expressed here. Obviously this topic is of great interest to DataCite, as we hope that data catalog cards use DataCite DOIs, and that we can help both with making article/data publishing workflows easier, and with discovering data associated with an article.

References

Martin Fenner
Technical Director at DataCite | Blog posts
  • Martin Fenner
    #molongui-disabled-link
    Farewell to DataCite
  • Martin Fenner
    #molongui-disabled-link
    The DataCite Technology Stack
  • Martin Fenner
    #molongui-disabled-link
    We need your feedback: Aligning the CodeMeta vocabulary for scientific software with schema.org
  • Martin Fenner
    #molongui-disabled-link
    DataCite is hiring an application developer

Share this:

  • Click to share on Mastodon (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Uncategorized.

© 2016 Martin Fenner. Distributed under the terms of the Creative Commons Attribution license.


Post navigation

Thinking about CSV
Zotero for Data Repositories Webinar

Recent Posts

  • Iratxe Puebla joining DataCite as Make Data Count Director
  • PIDs y Ciencia Abierta: Construyendo comunidad en América Latina
  • PIDs and Open Science: Building Community in Latin America 
  • Breaking a Metadata Barrier: Improving discoverability with automatic subject classification
  • DataCite in Buenos Aires, a recap of the csv,conf,v7

Tags

Anniversary (3) API (3) Bibliometrics (2) Citation (8) Conference (2) Content negotiation (2) Crossref (10) CSV (4) Data-level metrics (9) Data citation (7) Discovery (2) Docker (3) DOI (18) Dublin core (2) Fabrica (4) FAIR (5) FORCE11 (2) FREYA (8) Github (2) Google (2) GraphQL (7) IGSN (5) Impactstory (2) Infrastructure (13) MDC (7) Members (13) Metadata (35) Open hours (2) ORCID (17) Organization identifiers (4) PIDapalooza (5) PID graph (9) Policy (2) RDA (8) Re3data (11) React (2) ROR (5) Schema.org (3) Search (3) Services (5) Software (2) Software citation (5) Staff (6) Strategy (2) THOR (13)

Archives

  • May 2023 (5)
  • April 2023 (4)
  • March 2023 (3)
  • February 2023 (2)
  • January 2023 (5)
  • December 2022 (4)
  • November 2022 (3)
  • October 2022 (5)
  • September 2022 (6)
  • August 2022 (3)
  • July 2022 (1)
  • June 2022 (3)
  • May 2022 (1)
  • April 2022 (1)
  • March 2022 (2)
  • February 2022 (3)
  • January 2022 (1)
  • December 2021 (2)
  • November 2021 (3)
  • October 2021 (5)
  • August 2021 (2)
  • July 2021 (2)
  • June 2021 (1)
  • May 2021 (2)
  • April 2021 (2)
  • March 2021 (2)
  • February 2021 (3)
  • January 2021 (3)
  • December 2020 (1)
  • November 2020 (2)
  • October 2020 (4)
  • September 2020 (4)
  • August 2020 (3)
  • July 2020 (3)
  • June 2020 (2)
  • May 2020 (3)
  • April 2020 (2)
  • March 2020 (2)
  • February 2020 (4)
  • January 2020 (4)
  • December 2019 (3)
  • November 2019 (3)
  • October 2019 (5)
  • September 2019 (3)
  • August 2019 (3)
  • July 2019 (3)
  • June 2019 (2)
  • May 2019 (5)
  • April 2019 (6)
  • March 2019 (2)
  • February 2019 (5)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (4)
  • September 2018 (4)
  • August 2018 (4)
  • June 2018 (4)
  • May 2018 (4)
  • April 2018 (1)
  • February 2018 (3)
  • January 2018 (1)
  • November 2017 (2)
  • October 2017 (2)
  • August 2017 (4)
  • July 2017 (1)
  • June 2017 (1)
  • May 2017 (2)
  • April 2017 (5)
  • March 2017 (2)
  • January 2017 (1)
  • December 2016 (4)
  • November 2016 (2)
  • October 2016 (5)
  • September 2016 (3)
  • August 2016 (1)
  • July 2016 (3)
  • June 2016 (1)
  • May 2016 (6)
  • April 2016 (5)
  • March 2016 (5)
  • February 2016 (2)
  • January 2016 (2)
  • December 2015 (3)
  • November 2015 (3)
  • October 2015 (8)
  • September 2015 (5)
  • August 2015 (6)

About

  • What we do
  • Governance
  • Members
  • Steering groups
  • Team
  • Job opportunities

Services

  • Create DOIs with Fabrica
  • Discover metadata with Commons
  • Integrate with APIs
  • Partner services

Resources

  • Metadata schema
  • Support
  • Fee model

Community

  • Members
  • Partners
  • Steering groups
  • Service providers
  • Roadmap
  • FAIR Workflows

Contact us

  • Imprint
  • Terms and conditions
  • Privacy policy
  • Mail
  • RSS Feed
  • Twitter
  • Mastodon
  • GitHub
  • YouTube
  • LinkedIn
We use cookies on our website. Some are technically necessary, others help us improve your user experience. You can decline non-essential cookies by selecting “Reject”. Please see our Privacy Policy for further information about our privacy practices and use of cookies.
RejectAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT