DataCite Blog
  • Support
  • DataCite homepage

Making the most out of available Metadata

July 9, 2020 Martin Fenner
https://doi.org/10.5438/1dgk-1m22

Metadata are essential for finding, accessing, and reusing scholarly content, i.e. to increase the FAIRness [@https://doi.org/10.1038/sdata.2016.18] of datasets and other scholarly resources. A rich and standardized metadata schema that is widely used is the first step, encouraging users to register these metadata (as many of these are optional and not required) is the second step, while infrastructure providers such as DataCite facilitating metadata registration and making the most of the available metadata is the third step. While we all have put considerable energy into the first two steps, I want to use this blog post to describe what DataCite is doing to improve metadata FAIRness via our services. I will focus on three important optional metadata properties and on two approaches: encouraging metadata registration in a standardized way using the new DOI form in our Fabrica service, and improving discovery via search filters in the next major release of DataCite Search, that we have started development work on.

Language

DataCite metadata can include the primary language of the resource, using either the BCP 47 (e.g. en-US) or ISO 639-1 (e.g. en) controlled vocabularies. In the context of discovery, the ISO 639-1 code is most helpful, as we want to find resources we can understand because we speak the language, and not necessarily care about the nuances of for example U.S. English vs. British English. In Fabrica, the DOI form uses the list of languages in ISO 639-1, and that is also what we will use for filters in search.

Going forward, it would make sense to consider allowing multiple languages per resource. This will not only allow using both BCP 47 and ISO 639-1, addressing different use cases, but will also allow the proper description of multilingual resources.

Rights

Rights information about a resource is essential, as it informs the user if and under what conditions the resource can be reused. In order to allow users to filter by a specific license, rights information needs to be normalized. In theory, a URL pointing to a specific license is all that is needed, but we also need a human-readable string, and ideally also an abbreviation for the license (e.g. Creative Commons Attribution 4.0 International and CC-BY-4.0 for https://creativecommons.org/licenses/by/4.0/legalcode), and, more importantly, many licenses have more than one URL, e.g.

  • http://creativecommons.org/licenses/by/4.0
  • http://creativecommons.org/licenses/by/4.0/
  • https://creativecommons.org/licenses/by/4.0/
  • https://creativecommons.org/licenses/by/4.0/legalcode

We can address this by normalizing the URLs for all licenses, and providing a standard name and abbreviation. Luckily, this work has already been done by the Software Package Data Exchange (SPDX) project of the Linux Foundation. While SPDX focusses on software licenses, it also includes all Creative Commons licenses, which are the most common licenses used in DataCite metadata for data and text. In metadata schema 4.2, released in March 2019, we added the properties rightsIdentifier, rightsIdentifierScheme and schemeURI, and this enables the use of SPDX. In the past few months we have added SPDX support to the DOI form, and we will have search facets based on SPDX.

Subject Area

DataCite metadata have a very flexible Subject property, with sub-properties SubjectScheme, SchemeURI, and ValueURI. Unfortunately there is no standard way to describe the subject area covered by the resource. This makes it difficult to find content described by DataCite metadata in for example Mathematics, or to understand to what extend the various disciplines use DataCite DOIs. There are many subject area classification schemes, but the most widely used generic classification scheme is the OECD Fields of Science classification with 6 top-level categories and 42 subcategories. We have implemented the OECD Fields of Science classification in the DOI form, and will do so in search facets.

While OECD Fields of Science is the most commonly used generic subject classification, the most widely used subject classification we currently find in DataCite metadata is the Australian and New Zealand Standard Research Classification (ANZSRC) Fields of Research. This classification is much more detailed, supporting different use case. Luckily there is an official ANZSRC mapping to the OECD Fields of Science. This allows us to automatically add the OECD Fields of Science category or subcategory if the ANZSRC Fields of Research is used in DataCite metadata.

Going Forward

We hope that the DOI form makes it easier to register more of the optional but important metadata in a standardized way, and that the new search filters we are launching in a few months will improve discoverability of the content. And that in turn this encourages DataCite members and their repositories to include this information in DOI metadata also when using one of the DataCite APIs for DOI registration. There are sometimes good reason to do things differently, and this also includes metadata for language, rights and subject. The DataCite metadata schema provides the flexibility needed, but we hope that in most cases the standard vocabularies ISO 639-1, SPDX and OECD Fields of Science will be used, improving the finding, accessing, and reusing of scholarly content with DataCite metadata for everyone.

Of course we shouldn’t forget the important work of the DataCite Metadata Working Group, which is busy working on the next DataCite schema version. That is a topic for another blog post, but I can already tell you that DataCite metadata will better support text documents.

References

Martin Fenner
Technical Director at DataCite | Blog posts
  • Martin Fenner
    #molongui-disabled-link
    Farewell to DataCite
  • Martin Fenner
    #molongui-disabled-link
    The DataCite Technology Stack
  • Martin Fenner
    #molongui-disabled-link
    We need your feedback: Aligning the CodeMeta vocabulary for scientific software with schema.org
  • Martin Fenner
    #molongui-disabled-link
    DataCite is hiring an application developer

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Uncategorized.

© 2020 Martin Fenner. Distributed under the terms of the Creative Commons Attribution license.


Post navigation

Introducing the PID Services Registry
Our community driven approach and recent team changes

Recent Posts

  • New Release of Fabrica: Improvements Inspired by User Feedback
  • Welcome our new DataCite Committee Members
  • Wellcome Trust and the Chan Zuckerberg Initiative Partner with DataCite to Build the Open Global Data Citation Corpus
  • Full API support for DataCite Metadata Schema 4.4
  • DataCite Celebrate and Reflect on a Year of Global Community Collaboration

Tags

Anniversary (3) API (3) Bibliometrics (2) Citation (8) Conference (2) Content negotiation (2) Crossref (10) CSV (4) Data-level metrics (9) Data citation (7) Discovery (2) Docker (3) DOI (18) Dublin core (2) Fabrica (4) FAIR (5) FORCE11 (2) FREYA (8) Github (2) Google (2) GraphQL (7) IGSN (5) Impactstory (2) Infrastructure (13) MDC (7) Members (11) Metadata (34) Open hours (2) ORCID (17) Organization identifiers (4) PIDapalooza (5) PID graph (8) Policy (2) RDA (8) Re3data (11) React (2) ROR (5) Schema.org (3) Search (3) Services (5) Software (2) Software citation (5) Staff (6) Strategy (2) THOR (13)

Archives

  • January 2023 (4)
  • December 2022 (4)
  • November 2022 (3)
  • October 2022 (5)
  • September 2022 (6)
  • August 2022 (3)
  • July 2022 (1)
  • June 2022 (3)
  • May 2022 (1)
  • April 2022 (1)
  • March 2022 (2)
  • February 2022 (3)
  • January 2022 (1)
  • December 2021 (2)
  • November 2021 (3)
  • October 2021 (5)
  • August 2021 (2)
  • July 2021 (2)
  • June 2021 (1)
  • May 2021 (2)
  • April 2021 (2)
  • March 2021 (2)
  • February 2021 (3)
  • January 2021 (3)
  • December 2020 (1)
  • November 2020 (2)
  • October 2020 (4)
  • September 2020 (4)
  • August 2020 (3)
  • July 2020 (3)
  • June 2020 (2)
  • May 2020 (3)
  • April 2020 (2)
  • March 2020 (2)
  • February 2020 (4)
  • January 2020 (4)
  • December 2019 (3)
  • November 2019 (3)
  • October 2019 (5)
  • September 2019 (3)
  • August 2019 (3)
  • July 2019 (3)
  • June 2019 (2)
  • May 2019 (5)
  • April 2019 (6)
  • March 2019 (2)
  • February 2019 (5)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (4)
  • September 2018 (4)
  • August 2018 (4)
  • June 2018 (4)
  • May 2018 (4)
  • April 2018 (1)
  • February 2018 (3)
  • January 2018 (1)
  • November 2017 (2)
  • October 2017 (2)
  • August 2017 (4)
  • July 2017 (1)
  • June 2017 (1)
  • May 2017 (2)
  • April 2017 (5)
  • March 2017 (2)
  • January 2017 (1)
  • December 2016 (4)
  • November 2016 (2)
  • October 2016 (5)
  • September 2016 (3)
  • August 2016 (1)
  • July 2016 (3)
  • June 2016 (1)
  • May 2016 (6)
  • April 2016 (5)
  • March 2016 (5)
  • February 2016 (2)
  • January 2016 (2)
  • December 2015 (3)
  • November 2015 (3)
  • October 2015 (8)
  • September 2015 (5)
  • August 2015 (6)

About

  • What we do
  • Governance
  • Members
  • Steering groups
  • Team
  • Job opportunities

Services

  • Create DOIs with Fabrica
  • Discover metadata with Commons
  • Integrate with APIs
  • Partner services

Resources

  • Metadata schema
  • Support
  • Fee model

Community

  • Members
  • Partners
  • Steering groups
  • Service providers
  • Roadmap
  • FAIR Workflows

Contact us

  • Imprint
  • Terms and conditions
  • Privacy policy
  • Mail
  • RSS Feed
  • Twitter
  • Mastodon
  • GitHub
  • YouTube
  • LinkedIn
We use cookies on our website. Some are technically necessary, others help us improve your user experience. You can decline non-essential cookies by selecting “Reject”. Please see our Privacy Policy for further information about our privacy practices and use of cookies.
RejectAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT