DataCite Blog
  • Support
  • DataCite homepage

Eating your own Dog Food

December 20, 2016 Martin Fenner
https://doi.org/10.5438/4k3m-nyvg

Eating your own dog food is a slang term to describe that an organization should itself use the products and services it provides. For DataCite this means that we should use DOIs with appropriate metadata and strategies for long-term preservation for the scholarly outputs we produce. For the most part this is not research data, but rather technical documents such as the DataCite Schema and its documentation [-@https://doi.org/10.5438/0012].

These outputs also include the posts on this blog, where we discuss topics relevant for the DataCite community, but also of broader interest to anyone who cares about research data, persistent identifiers, and scholarly infrastructure. And starting today all blog posts on this blog will have a DOI, metadata and use a persistent storage mechanism.

Photo by Bill Emrich. CC Zero.
Photo by Bill Emrich. CC Zero.

Technical Implementation

This blog is powered by the static site generator Middleman, with blog posts written in Markdown and converted to HTML using Pandoc and the Travis CI continuous integration service. Static site generator means that there is no database or application server powering the site, making website adminstration simpler, cheaper and safer. In addition to the blog, the DataCite homepage and Metadata Schema subsite are also generated using Middleman.

The simplicity is particularly important here, as registering the DOIs and metadata can be accomplished using a command line utility written by DataCite staff that doesn’t need to know much about the internals of Middleman, and thus can be easily adapted to other static site generators such as Jekyll, Hugo or Hexo. The command line utility is Cirneco, generating the metadata XML according to the DataCite Metadata Schema, and registering DOI and metadata with the DataCite MDS. Like all tools mentioned in this post Cirneco is open source software, please reach out to us if you are interested in implementing similar functionality for your blog.

Generating DOIs

The DOIs for this blog are generated automatically, using a modified base32 encoding algorithm that is provided by Cirneco, as discussed last week [@https://doi.org/10.5438/55E5-T5C0]. The DOI is generated and minted when a new post is pushed to https://blog.datacite.org. This avoids two problems: a) DOI-like strings in the wild before publication and b) the randomly generated DOI exists already (we can simply generate a new one). All DOIs are short, without semantic infomation that might change over time, and with a checksum to minimize transcription errors, for example https://doi.org/10.5438/XCBJ-G7ZY. Going forward we encourage users to link to the DataCite Blog using the DOI, as these links will continue to work even if we ever move the blog to a different location.

Generating Metadata

For the generation of metadata, we need to strike a balance between simple author provided metadata, but rich enough to aid discovery. We are doing this via three mechanisms:

  • metadata provided by the author
  • default metadata for the blog
  • metadata automatically extracted from content

The metadata provided by the author are the typical metadata for blog posts, provided via YAML front matter at the beginning of each post:

---
layout: post
title: Eating your own Dog Food
author: mfenner
date: 2016-12-19
tags:
- datacite
- doi
- metadata
---

We can reuse all these metadata when generating DataCite metadata, using the tags as subjects.

The default metadata are metadata that always stay the same for the blog, such as publisher, HostingInstitution and rights. We can store them in a site-wide configuration file. We can also assume reasonable defaults that can be overridden in the YAML front matter, e.g. resourceType (we use BlogPosting with resourceTypeGeneral Text) and version. We store more information about authors outside the blog post, including givenName, familyName and nameIdentifier (we now show the ORCID ID of every blog author at the bottom of the post).

Finally, there are metadata that we can automatically extract from the blog post, and we are currently doing this for the description and relatedIdentifier. This blog uses Pandoc and BibTex to generate the references section at the end, and we can fetch this information and convert it into the format needed for relatedIdentifier.

Taken together we can provide all metadata that are required or recommended in the Metadata Schema documentation [-@https://doi.org/10.5438/0012], and we can do this without any extra effort for the author. The full XML is avalailable here.

Not all blog posts need to be cited formally with metadata in a references list formatted according to a specific citation style. But these metadata greatly help with discovery, a search in DataCite Search for eating dog food will for example bring up this blog post as the first hit.

Persistent storage

Using DOIs means that readers not only expect rich metadata that help with citation and discovery, but also that DataCite takes extra care to preserve the blog posts, thinking beyond the particular technical implementation or even the contiuing existence of this blog. This is an area where we do need to do more work, starting with a decision about the best archival format for a blog post (HTML, PDF, JATS?). For now blog posts are hosted in multiple Git repositories (one of them on Github), and in two independent Amazon S3 buckets that each use versioning. Multiple locations with versioning are a good start, but more work is clearly needed.

References

Martin Fenner
Technical Director at DataCite | Blog posts
  • Martin Fenner
    #molongui-disabled-link
    Farewell to DataCite
  • Martin Fenner
    #molongui-disabled-link
    The DataCite Technology Stack
  • Martin Fenner
    #molongui-disabled-link
    We need your feedback: Aligning the CodeMeta vocabulary for scientific software with schema.org
  • Martin Fenner
    #molongui-disabled-link
    DataCite is hiring an application developer

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Uncategorized.

© 2016 Martin Fenner. Distributed under the terms of the Creative Commons Attribution license.


Post navigation

2016 in review
Mysteries in Reference Lists

Recent Posts

  • DataCite Design System is ready to be worn.
  • DataCite launches Global Access Program with support from CZI
  • Launch of the PID-network Project – Understanding Metadata Workflows
  • DataCite Member Survey 2022
  • New Release of Fabrica: Improvements Inspired by User Feedback

Tags

Anniversary (3) API (3) Bibliometrics (2) Citation (8) Conference (2) Content negotiation (2) Crossref (10) CSV (4) Data-level metrics (9) Data citation (7) Discovery (2) Docker (3) DOI (18) Dublin core (2) Fabrica (4) FAIR (5) FORCE11 (2) FREYA (8) Github (2) Google (2) GraphQL (7) IGSN (5) Impactstory (2) Infrastructure (13) MDC (7) Members (13) Metadata (35) Open hours (2) ORCID (17) Organization identifiers (4) PIDapalooza (5) PID graph (9) Policy (2) RDA (8) Re3data (11) React (2) ROR (5) Schema.org (3) Search (3) Services (5) Software (2) Software citation (5) Staff (6) Strategy (2) THOR (13)

Archives

  • March 2023 (1)
  • February 2023 (2)
  • January 2023 (5)
  • December 2022 (4)
  • November 2022 (3)
  • October 2022 (5)
  • September 2022 (6)
  • August 2022 (3)
  • July 2022 (1)
  • June 2022 (3)
  • May 2022 (1)
  • April 2022 (1)
  • March 2022 (2)
  • February 2022 (3)
  • January 2022 (1)
  • December 2021 (2)
  • November 2021 (3)
  • October 2021 (5)
  • August 2021 (2)
  • July 2021 (2)
  • June 2021 (1)
  • May 2021 (2)
  • April 2021 (2)
  • March 2021 (2)
  • February 2021 (3)
  • January 2021 (3)
  • December 2020 (1)
  • November 2020 (2)
  • October 2020 (4)
  • September 2020 (4)
  • August 2020 (3)
  • July 2020 (3)
  • June 2020 (2)
  • May 2020 (3)
  • April 2020 (2)
  • March 2020 (2)
  • February 2020 (4)
  • January 2020 (4)
  • December 2019 (3)
  • November 2019 (3)
  • October 2019 (5)
  • September 2019 (3)
  • August 2019 (3)
  • July 2019 (3)
  • June 2019 (2)
  • May 2019 (5)
  • April 2019 (6)
  • March 2019 (2)
  • February 2019 (5)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (4)
  • September 2018 (4)
  • August 2018 (4)
  • June 2018 (4)
  • May 2018 (4)
  • April 2018 (1)
  • February 2018 (3)
  • January 2018 (1)
  • November 2017 (2)
  • October 2017 (2)
  • August 2017 (4)
  • July 2017 (1)
  • June 2017 (1)
  • May 2017 (2)
  • April 2017 (5)
  • March 2017 (2)
  • January 2017 (1)
  • December 2016 (4)
  • November 2016 (2)
  • October 2016 (5)
  • September 2016 (3)
  • August 2016 (1)
  • July 2016 (3)
  • June 2016 (1)
  • May 2016 (6)
  • April 2016 (5)
  • March 2016 (5)
  • February 2016 (2)
  • January 2016 (2)
  • December 2015 (3)
  • November 2015 (3)
  • October 2015 (8)
  • September 2015 (5)
  • August 2015 (6)

About

  • What we do
  • Governance
  • Members
  • Steering groups
  • Team
  • Job opportunities

Services

  • Create DOIs with Fabrica
  • Discover metadata with Commons
  • Integrate with APIs
  • Partner services

Resources

  • Metadata schema
  • Support
  • Fee model

Community

  • Members
  • Partners
  • Steering groups
  • Service providers
  • Roadmap
  • FAIR Workflows

Contact us

  • Imprint
  • Terms and conditions
  • Privacy policy
  • Mail
  • RSS Feed
  • Twitter
  • Mastodon
  • GitHub
  • YouTube
  • LinkedIn
We use cookies on our website. Some are technically necessary, others help us improve your user experience. You can decline non-essential cookies by selecting “Reject”. Please see our Privacy Policy for further information about our privacy practices and use of cookies.
RejectAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT