DataCite Blog
  • Support
  • DataCite homepage

Reference Lists and Tables of Content

August 15, 2015 Martin Fenner
https://doi.org/10.5438/5aeg-weev

Geoff Bilder from CrossRef likes to show the following slide at scholarly conferences, and then asks the audience what they see: Paper 1

Most of us probably immediately recognize this document as a scholarly article. This immediate recognition includes essential parts of an article such as the title – or the reference list:

Paper 2
Paper 2

This immediate recognition is a powerful concept, it makes it easy for the reader to navigate a scholarly document, e.g. to quickly jump to the abstract or references.

We don’t have the same immediate recognition for datasets. Given that a large number of datasets in DataCite are in CSV (comma separated values) format, the closest we come to a immediately recognized document is probably the spreadsheet:

Container
Container

From: Wikimedia Commons, licensed under CC BY-SA 3.0.

A canonical format for datasets goes beyond immediate recognition of the essential parts by the user, it would also greatly facilitate reuse of data. As Nick Stenning from the Open Knowledge Foundation (OKFN) pointed out at CSV.conf last year, the cost of shipping of goods is in large part determined by the cost of loading and unloading, and the container has dramatically changed that equation. He argued that common formats such as the OKFN data package could do the same for data reuse.

Bulk parcels
Bulk parcels

From: Wikimedia Commons, licensed under CC BY-SA 3.0.

Unfortunately there are at least three problems with using spreadsheets as canonical format for datasets:

  • not every dataset can be represented as a CSV file, there are many specialized formats (including of course Excel .xlsx)
  • we can’t include descriptive metadata (not even authors or document title) in a CSV file
  • many datasets actually include a collecting of files: not only in CSV format, but also other data formats and support files such as a README.

The approach taken by the OKFN data package format – and related formats such as the Research Object Bundle – is to put all data files (in CSV or other formats) into a folder, together with a standardized machine-readable file that includes the metadata (e.g. title, authors, publication date and license). This folder can then compressed with zip, again yielding a single file (a very common approach used for example for epub and docx).

The concept described here (a collection of documents in a larger container, and a listing of all included documents) is of course at least as old as the scholarly article: the book as a canonical format for collections (of texts), and the table of contents to describe what is in the book.

Table of contents
Table of contents

From: Wikimedia Commons, licensed under CC BY-SA 3.0.

The approach described here would not only help package datasets into a more reusable standard format, but the scholarly article would also greatly benefit from migrating to a container format. We all know that the concept of the scholarly article described at the beginning of this posts is falling apart – an article is simply no longer a single text document. We have not only associated figures and tables, but also associated files that can’t be easily included into the article PDF, in particular files that contain the data underlying the findings of the article, but also other supplementary information.

There are currently three common approaches referencing the underlying data in a scholarly article:

  • inclusion in supporting information files without any specific linking
  • informal citation in the article text, most commonly in the materials and methods section
  • formal citation with inclusion in the reference list

Until not too long ago I was a big proponent of including all data associated with an article in the reference list, mainly to make it easier to find the data. But the reference list isn’t the appropriate place for something that is really part of the article – or as colleague Todd Vision puts it: the data generated for an article are another output rather than an input. Reference lists summarize all the inputs to an article, whereas outputs belong into a table of contents. A table of contents isn’t a standard feature of scholarly articles yet, but to me is a logical next step for the journal article format, together with using the underlying concept of a container format described earlier in this post. Extracting references to datasets from a table of contents should be as easy as extracting them from a reference list, in particular if we make sure that this table of contents is openly available.

Journal Article Tag Suite (JATS) is the standard machine-readable format for journal articles in the life sciences (and increasingly other sciences). At JATS-CON in April this year I proposed (starting at minute 210) to extend JATS by providing it also as a container format:

Martin Fenner
Technical Director at DataCite | Blog posts
  • Martin Fenner
    #molongui-disabled-link
    Farewell to DataCite
  • Martin Fenner
    #molongui-disabled-link
    The DataCite Technology Stack
  • Martin Fenner
    #molongui-disabled-link
    We need your feedback: Aligning the CodeMeta vocabulary for scientific software with schema.org
  • Martin Fenner
    #molongui-disabled-link
    DataCite is hiring an application developer

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Uncategorized.

© 2015 Martin Fenner. Distributed under the terms of the Creative Commons Attribution license.


Post navigation

Overcoming Development Pain
From Pilot to Service

Recent Posts

  • New Release of Fabrica: Improvements Inspired by User Feedback
  • Welcome our new DataCite Committee Members
  • Wellcome Trust and the Chan Zuckerberg Initiative Partner with DataCite to Build the Open Global Data Citation Corpus
  • Full API support for DataCite Metadata Schema 4.4
  • DataCite Celebrate and Reflect on a Year of Global Community Collaboration

Tags

Anniversary (3) API (3) Bibliometrics (2) Citation (8) Conference (2) Content negotiation (2) Crossref (10) CSV (4) Data-level metrics (9) Data citation (7) Discovery (2) Docker (3) DOI (18) Dublin core (2) Fabrica (4) FAIR (5) FORCE11 (2) FREYA (8) Github (2) Google (2) GraphQL (7) IGSN (5) Impactstory (2) Infrastructure (13) MDC (7) Members (11) Metadata (34) Open hours (2) ORCID (17) Organization identifiers (4) PIDapalooza (5) PID graph (8) Policy (2) RDA (8) Re3data (11) React (2) ROR (5) Schema.org (3) Search (3) Services (5) Software (2) Software citation (5) Staff (6) Strategy (2) THOR (13)

Archives

  • January 2023 (4)
  • December 2022 (4)
  • November 2022 (3)
  • October 2022 (5)
  • September 2022 (6)
  • August 2022 (3)
  • July 2022 (1)
  • June 2022 (3)
  • May 2022 (1)
  • April 2022 (1)
  • March 2022 (2)
  • February 2022 (3)
  • January 2022 (1)
  • December 2021 (2)
  • November 2021 (3)
  • October 2021 (5)
  • August 2021 (2)
  • July 2021 (2)
  • June 2021 (1)
  • May 2021 (2)
  • April 2021 (2)
  • March 2021 (2)
  • February 2021 (3)
  • January 2021 (3)
  • December 2020 (1)
  • November 2020 (2)
  • October 2020 (4)
  • September 2020 (4)
  • August 2020 (3)
  • July 2020 (3)
  • June 2020 (2)
  • May 2020 (3)
  • April 2020 (2)
  • March 2020 (2)
  • February 2020 (4)
  • January 2020 (4)
  • December 2019 (3)
  • November 2019 (3)
  • October 2019 (5)
  • September 2019 (3)
  • August 2019 (3)
  • July 2019 (3)
  • June 2019 (2)
  • May 2019 (5)
  • April 2019 (6)
  • March 2019 (2)
  • February 2019 (5)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (4)
  • September 2018 (4)
  • August 2018 (4)
  • June 2018 (4)
  • May 2018 (4)
  • April 2018 (1)
  • February 2018 (3)
  • January 2018 (1)
  • November 2017 (2)
  • October 2017 (2)
  • August 2017 (4)
  • July 2017 (1)
  • June 2017 (1)
  • May 2017 (2)
  • April 2017 (5)
  • March 2017 (2)
  • January 2017 (1)
  • December 2016 (4)
  • November 2016 (2)
  • October 2016 (5)
  • September 2016 (3)
  • August 2016 (1)
  • July 2016 (3)
  • June 2016 (1)
  • May 2016 (6)
  • April 2016 (5)
  • March 2016 (5)
  • February 2016 (2)
  • January 2016 (2)
  • December 2015 (3)
  • November 2015 (3)
  • October 2015 (8)
  • September 2015 (5)
  • August 2015 (6)

About

  • What we do
  • Governance
  • Members
  • Steering groups
  • Team
  • Job opportunities

Services

  • Create DOIs with Fabrica
  • Discover metadata with Commons
  • Integrate with APIs
  • Partner services

Resources

  • Metadata schema
  • Support
  • Fee model

Community

  • Members
  • Partners
  • Steering groups
  • Service providers
  • Roadmap
  • FAIR Workflows

Contact us

  • Imprint
  • Terms and conditions
  • Privacy policy
  • Mail
  • RSS Feed
  • Twitter
  • Mastodon
  • GitHub
  • YouTube
  • LinkedIn
We use cookies on our website. Some are technically necessary, others help us improve your user experience. You can decline non-essential cookies by selecting “Reject”. Please see our Privacy Policy for further information about our privacy practices and use of cookies.
RejectAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT