Discoverability of research data is a core component of the research data ecosystem. Making data findable has always been one of DataCite’s main goals, with DataCite Search providing a tool to search for all datasets with DataCite DOIs.

Today the discoverability of datasets was taken to the next level with the launch of Google Dataset Search. Google recognized the increasing importance of data and wants to make it easier for everyone, not just researchers, to find datasets. It is similar to Google Scholar in that datasets are included wherever the data lives. This large-scale effort means increased visibility for all data repositories and provides an extra push to make all research data openly available.

To be able to include datasets from diverse sources, Google developed guidelines for dataset providers to describe their data in a way that allows Google to index the metadata. These guidelines include standard metadata such as title, dataset creator, publication data, publisher, license, and description, and use, a widely used metadata standard co-developed by Google.

The metadata need to be embedded into the dataset landing page so that the Google indexer can find them, and the data repository needs to provide a sitemaps file with the URLs of all dataset landing pages. By doing this the data repository not only can be indexed in Google Dataset Search, but also follows our community recommendations for embedding machine-readable metadata in landing pages. (Fenner et al., 2016).

The number of data repositories embedding metadata in dataset landing pages is still low, see our latest survey from May (Fenner et al. (2018)). If a data repository doesn't provide metadata via the dataset landing page, the next best option is indexers that store metadata about the dataset. DataCite Search is such a place, and in early 2017 we started to embed metadata in DataCite Search pages for individual DOIs, and we generated a sitemaps file (or rather files) for the over 10 million DOIs we have.

This allowed Google to collect metadata about these DOIs, even if the repository did not provide the metadata on its landing pages. As part of this work we had many conversations with Natasha Noy and Dan Brickley from Google, many data repositories implementing, and the broader community. DataCite was thrilled to be involved in discussions around the development of Google Dataset Search and to test previous versions.

Google Dataset Search will greatly enhance discoverability of research datasets and we will continue to ensure that all relevant datasets with DataCite DOIs are included. Right now the focus is mainly on Social Sciences and Environmental Sciences, but the use of should make expansion to other disciplines straightforward.

As Google also notes, a search engine is only as good as the metadata that go into it, so we want to encourage all DataCite members to continue to deposit complete metadata - that we then convert into format - to ensure your datasets can be found. Even better, embed metadata in your dataset landing pages. Should you have further questions, e.g. how to embed in your repository pages, do not hesitate to reach out to


Fenner, M., Crosas, M., Durand, G., Wimalaratne, S., Gräf, F., Hallett, R., … Clark, T. (2018, March). Listing of data repositories that embed metadata in dataset landing pages.

Fenner, M., Crosas, M., Grethe, J., Kennedy, D., Hermjakob, H., Rocca-Serra, P., … Clark, T. (2016). A Data Citation Roadmap for Scholarly Data Repositories. bioRxiv, 097196.

Blog Logo

Helena Cousijn

DataCite Director of Community Engagement and Communications

Blog Logo

Martin Fenner

DataCite Technical Director

Taking discoverability to the next level: datasets with DataCite DOIs can now be found through Google Dataset Search


© 2018 Helena Cousijn, , and Martin Fenner. Distributed under the terms of the Creative Commons Attribution license.

google, search,, featured