DataCite Blog
  • Support
  • DataCite homepage

DataCite’s New Search

January 7, 2019 Martin Fenner
https://doi.org/10.5438/vyd9-ty64

Today we are announcing our first new functionality of 2019, a much improved search for DataCite DOIs and metadata. While the DataCite Search user interface has not changed, changes under the hood bring many important improvements and are our biggest changes to search since 2012.

Faster Indexing

Newly registered (and tagged findable) DOIs now appear in the DataCite Search index within a few minutes, compared with the previous up to 12 hour lag. The same is true for metadata updates or DOIs removed from the public search index (by changing the DOI state from findable to registered). Faster indexing is particularly important when related content is published at the same time, e.g. a dataset with a DataCite DOI associated with a journal article with a Crossref DOI.

Advanced DOI Search in DOI Fabrica

This faster indexing makes it possible for members and clients to use the Search index also in DOI Fabrica, enabling the same advanced search functionality available in DataCite Search, but also including DOIs in draft or registered state. Our Solr search index could not be used in DOI Fabrica, as users would not see newly created or updated DOIs because of the indexing delay. This makes it much easier to manage DOIs and associated metadata, e.g. by filtering for DOIs in draft state or finding DOIs using the retired metadata schemata 2.1 and 2.2. And it is the first time that we provide DOI registration and search in a single user interface; this kind of simplification is one of our themes for 2019 [@https://doi.org/10.5438/bckb-qy95].

Query for research data management
Query for research data management

Search for Everything

Our new search index covers all metadata and allows specific searches of every metadata field. For example geoLocationPlace:

Search for geoLocationPlace Cuba
Search for geoLocationPlace Cuba

The supported search syntax is very similar to what was available before, and uses the Elasticsearch Query String Syntax. You can for example specify field names, use wildcards, regular expressions, ranges, and boolean operators, e.g. use creators.affiliation:stanford +creators.affiliation:ucsf to find the 174 DOIs with collaborators from both of these two institutions.

It Solves the Deep Paging Problem

One important limitation of our previous search index, and a common issue with many search implementations, was the deep paging problem, making it hard if not impossible to fetch a very large number of results. Our new search index supports cursor-based pagination that overcomes this problem, allowing users to, for example, harvest all DOI metadata from a particular member. This is done in the REST API, specifying a larger number of records per page – e.g. https://api.datacite.org/providers/caltech/dois?page[size]=1000 – and the using the URL provided via links.next in the API response for the next query.

Implementation

The above changes were made possible by updating our search index service from an old version of Solr (4.0) to a recent version of Elasticsearch (6.3). We switched to Elasticsearch, as it works better with our new JSON-based workflow – see our December blog post about JSON [@https://doi.org/10.5438/1pca-1y05] – and we can use a hosted service tightly integrated with the rest of our infrastructure thereby reducing the support effort needed.

Next

Not all DataCite services have been switched to the new search index, the Stats Portal and OAI-PMH service will be migrated within the next three months and continue to use the old Solr search index for now.

In the coming weeks and months we will also provide better documentation, and improve performance and fix any bugs we encounter. We will also work with our members to better understand what kind of queries they are most interested in, and how we can better support these queries in the search interface.

References

Martin Fenner
Technical Director at DataCite | Blog posts
  • Martin Fenner
    #molongui-disabled-link
    Farewell to DataCite
  • Martin Fenner
    #molongui-disabled-link
    The DataCite Technology Stack
  • Martin Fenner
    #molongui-disabled-link
    We need your feedback: Aligning the CodeMeta vocabulary for scientific software with schema.org
  • Martin Fenner
    #molongui-disabled-link
    DataCite is hiring an application developer

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Uncategorized.

© 2019 Martin Fenner. Distributed under the terms of the Creative Commons Attribution license.


Post navigation

Link checker is here
10 years of DataCite – How it all began

Recent Posts

  • Connected in Gothenburg: DataCite’s first in-person Connect event
  • IGSN ID Catalogs – Now it is Even Easier to Register IGSN IDs!
  • DataCite Design System is ready to be worn.
  • DataCite launches Global Access Program with support from CZI
  • Launch of the PID-network Project – Understanding Metadata Workflows

Tags

Anniversary (3) API (3) Bibliometrics (2) Citation (8) Conference (2) Content negotiation (2) Crossref (10) CSV (4) Data-level metrics (9) Data citation (7) Discovery (2) Docker (3) DOI (18) Dublin core (2) Fabrica (4) FAIR (5) FORCE11 (2) FREYA (8) Github (2) Google (2) GraphQL (7) IGSN (5) Impactstory (2) Infrastructure (13) MDC (7) Members (13) Metadata (35) Open hours (2) ORCID (17) Organization identifiers (4) PIDapalooza (5) PID graph (9) Policy (2) RDA (8) Re3data (11) React (2) ROR (5) Schema.org (3) Search (3) Services (5) Software (2) Software citation (5) Staff (6) Strategy (2) THOR (13)

Archives

  • March 2023 (3)
  • February 2023 (2)
  • January 2023 (5)
  • December 2022 (4)
  • November 2022 (3)
  • October 2022 (5)
  • September 2022 (6)
  • August 2022 (3)
  • July 2022 (1)
  • June 2022 (3)
  • May 2022 (1)
  • April 2022 (1)
  • March 2022 (2)
  • February 2022 (3)
  • January 2022 (1)
  • December 2021 (2)
  • November 2021 (3)
  • October 2021 (5)
  • August 2021 (2)
  • July 2021 (2)
  • June 2021 (1)
  • May 2021 (2)
  • April 2021 (2)
  • March 2021 (2)
  • February 2021 (3)
  • January 2021 (3)
  • December 2020 (1)
  • November 2020 (2)
  • October 2020 (4)
  • September 2020 (4)
  • August 2020 (3)
  • July 2020 (3)
  • June 2020 (2)
  • May 2020 (3)
  • April 2020 (2)
  • March 2020 (2)
  • February 2020 (4)
  • January 2020 (4)
  • December 2019 (3)
  • November 2019 (3)
  • October 2019 (5)
  • September 2019 (3)
  • August 2019 (3)
  • July 2019 (3)
  • June 2019 (2)
  • May 2019 (5)
  • April 2019 (6)
  • March 2019 (2)
  • February 2019 (5)
  • January 2019 (1)
  • December 2018 (4)
  • November 2018 (3)
  • October 2018 (4)
  • September 2018 (4)
  • August 2018 (4)
  • June 2018 (4)
  • May 2018 (4)
  • April 2018 (1)
  • February 2018 (3)
  • January 2018 (1)
  • November 2017 (2)
  • October 2017 (2)
  • August 2017 (4)
  • July 2017 (1)
  • June 2017 (1)
  • May 2017 (2)
  • April 2017 (5)
  • March 2017 (2)
  • January 2017 (1)
  • December 2016 (4)
  • November 2016 (2)
  • October 2016 (5)
  • September 2016 (3)
  • August 2016 (1)
  • July 2016 (3)
  • June 2016 (1)
  • May 2016 (6)
  • April 2016 (5)
  • March 2016 (5)
  • February 2016 (2)
  • January 2016 (2)
  • December 2015 (3)
  • November 2015 (3)
  • October 2015 (8)
  • September 2015 (5)
  • August 2015 (6)

About

  • What we do
  • Governance
  • Members
  • Steering groups
  • Team
  • Job opportunities

Services

  • Create DOIs with Fabrica
  • Discover metadata with Commons
  • Integrate with APIs
  • Partner services

Resources

  • Metadata schema
  • Support
  • Fee model

Community

  • Members
  • Partners
  • Steering groups
  • Service providers
  • Roadmap
  • FAIR Workflows

Contact us

  • Imprint
  • Terms and conditions
  • Privacy policy
  • Mail
  • RSS Feed
  • Twitter
  • Mastodon
  • GitHub
  • YouTube
  • LinkedIn
We use cookies on our website. Some are technically necessary, others help us improve your user experience. You can decline non-essential cookies by selecting “Reject”. Please see our Privacy Policy for further information about our privacy practices and use of cookies.
RejectAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT