DOI or LOD or DOI and LOD: Difference between revisions

From CETAF ISTC Wiki
bwf>Soraya Sierra
No edit summary
(header about copy and link to internet archive)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The pro-iBiosphere project has investigated the status of the use of stable identifier methods in the Biodiversity community. In the course of several workshops the participants from Europe, the US, and Australia almost unanimously agreed that Life Science Identifiers (LSIDs), being a technology driven only by the biodiversity community for the past 8 years, should be abandoned. It was also widely agreed, that the preferred form of any identifier, be it Linked Open Data URIs (LOD-URIs), DOIs, or ARKs, should be the Semantic Web compatible http-form (including the DOI or ARK resolver).
'''<span style="color:red">This page is a copy from https://wiki.pro-ibiosphere.eu/wiki/DOI_or_LOD_or_DOI_and_LOD , which is not available anymore. An archived version can be found at the [https://web.archive.org/web/20170603134252/https://wiki.pro-ibiosphere.eu/wiki/DOI_or_LOD_or_DOI_and_LOD Internet Archive].</span>'''


However, the question whether the management of stable URIs should occur decentralized (at multiple institutions, each using the standard URI-stability technology provided by most web servers), or whether a special, centralized technology such as DOI should be mandatory continues to be discussed. The following images show a comparison of the DOI and Semantic Web / Linked Open Data technologies:
<div style="text-align:right">by G. Hagedorn, 2013</div>
The pro-iBiosphere project has investigated the status of the use of stable identifier methods within the Biodiversity community. In the course of several workshops organised by the project (involving experts from Europe, the US, and Australia) participants almost unanimously agreed that:
* Life Science Identifiers (LSIDs), being a technology driven solely by the biodiversity community for the past 8 years, should be abandoned.
* It was also widely agreed, that the preferred form of any identifier, be it Linked Open Data URIs (LOD-URIs), DOIs, or ARKs, should be the Semantic Web compatible http-form (i.e. in the case of DOI or ARK including an http-based resolver).
 
However, the question whether the management of stable URIs should occur decentralized (at multiple institutions, each using the standard URI-stability technology provided by most web servers), or whether a special, centralized technology such as DOI should be mandatory, continues to be discussed.  
 
==Comparison of DOI and Semantic Web / Linked Open Data technologies==


[[File:DOI vs LOD (pro-iBiosphere discussion 2013, G. Hagedorn).PNG|600px|DOI vs LOD]]
[[File:DOI vs LOD (pro-iBiosphere discussion 2013, G. Hagedorn).PNG|600px|DOI vs LOD]]


The top scenario shows the functioning of a DOI resolution service. A separate server, the DOI resolution provider accepts the request, consults its internal Stability Mapping Definitions (where the DOI is mapped to the final URI), differentiates between RDF and html requests, and forwards the request to the ultimate destination. The client (machine or human) no longer sees the stable DOI, but the redirected, potentially unstable URI.
The top scenario shows DOI resolution. A separate server, the DOI resolution provider, accepts the request, consults its internal Stability Mapping Definitions (where the DOI is mapped to the final URI), differentiates between RDF and html requests, and forwards the request to the ultimate destination. The client (machine or human) no longer sees the stable DOI, but instead the redirected (and potentially unstable) URI.
 
The bottom scenario shows the same situation for a linked data setup. A webserver that is located within the data providing institution differentiates between RDF requests from machines (red dot on the left side) and HTML requests from humans using a web browsers. Using content negotiation, both requests to the same URI are directed to RDF data and HTML web pages respectively. The webserver also consults its internal mapping definitions to maintain the URIs stable. One advantage of this situation is that the client (machine or human) continues to see a stable URI.


The bottom scenario shows the same situation in a linked data webserver setup within one institution. The webserver itself differentiates between RDF requests from machines (red dot on the left side) and HTML requests from humans using a web browsers. Using content negotiation, both requests to the same URI are directed to RDF data and HTML web pages respectively. The webserver also consults its internal mapping definitions to maintain the URIs stable. One advantage of this situation is that the client (machine or human) continues to see a stable URI.
Technically both scenarios work very similarly. The DOI example has minor advantages with respect to stability (largely limited to scenarios where the domain is lost by accident or because, perhaps after a merger, the domain transfer is neglected). By introducing the additional redirection layer only a single domain name is needed. This is a single point of failure, but the central domain can be reasonably expected to be managed to the highest standards. The DOI has the disadvantage that the URI as seen from the client side changes, because the redirect goes to a different server and is not handled opaquely within a provider.


Technically both scenarios work very similarly. The DOI example has minor advantages with respect to stability (which happen almost exclusively should the domain be lost by accident or because after a merger the domain transfer is neglected). By introducing the additional redirection layer only a single domain name is needed (which is a single point of failure, but which also can be reasonably expected to be managed to the highest standards). The DOI has the disadvantage that the URI as seen from the client side changes, because the redirect goes to a different server and is not handled within a system.
The main distinction between the two scenarios is therefore between centralized and decentralized stability management.  


The main distinction between the two scenarios is therefore between centralized and decentralized stability management.
==Advantages and disadvantages of centralization==
   
   
[[File:Biodiversity community DOI system (pro-iBiosphere discussion 2013, G. Hagedorn).PNG|600px|Biodiversity community DOI system]]
[[File:Biodiversity community DOI system (pro-iBiosphere discussion 2013, G. Hagedorn).PNG|600px|Biodiversity community DOI system]]


The slide shows the scenario where millions of requests from millions of clients have to be forwarded by a central resolver infrastructure to a large number of data providers. The service requirements of a biodiversity DOI service, providing the canonical identifiers to all living things in the semantic web (including humans, their parasites, crops, pets, etc.) may in fact be several order of magnitude higher than a CrossRef or DataCite DOI redirection. For data relations involving organism, including those from medicine, agriculture, etc., these DOIs would have to be resolved with every query or reasoning.
The graphic shows the scenario where millions of requests from millions of clients have to be forwarded by a central resolver infrastructure to a large number of data providers. The service requirements of a biodiversity DOI service, providing the canonical identifiers to all living things in the semantic web (including humans, their parasites, crops, pets, etc.) may in fact be several order of magnitude higher than a CrossRef or DataCite DOI redirection. For data relations involving organism, including those from medicine, agriculture, etc., these DOIs would have to be resolved with every query or reasoning.


Some additional comments on the slide above:
Some additional comments on the slide above:
# The central redirection table can grow very large. Technically this is manageable, but requires resources.
# The central redirection table can grow very large. Technically this is manageable, but requires resources.
# The large number of involved data providers may requires substantial human resources.
# The large number of involved data providers may require substantial human resources.
# Updating the redirection table by a provider for, e.g., 30 objects, can only be done through scripting. It requires the provider (e.g. a natural history collection) to learn the API of the central redirection service.
# Updating the redirection table by a provider for, e.g., 30 objects, can only be done through scripting. It requires the provider (e.g. a natural history collection) to learn the API of the central redirection service.
# Because major current DOI systems such as CrossRef or DataCite provide identifiers to Digitally published Object, and define some metadata expectations to this extent, it is rather doubtful whether they are suitable for physical specimens or abstract taxon concepts. The slide therefore assumes a Biodiversity owned and maintained community infrastructure, run, e.g. by GBIF. Who exactly is running the infrastructure is, however, secondary. The primary argument is that load can be high and management and resources need to be adequate and sustainably financed.
# Because major current DOI systems such as CrossRef or DataCite provide identifiers to Digitally published Object, and define some metadata expectations to this extent, it is rather doubtful whether they are suitable for physical specimens or abstract taxon concepts. The slide therefore assumes a Biodiversity owned and maintained community infrastructure, run, e.g. by GBIF. Who exactly is running the infrastructure is, however, secondary. The primary argument is that load can be high and management and resources need to be adequate and sustainably financed.


A central system has some advantages, especially with respect to additional services like quality control, centralized and reliable global statistics. However, these advantages may be decisive, depending on ones needs. Unfortunately, it may be time consuming to reach a consensus on this.
A central system has some advantages, especially with respect to additional services like quality control, centralized and reliable global statistics. However, these advantages may be decisive, depending on ones needs. Unfortunately, it may be time consuming to reach a consensus on this.
<!-- TODO: table with the advantages/disadvantages of the approaches -->
==Don't wait==


However, there is some good news: Substantial concerns above about the management resources required to maintain the mapping between DOIs and URIs both at the central redirection provider and at each data provider can be reduced, by first implementing well managed locally stable URIs. Doing so is straightforward, does not require additional technology, and drastically reduces the frequency or even likely that changes at an additional central redirections are necessary.  
However, there is some good news: Substantial concerns above about the management resources required to maintain the mapping between DOIs and URIs both at the central redirection provider and at each data provider can be reduced, by first implementing well managed locally stable URIs. Doing so is straightforward, does not require additional technology, and drastically reduces the frequency or even likely that changes at an additional central redirections are necessary.  


Thus, whether a central DOI system will be adopted over time or not: Investing today into establishing good management practices for stable, semantic web compatible identifiers at each institution will not be wasted effort. It may be that the solutions is LOD-URis <b>and</b> DOIs:
Thus, whether a central DOI system will be adopted over time or not: Investing today into establishing good management practices for stable, semantic web compatible identifiers at each institution will not be wasted effort. It may be that the solutions is LOD-URIs <b>and</b> DOIs.


[[File:DOI and LOD (pro-iBiosphere discussion 2013, G. Hagedorn).PNG|600px|DOI and LOD]]
[[File:DOI and LOD (pro-iBiosphere discussion 2013, G. Hagedorn).PNG|600px|DOI and LOD]]
(See also [[Best practices for stable URIs]])

Latest revision as of 12:45, 5 March 2025

This page is a copy from https://wiki.pro-ibiosphere.eu/wiki/DOI_or_LOD_or_DOI_and_LOD , which is not available anymore. An archived version can be found at the Internet Archive.

by G. Hagedorn, 2013

The pro-iBiosphere project has investigated the status of the use of stable identifier methods within the Biodiversity community. In the course of several workshops organised by the project (involving experts from Europe, the US, and Australia) participants almost unanimously agreed that:

  • Life Science Identifiers (LSIDs), being a technology driven solely by the biodiversity community for the past 8 years, should be abandoned.
  • It was also widely agreed, that the preferred form of any identifier, be it Linked Open Data URIs (LOD-URIs), DOIs, or ARKs, should be the Semantic Web compatible http-form (i.e. in the case of DOI or ARK including an http-based resolver).

However, the question whether the management of stable URIs should occur decentralized (at multiple institutions, each using the standard URI-stability technology provided by most web servers), or whether a special, centralized technology such as DOI should be mandatory, continues to be discussed.

Comparison of DOI and Semantic Web / Linked Open Data technologies

DOI vs LOD

The top scenario shows DOI resolution. A separate server, the DOI resolution provider, accepts the request, consults its internal Stability Mapping Definitions (where the DOI is mapped to the final URI), differentiates between RDF and html requests, and forwards the request to the ultimate destination. The client (machine or human) no longer sees the stable DOI, but instead the redirected (and potentially unstable) URI.

The bottom scenario shows the same situation for a linked data setup. A webserver that is located within the data providing institution differentiates between RDF requests from machines (red dot on the left side) and HTML requests from humans using a web browsers. Using content negotiation, both requests to the same URI are directed to RDF data and HTML web pages respectively. The webserver also consults its internal mapping definitions to maintain the URIs stable. One advantage of this situation is that the client (machine or human) continues to see a stable URI.

Technically both scenarios work very similarly. The DOI example has minor advantages with respect to stability (largely limited to scenarios where the domain is lost by accident or because, perhaps after a merger, the domain transfer is neglected). By introducing the additional redirection layer only a single domain name is needed. This is a single point of failure, but the central domain can be reasonably expected to be managed to the highest standards. The DOI has the disadvantage that the URI as seen from the client side changes, because the redirect goes to a different server and is not handled opaquely within a provider.

The main distinction between the two scenarios is therefore between centralized and decentralized stability management.

Advantages and disadvantages of centralization

Biodiversity community DOI system

The graphic shows the scenario where millions of requests from millions of clients have to be forwarded by a central resolver infrastructure to a large number of data providers. The service requirements of a biodiversity DOI service, providing the canonical identifiers to all living things in the semantic web (including humans, their parasites, crops, pets, etc.) may in fact be several order of magnitude higher than a CrossRef or DataCite DOI redirection. For data relations involving organism, including those from medicine, agriculture, etc., these DOIs would have to be resolved with every query or reasoning.

Some additional comments on the slide above:

  1. The central redirection table can grow very large. Technically this is manageable, but requires resources.
  2. The large number of involved data providers may require substantial human resources.
  3. Updating the redirection table by a provider for, e.g., 30 objects, can only be done through scripting. It requires the provider (e.g. a natural history collection) to learn the API of the central redirection service.
  4. Because major current DOI systems such as CrossRef or DataCite provide identifiers to Digitally published Object, and define some metadata expectations to this extent, it is rather doubtful whether they are suitable for physical specimens or abstract taxon concepts. The slide therefore assumes a Biodiversity owned and maintained community infrastructure, run, e.g. by GBIF. Who exactly is running the infrastructure is, however, secondary. The primary argument is that load can be high and management and resources need to be adequate and sustainably financed.

A central system has some advantages, especially with respect to additional services like quality control, centralized and reliable global statistics. However, these advantages may be decisive, depending on ones needs. Unfortunately, it may be time consuming to reach a consensus on this.


Don't wait

However, there is some good news: Substantial concerns above about the management resources required to maintain the mapping between DOIs and URIs both at the central redirection provider and at each data provider can be reduced, by first implementing well managed locally stable URIs. Doing so is straightforward, does not require additional technology, and drastically reduces the frequency or even likely that changes at an additional central redirections are necessary.

Thus, whether a central DOI system will be adopted over time or not: Investing today into establishing good management practices for stable, semantic web compatible identifiers at each institution will not be wasted effort. It may be that the solutions is LOD-URIs and DOIs.

DOI and LOD

(See also Best practices for stable URIs)