CETAF Stable Identifier Guide: Difference between revisions

From CETAF ISTC Wiki
bwf>Anton Güntsch
 
(37 intermediate revisions by 6 users not shown)
Line 1: Line 1:
= CETAF ISTC Stable Identifier Initiative =
<big>'''CETAF Stable Identifier Guide'''</big>
 
== {{abbr|CETAF}} {{abbr|ISTC}} Stable Identifier Initiative ==
 
The Stable Identifiers of the Consortium of European Taxonomic Facilities (CETAF) are globally unique, consistent and reliable identifiers for specimens in natural history collections. These identifiers are used in the world wide web to redirect users and systems to images, websites and metadata of the physical objects and to integrate them with the semantic web.
 
== How do CETAF Stable Identifier look like? ==
 
[[File:Stable identifier example.png|thumb|Example for the syntax of a CETAF Stable Identifier.]]
The {{abbr|CETAF}} identifier system is based on {{abbr|HTTP}}-{{abbr|URIs}} and Linked Data principles. It is simple and future-proof. 
Each collection object as well as its associated information resources (e.g. multimedia, {{abbr|RDF}}, webpages) are identified by stable HTTP-URIs that will never change. The URI Syntax for the objects is chosen and maintained by the institution owning them. This flexibility is one of the main advantages of the CETAF Stable Identifier system as it allows e.g. to include branding and local scope identifiers into the CETAF Stable Identifier URI. There are however some [[Best_practices_for_stable_URIs|best practices for stable URIs]]. Examples are:
 
: [http://herbarium.bgbm.org/object/B100277113 http://herbarium.bgbm.org/object/B100277113]
: [http://www.botanicalcollections.be/specimen/BR0000005516339 http://www.botanicalcollections.be/specimen/BR0000005516339]
: [http://data.rbge.org.uk/herb/E00421509 http://data.rbge.org.uk/herb/E00421509]
 
== How are CETAF Stable Identifiers resolved? ==
 
[[File:Resolving-cetaf-istc stable identifiers.png|thumb|Resolving {{abbr|URI}}-based collection identifiers using standard HTTP-redirection mechanisms.]]
A CETAF Stable Identifier allows the access of information about the corresponding collection object in various ways.
If a human user tries to access a collection object by typing it’s CETAF Stable Identifier into a web-browser, he will be redirected to a human-readable representation (e.g. html web-page) of it.
If a software-system tries to access the collection object via the same identifier, it will be redirected to a machine-processable {{abbr|RDF}}-encoded metadata record. The identifier is therefore integrated with the semantic web and can also be used in other {{abbr|RDF}} representations to link to the belonging collection object.
 
== What can CETAF Stable Identifiers be used for? ==
 
As described above, CETAF Identifiers can first of all be used to redirect users and systems to images, websites and metadata of the physical objects they belong to.
They can also be used to precisely reference specimens needed in scientific studies and serve as basis for data retrieval, integration and reproducibility of data experiments.
Additionally, the stable identifiers enable new applications in the semantic web domain. An example for this is the Biology Pilot. The [https://www.bgbm.org/ Botanic Garden and Botanical Museum Berlin], [https://www.plantentuinmeise.be/en/home/ Meise Botanic Garden] and other collections annotated thousands of specimens with the [https://kiki.huh.harvard.edu/databases/botanist_index.html HUH] and [https://www.wikidata.org/wiki/Wikidata:Main_Page WikiData] IDs of their collectors. The CETAF Stable Identifiers of the annotated specimens are available on [https://www.gbif.org/ GBIF] and a server is crawling the identifiers to organize the RDF information in a Blaze Graph triple store. This graph enables us to search for specimen by their collector {{abbr|ID}} of {{abbr|HUH}} or WikiData, which is invariant to the different spelling variants the individual institutions may be using. The query will return all relevant specimens available in the joined set of specimens regardless of their origin institution. If the number of institutions using stable identifiers grows and the amount of machine readable annotations increases, this technology could be used to basically create a “google for specimens”.
 
== How can I implement CETAF Stable Identifiers for my collection? ==
 
The CETAF Stable Identifiers can be implemented in three levels. They are described in detail in [http://herbal.rbge.info/md.php?q=documentation herbal.rbge.info’s documentation].
 
{| class="vertical-align-top booktable"
|+ Following conditions have to be met to reach the corresponding implementation levels
! Level 1 … !! → Level 2 !! → Level 3 !!
|-
| style="width:40%" |<!-- L1 -->
{{color|green|<b>✓</b>}} you assigned a stable URI to each object of your collection, which will be never changed and preferably follows the [[Best_practices_for_stable_URIs|best practices for stable URIs]]
 
{{color|green|<b>✓</b>}} there exists a human-readable representation (web-page) for each of your collection objects
 
{{color|green|<b>✓</b>}} a user trying to access a collection object by typing the stable URI of it into a web-browser will be redirected to the human-readable representation (web-page) of the object (you can test this by using the [http://herbal.rbge.info/search.php CETAF URI Tester])
| style="width:30%" |<!-- L2 -->
{{color|green|<b>✓</b>}} you reached ''Level 1''
 
{{color|green|<b>✓</b>}} there exists a machine-readable RDF metadata record for each of your collection objects
 
{{color|green|<b>✓</b>}} a machine trying to access a collection object via its identifier with <code>application/rdf+xml</code> header will be redirected to the objects machine-readable RDF metadata record (you can test this by using the [http://herbal.rbge.info/search.php CETAF URI Tester])
| style="width:30%" |<!-- L3 -->
{{color|green|<b>✓</b>}} you reached ''Level 2''
 
{{color|green|<b>✓</b>}} the machine-readable RDF metadata record of each of your collection objects encodes application specific data (e.g. is compliant to the [[CSPP|CETAF Specimen Preview Profile—CSPP]])
|}
 
=== HTTP vs. HTTPS versions of CETAF {{abbr|URIs}} ===
 
As far as the Semantic web is concerned ''<nowiki>http://xyz</nowiki>'' and ''http<b>s</b>://xyz'' are different things because they are different {{abbr|URIs}}. The recommendation for new implementations should be just to use HTTPS. If you have only HTTP or HTTPS versions, ''or'' want to change it you should take notice of the following:
 
{| class="booktable"
|-
! HTTP  !! HTTPS
|-
| You
* have issued ''only HTTP'' versions of CETAF URIs and want to keep it that way
* have nothing to add technically, just have the usual 303 HTTP redirect to RDF or HTML resources in place
|You
* have issued ''only HTTPS'' versions of CETAF URIs
* don’t need to resolve then HTTP if you have never issued any, because they aren’t out there to be resolved.
|-
! style="text-align:center;" colspan="2" | Want to change HTTP to HTTPS
|-
| colspan="2" | You
* have issued HTTP versions of CETAF URIs but want to change to HTTPS
* have to keep resolving with a 303 redirect to HTTPS of the RDF or HTML resources. The RDF should contain an <code>owl:sameAs</code> assertion linking the HTTP and HTTPS versions of the {{abbr|URI}}, therefore only minor configure stuff for providers and transparent for users.
* could change to telling people to cite HTTPS rather than HTTP for your specimens but it shouldn’t matter too much as these things are linked together. The recommendation would be to cite as HTTPS if you have it implemented as at some point in the future a client may refuse to trust even a redirect from an HTTP URI (which is a bit paranoid but may happen).
|}
 
== Publishing CETAF IDs to {{abbr|GBIF}} ==
 
If your institution is using CETAF IDs and yout want them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as {{abbr|GUIDs}} in the specimen data fed to GBIF. As described in [[CETAF Specimen Catalogue]], the GBIF Index is used to discover CETAF IDs.
* If DarwinCore is used, the IDs must be mapped to [http://rs.tdwg.org/dwc/terms/occurrenceID occurrence ID].
* For {{abbr|ABCD}}, the concept [https://terms.tdwg.org/wiki/abcd2:UnitGUID UnitGUID] should be used.
 
== How can I discover specimens with CETAF IDs and corresponding Linked Open Data ({{abbr|LOD}})? ==
 
You can discover specimens of institutions of the Stable Identifiers Implementers Group by using the [[CETAF Specimen Catalogue]] maintained at the {{abbr|BGBM}}, which offers a web service for getting a list of valid CETAF IDs. For implementers of level 2, who provide {{abbr|RDF}} representations of their specimens, a cache triple store with a {{abbr|SPARQL}} access point will be available soon.
 
== What data fields or elements are recommended or standardized? ==
 
The [[CETAF Specimen Preview Profile (CSPP)]] is developed as a minimal set of agreed ({{abbr|RDF}}) collection metadata elements implemented consistently across {{abbr|CETAF}} organisations. Its purpose is to provide a stable resource enabling preview functions in specimen portals. The {{abbr|CSPP}} is not meant to be comprehensive, which means that Linked Open (collection) Data implementations of CETAF institutions will usually provide much richer metadata with additional RDF-elements.
 
== Further Questions ==
 
See on [[Questions, problem solutions and further discussions (Guide of best practices)]] and in general also in [[:Category: Discussion]].
 
== Useful Links ==
 
* [[Best practices for stable URIs]]
* [[CSPP|CETAF Specimen Preview Profile (CSPP)]]—A set of standard data components for data exchange
* [https://git.bgbm.org/cetaf/stableidentifiernegotiation Source code and example documents (git.bgbm.org)]
* [http://herbal.rbge.info/ CETAF URI Tester (herbal.rbge.info)]
* [[Standards_compliance_dashboard|The Standards Compliance Dashboard]] of collaborating institutions
* [[:Category: Guide for CETAF Stable Identifiers]]—Collection of pages related to this guide or handbook
 
== Further reading ==
 
<div class="hanging-indent compact-references">
''Kuzmova, I.'' ‘Pro-IBiosphere - Stable Identifiers for Specimens – A CETAF ISTC Initiative Supported by pro-IBiosphere’. ''EUBON''. 1 July 2013. URL: https://www.pro-ibiosphere.eu/news/4296_stable_identifiers_for_specimens_-_a_cetaf_istc_initiative_supported_by_pro-ibiosphere/.
 
''Güntsch, A.'' et al., ‘Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects’, ''Database (Oxford)'', vol. 2017; Jan. 2017. URL: https://doi.org/10.1093/database/bax003.
 
''Groom, Q.'' et al., ‘Stable Identifiers for Collection Specimens’, ''Nature (Correspondence)'', 546.7656 (2017), 33; URL: https://doi.org/10.1038/546033d
 
''Hardisty, A.'' ‘Natural Science Identifiers & CETAF Stable Identifiers’. DiSSCoTech (blog). 28 May 2020; URL:https://dissco.tech/2020/05/28/natural-science-identifiers-cetaf-stable-identifiers/.
 
''Hyland, B.'' et al. ‘Best Practices for Publishing Linked Data.’ World Wide Web Consortium, 9 Jan. 2014; http://www.w3.org/TR/ld-bp/.
 
''Wouter, A.'' ‘Identifiers for Our Institutes – GRID and ROR’, DiSSCoTech (blog), 11 April 2020; https://dissco.tech/2020/04/11/identifiers-for-our-institutes-grid-and-ror/.
 
''McMurry, J. A.'' et al., ‘Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data’, ''PLOS Biology'' 15(6):e2001414 June 2017; URL:  https://doi.org/10.1371/journal.pbio.2001414.
 
<div style="margin-top:2ex;" ></div>
 
Poster: [http://www.cetaf.org/sites/default/files/cetaf-istc_stable_identifiers_poster50x70.pdf CETAF stable identifiers for specimens (1.4MB, www.cetaf.org)]
</div>


== Meetings ==
== Meetings ==
* [http://stories.rbge.org.uk/archives/3846 Edinburgh Meeting (June 2013)]
* [http://stories.rbge.org.uk/archives/3846 Edinburgh Meeting (June 2013)]
* [http://wiki.pro-ibiosphere.eu/wiki/Workshop_Berlin_1:_How_to_improve_technical_cooperation_and_interoperability_at_the_e-infrastructure_level_Minutes Joint ISTC/pro-iBiosphere workshop Berlin (October 2013)]
* [https://web.archive.org/web/20200318235750/https://wiki.pro-ibiosphere.eu/wiki/Workshop_Berlin_1:_How_to_improve_technical_cooperation_and_interoperability_at_the_e-infrastructure_level_Minutes Joint ISTC/pro-iBiosphere workshop Berlin, October 2013 (archived version)]
* [[Geneva_meeting|Geneva Meeting (October 2015)]]
* [[Geneva_meeting|Geneva Meeting (October 2015)]]
* [http://cetafdigitization.biowikifarm.net/cdig/ISTC_Meeting_Spring_2016_Bratislava#Minutes Joint CETAF-ISTC / CETAF-DWG meeting (May 2016) ]
* [[ISTC_Meeting_Spring_2016_Bratislava#Minutes|Joint CETAF-ISTC / CETAF-DWG meeting (May 2016)]]
* [http://cetafdigitization.biowikifarm.net/cdig/ISTC_Meeting_Spring_2017_Stuttgart Joint CETAF-ISTC / CETAF-DWG meeting (March 2017) ]
* [[ISTC_Meeting_Spring_2017_Stuttgart|Joint CETAF-ISTC / CETAF-DWG meeting (March 2017)]]
* [http://cetafidentifiers.biowikifarm.net/wiki/IDs_and_LOD_Discussion (Virtual) LOD Hackathon (October 2017)]
* [[IDs_and_LOD_Discussion|(Virtual) LOD Hackathon (October 2017)]]
* [http://cetafdigitization.biowikifarm.net/cdig/ISTC_DWG_Meeting_Spring_2018_Copenhagen Joint CETAF-ISTC / CETAF-DWG meeting (February 2018) ]
* [[ISTC_DWG_Meeting_Spring_2018_Copenhagen|Joint CETAF-ISTC / CETAF-DWG meeting (February 2018)]]
 
* [[ISTC_QoS_Workshop_Copenhagen_2018|ISTC QoS Workshop Copenhagen (June 2018)]]
== Background ==
* [[ISTC_DWG_Meeting_Spring_2019_Vienna|Joint CETAF-ISTC / CETAF-DWG meeting (February 2019)]]
* [http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs Best practices for stable URIs]
* [http://www.pro-ibiosphere.eu/news/4296_stable_identifiers_for_specimens_-_a_cetaf_istc_initiative_supported_by_pro-ibiosphere/ Stable identifiers for specimens – A CETAF ISTC initiative supported by pro-iBiosphere]
* [http://www.cetaf.org/sites/default/files/cetaf-istc_stable_identifiers_poster50x70.pdf Poster: CETAF stable identifiers for specimens]
* [https://academic.oup.com/database/article/3053443/Actionable, Paper in Database (Oxford) 2017 (1): Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects]
* [http://www.nature.com/nature/journal/v546/n7656/full/546033d.html?foxtrotcallback=true Paper in Nature 546, 33 (01 June 2017): Data management: Stable identifiers for collection specimens]


== Material / Results ==
[[Category: Guide for CETAF Stable Identifiers]]
* [https://sourceforge.net/projects/stablecollectionidentifiers/ Source code and example documents]
* CETAF Specimen Preview Profile: [[CSPP]]
* [http://herbal.rbge.info/ URI Tester]

Latest revision as of 10:43, 17 September 2025

CETAF Stable Identifier Guide

CETAF ISTC Stable Identifier Initiative

The Stable Identifiers of the Consortium of European Taxonomic Facilities (CETAF) are globally unique, consistent and reliable identifiers for specimens in natural history collections. These identifiers are used in the world wide web to redirect users and systems to images, websites and metadata of the physical objects and to integrate them with the semantic web.

How do CETAF Stable Identifier look like?

Example for the syntax of a CETAF Stable Identifier.

The CETAF identifier system is based on HTTP-URIs and Linked Data principles. It is simple and future-proof. Each collection object as well as its associated information resources (e.g. multimedia, RDF, webpages) are identified by stable HTTP-URIs that will never change. The URI Syntax for the objects is chosen and maintained by the institution owning them. This flexibility is one of the main advantages of the CETAF Stable Identifier system as it allows e.g. to include branding and local scope identifiers into the CETAF Stable Identifier URI. There are however some best practices for stable URIs. Examples are:

http://herbarium.bgbm.org/object/B100277113
http://www.botanicalcollections.be/specimen/BR0000005516339
http://data.rbge.org.uk/herb/E00421509

How are CETAF Stable Identifiers resolved?

Resolving URI-based collection identifiers using standard HTTP-redirection mechanisms.

A CETAF Stable Identifier allows the access of information about the corresponding collection object in various ways. If a human user tries to access a collection object by typing it’s CETAF Stable Identifier into a web-browser, he will be redirected to a human-readable representation (e.g. html web-page) of it. If a software-system tries to access the collection object via the same identifier, it will be redirected to a machine-processable RDF-encoded metadata record. The identifier is therefore integrated with the semantic web and can also be used in other RDF representations to link to the belonging collection object.

What can CETAF Stable Identifiers be used for?

As described above, CETAF Identifiers can first of all be used to redirect users and systems to images, websites and metadata of the physical objects they belong to. They can also be used to precisely reference specimens needed in scientific studies and serve as basis for data retrieval, integration and reproducibility of data experiments. Additionally, the stable identifiers enable new applications in the semantic web domain. An example for this is the Biology Pilot. The Botanic Garden and Botanical Museum Berlin, Meise Botanic Garden and other collections annotated thousands of specimens with the HUH and WikiData IDs of their collectors. The CETAF Stable Identifiers of the annotated specimens are available on GBIF and a server is crawling the identifiers to organize the RDF information in a Blaze Graph triple store. This graph enables us to search for specimen by their collector ID of HUH or WikiData, which is invariant to the different spelling variants the individual institutions may be using. The query will return all relevant specimens available in the joined set of specimens regardless of their origin institution. If the number of institutions using stable identifiers grows and the amount of machine readable annotations increases, this technology could be used to basically create a “google for specimens”.

How can I implement CETAF Stable Identifiers for my collection?

The CETAF Stable Identifiers can be implemented in three levels. They are described in detail in herbal.rbge.info’s documentation.

Following conditions have to be met to reach the corresponding implementation levels
Level 1 … → Level 2 → Level 3

you assigned a stable URI to each object of your collection, which will be never changed and preferably follows the best practices for stable URIs

there exists a human-readable representation (web-page) for each of your collection objects

a user trying to access a collection object by typing the stable URI of it into a web-browser will be redirected to the human-readable representation (web-page) of the object (you can test this by using the CETAF URI Tester)

you reached Level 1

there exists a machine-readable RDF metadata record for each of your collection objects

a machine trying to access a collection object via its identifier with application/rdf+xml header will be redirected to the objects machine-readable RDF metadata record (you can test this by using the CETAF URI Tester)

you reached Level 2

the machine-readable RDF metadata record of each of your collection objects encodes application specific data (e.g. is compliant to the CETAF Specimen Preview Profile—CSPP)

HTTP vs. HTTPS versions of CETAF URIs

As far as the Semantic web is concerned http://xyz and https://xyz are different things because they are different URIs. The recommendation for new implementations should be just to use HTTPS. If you have only HTTP or HTTPS versions, or want to change it you should take notice of the following:

HTTP HTTPS
You
  • have issued only HTTP versions of CETAF URIs and want to keep it that way
  • have nothing to add technically, just have the usual 303 HTTP redirect to RDF or HTML resources in place
You
  • have issued only HTTPS versions of CETAF URIs
  • don’t need to resolve then HTTP if you have never issued any, because they aren’t out there to be resolved.
Want to change HTTP to HTTPS
You
  • have issued HTTP versions of CETAF URIs but want to change to HTTPS
  • have to keep resolving with a 303 redirect to HTTPS of the RDF or HTML resources. The RDF should contain an owl:sameAs assertion linking the HTTP and HTTPS versions of the URI, therefore only minor configure stuff for providers and transparent for users.
  • could change to telling people to cite HTTPS rather than HTTP for your specimens but it shouldn’t matter too much as these things are linked together. The recommendation would be to cite as HTTPS if you have it implemented as at some point in the future a client may refuse to trust even a redirect from an HTTP URI (which is a bit paranoid but may happen).

Publishing CETAF IDs to GBIF

If your institution is using CETAF IDs and yout want them (and potential Specimen RDF) to be included into the CETAF Specimen Catalogue, they need to be used as GUIDs in the specimen data fed to GBIF. As described in CETAF Specimen Catalogue, the GBIF Index is used to discover CETAF IDs.

  • If DarwinCore is used, the IDs must be mapped to occurrence ID.
  • For ABCD, the concept UnitGUID should be used.

How can I discover specimens with CETAF IDs and corresponding Linked Open Data (LOD)?

You can discover specimens of institutions of the Stable Identifiers Implementers Group by using the CETAF Specimen Catalogue maintained at the BGBM, which offers a web service for getting a list of valid CETAF IDs. For implementers of level 2, who provide RDF representations of their specimens, a cache triple store with a SPARQL access point will be available soon.

What data fields or elements are recommended or standardized?

The CETAF Specimen Preview Profile (CSPP) is developed as a minimal set of agreed (RDF) collection metadata elements implemented consistently across CETAF organisations. Its purpose is to provide a stable resource enabling preview functions in specimen portals. The CSPP is not meant to be comprehensive, which means that Linked Open (collection) Data implementations of CETAF institutions will usually provide much richer metadata with additional RDF-elements.

Further Questions

See on Questions, problem solutions and further discussions (Guide of best practices) and in general also in Category: Discussion.

Useful Links

Further reading

Kuzmova, I. ‘Pro-IBiosphere - Stable Identifiers for Specimens – A CETAF ISTC Initiative Supported by pro-IBiosphere’. EUBON. 1 July 2013. URL: https://www.pro-ibiosphere.eu/news/4296_stable_identifiers_for_specimens_-_a_cetaf_istc_initiative_supported_by_pro-ibiosphere/.

Güntsch, A. et al., ‘Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects’, Database (Oxford), vol. 2017; Jan. 2017. URL: https://doi.org/10.1093/database/bax003.

Groom, Q. et al., ‘Stable Identifiers for Collection Specimens’, Nature (Correspondence), 546.7656 (2017), 33; URL: https://doi.org/10.1038/546033d

Hardisty, A. ‘Natural Science Identifiers & CETAF Stable Identifiers’. DiSSCoTech (blog). 28 May 2020; URL:https://dissco.tech/2020/05/28/natural-science-identifiers-cetaf-stable-identifiers/.

Hyland, B. et al. ‘Best Practices for Publishing Linked Data.’ World Wide Web Consortium, 9 Jan. 2014; http://www.w3.org/TR/ld-bp/.

Wouter, A. ‘Identifiers for Our Institutes – GRID and ROR’, DiSSCoTech (blog), 11 April 2020; https://dissco.tech/2020/04/11/identifiers-for-our-institutes-grid-and-ror/.

McMurry, J. A. et al., ‘Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data’, PLOS Biology 15(6):e2001414 June 2017; URL:  https://doi.org/10.1371/journal.pbio.2001414.

Poster: CETAF stable identifiers for specimens (1.4MB, www.cetaf.org)

Meetings