Best practices for stable URIs

From CETAF ISTC Wiki
Revision as of 12:49, 26 May 2013 by bwf>Gregor Hagedorn

Introduction

1. It is important to keep the mission-critical URIs (or URLs, the web-adresses) stable. Make a deliberate choice which pages and which classes of objects you want to manage as stable. Do not aim to keep all your URI/URLs stable forever: this often becomes unmanageable.

2. The primary purpose of this discussion is to support others in finding good URI patterns. The secondary purpose is to assess whether it is possible that some institutions voluntarily share the same pattern to ease recognition and set a recognizable example for others to follow?

3. Linked Open Data / the Semantic Web uses http-URIs to identify the things itself and talk about this. The semantic web works with any kind of http-URIs, including those that do not follow these best practices! The best practices are relevant only to make it reasonably likely to be able to keep your URIs stable.

4. In the face of changing technology, at some point you will have to use the webserver's rewrite module to keep URIs stable. The simpler the URI pattern is, the easier this becomes. Thus the first recommendation is: Use rewriting from the start on. Define simple URI patterns (= no ports, no extensions like .php or .aspx, no parameters with ? or &) that are being rewritten to your current technology.

5. If several URIs exist, declare one as the "preferred" (canonical) URI. E.g. if both

refer to the same object, one should be a redirect to the other (e.g. http status 301).

6. Highly recommended references: 1. Sauermann & Cyganiak 2008, Cool URIs for the Semantic Web. 2. Hyam, R.D., Drinkwater, R.E. & Harris, D.J. Stable citations for herbarium specimens on the internet: an illustration from a taxonomic revision of Duboscia (Malvaceae) Phytotaxa 73: 17–30 (2012) PDF

Parts of stable URI patterns

A recommended URI pattern is the following:

subdomain.yourdomain.org/path/variable-identifier#hash
  • subdomain: If the stable URIs use a general purpose domain with many different services, it may be desirable to add a dedicated subdomain for specific services. Using subdomains offers the flexibility that several institutions in the future share operations for a specific subdomain without affecting the stability of these URIs. If the main domain is already dedicated to a specific service, using a subdomain is probably irrelevant.
  • yourdomain.org: The part like rbge.org.uk, zoobank.org, ipni.org, naturalis.nl, nhm.ac.uk/
  • path: The part that is remains constant for different identifiers of the same class (e.g., taxa, specimens). Similar to a subdomain, this increases the ease with which identifiers can be kept stable over decades (using web server rewrite modules).
    It is possible to ignore this and use URIs like http://zoobank.org/7D39CAAA-4B4B-4588-A372-D4097162B1CD. However, this makes future rewrite rules more sensitive
  • variable-identifier: The part that changes for each object. It will usually be a number or code you also use otherwise, like a simple locally unique or code (123, a123, M-2361318, ...) or it may be a UUID like 1C4EDC178AD79DD7F1A5AB856E8C5BCA.
  • #hash: This relevant only when using the hash-method to distinguish between the abstract concept or concrete object (e.g. Formica rufa or a specific physical specimen, which cannot be transmitted through the internet, but described) and the web pages (html, pdf, rdf-data). (The alternative method is to use 303 redirects.)

Social consideration for labeling parts of stable URIs

With the hash method and using both subdomain and path for stability, three strings are needed.

1. Technically irrelevant, but confusing to humans are repetitions like http://objects.example.org/objects/123#object or concatenations of closely overlapping terms like http://objects.example.org/concepts/123#topic

2. Terms may come from these categories:

  • Generic terms like: "resource", "portal", "content", "object", "concept", "thing", "topic", "id", "identifier". NOTE: ADDITIONAL PROPOSAL WELCOME!
  • Terms for classes of objects or concepts like: "taxon"/"taxa", "taxonconcept", "name", "term", "sample", "specimen", "treatement", "description", "morphology", "collection", "person"/"people", "organisation"/"institution", "locality".
  • An indicator of stability like "stable", "permanent", "stable-id", "purl" (= permanent URL). NOTE: ADDITIONAL PROPOSAL WELCOME!
  • Terms with no or reduced semantic like: "dx", "zb" (abbreviation of Zoobank), "res", "it", "o", "t", "s", "p".

3. In the semantic web, the word "data" should be avoided where referring to the concept or thing itself (as opposed to the data about it). A URI like data.organisation.org/specimen/123 for a specimen itself (but redirected to another URI when the data are being returned) is easily misinterpreted as referring to the data rather than the object.

4. In principle a similar concern may be raised over the use of "id" or "identifier" (the semantic web would speak about the thing by means of an identifier, not about the identifier), but these concerns are probably negligible.

5. Terms from the categories above can probably used interchangeably for subdomain and path, i.e. specimen.example.org/object/123 and object.example.org/specimen/123 work similarly well.

If you foresee that operations for different objects classes may in the future be consolidated within different consortia, it may be desirable to put the object class (like specimen) in the subdomain.

6. For the hash tag to indicate that the URI with hash is the real thing, the one without the data, the choices are more limited. Examples:

specimen.example.org/res/123#specimen
specimen.example.org/res/123#object
specimen.example.org/res/123#obj
specimen.example.org/res/123#id
specimen.example.org/res/123#itself
PLEASE ADD YOUR EXAMPLES!
(See below for a brief discussion of the hash versus 303 redirection method)

Examples:

http://objects.example.org/res/123#specimen
http://specimen.example.org/stable-id/123#physical
http://id.example.org/specimen/123#obj
http://res.example.org/specimen/123#id
http://permanent.example.org/specimen/123#id

SPACE to add your preferred pattern for specimen or scientific names

Please add in your preferred pattern based on the notes above as well as new ideas. Can we achieve a set of patterns (not a single one) that others could mimic? I think this might help to spread the idea...