Import issues with CETAF identifiers: Difference between revisions

From CETAF ISTC Wiki
bwf>Andreas Plank
mNo edit summary
bwf>Andreas Plank
mNo edit summary
Line 1: Line 1:
----
'''Note:''' Unresolved or pending issues are on top and issues that are done get to the end
----
__TOC__
__TOC__


== data.nhm.ac.uk ==
== data.nhm.ac.uk ({{abbr|NHM}}) ==


({{Tobedone}}) Requesting “Content-Type: application/rdf+xml” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:06, 18 February 2020 (CET)  
({{Tobedone}}) Requesting “Content-Type: application/rdf+xml” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:06, 18 February 2020 (CET)  


== herbarium.bgbm.org ==
== specimens.kew.org ({{abbr|RBGK}}) ==
 
({{Tobedone}}) Requested RDF is instead HTML, e.g.:
<syntaxhighlight lang="bash">
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"
</syntaxhighlight>
--[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:32, 18 February 2020 (CET)
 
== herbarium.bgbm.org ({{abbr|BGBM}}) ==


({{done}}) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in <code>owl:sameAs</code> and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:
({{done}}) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in <code>owl:sameAs</code> and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:
Line 20: Line 32:
* http://herbarium.bgbm.org/data/rdf/B100000503 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
* http://herbarium.bgbm.org/data/rdf/B100000503 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
* http://herbarium.bgbm.org/data/rdf/B100000627 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
* http://herbarium.bgbm.org/data/rdf/B100000627 --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 16:21, 30 January 2020 (CET) {{done}} --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 11:45, 3 February 2020 (CET)
== specimens.kew.org ==
({{Tobedone}}) Requested RDF is instead HTML, e.g.:
<syntaxhighlight lang="bash">
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"
</syntaxhighlight>

Revision as of 15:33, 18 February 2020


Note: Unresolved or pending issues are on top and issues that are done get to the end


data.nhm.ac.uk (NHM)

(Work in progress: pending Pending) Requesting “Content-Type: application/rdf+xml” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --Andreas Plank (talk) 14:06, 18 February 2020 (CET)

specimens.kew.org (RBGK)

(Work in progress: pending Pending) Requested RDF is instead HTML, e.g.:

wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"

--Andreas Plank (talk) 14:32, 18 February 2020 (CET)

herbarium.bgbm.org (BGBM)

( Done) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in owl:sameAs and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:

Bad URI: < http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2> Code: 0/ILLEGAL_CHARACTER in SCHEME: The character violates the grammar rules for URIs/IRIs. ERROR Bad character in IRI (space): <[space]...>

… see for instance in line 63:

<rdf:Description rdf:about="http://www.wikidata.org/entity/Q6382619">
                    <owl:sameAs rdf:resource="	http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2" />
                <owl:sameAs rdf:resource="http://viaf.org/viaf/233473288" />
          </rdf:Description>

The following objects were detected: