Import issues with CETAF identifiers: Difference between revisions
| bwf>Andreas Plank mNo edit summary | bwf>Andreas Plank  mNo edit summary | ||
| Line 18: | Line 18: | ||
| </syntaxhighlight> | </syntaxhighlight> | ||
| --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:32, 18 February 2020 (CET) | --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:32, 18 February 2020 (CET) | ||
| == col.smns-bw.org ({{abbr|SMNS}}) == | |||
| ({{Tobedone}}) Requested RDF is instead an HTML fragment, e.g. under Linux: | |||
| <syntaxhighlight lang="bash"> | |||
| wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="col.smns-bw.org⁄object⁄S10000227722006.rdf" "http://col.smns-bw.org/object/S10000227722006" | |||
| file col.smns-bw.org⁄object⁄S10000227722006.rdf | |||
| # col.smns-bw.org⁄object⁄S10000227722006.rdf: HTML document, ISO-8859 text, with very long lines, with CRLF line terminators | |||
| </syntaxhighlight> | |||
| --[[User:Andreas Plank|Andreas Plank]] ([[User talk:Andreas Plank|talk]]) 14:38, 18 February 2020 (CET) | |||
| == herbarium.bgbm.org ({{abbr|BGBM}}) == | == herbarium.bgbm.org ({{abbr|BGBM}}) == | ||
Revision as of 15:39, 18 February 2020
Note: Unresolved or pending issues are on top and issues that are done get to the end
data.nhm.ac.uk (NHM)
(  Pending) Requesting “Content-Type: application/rdf+xml” results in 404 (not found) instead of getting RDF (see https://github.com/NaturalHistoryMuseum/ckanext-nhm/issues/458) --Andreas Plank (talk) 14:06, 18 February 2020 (CET) 
specimens.kew.org (RBGK)
(  Pending) Requested RDF is instead HTML, e.g. under Linux:
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="specimens.kew.org⁄herbarium⁄K001116483.rdf" "http://specimens.kew.org/herbarium/K001116483"
file specimens.kew.org⁄herbarium⁄K001116483.rdf 
# specimens.kew.org⁄herbarium⁄K001116483.rdf: HTML document, ASCII text, with very long lines, with CRLF, LF line terminators--Andreas Plank (talk) 14:32, 18 February 2020 (CET)
col.smns-bw.org (SMNS)
(  Pending) Requested RDF is instead an HTML fragment, e.g. under Linux:
wget --header='Accept: application/rdf+xml'  --header='Content-Type: application/rdf+xml' --output-document="col.smns-bw.org⁄object⁄S10000227722006.rdf" "http://col.smns-bw.org/object/S10000227722006"
file col.smns-bw.org⁄object⁄S10000227722006.rdf
# col.smns-bw.org⁄object⁄S10000227722006.rdf: HTML document, ISO-8859 text, with very long lines, with CRLF line terminators--Andreas Plank (talk) 14:38, 18 February 2020 (CET)
herbarium.bgbm.org (BGBM)
( Done) In some RDF files are invalid URI entries i.e. there is a tab/space character in the URI in 
owl:sameAs and this would break the whole import of data. The error log of triple store loader (tdbloader2) shows something like:
Bad URI: < http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2> Code: 0/ILLEGAL_CHARACTER in SCHEME: The character violates the grammar rules for URIs/IRIs. ERROR Bad character in IRI (space): <[space]...>
… see for instance in line 63:
<rdf:Description rdf:about="http://www.wikidata.org/entity/Q6382619">
                    <owl:sameAs rdf:resource="	http://purl.oclc.org/net/edu.harvard.huh/guid/uuid/a86596ea-6f4d-4b97-bf6f-8d492c0fc8b2" />
                <owl:sameAs rdf:resource="http://viaf.org/viaf/233473288" />
          </rdf:Description>The following objects were detected:
- http://herbarium.bgbm.org/data/rdf/B100000580 --Andreas Plank (talk) 16:21, 30 January 2020 (CET) Done --Andreas Plank (talk) 11:45, 3 February 2020 (CET) 
- http://herbarium.bgbm.org/data/rdf/B100000503 --Andreas Plank (talk) 16:21, 30 January 2020 (CET) Done --Andreas Plank (talk) 11:45, 3 February 2020 (CET) 
- http://herbarium.bgbm.org/data/rdf/B100000627 --Andreas Plank (talk) 16:21, 30 January 2020 (CET) Done --Andreas Plank (talk) 11:45, 3 February 2020 (CET) 


