2025 Spring Meeting: Difference between revisions

From CETAF ISTC Wiki
No edit summary
(adding info on the presentation of IndExs)
 
(5 intermediate revisions by 2 users not shown)
Line 38: Line 38:
* Theary Ung (MNHN, Paris, FR)
* Theary Ung (MNHN, Paris, FR)
* Laurent Gohy (ULiège, BE) - first day
* Laurent Gohy (ULiège, BE) - first day
* Alex (CETAF)
* Robin Drinkwater (RBGE, Edinburgh)


== Draft Agenda ==
== Agenda ==
{| class="wikitable"
{| class="wikitable"
!colspan="2" | ------------------------------------------------------------ Day 1 - April 7 ------------------------------------------------------------
!colspan="2" | ------------------------------------------------------------ Day 1 - April 7 ------------------------------------------------------------
Line 49: Line 51:
| 13:30 - 14:00 || The new ISTC Wiki and future ISTC documentation
| 13:30 - 14:00 || The new ISTC Wiki and future ISTC documentation
|-
|-
| 14:00 - 15:15 || CMS Subgroup<br>- Introduction of the new subgroup<br>- Presentations of CMS-related activities in CETAF collections<br>* Frank Theeten<br>* [https://docs.google.com/presentation/d/1f_mICa50WKusn94CXW0IhELABTsx4IkV/edit?usp=sharing&ouid=114384447688591455468&rtpof=true&sd=true/ Maarten Trekels]<br>* Stefan Seifert<br>* <s>Jonas Grieb</s><br>* [https://istc.cetaf.org/images/8/89/2025-04-07_JACQ_ISTC_Bonn.pdf Dominik Röpert]<br>- Discussion of possible activities of the CMS subgroup and structuring of the collaboration
| 14:00 - 15:15 || CMS Subgroup<br>- Introduction of the new subgroup<br>- Presentations of CMS-related activities in CETAF collections<br>* [https://istc.cetaf.org/images/1/15/01_theeten_presentation_isct_2025.pdf Frank Theeten]<br>* [https://docs.google.com/presentation/d/1f_mICa50WKusn94CXW0IhELABTsx4IkV/edit?usp=sharing&ouid=114384447688591455468&rtpof=true&sd=true/ Maarten Trekels]<br>* [https://istc.cetaf.org/File:03_Seifert-ISTC-2025-04-07.pdf Stefan Seifert]<br>* <s>Jonas Grieb</s><br>* [https://istc.cetaf.org/images/8/89/2025-04-07_JACQ_ISTC_Bonn.pdf Dominik Röpert]<br>- Discussion of possible activities of the CMS subgroup and structuring of the collaboration
|-
|-
| '''15:15 - 15:45''' || '''Coffee break'''
| '''15:15 - 15:45''' || '''Coffee break'''
Line 79: Line 81:
| '''13:00''' || '''End of meeting (and packet lunch)'''
| '''13:00''' || '''End of meeting (and packet lunch)'''
|}
|}
== Minutes ==
=== Day 1 ===
==== The new ISTC Wiki and future ISTC documentation ====
==== CMS Subgroup ====
* Presentations are linked in the agenda
* Question to Maarten: Can the tender for their new system be published? Answer: Tender contains some confidential data, but once this has been cleared, it can be shared.
==== AI, Robotics and Digital Collections Subgroup ====
* Presentation is linked in the agenda
* Presentation by Maarten:
** DiSSCo Flanders: second phase with a stronger emphasis on specimen enrichment
** Some work done at University Ghent
** Common steps: straightening, segmenting, multi plant detection, Label OCR, Leaf extraction, identifications for specimen without any (or only rough)
** Future ideas: morphological feature extraction
* Laurent (digitization manager at University Liege): Automating Herbarium Sheet Data Extraction with AI
** specimen QR-Code photographed with camera
** directly read label information with Google Lens
** separate the labels in fields in next step
* Arianna: AI at the [Natural History] Museum [UK]
** topics AI for Digitization / collection
*** Use Cases: General Knowledge Base, based on all information in the collection, trait recognition, insect digitization,
*** Robotic Arm for pinned objects: scan drawer, recognize pins, select pin for handling, move it to photography station, take images of insect from multiple angles and label
** NHM AI Lab (also for general science questions),
*** E.g. Classifications for animal hair or diatoms,
*** Outreach via AI Coffee Lab or Hybrid Lunches, collaboration with the Turing Institute and UCL
** DCMS x NHM AI Pilot program (UK Department for Culture, Media and Sport)
*** People could propose projects, 5 selected.
*** Topics: herbarium: chat bot that answers from the perspective of a known collector
*** Segmentation and part
*** Data extraction based on old index cards
*** AI Alt text Generation
*** Predict species of reptiles based on confiscated fashion products: is the product illegal or not
* General Discussion/Next Steps (sorted by the questions suggested by Arianna)
** What is currently going on, in your institutions?
** What are hurdles to AI techniques?
*** Questionable moralities of the companies behind the LLMs (Copyright?)
*** Power consumption
*** Local or OpenSource LLMs as an alternative?
** How can robotics benefit digital collections?
** How can CETAF facilitate this better?
*** Big wins come from scaling it up to millions of specimen instead of just a couple of ones.
*** Run tools on all of CETAF collections, e.g. search for the signature of a collector
*** DiSSCo Machine Annotation service could help to move the data out of silos
*** This group could help with comparison of the different approaches, quality checks. This could be done on a benchmark subset, but will require a lot of work for a high quality ground truth. Can the creation of this dataset be automated?
*** Maybe additional funding as a COST action? Relevance?
*** Infrastructure requirements: who does this
*** Use Case: Structure unstructured data, could be relevant for the GBIF Capacity Enhancement Program, WFO and PlantNet have a similar approach.
** Mode on which the working group will operate
*** Many colleagues are working on it, but not in particular projects, general interest to continue.
*** How to keep in touch. Slack channel? Free channels will lose old posts.
*** Google Spreadsheet to have an overview who is working on what.
*** Github group
*** DiSSCo will have Hackathons for the Machine Annotation Service.
*** Goal: find a specific target/goal/project that allows everybody in work on it and shows the potential but it also small enough to be managable
*** Best practices for organizing a hackathon: there is one from the Biohackathon community.
=== Day 2 ===
==== Type Specimen Initiative ====
* additional detailed notes in the Google Doc that is linked in the agenda
* Sabine: Wikidata and Research Conference in Florence in June,
** Highlight the importance of the type specimen collections, Compile a list of type cataloges in Wikidata. Multilinguality is a strong advantage here, connected to different external identifiers. 300+ already imported. Please send Sabine other examples of other published type catalogues.
* Nicky: maybe publish list also to Zetoro. Possible to do this via API.
* There is a strong focus on zoology, but there are some from botany as well.
* Also look at extra list of typifications from other publications, e.g. Wildenowia.
* There is a discussion in the wikidata community, on whether all of the type specimens should be in Wikidata. There are some in there already. Alternative would be to have a dedicated place.
* TRE would be one of the places. Type specimens from GBIF connected to their names from IPNI. The alternative approach would be to start from literature and try to find the cooresponding specimen. The TRE data model is a bit different from Wikidata. Not all institutions properly export their type records properly to GBIF, in particular via JSTOR.
* Typfied Name is suggested to DwC, but it has been long in discussion. With that it would be a lot nicer to share type data.
* Hosted Portal with a subset from GBIF of all of the specimen that claim to be types.
* Also look at the registries, e.g. for mycology, which are their precisely for that reason. Different fields have different practices: registration,
* In particular it is difficult to find lectotypes.
* There are inconsistencies with how type information should be expressed in DwC. GBIF interprets it differently.
* The TRE is a useable approach to demonstrate the data model, importing all type specimen into Wikidata could be possible.
* While it might look, like the information about types is static, there is some changes, e.g. holotypes become lectotypes. Wikidata could handle those changes, as well as contradicting information. Synchronization is still an issue, there are a lot of data changes in GBIF for type data for example.
* There could still be a use case for keeping a dedicated Wikibase for the type specimens and just interlinking it with Wikidata via links and federated queries.
* The new version of the Taxonomic Concept Standard is up for review at TDWG, this might change the way that taxon names are shared with GBIF.
* Next steps: Which other infrastructures might be interested in a type catalogue? DiSSCo or GBIF. Proof-of-Concept will be delivered in TETTRIs, so the next further steps could be build in TETTRIs Next.
==== Other Presentations ====
* Indexing of research expeditions and linking to semantic entities (Sabine)
** Proposed as a group in TDWG a couple of years ago, is now a Task Group in TDWG
** Goal: get identifiers for expeditions, historic and contemporary
** Compile basic metadata: time, participants, …
** Link to related external entities
** Possibly an extension to DwC in the future
** Identified several applicable wikidata properties
** Issues with colonial historic background, e.g. outdated country names
** Tool to visualize journeys using Wikidata and GBIF: https://www.expeditia.info/
** 24. April: TDWG Working Group Session on Research Expeditions
** Discussion:
*** RAiD Identifier: https://raid.org/ Research Activity Identifier (doesn’t have a Wikidata Property yet)
*** Is the term expedition a bit outdated? Expedition might imply trips to undiscovered or under developed regions. Should it be a more generic Research Event?
*** Could this be expressed using Latimer Core, as a virtual collection of specimens that have the expedition in common and it would allow for estimates how many
* DiSSCo Machine Annotation Services (MAS), Wouter
** [https://docs.google.com/presentation/d/1jQp0TKJhnbKCG_IHmNmQaIHOKTQIN7GAbQbsMx-The8/edit?usp=sharing Presentation]
** Annotation Motivations: Commenting, Adding, Editing, Assessing, Deleting,
** Target Types: Term, Class, Region of Interest (in images)
** Existing services can be adopted to DiSSCo Data Model using Wrappers
** MAS Service Providers need to: Create a MAS Service and test in sandbox, get code reviewed, Service Delivery Plan and SLA
** Current Test Cases: Plant Organ Detection, AI4Labels,
** Recently: MAS Hackathon, a lot of participants, 3 new services were created.
** INDEED Tool to train quality model to judge the parts of AI based Herbarium Sheet segmentation
** Limitations: Images are not hosted by DiSSCo only the links, a centralized storage would make processing a log easier.
** Quality Control is important.
* Index of Exsiccatae (Dagmar)
** http://indexs.botanischestaatssammlung.de/
** Detailed documentation at [[Storytelling with IndExs]]
** More than 2.400 series of duplicate specimens with printed numbered labels, the eldest once in bounded volumes, but regularily integrated in the general collections of herbaria
** Database great for storytelling about exsiccatal series (duplicate specimen series) in herbaria, e.g. towards EU funders to point on the existing international network of collections
** This could be a different use-case for Latimer Core.
==== Any Other Business ====
* Anton will send around a draft of the new charter
* Subgroups will meet and take all of the input from this meeting in consideration.


== Logistics ==
== Logistics ==

Latest revision as of 09:57, 30 April 2025

CETAF Information Science & Technology Commission - Spring 2025 Meeting

Bonn, April 7-8 2025

Venue

Leibniz Institute for the Analysis of Biodiversity Change
Museum Koenig
Raiffeisenhaus
Adenauerallee 127
53113 Bonn, Germany (First floor, on the right. Please follow the signs!)

Participants

  • Anton Güntsch (FUB-BGBM, DE)
  • Dominik Röpert (FUB-BGBM, DE)
  • David Fichtmüller (FUB-BGBM, DE)
  • Quentin Groom (Meise Botanic Garden, BE)
  • Mathias Dillen (Meise Botanic Garden, BE)
  • Maarten Trekels (Meise Botanic Garden, BE)
  • Jiri Frank (NM, CZ)
  • Adam Cironis (NM, CZ)
  • Wouter Addink (Naturalis, NL)
  • Dagmar Triebel (SNSB, DE)
  • Stefan Seifert (SNSB, DE)
  • Sabine von Mering (MfN, DE)
  • Caitlin Thorn (MfN, DE)
  • Jonas Grieb (SGN, DE)
  • Claus Weiland (SGN, DE)
  • Anke Penzlin (SGN, DE)
  • André De Mûelenaere (Africamuseum, BE)
  • Franck Theeten (Africamuseum, BE)
  • Björn Quast (LIB, Bonn, DE)
  • Birgit Rach (LIB, Bonn, DE)
  • Cristina Garilao (LIB, Hamburg, DE)
  • Laura Tilley (CETAF, BE)
  • Ana Casino (CETAF, BE)
  • Glorioso Alessio (CETAF, BE)
  • Nicky Nicolson (Kew, UK)
  • Arianna Salili-James (NHM, UK)
  • Theary Ung (MNHN, Paris, FR)
  • Laurent Gohy (ULiège, BE) - first day
  • Alex (CETAF)
  • Robin Drinkwater (RBGE, Edinburgh)

Agenda

------------------------------------------------------------ Day 1 - April 7 ------------------------------------------------------------
12:00 - 13:00 Arrival and lunch
13:00 - 13:30 Welcome, introduction of participants, agenda, logistics
13:30 - 14:00 The new ISTC Wiki and future ISTC documentation
14:00 - 15:15 CMS Subgroup
- Introduction of the new subgroup
- Presentations of CMS-related activities in CETAF collections
* Frank Theeten
* Maarten Trekels
* Stefan Seifert
* Jonas Grieb
* Dominik Röpert
- Discussion of possible activities of the CMS subgroup and structuring of the collaboration
15:15 - 15:45 Coffee break
15:45 - 17:00 AI, Robotics and Digital Collections Subgroup
- Introduction of the new subgroup
- Examples of currently ongoing initiatives
- Discussion of possible activities of the AI subgroup and structuring of the collaboration
17:00 - 17:30 Cancelled: Establishment of a new Center for Knowledge Literacy and Biodiversity Informatics at the LIB (Peter Grobe)
17:30 End of day one
19:00 - Dinner at Tuscolo, Gerhard-von-Are-Straße 8, 53111 Bonn (self paid)
------------------------------------------------------------ Day 2 - April 8 ------------------------------------------------------------
9:00 - 10:00 Type specimen initiative
10:00 - 11:00 Guided tour
11:00 - 11:20 Coffee break
11:20 - 12:20 Misc presentations (projects, initiatives, ideas, ...):
- Indexing of research expeditions and linking to semantic entities (Sabine von Mering)
- Cancelled: Enhancing geological specimen data (Laura Tilley)
- Update on MAS development in DiSSCo (Wouter Addink)
- Exsiccata series in botanical and mycological collections: Storytelling with IndExs, demo (Dagmar Triebel)
12:20 - 12:45 Funding opportunities
12:45 - 13:00 AOB
13:00 End of meeting (and packet lunch)

Minutes

Day 1

The new ISTC Wiki and future ISTC documentation

CMS Subgroup

  • Presentations are linked in the agenda
  • Question to Maarten: Can the tender for their new system be published? Answer: Tender contains some confidential data, but once this has been cleared, it can be shared.

AI, Robotics and Digital Collections Subgroup

  • Presentation is linked in the agenda
  • Presentation by Maarten:
    • DiSSCo Flanders: second phase with a stronger emphasis on specimen enrichment
    • Some work done at University Ghent
    • Common steps: straightening, segmenting, multi plant detection, Label OCR, Leaf extraction, identifications for specimen without any (or only rough)
    • Future ideas: morphological feature extraction
  • Laurent (digitization manager at University Liege): Automating Herbarium Sheet Data Extraction with AI
    • specimen QR-Code photographed with camera
    • directly read label information with Google Lens
    • separate the labels in fields in next step
  • Arianna: AI at the [Natural History] Museum [UK]
    • topics AI for Digitization / collection
      • Use Cases: General Knowledge Base, based on all information in the collection, trait recognition, insect digitization,
      • Robotic Arm for pinned objects: scan drawer, recognize pins, select pin for handling, move it to photography station, take images of insect from multiple angles and label
    • NHM AI Lab (also for general science questions),
      • E.g. Classifications for animal hair or diatoms,
      • Outreach via AI Coffee Lab or Hybrid Lunches, collaboration with the Turing Institute and UCL
    • DCMS x NHM AI Pilot program (UK Department for Culture, Media and Sport)
      • People could propose projects, 5 selected.
      • Topics: herbarium: chat bot that answers from the perspective of a known collector
      • Segmentation and part
      • Data extraction based on old index cards
      • AI Alt text Generation
      • Predict species of reptiles based on confiscated fashion products: is the product illegal or not
  • General Discussion/Next Steps (sorted by the questions suggested by Arianna)
    • What is currently going on, in your institutions?
    • What are hurdles to AI techniques?
      • Questionable moralities of the companies behind the LLMs (Copyright?)
      • Power consumption
      • Local or OpenSource LLMs as an alternative?
    • How can robotics benefit digital collections?
    • How can CETAF facilitate this better?
      • Big wins come from scaling it up to millions of specimen instead of just a couple of ones.
      • Run tools on all of CETAF collections, e.g. search for the signature of a collector
      • DiSSCo Machine Annotation service could help to move the data out of silos
      • This group could help with comparison of the different approaches, quality checks. This could be done on a benchmark subset, but will require a lot of work for a high quality ground truth. Can the creation of this dataset be automated?
      • Maybe additional funding as a COST action? Relevance?
      • Infrastructure requirements: who does this
      • Use Case: Structure unstructured data, could be relevant for the GBIF Capacity Enhancement Program, WFO and PlantNet have a similar approach.
    • Mode on which the working group will operate
      • Many colleagues are working on it, but not in particular projects, general interest to continue.
      • How to keep in touch. Slack channel? Free channels will lose old posts.
      • Google Spreadsheet to have an overview who is working on what.
      • Github group
      • DiSSCo will have Hackathons for the Machine Annotation Service.
      • Goal: find a specific target/goal/project that allows everybody in work on it and shows the potential but it also small enough to be managable
      • Best practices for organizing a hackathon: there is one from the Biohackathon community.

Day 2

Type Specimen Initiative

  • additional detailed notes in the Google Doc that is linked in the agenda
  • Sabine: Wikidata and Research Conference in Florence in June,
    • Highlight the importance of the type specimen collections, Compile a list of type cataloges in Wikidata. Multilinguality is a strong advantage here, connected to different external identifiers. 300+ already imported. Please send Sabine other examples of other published type catalogues.
  • Nicky: maybe publish list also to Zetoro. Possible to do this via API.
  • There is a strong focus on zoology, but there are some from botany as well.
  • Also look at extra list of typifications from other publications, e.g. Wildenowia.
  • There is a discussion in the wikidata community, on whether all of the type specimens should be in Wikidata. There are some in there already. Alternative would be to have a dedicated place.
  • TRE would be one of the places. Type specimens from GBIF connected to their names from IPNI. The alternative approach would be to start from literature and try to find the cooresponding specimen. The TRE data model is a bit different from Wikidata. Not all institutions properly export their type records properly to GBIF, in particular via JSTOR.
  • Typfied Name is suggested to DwC, but it has been long in discussion. With that it would be a lot nicer to share type data.
  • Hosted Portal with a subset from GBIF of all of the specimen that claim to be types.
  • Also look at the registries, e.g. for mycology, which are their precisely for that reason. Different fields have different practices: registration,
  • In particular it is difficult to find lectotypes.
  • There are inconsistencies with how type information should be expressed in DwC. GBIF interprets it differently.
  • The TRE is a useable approach to demonstrate the data model, importing all type specimen into Wikidata could be possible.
  • While it might look, like the information about types is static, there is some changes, e.g. holotypes become lectotypes. Wikidata could handle those changes, as well as contradicting information. Synchronization is still an issue, there are a lot of data changes in GBIF for type data for example.
  • There could still be a use case for keeping a dedicated Wikibase for the type specimens and just interlinking it with Wikidata via links and federated queries.
  • The new version of the Taxonomic Concept Standard is up for review at TDWG, this might change the way that taxon names are shared with GBIF.
  • Next steps: Which other infrastructures might be interested in a type catalogue? DiSSCo or GBIF. Proof-of-Concept will be delivered in TETTRIs, so the next further steps could be build in TETTRIs Next.

Other Presentations

  • Indexing of research expeditions and linking to semantic entities (Sabine)
    • Proposed as a group in TDWG a couple of years ago, is now a Task Group in TDWG
    • Goal: get identifiers for expeditions, historic and contemporary
    • Compile basic metadata: time, participants, …
    • Link to related external entities
    • Possibly an extension to DwC in the future
    • Identified several applicable wikidata properties
    • Issues with colonial historic background, e.g. outdated country names
    • Tool to visualize journeys using Wikidata and GBIF: https://www.expeditia.info/
    • 24. April: TDWG Working Group Session on Research Expeditions
    • Discussion:
      • RAiD Identifier: https://raid.org/ Research Activity Identifier (doesn’t have a Wikidata Property yet)
      • Is the term expedition a bit outdated? Expedition might imply trips to undiscovered or under developed regions. Should it be a more generic Research Event?
      • Could this be expressed using Latimer Core, as a virtual collection of specimens that have the expedition in common and it would allow for estimates how many
  • DiSSCo Machine Annotation Services (MAS), Wouter
    • Presentation
    • Annotation Motivations: Commenting, Adding, Editing, Assessing, Deleting,
    • Target Types: Term, Class, Region of Interest (in images)
    • Existing services can be adopted to DiSSCo Data Model using Wrappers
    • MAS Service Providers need to: Create a MAS Service and test in sandbox, get code reviewed, Service Delivery Plan and SLA
    • Current Test Cases: Plant Organ Detection, AI4Labels,
    • Recently: MAS Hackathon, a lot of participants, 3 new services were created.
    • INDEED Tool to train quality model to judge the parts of AI based Herbarium Sheet segmentation
    • Limitations: Images are not hosted by DiSSCo only the links, a centralized storage would make processing a log easier.
    • Quality Control is important.
  • Index of Exsiccatae (Dagmar)
    • http://indexs.botanischestaatssammlung.de/
    • Detailed documentation at Storytelling with IndExs
    • More than 2.400 series of duplicate specimens with printed numbered labels, the eldest once in bounded volumes, but regularily integrated in the general collections of herbaria
    • Database great for storytelling about exsiccatal series (duplicate specimen series) in herbaria, e.g. towards EU funders to point on the existing international network of collections
    • This could be a different use-case for Latimer Core.

Any Other Business

  • Anton will send around a draft of the new charter
  • Subgroups will meet and take all of the input from this meeting in consideration.

Logistics

Hotels

  • ACHAT Sternhotel Bonn, Markt 8, 53111 Bonn, 90€
  • IntercityHotel Bonn, Quantiusstraße 22, 53115 Bonn, 100€
  • Hotel Motel One, Am Hauptbahnhof 12, and Berliner Freiheit 36, 53111 Bonn, 100€
  • Hotel Kurfürstenhof, Baumschulallee 20, 53115 Bonn, 90€
  • Beethoven Hotel Dreesen, Bonngasse 17, Zentrum, 53111 Bonn, 90€
  • My Südstadt Bonn, Kaiserstraße 221, 53113 Bonn, 90€> (very close to Museum Koenig, but rooms are very noisy towards Kaiserstraße)
  • For those who like it unusual: BaseCamp Hostel Bonn, In d. Raste 1, 53129 Bonn, 70€

Hotels on a map: https://maps.app.goo.gl/c6P5W2taKKTZD8HJ6