ABCD Schema - Task Group on
Access to Biological Collection Data

A joint CODATA and TDWG initiative supported by GBIF


2nd Workshop, Sydney, November 7-8, 2001

Venue: Royal Botanic Garden, Sydney, in the framework of the Biodiversity Knowledge Management Forum conference in Sydney. 

Attendants: Nicolas Bailly, Walter G. Berendsohn, Stanley Blum, Alex R. Chapman, Barry J. Conn, Charles Copp, Jim Croft, Marc Geoffroy, Philip Gleeson, David G. Green, Stinger Guala, Anton Güntsch,  Norman F. Johnson, Yde de Jong, Ross Mathews, Robert A. Morris, Ben Richardson, Adrian Rissone, Sabine Roscher, P. J. Schwartz, Kerstin Teske, David Vieglais, Greg Whitbread 

Apologies received: Lois Blaine, Kurt Bollacker, Raul Jimenez Rosenberg, Rudolf May, Derek Munro, Paula Ross Huddleston, Hideaki Sugawara, Neil Thomson, John Wieczorek 

Meeting Report

[A report written by Charles Copp for the ENHSIN group is available via http://www.nhm.ac.uk/science/rco/enhsin/Xmlreport.doc (MS Word document)] 

The purpose of this meeting was to:

Initial business

Subgroup on content definition 

Schema development so far

Charles Copp presented an initial schema he developed within the last three weeks funded by ENHSIN (NHM) (http://url.des.schemas). The schema is mainly a conversion of the DTD produced by first workshop in Santa Barbara (http://www.bgbm.org/tdwg/codata/SBWorkshop.htm) extended by elements from the BioCISE information model (http://www.bgbm.org/biodivinf/docs/CollectionModel/) and the British NBN/Recorder model . Charles pointed out that data transfer efficiency could be improved in many cases if the GatheringEvent were the root concept of a hierarchical data structure (with CollectionUnits as children of the GatheringEvent), either as an alternative to or instead of the structure that uses the CollectionUnit as the root concept. This would obviate the need to transfer redundant data in cases where many specimens were collected in the same gathering event. Nevertheless, the group decided to stay with the structure that uses CollecitonUnit as the root concept for two reasons:
(1) efficiency is not an important design goal of a semantic standard (clarity, universality, completeness, and simplicity, for example, should be given higher priorities); 
(2) collection databases implemented as flat data structures (a large number) won't easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore won't be able to participate in a federation based on this alternative. (We think it will be easier for systems based on a hierarchical structure to export a flat version of their contents.)

Additional elements

Several additional elements were noted and will be included in the next version of the schema.

Future work on the schema

From now, the schema will evolve through the work of exports on specific "subschemas" such as botanical names or  geography. To do so, the schema will be published and maintained on the BGBM server in a way that makes it understandable for less "XML experienced" experts. Element definitions will consist of the following data items:

A turnaround of 30 days after Request for Comment considered to be appropriate. The group agreed on the following rules for the future development of the schema:

Bob Morris offered to revise the schema according to the fulfillment of syntactic requirements.

Sub schemas and coordinators identified so far:

Zoological names Yde de Jong
Botanical names Walter Berendsohn
Identifications Walter Berendsohn
Bacteriology Lois Blaine (to be confirmed)
Geography Sabine Roscher (to be confirmed)
Measurements Charles Copp

It was not decided whether or not mineralogy should be included in the schema. To provide a "slot" for minerals, the schema should include an abstract type for mineralogy. Walter Berendsohn will approach experts for the revision of schema parts not yet identified.

Subgroup on Protocol Development

Progress on a reference implementation of the protocol

Stan Blum, PJ Swartz (portal software) and Dave Vieglas (provider software) introduced the development of DIGIR (Distributed Generic Information Retrieval), an open source reference implementation (http://digir.sourceforge.net/) for the query protocol being developed by the "grand scheme" subgroup. A design objective in the current work is to decouple the protocol, software, and semantics. One benefit of this decoupling would be to make it easier to evolve or version the federation schema. As long as a stable schema for collection data is not available, the system will use the Darwin Core Version 2 as a simple example. If providers of collection data register their service on a UDDI server (http://www.uddi.org/), they will not have to inform each portal individually. Portals can poll the UDDI server periodically to discover new providers. Protocol compliant queries will be transmitted to data providers as XML documents. Result sets will also be returned to portals as XML documents. The reference architecture will probably require each provider to publish a "meta data" data set (i.e., collection level data) that will help intelligent portals determine which providers need to be queried to answer a particular user request. The content of meta data items will be discussed on the basis of results of the BioCISE project (http://www.bgbm.org/BioCISE).

Acknowledgements:

The organizers gratefully acknowledge support from the following organizations:


Working Group Homepage | TDWG Accessions Subgroup Homepage | CODATA | TDWG 

Page hosted by the Department of Biodiversity Informatics and Laboratories of the Botanic Garden and Botanical Museum Berlin-Dahlem. DISCLAIMER  
Page editor: Walter Berendsohn (w.berendsohn [at] bgbm.org).

This page last updated: 06.03.2005