Resource Identification for a Biological Collection Information Service in Europe
Results of the Concerted Action Project

[Contents] [BioCISE HomeThe Survey | Collection catalogue | Software | Standards and Models]

The complexity of collection information

Walter G. Berendsohn and Pier Luigi Nimis

Pp. 13-18  in: Berendsohn, W. G. (ed.), Resource Identification for a Biological Collection Information Service in Europe (BioCISE). - Botanic Garden and Botanical Museum Berlin-Dahlem, Dept. of Biodiversity Informatics.

Living materials for biotechnology such as microbial strains address a number of industrial and high priority research areas. Therefore, computerization of catalogues and standardization in these collections is much further advanced as compared to other types of collections (see Chapter VI). But also non-commercial living collections are usually further advanced in their computerized management than natural history collections - simply because inventories are not static and must be managed, and also because the number of units for example in a botanical garden is usually much smaller than in a herbarium.

Entry of materials into natural history collections is more of a museal task. Once the appropriate conditions for storage are achieved, the material documents itself; information regarding the specimen is often stored with the material. Access to the information traditionally meant accessing the material. The aim of a collection information service is on the one hand to help to localise appropriate materials, on the other hand to uncouple at least some of the needed information from the object itself.

“Units” in the BioCISE Model

Any object containing, being, or being part of a living, petrified, or conserved organism is considered a unit as soon as it appears in the system. The unit may be gathered (observed or collected) in the field and derived units may recursively emerge from it through specimen processing, breeding or cultivation. In addition, Units may form Associations (e.g. host/parasite), Ensembles (lichen on a rock with fossils), and Assemblages (herd, artificial grouping). Gathering events, specimen management (acquisition, accession, storage, preservation, exchange, ownership), and taxonomic or other identifications relate to the Unit.
The term “specimen” can often be used as a synonym of unit, however, it lacks a precise, context independent definition, it is often perceived in a narrower sense compared to the unit, but as the example shows, a single specimen can represent a number of units.

To reach the latter aim, the information structures present in biological collection units were to be investigated and several information models were published. The first information model becoming widely available was the result of a workshop of the Association of Systematics Collections in the United States, the so-called ASC model (ASC 1993). CDEFD and finally the BioCISE model (Berendsohn et al. 1999a) are later and broader approaches. The latter is thought of as a reference model, i.e. a tool to interpret data rather than a model that should be turned directly into a database.

We have used a set of units represented by a specimen in the lichen collection of the herbarium in Trieste (TSB) to illustrate the degree of complexity which (at least in some cases) has to be documented in natural history collections, and to illustrate how that information is handled in the BioCISE model.

The example

The envelope (fig. 1a, 224 KB) is part of the Lichenes Selecti Exsiccati, distributed by Prof. Antonin Vezda of the Botanical Institute of the Czech Academy of Sciences in Brno. The original material was collected by P.L. Nimis, J. Poelt and A. Vezda in Sicily, and was brought to Brno by the last collector, for later distribution. Meanwhile, Poelt and Nimis studied the material, and decided that it belonged to a new species; the formal description was sent to Vezda, in order to have it on the exsiccatum itself. The holotype (i.e. the specimen the scientific name is based on) is the specimen conserved in Graz (GZU); all other specimens distributed within the exsiccatum are isotypes. The information printed on the original label (fig. 1b, 112 KB) is valid for all specimens distributed within the exsiccatum.

Further information was added on the label and inside the envelope during the history of this specific envelope. The history is as follows: GZU (Graz) received two pieces of the exsiccatum, both were accessioned. One was retained in GZU as the holotype, the other was sent to Trieste (TSB) as a gift.

In Trieste the sample received a new accession number, and was placed in the reference herbarium, which has a different location than the main herbarium. In Trieste, a lichenologist (M. Tretiach) made a note on the label, stating that the new name is probably a synonym. A chemical analysis of the secondary, accompanying species was carried out (by Nimis) and a parasite was discovered on the main species (by M. Castello). Nimis, furthermore, made a slide of the apothecia and a drawing of the spores, which were placed inside the envelope (left picture).

Later in this booklet (chapter IV) it will be shown that potential users of a collection information service will want to use several criteria to query for this specimen. Almost all the information stated here is relevant for research purposes: the presence of the different species determined at a defined locality at the time of gathering, the host-parasite relationship, the new name based on one of the lichens present in the specimen, etc.

Representation in the model

To be able to access this information efficiently in a database eventually representing holdings of many millions of specimens, this information has to be digitised in a highly structured and standardized form. Fig. 2 (474 KB) shows how the BioCISE information model accommodates all details given. As said before, this does not mean that the database must be organised along this line, but it should be possible to extract data represented in this information structure from it accurately by appropriate means.

The diagram depicts, in a rather simplified form, that the information contained in the specimen in Trieste represents a high number of units in the database, some of them only of an intermediate, "virtual" nature, some of them representing materials (the specimens or parts of the specimens) at hand in a collection. The single "Gathering or Field Unit" corresponds to the materials as found and taken in the field. It serves as an interface between the gathering's circumstances (the who, how, where, and when) and all "Derived Units" stemming from the material. With the identification of the material as belonging to two different species, Nimis and Poelt created two further "virtual" units, because by definition a unit has to be homogeneous as to its taxonomic identification. They formed an Ensemble, because they continued to be handled together. We can assume that Vezda, while creating 25 specimens from the original material, ensured that all of them contained both species, because he stated so on the label. In effect, he thus created 50 new derived units by sending the material to 25 herbaria around the world. One, designated as the holotype of Caloplaca thamnoblasta by Nimis and Poelt ended up in the Herbarium in Graz (GZU), of course still forming an Ensemble with the other species. We presently do not know if any further work was done on the specimen there, nor what happened with 23 of the other specimens. However, we do know that several additional "Derived Unit Creation Events" took place in Trieste: Tretiach's identification did not change the unit's circumscription nor content, so it remained unchanged. However, the identification of yet another species (a parasite) created a new Ensemble of three units, two of which (the host and the parasite) form an association. Two further events, the addition of a microscopic slide sample, and of a spore drawing brought the final number of units in the specimen to 5, its present state.

At first this looks like an enormous complication to the seemingly simple form of traditional data storage ("it's all there on the specimen!"). However, considering the possibilities of networking the results and retrieving all results obtained from the entire set of specimens already shows some of the benefits a highly structured system could bring about. To be realistic - we will not have this degree of detail for all collections anytime soon, if ever (see previous chapter). However, although biodiversity research has many aspects, one of its main information sources are collections. As will be shown in chapter IV, data to be used in research applications have to be of a high quality and exact description. The BioCISE model demonstrates that biological collections have a common information structure, which - if properly implemented - could enhance the value of collection information systems for present and future research applications. As will be shown in the last two chapters, one of the basic prerequisites for the functioning of a European collection service has to be its scalability, its ability to integrate highly heterogeneous degrees of data atomisation, structuring and quality control.


© BioCISE Secretariat. Email: biocise@, FAX: +49 (30) 841729-55
Address: Botanischer Garten und Botanisches Museum Berlin-Dahlem (BGBM), Freie Universität Berlin, Königin-Luise-Str. 6-8, D-14195 Berlin, Germany