CDEFD

Introduction

Over the past decade, computers have become increasingly available and easy to use. As a result, many data banks of very different content and on very different scales have been constructed or are being planned which deal with - or refer to - data on organisms. Examples include international or local checklists of organisms, phytogeographic databases, or projects to computerize natural history collections. High-quality data are often found in data files resulting from personal research.

Recent developments in the field of computer networking now favour, at least in theory, an easy interconnection of different data banks with similar contents. A main problem, however, lies in the fact that even very similar data banks often are organized on the basis of very different data structures. This poses a major obstacle in the way of connecting several data banks into an efficient, information-providing network. Projects like the Species 2000 initiative (Bisby, this volume) have been proposed to overcome this difficulty by means of modern networking techniques. However, common data structures and standardization of data content are the most effective means to reach compatibility.

Models are needed which provide project-independent data structures to be used in the design of databases that include biological data. CDEFD ("A Common Datastructure for European Floristic Databases") is a concerted action project financed under the European Commission's third framework program which has set out to provide such models to the biological community and to database designers. The core of this work was formed by a detailed datamodel for botanical collections (including lichens and fungi), which was widened in scope to include other natural history collections and microbiological culture collections.

The present model is part of a larger model which tries to provide a unified view of biological information, including taxonomic, nomenclatural, ecological, bibliographical, and geographic components, as well as the results of studies (descriptors) in individual branches of biological sciences. For taxonomic data in botany, as well as for literature citations and the data area concerned with person names (author teams, collectors, etc.), CDEFD accepted the datamodel elaborated for the IOPI Global Plant Checklist (Berendsohn 1994) with minor changes, which have been incorporated in version 7.3 of that model.

The provided data structures are very complex, attempting to incorporate all available information into a single model. To fit the particular needs of a given data bank they can easily be modified and simplified. The model allows the designer to assess the consequences of the simplification process, particularly in regard to restraints on future extensions of information content and possible incompatibilities with other databases. The complex model thus provides a reference tool for the planning of specific databases. In addition, the model supplies guidelines for the definition of data fields and thereby provides a base for the discussion of data standards.

One of the main problems encountered by CDEFD was the immense diversity of data found to be connected to floristics, botany and biology in general. Consequently, one of the principal tasks was to extract general structures and to construct a framework for specific data. The discussion was complicated by the fact that several categories have to be considered in parallel:

In an attempt to subdivide the information, the following large-scale data areas were identified:

CDEFD tries to provide a complete model of unit data for collections of any scale. Place of storage and administration is completely covered. Examples from microbiological and herbarium collections are presented to cover the aspect of definity. Field data are also analysed in detail, and in this context the question of substance (observation vs. collection) is treated. However, geographical and ecological collection site data proved too extensive to be covered comprehensively, and the data collected in the form of field descriptors vary strongly depending on the context of the collection so that only examples are provided. Examples for non-floristic descriptors and studies are provided by the models treating karyology (Berendsohn & al., in print) and secondary metabolites (Jakupovic & al., in prep.).


Definitions: Terminology, Data Structure Diagrams, Entity Relation Diagrams
Next; Previous; Contents; Entity list; References; Author information. Last updated: April 29, 1996, wgb@zedat.fu-Berlin.de