Resource Identification for a Biological Collection Information Service in Europe
Results of the Concerted Action Project

[Contents] [BioCISE HomeThe Survey | Collection catalogue | Software | Standards and Models]

Users and uses of biological collections

Birgit Felinks, Andrea Hahn, Linda Olsvig-Whittaker and Wouter Los

Pp. 20-32 in: Berendsohn, W. G. (ed.), Resource Identification for a Biological Collection Information Service in Europe (BioCISE). - Botanic Garden and Botanical Museum Berlin-Dahlem, Dept. of Biodiversity Informatics.

Introduction

Throughout the project, BioCISE collected information on users and potential users of biological collections. Because of the broad, interdisciplinary approach of the project the identification of user groups and user interests had to follow an informal procedure - BioCISE used meetings, discussions and interviews rather than questionnaires to gather information. The needs of users, especially of those from the environmental sector have been in the focus of two workshops. Various interviews and discussions were conducted during national and international ecological congresses. Most helpful have also been responses from Taxacom and ZBIG-L e-mail lists to a query asking for "most useful queries" to a common collection access system (Beach 1998). The broad approach of BioCISE, originally based on the analysis of the underlying information structures of biological collections, was justified once more by the results with respect to user communities: users are only rarely interested in a single category of collections.

To provide a framework for the results, the project focussed on the following questions: Can we categorise users into broad groups according to their primary interests in collection information? Can we categorize uses of collections? What are the common denominators among different user groups and different uses of collections, i.e. what are the most important tasks a collection information service should tackle?

Four major groups of users may be distinguished:

Uses of biological collections fall into the following categories:

This introduction already makes clear that many different communities can use collection information in variable ways. "Traditional means of access, - personal visits to collections and long diligent searches of paper records - guarantee that most [potential users of biological collections] will find other means of acquiring the information they need, or act with insufficient information" (ASC 1993). This means that they will not utilise the primary sources, collection objects, but sources like secondary literature or "expert opinion", perpetuating earlier, sometimes erroneous interpretations instead of adding to knowledge (OECD 1999).

Knowledge of particular holdings of a collection (e.g. of algal strains) may be quite common within a certain interest group (e.g. systematic phycologists). However, this does not mean that the hydrobiologist who looks into the causes for a heavy increase in biogenic toxin concentration in the local pond is aware of the existence of a reference collection to identify the culprit, and possibly learn more about it. That information may already be available, perhaps even in an on-line database. A common access system should help to access information either by pointing to relevant resources or - better still - by directly providing the needed information. It is obviously important to consider user needs from the beginning in the design process of such a system. The four identified user groups shall be introduced in more detail, and an attempt at summarising their needs and wishes will be made. Some of those may seem obvious; not all of them will be easily satisfied.

Major categories of users

Users from the fields of biotechnological research and industry

Biotechnology. Research areas where collections are used include pharmaceutics, natural products research, medical research on pathogens, vectors, tissue cultures, plant and animal breeding (including genetic engineering), research on plant pathogens and host-parasite relationships, and weed research. Collections here serve mainly to provide raw material: plant or other material either for direct use, or to serve as the research object in chemical compound analysis and synthesis. Less frequently, collections are directly used as a reference for identification by comparison (e.g. seed collections in weed research or forestry), a use that is normally mediated by systematists (see below). In view of discussions on the "intellectual property rights" on active substances isolated from indigenous plants (Gollin 1999), and considering more and more restrictions to the exploitation of wild plants, research on cultivated material will increase in importance. A further field of application is that of test organisms from different groups (test animals, cell cultures, etc.) for the indication of potential toxicity effects of products like medicines or cosmetics, or for bioassay indication for waste products in routine applications such as the screening of water quality.

Consequently, information requested is usually to enable the user to access physical objects of a certain kind, e.g. belonging to a taxon or microbial strain with specific properties. The system should supply addresses and links to Internet sites with on-line databases or provide direct access to catalogues of commercial products. All data should be accessible fast, be regularly updated and presented in a unified form, and the materials from different sources should adhere to common quality standards. In pharmaceutics, the need for an effective way to extract data from a variety of different database systems has led to a move to link databases by drug firms by establishing a common standard (Williams 1997). The CABRI (Common Access to Biological Resources and Information) service is a working system providing such an access to commercial collection materials, such as microbial strains and tissue cultures (see Chapter VI).

Users from biological systematics and university research

Systematic research and taxonomy play a key role in the access to all organism-related data. Research institutes in this field provide the basis for information exchange in all the other fields discussed here by providing a common reference system (scientific names), keeping track of the history of name changes, by enforcing the preservation of voucher specimens for the reference system (nomenclatural types) and by offering long-term preservation of vouchers in natural history collections. Systematic research is mainly conducted in systematics departments of universities and natural history museums, but often it is also an integral part of other biodiversity related activities, for example when organism inventories (checklists) are compiled in ecology. As the common reference system, organism names link the results from different areas of biological research, such as molecular sequence data or ecological observations.

Taxonomy is based on collection objects, and collection objects supplement taxonomic data as a reference system of vouchers. Nomenclature in zoology and botany depends on type specimens to preserve the original objects to which names are attached. Living objects are used to examine the variability of organisms.

Systematists often request highly specific collection information, e.g. on the presence of a single specimen in the collection (one that has been cited in a publication, for example). For taxonomic revisions - the treatment of a specific group of organisms - loans of specimens belonging to the taxon under investigation are requested for further examination. Historic collections are often of particular importance because the concept of the taxon a previous investigator formed is becoming clear by examining the material seen at that time. In addition, in larger collections it may include hitherto unrecognised type specimens. Data on collectors and the date of collection may be valuable to pre-select specimens, as the data on the collection site may be, because most revisions are restricted in their geographical scope. The name of the identifier may give an indication of the reliability of the identification to the informed user. A picture of the specimen or its label can in most groups of organism provide valuable (original!) information in a preliminary phase of research, to decide if the specimen may be of interest for closer examination. Experienced specialists in specific groups are scarce; this already constitutes the "taxonomic impediment" in biodiversity research (ABRS 1998). A collection information service should thus aid systematists to gain access to needed research materials with as little as possible waste of precious time. In appropriate groups, digitised images may also aid in the training of a new generation of taxonomists who are able to recognize taxa by looking at specimens or the organism itself.

Research in molecular systematics usually relies on a supply of fresh material of secure identification. Ideally, material from the type locality of the relevant (name giving) species of the group should be investigated, or at least the type species of the examined group. Since this can rarely be achieved, the quality of the identification and the selection of the material attained using "traditional" taxonomic techniques, become even more important.

Users from the environmental sector

Environmental users belong to three main categories: (i) the domain of public service observation, monitoring and preservation of the environment, including related administration; (ii) commercial services (landscape planning, monitoring and mapping projects, and impact assessment studies); and (iii) the sector of environmental education (teachers, public relations staff, and officials in contact with the general public).

Public services and administration. Users from the sectors of public services, science, and administration in environmental observation, monitoring, and preservation belong to, e.g., environment agencies on different national to supra-national levels, commissions concerned with nature protection (e.g. the IUCN Species Survival Commission for the protection of endangered species), nature reserves and national parks. They carry responsibility for nature conservation, are concerned with ecological research in universities and other projects (e.g. the long time ecological research network - LTER 1999), and work in administrative functions responsible for decisions regarding environmental protection. The scope of their work and responsibilities reaches from basic research to legislation, and from global scale (as in global change research) to concerns of an individual nature reserve.

What is expected from them (and for which to deliver they expect support from collection information systems) are background research results on the functioning of ecosystems and long-term as well as short-term ecological phenomena; the provision and publication of data and information on, e.g., cumulative effects of human activities and natural processes on the environment; the identification and effective protection of areas with high conservation value and of endangered species; and the fulfilment of legal obligations like those arising from the Rio Convention, compelling all signing states to provide information on specimens stored in museums or records held by institutions specialised on observations.

These users need data: as the base for their own information evaluation and publication. They need to present suitably processed information to inform the general public. They need data in the form of digitised species lists and databased inventories in order to outline areas with high conservation value and to answer specific questions (for example, for the distribution and occurrence of endemic, endangered, or invasive species). In species protection, they need as much species related data as possible to establish successful management plans and protection measures. Research on and understanding of long-term ecological phenomena across national and regional boundaries, contributing to the comparability of ecological information and indirect measurements of environmental changes, requires access to complete time series covering a vast spatial range to enable effective pattern analysis.

Commercial services. Like the public services concerned with the environment at large, enterprises working in this field usually are suppliers as well as users of collection information. As private or public companies, they work in landscape planning, in monitoring and mapping projects, and as consultants in impact assessment studies. While contributing to the data "pool" by own field studies in principle, evaluation results and decisions often have to be reached fast: Contracts for impact assessments usually require almost immediate results, and there may be no time for collecting comprehensive field data by surveys and research. In this case, the professional has to rely on already published data.

Environmental education. The term "education" is here understood to include teachers at schools and universities, but also, in a broader sense, PR staff of environmental agencies and nature reserves or, e.g., rangers in national parks who are in contact with visitors. Their task, in general, is to rise public awareness of environmental and nature conservation issues at different levels, and to generate an interest in and better acceptance of environmental goals in the public.

What they require, above all, are the tools to provide motivation: Comprehensible, pre-processed data integrated with new electronic media, pictures and animated visualisations, demonstration objects, and the general possibility to involve their clientele in inter-active processes like, e.g., the participation of school classes in an internet-based observation project.

In summary, users from the environmental sector want access to a completely digitized and databased inventory of biological object collections as well as survey data. Data should be unit-level if possible, though information describing collection contents and pointing the way to additional sources is highly appreciated where unit-level data are not available. The presentation of search results should provide the user with choices regarding the level of detail he wants to review, should offer means to flexibly switch between different visualization modes, and enable the user to download datasets into his own files and applications.
All three categories of users from the environmental sector have in common that - when querying a database - they primarily focus on the geo-ecological, the temporal, and the taxonomic domain. Key questions users from the environmental sector want to pose to a database address the "where", "when", "what", "who", and "how many" of observations and collection objects. Discussions during the user workshops showed that access to information by a defined location is often top priority. Primary access questions for users from the environmental domain to query a collection database are geo-ecological and temporal criteria, while taxonomy is often used as a kind of "connector" to add organism-related information.

Users from education and entertainment

The "general public" forms a very heterogeneous group. It combines pupils and teachers from schools and other educational institutions (high schools, universities, adult education), visitors of museums, zoological and botanical gardens, "interested laypersons" and hobby researchers, but also customers of commercial services. In collections, they expect to find recreation value (a botanical garden to jog in, going to looking at the exhibitions of the natural history museum), demonstration objects (especially in education, using living organisms as well as preserved specimens), or a selection of goods for sale (nurseries or seed catalogue). In education, collections largely serve to provide motivation and visualize the subject. Projects stimulating observations, e.g. taking note of the first song of a blackbird heard in spring, or the first fruit of an acorn found, also lead to a closer scrutiny and increased use of observational data already available - see, for example, the project Nature Detectives ("Naturdetektive", Freiberg 1999) of the German Clearing House Mechanism. Easy access to quantities of high quality (microscopic) images, video clips and audiovisual data can also help the student to become familiar with basic structures and biological processes (Schumann 1998). It should be added that "Biology with its combination of different concepts, different kinds of information, and 'attractive' pictorial information provides an ideal experimental platform to develop new learning tools" (Schalk & Los 1998).

For a number of users, the key information is access information: the visiting address of a zoological garden, the address of where to send the order or request a catalogue, or where to find a contact person for a specific question. Descriptive information on collection holdings is equally important, for example commercial catalogues of rose plants. Media sources (pictures, sounds) are also in high demand, preferably being accessible via the Internet.

Access to biological collection information

In this section, we will detail the main criteria by which users want to search a database, and the manner in which they would like to pose their query.

Requests by geo-ecological criteria

With some exceptions, specimens and observations in biological collections carry information as to their provenance, i.e. where the unit was originally collected or observed. Spatial search criteria cover geographical aspects ("Mediterranean coast"; "the Pyrenees"), political entities ("Italy"; "Burgundy", "Sevilla") and ecological designations ("moorlands", "deciduous woods").

Access tools for a database quest are inter-active maps and lists of location names, combined with free-text searches. Map access in particular is most attractive. The possibilities to select an area by "clicking" or circling it on the computer screen, by selecting co-ordinates or positions in grid or polygons, by entering point data and specifying a radius, by zooming in, and by the selection from a subset after entering habitat specifications ("heather") are an intuitive as well as effective ways of searching by geo-ecological criteria. Supplementary lists of location and habitat names (detailing, e.g., country, province, state, floristic regions, nature reserves, parks, aquatic and terrestrial habitats) allow searches in cases where exact locations are less obvious on a map. They offer a pre-selection to choose from, and ideally also visualize the corresponding locations on a map. An additional free text search provides a means for less foreseeable quests and more detailed specifications, also comparing them to the thesaurus data used for the lists to suggest or use cross-links where possible.

Access to collection information by means of geo-ecological criteria is clearly most prevalent for the users from the environmental sector. Systematists working on the flora or fauna of a specific region, or investigating the geographical distribution or ecological range of a species also use it. Educational and general public use is often focused on the local resources. For biotechnology, locality data are only important in the context of gaining access to materials (e.g. natural substances), where gathering the material in the wild is the only means to obtain it.

Requests for a defined time period or time sequence

Temporal criteria consist of roughly four categories: "the present", "the (near) past", historical, and pre-historical. Questions for the present ask for up-to-date records, mainly of observation data, and possibly on vouchers: What is there to be had at present? What is the present state of knowledge? The past, further subdivided into the "near" past or short-term (up to about 20 years), and long-term (~ 100 years), is mainly addressed for retrieving time series: What observations have been made, what specimens collected over the past 20 / the last 100 years? Quests along the historical timescale ask, e.g., for herbarium specimens or natural history collection objects collected within a given period, while pre-historic criteria concern material of (sub-) fossil origin.

Correspondingly, specifying a time period is the most common approach. A user may ask what records are available from 1980 to 1990. Other approaches include asking for an exact date, specifying an end or a start date ("give me all entries before May, 1920" or, in its extreme form, "show me the first / the last record ever made"), and requesting a seasonal time series ("all records ever made in June"). Time-related questions therefore ask for a much higher degree of "tolerance" on the part of the user interface than might be expected. The interface must be able to handle specific dates and periods defined by specific start and end dates, but also be able to process requests just stating a month for seasonal series, or allow searches for "the most recent" entry without any date specified at all.

Requests for a given taxon

Taxon names are often regarded as some kind of "handle" used to retrieve organism related information - the taxonomic details being of secondary importance. Queries are posed either by the scientific name of an organism, or by its trivial name. Preferred ways of access are selecting a taxon name from a menu, browsing through a hierarchy, or entering a name in a free-text search. Most common queries are for names on hierarchical levels from genus level downwards (genus, species, subspecies), although questions for higher ranks (e.g. family) are not unusual. For users in contexts like education or environmental campaigning, enhancing the hierarchical and list selection search by offering pictures of representative members for the selected group would further contribute to a useful search interface.

In general, entering a search term in any of the three specified ways (list selection, hierarchical browsing, free-text search) should hide taxonomic details from the user as much as possible. The main objective is "to use the taxonomic framework to establish a connection with the available knowledge accurately accessed under the 'correct' species. This means that the users in applied biology prefer the taxonomic indexing to be completely invisible and yet perfect in its functioning, comprising automated synonymic indexing as well as precise handling of the ambiguities caused by homonyms, misapplied names and pro-parte synonyms" (Bisby 1998). In contrast, a user from systematics will be interested in details such as the identification history of a specific specimen.

Additional search criteria

In addition to these three main access criteria in searches on biological databases, there are some others deserving mention. First of all, the name of the collector or observer appears as a search criterion: What objects are there collected by xyz? Where has xyz collected, when have expeditions taken place? Incomplete specimen data, especially in historic specimens, can often be reconstructed by such questions. Systematists looking for type material often request a specific specimen, identified by collector and collection number. A second regular type of question addresses organism characteristics: What data on host-parasite relationships are available? What observations / specimens are there of flowering stages? of larval stages? However, information requests may also refer to a specific institution: What does it have, is it worth a visit? The location of the institution may also be important - what natural history collections can I find in Edinburgh?

Combined search criteria

In a real-world information system working with large data resources, users are normally posing their questions as a combination of criteria. Typical combined requests in collection database searches are:

 

Figure 3: Access to collection information in the environmental sector

Desired output

The following sections will specify the users' expectations as to what the database should deliver, and in which form. What kind of information do they want to retrieve (content) and in which form should this information be delivered (presentation)? Once more, the account somewhat focuses on users from the environmental sector, as these seem to have the most complex requirements.

Data content

Answers to the question of "what kind of data do users want to get" are as diverse as the user groups, and differ with the motivation the user had to pose the query in the first place. They may best be typified by breaking them down from the most general to the most specific:

In the previous two points, taxon-related information may be of a general nature, stopping at genus level or summarizing on the rank of order, family or even higher taxa ("plants, animals, and micro-organisms"). In this case, though, at least an indication of the associated species numbers is requested as a measure of biodiversity. In the following categories, the focus changes to species level or below (e.g. population level in research on genetic resources).

In contrast to the previous categories and like the following one, the request for qualitative characteristics is expecting specimen information rather than the summaries on species level or above, which are associated with the more general requests.

Data quality

Three points are top on the list of user's wishes regarding the data they receive - up-to-date, high quality, complete. These demands appear as justified as they seem obvious. In environmental research, a constantly updated base of data, especially of observations, is most desirable, provided the data are also reliable. "Accurate with respect to geocode validation and taxonomical identification; complete on a variety of temporal and spatial scales" (Alkin 1998), such data could form a most useful base for further research as well as in more administrative functions.

Aspects of data reliability and "correctness" include a demand for markers of data quality and a proper treatment of possible inconsistencies arising from the merging of different datasets. Users want access to all information they need to assess the quality of a given dataset, while on the other hand they wish not to be swamped with details. A database system should offer the possibility to trace the history of a record, including information on the motive of the gathering (scientific research, environmental impact assessment, hobby collection), the collector's name and field number, the identifier's name, and information on procedures of data evaluation and revision. All of these make it easier for the user to judge the reliability of the data (Diederich et al. 1998). In cases of merged datasets from different sources, inconsistencies or differences between the datasets must be detected and resolved (Alkin 1998), less the resulting dataset loose data integrity and be far less useful. Should discrepancies arising in the process be irresolvable, the least demand is for a flagging of suspicious data to make the user aware of the fact.

User interface

The wish list. The query interface should be interactive and graphic as possible. The output should on the one hand be as detailed as possible, consist of primary data and be supplied with an additional measure of quality. On the other hand the presentation of results should provide users with a choice regarding the level of detail. A choice between visualization modes should be available, additional processing tools should be integrated, and export datasets in different formats should be available for local use.

Users ask for a multi-facetted user interface, allowing selection between several possibilities of query definition, which must include list selection, hierarchical browsing, and free-text search. Additional visual help in the process of query formulation, like map visualisation or pictures of organisms, is highly welcome or even, as in geographical contexts, a prerequisite.

Three priorities rule the users' wish list for data delivery: Direct, fast, and free of charge. Results from a query should be delivered by the queried access system in a compiled form, without compelling the user to collect results packages from different databases. Ideally, the information should be delivered electronically, promptly following sending off a query (via the Internet), and the service should be for free.

Where appropriate, the presentation of search results should have ample support through visualisations besides the supply of original data. Users ask for a highly interactive user interface, providing a choice of presentation modes. They should be able to switch between details and data quality information, summarise large datasets (e.g. provide indicative numbers of species for higher taxon map visualisations and still be able to access the single datasets on selection), and superimpose self-defined filters on a pre-selection (e.g., focus on ever narrowing time-frames). Additional wishes include statistical tools and data processing such as used in predictive mapping.
Initially, users want to be presented with a list or a map showing the distribution of collections and observations retrieved from a specific query. Since Geographic Information Systems (GIS) provide tools for the presentation of digital information in the form of maps to which layers of additional information may be superimposed, GIS applications are presented as examples for correlating collection and observation data with environmental variables. Many users look for uncomplicated ways to import query results (lists or maps) into their own spreadsheet files or databases to combine them with their own GIS application.

The requirements of users regarding query interface and result presentation are often contradictory - intuitive simplicity with a huge range of functionality. Design and implementation of a functional user interface, therefore, will pose a major challenge. The interface will have to offer different levels of access (e.g. simple and expert search) always trying to morph into a self-explanatory interface that is characterised by minimal usage of technical language.


© BioCISE Secretariat. Email: biocise@, FAX: +49 (30) 841729-55
Address: Botanischer Garten und Botanisches Museum Berlin-Dahlem (BGBM), Freie Universität Berlin, Königin-Luise-Str. 6-8, D-14195 Berlin, Germany