Biodiversitätsinformatik / Biodiversity Informatics |
Choosing a formal language for the
|
R1. PT1 and PT2 are congruent
|
|
R2. PT1 is included in PT2
|
|
R3. PT1
includes PT2
|
|
R4. PT1 and PT2 overlap each other
|
|
R5. PT1 and PT2 exclude each other
|
|
The relationship between several Potential Taxa thus form an oriented graph, where the nodes are the Potential Taxa and the edges are formed by those pairs of Potential Taxa, for which the expert(s) assigned set relationships :
Suppose that there is a pool of connected Potential Taxa from different sources. Two different kind of queries about factual information are of interest for the user:
To which taxa (actually: Potential Taxa) does certain factual information apply?
Which factual information applies to certain taxon names (actually: Potential Taxa)?
The result should not depend on which Potential Taxon the factual information was originally linked to. For this purpose a rule system based on the Potential Taxon graph has to be developed. Moreover it should be possible to formulate flexible rules, which restrain the result (e.g. depending on factors such as an assessment of the expertise of sources or authors of relationships). As a result, users can be notified about qualitative aspects of the linkage between transmitted facts and the potential taxon they used at the start of their query.
There are four categories for the applicability of factual
information with respect to "their" Potential Taxon:
1) fully applicable, if the factual information applies to every element
of the taxon,
2) partially applicable, if the factual information applies only to a subset of
elements
of the taxon,
3) doubtful applicable, if the factual information may apply to some
elements
of the taxon and
4) not applicable, if the factual information does not apply to any
element of the taxon.
Suppose that some factual information is fully applicable for the potential taxon PT1. Taking into account the graph with its relationships there are at least three options for the quality of the factual information if transmitted to the potential taxon PT2:
fully applicable, if PT1 º PT2 or PT1 É PT2
partially applicable, if PT1 Ì PT2 or PT1 Å PT2
not applicable, if PT1 ! PT2
As shown, the quality of the factual information applying to PT2 depends on both the quality of the same factual information when applying to PT1 and on the set relationship between both of them.
In the graph it is evident that an edge does not exist for every pair of potential taxa, although a path (sequence of edges) between them might exist. In our example this is the case for PTi and PTk. Therefore there must be a rule, which calculates the resulting set relationship, when concatenating two contiguous edges with their respective set relationships. If e.g. Bij and Bjk are the set relationship „Ì“, then it is easy to see that the resulting relationship between PTi and PTk is also „Ì“ and hence fully applicable factual information to PTi is only partially applicable to PTk . Assume that Bij still remains „Ì“ but that Bjk is „Å“. Then it turns out that the resulting relationship between PTi and PTk is no longer unique. It could be „Ì“ or „Å“ or even "!". This forces the introduction of "combined" relationships and a corresponding extension of the rule. With this extended rule it is then possible to associate a unique "combined" relationship to each path in the graph.
Actually, two potential taxa can be connected in the graph through several paths. This is the case for PTi and PTl, because there is a "direct" path - the corresponding edge - and also an "indirect" path over PTj. A "combined" relationship is associated to each path. Additional rules must thus specify how the system has to be proceeded to obtain from such two "combined" relationships the resulting "combined" relationship. This leads at least to two alternative rules.
For each oriented relationship between two potential taxa PT1 and PT2 there exists a reverse oriented relationship between PT2 and PT1, which can be likewise defined by an appropriate rule. This results in altogether at least four different rules.
The quality of factual information when transmitted from an "original" PTo to a "target" PTt thus depends (i) on the Potential Taxon graph, or more precisely on all paths from PTo to PTt and on the oriented relationships that are assigned to the edges included in these paths and (ii) on the applicability of the factual data. Computing the quality of transmitted factual information is therefore based on:
algorithms that find all paths from PTo to PTt in an oriented graph
rules that assign to each path a relationship on the basis of the relationships corresponding to the included edges and which then assign a unique final relationship to the pair (PTo, PTt) based on all paths from PTo to PTt. This last relationship is used to compute the quality of the transmitted factual information.
a rule that combines the resulting relationship with the applicability of the factual information to arrive at a relevant result.
For the formal description of such a graph as well as for the algorithms and rules any higher programming language can be used. These rules do not need to be edited, since they do not depend on the specific contents of the included data. As an example we used Visual Basic to define a "relationship data type" as well as the above mentioned rules.
Definition of a datatype for "combined relationship"-objects:
Public Type Relationship
Congruent_to As Boolean
Is_included_in As Boolean
Includes As Boolean
Overlaps As Boolean
Excludes As Boolean
Doubtful As Boolean
End Type
Reversal rule for "combined relationships":
Public Function reverse(Rel1 As Relationship) As Relationship
reverse = Rel1
reverse.Is_included_in = Rel1.Includes
reverse.Includes = Rel1.Is_included_in
End Function
Unification rule for two "combined relationships" (strong agreement - intersection):
Public Function cons(Rel1 As Relationship, Rel2 As Relationship) As Relationship
If Rel1.Doubtful = Rel2.Doubtful Then
cons.Congruent_to = Rel1.Congruent_to And Rel2.Congruent_to
cons.Is_included_in = Rel1.Is_included_in And Rel2.Is_included_in
cons.Includes = Rel1.Includes And Rel2.Includes
cons.Overlaps = Rel1.Overlaps And Rel2.Overlaps
cons.Excludes = Rel1.Excludes And Rel2.Excludes
cons.Doubtful = Rel1.Doubtful
ElseIf Rel1.Doubtful = False Then
cons.Congruent_to = Rel1.Congruent_to
cons.Is_included_in = Rel1.Is_included_in
cons.Includes = Rel1.Includes
cons.Overlaps = Rel1.Overlaps
cons.Excludes = Rel1.Excludes
cons.Doubtful = Rel1.Doubtful
Else
cons.Congruent_to = Rel2.Congruent_to
cons.Is_included_in = Rel2.Is_included_in
cons.Includes = Rel2.Includes
cons.Overlaps = Rel2.Overlaps
cons.Excludes = Rel2.Excludes
cons.Doubtful = Rel2.Doubtful
End If
End Function
Unification rule for two "combined relationships" (weak agreement - union):
Public Function large_cons(Rel1 As Relationship, Rel2 As Relationship) As Relationship
large_cons.Congruent_to = Rel1.Congruent_to Or Rel2.Congruent_to
large_cons.Is_included_in = Rel1.Is_included_in Or Rel2.Is_included_in
large_cons.Includes = Rel1.Includes Or Rel2.Includes
large_cons.Overlaps = Rel1.Overlaps Or Rel2.Overlaps
large_cons.Excludes = Rel1.Excludes Or Rel2.Excludes
large_cons.Doubtful = Rel1.Doubtful Or Rel2.Doubtful
End Function
Concatenation rule for two contiguous "combined relationships":
Public Function concatenate(Rel1 As Relationship, Rel2 As Relationship) As Relationship
Dim RelNull As Relationship
Dim RelFull As Relationship
Dim TempRelResult As Relationship
RelNull.Congruent_to = False
RelNull.Is_included_in = False
RelNull.Includes = False
RelNull.Overlaps = False
RelNull.Excludes = False
RelNull.Doubtful = False
RelFull.Congruent_to = True
RelFull.Is_included_in = True
RelFull.Includes = True
RelFull.Overlaps = True
RelFull.Excludes = True
RelFull.Doubtful = False
concatenate = RelNull
TempRelResult = RelNull
If Rel1.Congruent_to Then
concatenate = Rel2
End If
If Rel2.Congruent_to Then
TempRelResult = Rel1
concatenate = large_cons(concatenate,
TempRelResult)
TempRelResult = RelNull
End If
If Rel1.Is_included_in Then
If Rel2.Is_included_in Then
TempRelResult.Is_included_in = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Includes Then
TempRelResult = RelFull
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Overlaps Then
TempRelResult.Is_included_in = True
TempRelResult.Overlaps = True
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Excludes Then
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
End If
If Rel1.Includes Then
If Rel2.Is_included_in Then
TempRelResult.Congruent_to = True
TempRelResult.Is_included_in = True
TempRelResult.Includes = True
TempRelResult.Overlaps = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Includes Then
TempRelResult.Includes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Overlaps Then
TempRelResult.Includes = True
TempRelResult.Overlaps = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Excludes Then
TempRelResult.Includes = True
TempRelResult.Overlaps = True
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
End If
If Rel1.Overlaps Then
If Rel2.Is_included_in Then
TempRelResult.Is_included_in = True
TempRelResult.Overlaps = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Includes Then
TempRelResult.Includes = True
TempRelResult.Overlaps = True
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Overlaps Then
TempRelResult = RelFull
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Excludes Then
TempRelResult.Includes = True
TempRelResult.Overlaps = True
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
End If
If Rel1.Excludes Then
If Rel2.Is_included_in Then
TempRelResult.Is_included_in = True
TempRelResult.Overlaps = True
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Includes Then
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Overlaps Then
TempRelResult.Is_included_in = True
TempRelResult.Overlaps = True
TempRelResult.Excludes = True
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
If Rel2.Excludes Then
TempRelResult = RelFull
concatenate =
large_cons(concatenate, TempRelResult)
TempRelResult = RelNull
End If
End If
concatenate.Doubtful = Rel1.Doubtful Or Rel2.Doubtful
End Function
Interpretation rule for "combined relationships":
Public Function evaluate(Category as
String, Rel1 As Relationship) As String
If (Not
Rel1.Congruent_to) And (Not Rel1.Is_included_in) And (Not Rel1.Includes)
And (Not Rel1.Overlaps) And (Not Rel1.Excludes) Then
evaluate
= " Contradiction !"
ElseIf Category =
" fully applicable " Then
If
(Not Rel1.Doubtful) Then
If (Not Rel1.Excludes) Then
If (Not Rel1.Is_included_in) And (Not Rel1.Overlaps) Then
evaluate = " fully applicable !"
Else
evaluate = " partially
applicable !"
End If
Else
If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or
Rel1.Overlaps) Then
evaluate = " doubtful applicable !"
Else
evaluate = " not applicable !"
End If
End If
Else
If (Not Rel1.Excludes) Then
If (Not Rel1.Is_included_in) And (Not Rel1.Overlaps) Then
evaluate = " fully applicable ?"
Else
evaluate = "
partially applicable ?"
End If
Else
If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or
Rel1.Overlaps) Then
evaluate = " doubtful applicable ?"
Else
evaluate = " not applicable ?"
End If
End If
End
If
ElseIf Category =
" partially applicable " Then
If
(Not Rel1.Doubtful) Then
If (Not Rel1.Excludes) Then
If (Not Rel1.Includes) And (Not Rel1.Overlaps) Then
evaluate = " partially applicable !"
Else
evaluate = " doubtful applicable !"
End If
Else
If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or
Rel1.Overlaps) Then
evaluate = " doubtful applicable !"
Else
evaluate = " not applicable !"
End If
End If
Else
If (Not Rel1.Excludes) Then
If (Not Rel1.Includes) And (Not Rel1.Overlaps) Then
evaluate = " partially applicable ?"
Else
evaluate = " doubtful applicable ?"
End If
Else
If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or
Rel1.Overlaps) Then
evaluate = " doubtful applicable ?"
Else
evaluate = " not applicable ?"
End If
End If
End
If
Else
If
(Not Rel1.Doubtful) Then
If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or
Rel1.Overlaps) Then
evaluate = " doubtful applicable !"
Else
evaluate = " not applicable !"
End If
Else
If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or
Rel1.Overlaps) Then
evaluate = " doubtful applicable ?"
Else
evaluate = " not applicable ?"
End If
End
If
End If
End Function
Sometimes - as a function of specific characteristics of the considered data or data sources - addition of new rules and/or the adjustment of existing ones is necessary, e.g to:
include or exclude certain data sources which are available in the system
give preferential treatment to certain data sources for data output
weighting edges depending on their source (e.g. higher weighting of the opinion held by a certain expert for a certain taxonomic group)
define a special treatment for queries that entail some special risk (e.g. medical information or information concerning the protection of species)
Since rules of this kind are not generally foreseeable and since they may refer directly to data contents and metadata of the source, they should not be incorporated in the core rules and analysis algorithms, but should be read and applied at run-time.
To ensure the adjustment of these rules, they could be formulated in a formal language adapted for propositional calculus. This would also facilitate the implementation of a user interface for this purpose . The programming language Prolog fulfils these requirements and we shall use it for the further description of the system . For the implementation however, other languages can be taken in account. An implementation could also be based on a complex configuration file, from which parameters are passed to core rules at runtime.
Marc Geoffroy, Anton Güntsch & Walter G. Berendsohn
First version (German only): August 2001
Revised second (German and English) version: June 2002
__________________________________________________________________________
MoreTax (Rule-based association of taxonomic concepts) is a research and development project financed by the Federal Agency for Nature Conservation of the German Ministry of the Environment.
Project co-ordinator: Walter
Berendsohn
Project scientist: Marc
Geoffroy
This page last updated on 12-11-2002
© Freie Universität
Berlin, Botanischer Garten und Botanisches Museum Berlin-Dahlem,
Seitenverantwortlicher / Page editor: M.
Geoffroy.
BGBM Impressum / Imprint