Verifying
Ontology
Naming
Conventions & Metadata-completeness in Protégé 4
Motivation
With the
advent of the semantic web and
RDF-based knowledge representation techniques of-the-shelf ontology
editors
like Protégé 4 gain widespread use. Although it's
functionalities are
sufficient for the daily ontology editing tasks, some clean-up checks
on the generated
ontology - e.g. to be stored and carried out before each ontology
release - could complement
P4 in a
useful way.
We here introduce a new Protege Tab plugin that
checks certain properties of an active ontology (OntoCheck) and
allows
for improvements in the areas of
a) Metadata completeness, e.g. via cardinality
checks on mandatory and obligatory annotation properties
b)
Naming Conventions, e.g. via lexical analysis and
labeling enforcement for representational units (RU) names
and IDs.
This work is based on pervious efforts in lexical harmonization, i.e.
the OBO Foundry
Naming Convention proposal.
For a review of ontology naming conventions, please look at the Survey-based naming
conventions for use in OBO Foundry ontology development paper.
Download &
Installation
Find the
OntoCheck plugin for Protege 4.x for download at:
OntoCheck
P4 plugin
Place this jar file in
your Protege 4.x plugin folder, then start Protege and open an owl
ontology. Make the new
OntoCheck Tab visible by checking it in the Window/Tabs menu.
In case you get a
distorted look of the plugin tab window, try to configure the tabs
borders,
so that there is just enough space to show the three panels in the
OntoCheck view. E.g. try move the border of the
class hierarchy browser to the left. If this does not help, try
to select "Window/Reset selected Tab to default state" and
re-try. In case this does not work either, or you get a 'Can't load
plugin error', try to deactivate the tab and re-activate it by
windows/Views/Class Views and drop it into the Field right of the
hierarchy pane.
You can store your generated Checks as well as its result lists onto
your harddrive. Previously the code of the stored xml was to be found
in the
OntoCheckSaves folder within your P4 installation directory, e.g.
in:
C:\Program Files\Protege_4.1\OntoCheckSaves. For these versions you
need to make sure you log into your operating system with
write-rights.
Documentation
The Plugin is
self-explanatory and provides hints when the mouse pointer is
placed over an Item in question. Find an overview of its features and
example applications in the ppt OBML
2011 talk
, or in the OBML 2011
paper (
page 61).
Here you can also look at a test
result-table listing checks carried out on six ontologies together
with some outcome quantification.
Screenshots
The Check
Tab:

Save and re-load of formulated Checks:

The Compare Tab:

The Statistics Tab:

Current Limitations
& Desired Feature List
At the moment the user has to
amend
the labels manually, but RUs
violating tests could be corrected automatically (OntoCure) in the
future, i.e. where possible, do simple syntactic corrections
automatically, e.g. correct case and separator conventions for all
found violations.
Check Tab
A future version should:
- Allow for the exploitation of user specified lists with
words representing ‘affixes of interest’ which are to be
detected and alerted on when occurring in labels. This would allow top
automatize checks e.g. on stop-words, negation indicators, cardinality
indicators, taboo words from the metalevel and many more.
- Check and alert automatically, when an imported
ontology differs in naming conventions from the active ontology.
- Check and alert on violations of class-subclass
name patterns, i.e. situations where the head noun of a subclass is
not adequately related via a taxonomic correspondence to the head noun
of its superclass. It was observed that, for multi-token labels, such
violations are nearly always caused either by a modeling error, such as
confusion of taxonomy with partonomy, or by a bad naming practice, e.g.
‘parsimonious’ omission of the true head noun; this is a
consequence of the set-theoretic nature of OWL ontologies.
- Check that qualifier terms (differentia) appear before
the part being qualified (genus). E.g. ‘NMR_instrument’ in
place of ‘instrument_for_NMR’.
- Allowed checks could be expanded on
relations, e.g. for object properties, check that mutually inverse
relations comply name-wise with each other, e.g. that a certain
relation prefix implies a corresponding circumfix in its inverse form,
as for ‘has_X’ the inverse should be ‘is_X_of’:
Using (genetic_information) ‘has_bearer’ (gene) and (gene)
‘is_bearer_of’ (GeneticInformation) would avoid the
hard-to-find relation name 'inheres_in' as inverse of 'is_bearer_of'.
- rresponding circumfix in its inverse form,
as for ‘has_X’ the inverse should be ‘is_X_of’:
Using (genetic_information) ‘has_bearer’ (gene) and (gene)
‘is_bearer_of’ (GeneticInformation) would avoid the
hard-to-find relation name 'inheres_in' as inverse of 'is_bearer_of'.
- Check for
naming clashes in equal (synonymous)
fields for different classes, e.g. if there is a class with equal
labels represented with different IDs.
- Check if an
imported ontology differs in
naming conventions from the active ontology.
- Check for
violation of class-subclass naming
pattern, i.e., situations when the head noun of a sub-class is not
adequately related via a taxonomic correspondence to the head noun of
its superclass. It was observed that, for multi-token labels, such
violations are nearly always caused either by a modeling error, such
as confusion of taxonomy with partonomy, or by a bad naming practice,
e.g. ‘parsi-monious’ omission of the true head noun;
this is actually a natural consequence of the set-theoretic nature of
OWL ontologies.
Compare Tab
- In the result list both variant
forms
will be displayed and the found
differences could be highlighted.
Count Tab
- Detect
abundant pre-, in-, suff- and postfixes
and list them according to frequency of occurrence. If a postfix occurs
often in siblings, a recommendation could be issued to use this postfix
in those labels throughout, or as superclass label.
- Detect
logical operators like AND, OR, NOT in
names, e.g. BioTop has CarbohydrateMole-culeOrResidue and
OligoOrPolymer. These could be potentially correlated with actual
logical definitions and disjoints.
- Semantic
analysis could probably guide in
expressiveness selection, e.g. words indicating cardinality
requirements, such as minimal, maximal, exact hint for certain OWL 2
profiles.
- Given a reasoner
is activated, as part of an expressivity analysis, besides counting the
hub-node-to-isolate class ratio, entailment densities and OWL flavor
element usage should also be counted. Such metrics would enable the
judgment whether a certain semantics or OWL flavor was chosen because
it is en vogue or because it is actually needed.
Contact &
Questions
OntoCheck is provided free of cost,
is published under the terms of the GNU General Public License and is,
of course, provided without any warranty.
If there are still features missing,
please contact us for new ideas
not mentioned above:
Schober at imbi
dot
universität-freiburg dot de
Acknowledgements
This
work was initiated and
supervised by Daniel Schober, Implemented and improved by Ilinca Tudose
and hosted by Stefan Schulz and Martin Boeker. It
was partly supported by the
Deutsche Forschungsgemeinschaft (DFG) grant JA 1904/2-1, SCHU 2515/1-1
GoodOD
(Good Ontology Design). Thanks to Timothy Redmond for helping
with the Protege API.